… that is, according to the [San Jose, CA] Mercury News:

But how did the hundreds of lesser-known Victorian writers regard the world around them? This question and many others in fields like literature, philosophy and history may finally find an answer in the vast database of more than 12 million digital books that Google has scanned and archived. Google, scholars say, could boost the new and emerging field of digital humanities, …

Google recently named a dozen winners of its first-ever “Digital Humanities Awards,” setting aside about $1 million over two years to help teams of English professors, historians, bibliographers and other humanities scholars harness the Mountain View search giant’s algorithms and its unique database of digital books. Among the winners was Dan Cohen, a professor of history and new media at George Mason University, who hopes to come up with a much broader insight into the Victorian mind, overcoming what he calls “this problem of anecdotal history.” ”What’s incredible about the Google database is that they are really approaching a complete database of Victorian books,” Cohen said. “So we have the possibility, for the first time, of going to something that’s less anecdotal, less based on a chosen few authors; to saying, ‘Does that jibe with what the majority of authors were saying at that time?’”

Besides the Victorian study, the winning teams include a partnership between UC Riverside and Eastern Connecticut State University to improve the identification of books published before 1801 in Google’s digital archive, and a team including UC Berkeley and two British universities to develop a “Google Ancient Places” index. It would allow anyone to query Google Books to find titles related to a geographic location and time period, and then visualize the results on digital maps. ”We have the ability to harness vast amounts of information collected from different places,” said Eric Kansa, a UC Berkeley researcher working on the ancient places project, “and put them together to get a whole new picture of ancient cultures.”

Maybe our own Eric Kansa can explain a bit more about the Google Ancient Places project? The announcement stated: “Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus.” They further wrote:

Google’s Digital Humanities Research Awards will support 12 university research groups with unrestricted grants for one year, with the possibility of renewal for an additional year. The recipients will receive some access to Google tools, technologies and expertise. Over the next year, we’ll provide selected subsets of the Google Books corpus—scans, text and derived data such as word histograms—to both the researchers and the rest of the world as laws permit. (Our collection of ancient Greek and Latin books is a taste of corpora to come.)

Martin Bailey has another good article in the June issue of The Art Newspaper, this time on the new director-general of UNESCO, Francesco Bandarin. He was promoted from within: he comes from the World Heritage Centre. A few excerpts:

From organising the restoration and re-erection of the 1,700-year-old Obelisk of Axum in Ethiopia more than 70 years after it was looted by Mussolini, to working to protect the ancient capital city of Samarra in war-torn Iraq, Italian-born Francesco Bandarin has been involved in many well-known projects during the decade he has served as the director of the World Heritage Centre, …

How damaging is tourism to the major world heritage sites? —— FB: It is an issue of scale, and context. Machu Picchu now has one million tourists a year, which may not seem so many, but it is an isolated mountain site. This led to the development of the city of Aguas Calientes at the foot of Machu Picchu. All the rules of conservation have been overrun by the sheer volume of tourists. At Angkor Wat, in Cambodia, the temples are being conserved, but we did not realise that nearby, at Siem Reap, 150 luxury hotels have sprung up like mushrooms. —— Should Unesco have been tougher in monitoring Angkor Wat? —— FB: We were distracted because we were focusing on conservation of the temples, not on the environment. Now it is a problem. We are not an international police force, but we do run a substantial monitoring system. This year we will be reporting on 180 World Heritage Sites, out of 890. Sometimes monitoring works in terms of results and sometimes it doesn’t. It’s a bit frustrating.

Culture is not a luxury, it is a constituent of development, both economic and social. Culture is not entertainment, it is actually production or capital for development. … Unesco deals with four aspects of culture. First, conservation of heritage sites, both cultural and natural. Secondly, preservation of intangible culture. That comprises traditional knowledge, such as rituals, dance or skills. For instance, the Tango was born in Argentina and Uruguay, but it is now found all around the world. Thirdly, museums. And finally, intercultural dialogue.

I will be organising a major international conference on the future of the book. The book is the most important cultural object, but Unesco has been absent from the debate. The argument between Google and the French government is not healthy, and I think we should provide a forum for the actors [French publishers are resisting Google’s attempts to scan their books]. There is the issue between the Anglo-Saxons and the rest of the world, with English dominating language and technology. Amazon did not exist a few years ago. Books won’t disappear, but they will mutate.

As a follow-up to the previous post about the British Museum’s collaboration with Wikipedia, I’d like to publish a text that was distributed originally on the private agade mailing list. It is written by A.J. Cave.


At 3:14 UTC on June 8th, 2010, English Wikipedia had 3, 317,225 articles and 12,495,212 registered users.  At 4:29 on June 8th there were 3,317,230 articles and 12,495,394 registered users.  In one hour and 15 minutes, Wikipedia had added 5 new articles and 82 new registered users (that is 1.1 registered user per minute!).

Now these numbers might not mean much to you and me, but they mean a lot to online search engines.

Google loves constant change, so it gives preference in its search algorithms to anything posted on Wikipedia above other less active web-based sources.  Google search bots comb through Wikipedia pages regularly like giant spiders, devouring, adding and indexing the ever-growing volume of information.

In the early days of Wikipedia, many techies joined and started writing too.  Some [like me] were more interested to test the underlying technology and see how another web “startup” could shape the internet rather than writing an online encyclopedia.

Since those early days in 2001, Wikipedia has grown into one of the largest websites, with an estimated 800+million [give and take a few] visitors a year.  There are more than 91,000 active contributors working mostly collaboratively on more than 15,000,000 articles in over 270 languages.  About 75,000 editors, from expert scholars to casual readers, regularly edit Wikipedia.

Wikipedia articles usually rank in the top 5 search results depending on the topic.

As mobile search heats up thanks to smartphones that have more capabilities than old personal computers, being among the top 5 search results on a tiny screen becomes even more important.  There is even a Wikipedia website for mobile access at:

I googled British Museum and the Wikipedia article on British Museum showed up as number 4 on the search list, right after the map of the museum and 2 links to museum’s website.  Another Google search on Cyrus Cylinder, a part of the current British Museum’s collection, placed the Wikipedia article in the number 1 spot, with a link to British Museum website at number 4.

Not all Wikipedia articles are of encyclopedic quality and since there is no systematic process to force an all-volunteer army of Wikipedians to write about every topic considered “obviously important” by others, Wikipedia does contain oversights and omissions.

Due to its nature, Wikipedia needs more subject matter experts and specialists in many areas.

So it is not hard to see the motive behind the recent news about the collaboration between the British Museum and a group of London area-based Wikipedians to ensure the museum collection is adequately reflected on the virtual pages of Wikipedia.

The key advantage of Wikipedia over traditional paper encyclopedias is the short editorial cycle, where Wikipedians can update an article anytime with the most recent events and scholarship.  For example the publicly announced results of the upcoming British Museum Workshop on Cyrus Cylinder in late June could hit the corresponding Wikipedia article by one of the Wikipedians with the “backstage” pass to the museum before it reaches other online and print news sources.

Wikipedia has a set of rules that have developed over the years and there is no need to cover them in details here.  If you are interested, you can click on the ‘About Wikipedia’ and read them.  These rules are important because there are a few million Wikipedians and blood would flow in the streets of Wikidom, if there are no rules.

While is a good idea to read Wikipedia’s tutorials, policies and guidelines, sorting through volumes of information can be intimating for newcomers.  So here are a few helpful hints:

1. No matter what you do, you can’t break Wikipedia.  Wikipedia has robust version controls, so you cannot accidentally do permanent harm if you make a mistake in your editing.  All mistakes can be quickly and easily reversed or fixed by any other editor.

2. Start small.  The best way to break in and feel comfortable is do minor edits first.

3. While to edit an article, you can remain anonymous, to create a new article you have to register with a valid email userid and a password.  If you are concerned about privacy and anonymity, you may prefer to create a user name for yourself in order to hide your IP address.

4. Before starting a major edit, announce your intentions on the “Discussion” page of the article.

5. Wikipedians are expected to be civil and neutral, respecting all points of view, and only add verifiable and factual information with cited external sources rather than personal views and opinions.

6. An ideal Wikipedia article aims to be well-researched, well-written, balanced, and neutral with verifiable information, suited for an encyclopedia.  However, many Wikipedia articles start as a “stub”. A stub is an article containing only a few sentences of text which is too short to provide encyclopedic coverage of a topic, but not so short as to provide no useful information, and it should be capable of expansion.

7. Wikipedia articles are always work in progress and vary in quality and maturity.  However, given that anyone can edit any article, it is possible for biased, outdated, or incorrect information to be posted.

8. Wikipedia does not allow original research and there is no elaborate system of scholarly peer review.

9. All articles are susceptible to vandalism and insertion of false information – particularly articles on popular and controversial topics.  But they eventually get cleaned up, either via consensus among Wikipedians or through intervention by the editors using Wikipedia’s conflict resolution systems.  A lock on an article’s page means the article is temporarily protected from editing by everyone and restricted to a few editors.

10. There are no content guarantees, so always check the History page to see if the article has been vandalized.

11. For those who teach, if you think your students have changed a Wikipedia article to match their research papers, just have them printout the History of a Wikipedia article and hand over!

[Additional information at: ]

At the occasion of the spat between Google and the Chinese government, Reuters reports: “More than three-quarters of scientists in China use the search engine Google as a primary research tool and say their work would be significantly hampered if they were to lose it, a survey showed on Wednesday.” Just in case anyone still doubted how much today’s scholars rely on Google and the cornucopia of research and information available on the web, esp. in developing countries. “… asked by the Nature journal how much they rely on Google said it was vital for finding academic papers, information about discoveries or other research programs and finding scholarly literature.” “… science in China would not come to a halt without Google, but the search engine had ‘has transformed information-seeking behaviors in academic communities.’”

A new report came out: The Future of the Internet IV, by J. Anderson and L. Rainie. It’s the 4th volume in this quasi-annual series (previous volumes also available online). This is an important study.

A survey of nearly 900 Internet stakeholders reveals fascinating new perspectives on the way the Internet is affecting human intelligence and the ways that information is being shared and rendered.

The web-based survey gathered opinions from prominent scientists, business leaders, consultants, writers and technology developers. It is the fourth in a series of Internet expert studies conducted by the Imagining the Internet Center at Elon University and the Pew Research Center’s Internet & American Life Project. In this report, we cover experts’ thoughts on the following issues:

“Three out of four experts said our use of the Internet enhances and augments human intelligence, and two-thirds said use of the Internet has improved reading, writing and rendering of knowledge,” said Janna Anderson, study co-author and director of the Imagining the Internet Center. “There are still many people, however, who are critics of the impact of Google, Wikipedia and other online tools.” Read more

Microsoft has made a deal with the NSF to offer free cloud computing services to scientists, says The New York Times. “The goal of the three-year project is to give scientists the computing power to cope with exploding amounts of research data. It uses Microsoft’s Windows Azure computing system, …” “[Those systems] allow organizations and individuals to run computing tasks and Internet services remotely in relatively low-cost data centers.” “Microsoft’s commitment to scientific computing comes two years after a similar service was introduced by Google and I.B.M. … hoping to differentiate the new service by offering scientists a set of custom applications that simplified access to Azure and use of existing software applications like Microsoft Excel easily.” “… the explosion of data being collected by scientists had transformed the needs of the typical scientific research program on campus from a half-time graduate student one day a week to a full-time employee dedicated to managing the data. He said this kind of exponential growth in cost was increasingly hampering scientific research.”

A recent report—thanks to Clifford Lynch via Melinda Burns—by Kathy English, The Longtail of News: To Unpublish or Not to Unpublish, draws attention to an old issue that is gaining new prominence: published content can be challenged but open-access and Google-indexed content brings even passages of material that was “obscure in practice” out into the open. Newspapers and news websites are of course foremost confronted with this (I remember lawyers contacting me a couple of times when I was editing IW&A). People don’t like something published about them (or a pet cause), erroneously or not, and ask for it to be removed from an online archive, sometimes years after the fact. Before, one would easily move on and forget but, now that one can google oneself, old wounds are easily ripped open again, listed prominently in Google search results. In archaeology, we haven’t been subject to this kind of problem much yet—correct me if I’m wrong—but it may very well be only a matter of time. We all know how politically sensitive certain research can be, e.g., Native American repatriation, Biblical archaeology, national heritage vs. colonialism, etc. Personal issues (accusations, challenges, …) do interfere often in the study of the ancients too. A long-forgotten diatribe against an esteemed colleague, “buried” in a Festschrift or some other obscure volume, may suddenly pop up on the Google radar. Excavation notes could list certain artifacts as having been excavated by Ms. X while her arch rival, Mr. Y, remembers differently.

Paradoxically or as a matter of purpose, the endeavored better user experience leads to easier access to information: open-access and Google-indexing means open to legal and other potentially unpleasant challenges. Our academic gentlemen’s agreement on such issues may become antiquated. The general cultural context under which we operate influences our research and the way we communicate our research. The open-access movement is making great strides but there are counterforces. We are not insulated from them. Only time will tell how the balance will evolve, I suppose. One more thing: this also draws attention to archiving and retention policies of online collections. In the future, will outdated, controversial or neglected publications  be included in the migration of a collection to the umpteenth new data standard? Who will decide and on what grounds?

antique printing press

(Crossposted with minor alterations from Heritage Bytes)

I’ve never had the opportunity to visit the impressive ruins of volcanically-conserved Pompeii in Italy. I know it from books, articles and the occasional glimpses from TV or movies but now there’s another way to acquaint oneself with how it must’ve felt to actually walk the streets of the ancient Roman city: Google Maps Street View. For instance, you can walk around in a 3D version of the amphitheater or follow one of the streets. (with thanks to Jack M. Sasson’s agade mailing list)

Cultural Heritage – A UKOLN Blog for the Cultural Heritage Sector discusses how a local UK museum has used Google Books to create an online version of its library. “The Wiltshire Heritage Museum library has just gone online with a full digital library created in just 5 months using the Google Books service. The Library has been collecting books about the history, environment and archaeology of Wiltshire for over 150 years, and has many rare and important books in its collection of over 8000 volumes. … Without Google, it would have cost tens of thousands of pounds, buying a computer system, exhaustive data entry and only a few of the books could have been scanned electronically.” A practical example perhaps?

The Coalition for Networked Information has launched a program called CNI Conversations, a series of sessions in which participants from member institutions take part in discussions on current topics. The first one took place on September 15 and focused on the Google Book proposed settlement, DataNet, library responses to the financial crisis, etc. The mp3 is available online.

