books


Clifford Lynch drew my attention to “an announcement from the UK Royal Society indicating that in celebration of Open Access week they were opening their entire journal archive for free access till the end of the society’s 350th anniversary year, 30 November 2010. This is a great opportunity to get access to two issues  of Philosophical Transactions of the Royal Society A from August and September 2010 which focus on E-science and contain a number of outstanding papers. See http://rsta.royalsocietypublishing.org/content/368/1925.toc and http://rsta.royalsocietypublishing.org/content/368/1926.toc

A few examples:

  • “Methodological commons: arts and humanities e-Science fundamentals” (abstract and pdf);
  • “Deploying general-purpose virtual research environments for humanities research” (abstract and pdf);
  • “Use of the Edinburgh geoparser for georeferencing digitized historical collections” (abstract and pdf);
  • “Adoption and use of Web 2.0 in scholarly communications” (abstract and pdf);
  • “Retaining volunteers in volunteer computing projects” (abstract and pdf).

figure from “Use of the Edinburgh geoparser for georeferencing digitized historical collections”

Among Anglo-Saxons, tonight is Halloween, a rather frivolous holiday with some serious undertones. American movies and TV have propagated the holiday to such an extent however, that the lowest common denominator of the event is relatively well known across the world: small—and big—kids dressing up and collecting candy (“trick-or-treating”). Here’s a pic of my kids six years ago:

Halloween 2004, copyright F. Deblauwe

Of course, the connection with  superstitions about the Undead is easy to spot, be they disguised as the All Souls Christian holiday or el Día de los Muertos in Mexico. As a child in Belgium, we didn’t celebrate Halloween but I did make a scary-face lantern around this time of the year albeit not using a pumpkin but a sugar beet. If I remember correctly, popular lore somehow connected the lanterns with St. Maarten (St. Martin), a saint that actually in some regions of my home province of West Flanders even substituted for Sint Niklaas (St. Nicholas, i.e., Santa) in his gift-giving-to-kids role.

This is primarily an archaeological blog though. So what are the connections between digging up the past and zombies, witches and other scary critters and dark practices? Here are a few choice links:

Archaeology is a famously ghoulish pursuit whose practitioners are always on the look-out for dead bodies to gloat over. If we can’t find a grave, then at least we’ll try to get hold of animal bones from kitchen middens and sacrificial deposits. I’ve seen desperate Mesolithic researchers cackle with funereal glee over the toe bones of long-dead seals. Osteologists are of course the worst necrophiliacs of the lot. But nobody’s immune. There’s an anecdote going around about my old favourite teacher, where he lifts a pelvis out of a Middle Neolithic grave, licks his lips while turning the charnel thing over in his hands, and exclaims, “Now this was a very beautiful woman!”.

… that is, according to the [San Jose, CA] Mercury News:

But how did the hundreds of lesser-known Victorian writers regard the world around them? This question and many others in fields like literature, philosophy and history may finally find an answer in the vast database of more than 12 million digital books that Google has scanned and archived. Google, scholars say, could boost the new and emerging field of digital humanities, …

Google recently named a dozen winners of its first-ever “Digital Humanities Awards,” setting aside about $1 million over two years to help teams of English professors, historians, bibliographers and other humanities scholars harness the Mountain View search giant’s algorithms and its unique database of digital books. Among the winners was Dan Cohen, a professor of history and new media at George Mason University, who hopes to come up with a much broader insight into the Victorian mind, overcoming what he calls “this problem of anecdotal history.” ”What’s incredible about the Google database is that they are really approaching a complete database of Victorian books,” Cohen said. “So we have the possibility, for the first time, of going to something that’s less anecdotal, less based on a chosen few authors; to saying, ‘Does that jibe with what the majority of authors were saying at that time?’”

Besides the Victorian study, the winning teams include a partnership between UC Riverside and Eastern Connecticut State University to improve the identification of books published before 1801 in Google’s digital archive, and a team including UC Berkeley and two British universities to develop a “Google Ancient Places” index. It would allow anyone to query Google Books to find titles related to a geographic location and time period, and then visualize the results on digital maps. ”We have the ability to harness vast amounts of information collected from different places,” said Eric Kansa, a UC Berkeley researcher working on the ancient places project, “and put them together to get a whole new picture of ancient cultures.”

Maybe our own Eric Kansa can explain a bit more about the Google Ancient Places project? The announcement stated: “Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus.” They further wrote:

Google’s Digital Humanities Research Awards will support 12 university research groups with unrestricted grants for one year, with the possibility of renewal for an additional year. The recipients will receive some access to Google tools, technologies and expertise. Over the next year, we’ll provide selected subsets of the Google Books corpus—scans, text and derived data such as word histograms—to both the researchers and the rest of the world as laws permit. (Our collection of ancient Greek and Latin books is a taste of corpora to come.)

In light of the recent triennial review of copyright practice in the US by the  US Copyright Office (a division of the Library of Congress) that legalized “jail-breaking” iPhones, I thought it would be a good idea to point out some good, freely-available materials on copyright relevant to archaeology and the humanities in general:

  • article about “Copyright Urban Legends” from the June 2010 issue of Research Library Issues;
  • implications of the US Copyright Office exemptions to the Digital Millennium Copyright Act for educators from Planned Obsolescence;
  • the Privilege and Property. Essays on the History of Copyright edited book;
  • The Economics of Copyright report, a last hurrah of a now-suddenly-disbanded Strategic Advisory Board for Intellectual Property Policy (“Providing [UK] government with independent, strategic, evidence-based advice on intellectual property policy”… no longer needed by the new Tory-Lib government perhaps?)

Via AWOL came the announcement of a new open-access monograph series: Ancient Near East Monographs/Monografías sobre el Antiguo Cercano Oriente (ANEM/MACO—yes, it trips from your tongue, doesn’t it?). It publishes archaeological/historical/linguistic research on the ancient Near East (Egypt, Palestine, Mesopotamia, Anatolia, …). This initiative is welcome as always, of course, the more peer-reviewed, open-access publications the better. Have a look at Alan Lenzi’s empassioned and eloquent explanation of why open-access publishing is a good idea. I’m probably preaching to the choir here but just in case…

Cultural Heritage – A UKOLN Blog for the Cultural Heritage Sector discusses how a local UK museum has used Google Books to create an online version of its library. “The Wiltshire Heritage Museum library has just gone online with a full digital library created in just 5 months using the Google Books service. The Library has been collecting books about the history, environment and archaeology of Wiltshire for over 150 years, and has many rare and important books in its collection of over 8000 volumes. … Without Google, it would have cost tens of thousands of pounds, buying a computer system, exhaustive data entry and only a few of the books could have been scanned electronically.” A practical example perhaps?

The Coalition for Networked Information has launched a program called CNI Conversations, a series of sessions in which participants from member institutions take part in discussions on current topics. The first one took place on September 15 and focused on the Google Book proposed settlement, DataNet, library responses to the financial crisis, etc. The mp3 is available online.

More fascinating and thoughtful debate about the Google Book Settlement in Mike Wilken’s comment thread.

I want to add just a bit more about it.

I think Ryan Shaw’s assessments are spot on in this discussion. We’re left perplexed by the Settlement and concerned about ambiguities and scenarios where these ambiguities (or defects) in the Settlement can lead to bad outcomes.

Mike asks where the animosity toward Google comes from, and I think that’s a harder issue. Ryan responded that people had “Google on a pedestal” and are disappointed that Google didn’t fight harder for the public interest. There may be something to that. I’ve followed the “Access to Knowledge” movement for some years, and Google has often been seen in a very positive light – “Look you can make a profit and dramatically widen information access and use”.

However, I think the scale of the book corpus, together with Google’s other information services make people rightfully concerned about Google, its future actions, and the power it wields. Even if the current leadership at Google is relatively enlightened, will it always be that way? Will the Google Books service and corpus someday be sold to Elsevier or NewsCorp? Would we still like the settlement then?

Some of the skepticism also comes from how this settlement changes Google’s profit and incentive models. The settlement makes Google a content provider, one that sells access to books. This is a very different position than its familiar role of providing search and discovery services. This issue links to the debate about Google’s “Knol” service, where Google aims to host user-generated articles in a manner similar to (or in competition with) the Wikipedia. Several have argued that this creates a conflict of interest, and people worry that if Google becomes a content provider it will face pressure to bias search results to its own content. So I think there are some legitimate worries about Google shifting from information discovery to becoming a publisher promoting its own content.

So, to me, it make sense to look at the settlement from the perspective “what could go wrong”. When people think about risk, they usually make an assessment about the probability of something going wrong times its impact. Given the high stakes involved, where the impact of a poor Settlement can be pretty large and dreadful, I think caution is very reasonable.

In preping for the big day on Friday, when the UC Berkeley ISchool will host a conference on the Google Books Settlement (GBS), I’ve been doing some poking around to get a sense of reactions from researchers.

Matt Wilkens, a computationally inclined humanist recently wrote a a good argument for supporting the settlement. Although thought provoking, I still can’t agree with the GBS without some key changes. In my mind, (echoed in many places) the dangers of a entrenching Google as a monopoly in this space far outweigh the benefits offered by the settlement.

There are other important objections with regard to the privacy issues and user data capture that will be required under the access and use restrictions. Remember this is a company that already monitors a tremendous amount of user data (some 88% of all web traffic! see: http://knowprivacy.org/), and is moving toward “behavioral advertising”.

What’s bad about this for scholars? I think there can be a “chilling effect” with the privacy issues. Google does not have the same values found in your university library, and will exploit data about your use of their corpus. They can also remove works with no notice or recourse, again, not like a university library.

All of these objections have been made by many others (more eloquently than here).

The Research Corpus

What has somewhat less attention is the “non-consumptive” use of the so-called “research corpus”. The GBS would make the scanned book corpus available to qualified researchers for “non-consumptive” uses (I read this as uses that don’t primarily require a human to read the books). Nobody will know how they will play out. I think for researchers on the computational side, it’ll be a huge boon, since they’ll have a big data set to use to test new algorithms.

However, humanities scholars are on the more “applied” side of this. They’re more likely to want to use text-mining techniques to better understand a collection. Where I see a problem is that they will not have clear permissions to share their understandings, especially as a new service (say one with enhanced, discipline-specific metadata over a portion of the corpus). Because that service may “compete with Google” or other “Rightsholders”. I really think that restriction matters.

The settlement also places restrictions on data extracted (through mining and other means) from copyrighted works. In the settlement on Page 82, “Rightsholders” can also require researchers to strike all data extracted from a given book. I see this as a major problem because it weakens the public domain status of facts/ideas. Another more down-stream worry lies in future services Google may offer on the corpus. If Google launches a Wolfram|Alpha like service on this corpus, they will also likely act like Wolfram|Alpha and claim ownership of mined “facts”.

None of this is good for researchers in the long term. Now, I’m not saying this has to be a totally “open” resource (it can’t because of the copyright status of many of the books). All I’m saying is that we should be REALLY concerned. We should push for some additional protections.

On that note, here’s a nice idea:
http://www.eff.org/deeplinks/2009/06/should-google-have-s