cultural heritage

More fascinating and thoughtful debate about the Google Book Settlement in Mike Wilken’s comment thread.

I want to add just a bit more about it.

I think Ryan Shaw’s assessments are spot on in this discussion. We’re left perplexed by the Settlement and concerned about ambiguities and scenarios where these ambiguities (or defects) in the Settlement can lead to bad outcomes.

Mike asks where the animosity toward Google comes from, and I think that’s a harder issue. Ryan responded that people had “Google on a pedestal” and are disappointed that Google didn’t fight harder for the public interest. There may be something to that. I’ve followed the “Access to Knowledge” movement for some years, and Google has often been seen in a very positive light – “Look you can make a profit and dramatically widen information access and use”.

However, I think the scale of the book corpus, together with Google’s other information services make people rightfully concerned about Google, its future actions, and the power it wields. Even if the current leadership at Google is relatively enlightened, will it always be that way? Will the Google Books service and corpus someday be sold to Elsevier or NewsCorp? Would we still like the settlement then?

Some of the skepticism also comes from how this settlement changes Google’s profit and incentive models. The settlement makes Google a content provider, one that sells access to books. This is a very different position than its familiar role of providing search and discovery services. This issue links to the debate about Google’s “Knol” service, where Google aims to host user-generated articles in a manner similar to (or in competition with) the Wikipedia. Several have argued that this creates a conflict of interest, and people worry that if Google becomes a content provider it will face pressure to bias search results to its own content. So I think there are some legitimate worries about Google shifting from information discovery to becoming a publisher promoting its own content.

So, to me, it make sense to look at the settlement from the perspective “what could go wrong”. When people think about risk, they usually make an assessment about the probability of something going wrong times its impact. Given the high stakes involved, where the impact of a poor Settlement can be pretty large and dreadful, I think caution is very reasonable.

I’ve had a chance to digest our recent conference on the Google Books Settlement. Like many other observers, I came away from the event less clear about what the Settlement actually means and how it will shape the future landscape of information access. Mark Liberman, a conference participant and pioneer in computational humanities (and other areas) live-blogged the event here.

Unfortunately, Colin Evans from Metaweb caught a flu and had to cancel. I was really hoping to get their perspective, since Metaweb is an important player in the landscape of “texts as data”. Much of the data in Freebase (Metaweb’s service) comes from the Wikipedia and other public sources. To populate Freebase, Metaweb has performed a great deal of text-mining and entity extraction of Wikipedia articles. But one of the great things about the situation with Freebase is that they do not have exclusive control their source datasets. If you don’t like the way Freebase mined the Wikipedia, you are free to download the Wikipedia yourself and have at it.

Google Books, and the Google Books Settlement represent a very different set of circumstances.

The more I think about it, the more I’m worried about the whole thing. The Google Books corpus is unique and not likely to be replicated (especially because of the risk of future lawsuits around orphan-works). This gives Google exclusive control over an unrivaled information source that no competitor can ever approximate. Companies like Metaweb and Powerset (recently acquired by Microsoft) who, in large part, base their services on computational processing of large collections of texts, will be unable to compete with Google.

To make this point more clear, imagine if in 1997 Website owners and Yahoo! agreed to a similar settlement about crawling and indexing websites. This hypothetical settlement would have created roadblocks to new startups from crawling and indexing the Web and offering new innovative search services because the startups would have faced risks of ruinous copyright lawsuits. Research in new search technology may have continued, but under similar restrictions, where rival commercial or even noncommercial services could not be deployed. Given this hypothetical, would we even have Google now?

So why is it that crawling and indexing the Web is so different from digitizing and indexing books? In one area we have innovation and competition (sorta, given Google’s dominance), and now in the other area, we have one company poised to have exclusive control over a major part of our cultural, or at least literary, heritage.

Final Points

In our continuing dialogue about the settlment, Matthew Wilkens comments on my earlier complaints about the Google Books Settlement noting (in comments):

Maybe Eric and others fear that Google and/or the publishers will construe ordinary research results as “competing services,” but I think that’s pretty effectively covered in the settlement. As an i-school person, he’s maybe more likely than I am to butt up against “service” issues. But I still don’t really see the problem; the settlement says you’re not entitled to Google’s database for purposes other than research. That strikes me as fair.

Fair enough, and yes, I’m just as concerned about creating scholarly “services” as I am about creating finished scholarly “products” (books, articles, and the like). I think that many exciting areas of scholarly production lie in the creation of services (“Scholarship-as-a-Service”; my own attempt at a lame catch-phrase). Essentially the idea is that some scholarly work can serve as working infrastructure for future scholarly work.  I think the restrictions in the Google Book Settlement are too vague and open ended and would inhibit researchers from designing and deploying new services of use to other researchers. So, although the settlement probably won’t be that much of a problem if your goal is directed to creating a few research artifacts (books, papers), it can be a big problem if your goal is to make cyberinfrastructure others can use. Thus, even from the relatively narrow perspective of my interests as a researcher (and neglecting the larger social issue of the lack of competition in text-mining such a significant chunk of world literature), I have deep concerns about the settlement.

Last, in my panel, Dan Clancy of Google Books tried to respond to what would and would not be restricted in terms of “facts” that could be mined and freely shared from the Research Corpus, in “services” or in other forms. Despite his attempts to address the issues (and I really appreciate his efforts at reaching out to the community to explain Google’s position), I am still left very confused about what is, and what is not, restricted. Given that this corpus is so unique and unrivaled, this confusion worries me greatly.

In preping for the big day on Friday, when the UC Berkeley ISchool will host a conference on the Google Books Settlement (GBS), I’ve been doing some poking around to get a sense of reactions from researchers.

Matt Wilkens, a computationally inclined humanist recently wrote a a good argument for supporting the settlement. Although thought provoking, I still can’t agree with the GBS without some key changes. In my mind, (echoed in many places) the dangers of a entrenching Google as a monopoly in this space far outweigh the benefits offered by the settlement.

There are other important objections with regard to the privacy issues and user data capture that will be required under the access and use restrictions. Remember this is a company that already monitors a tremendous amount of user data (some 88% of all web traffic! see:, and is moving toward “behavioral advertising”.

What’s bad about this for scholars? I think there can be a “chilling effect” with the privacy issues. Google does not have the same values found in your university library, and will exploit data about your use of their corpus. They can also remove works with no notice or recourse, again, not like a university library.

All of these objections have been made by many others (more eloquently than here).

The Research Corpus

What has somewhat less attention is the “non-consumptive” use of the so-called “research corpus”. The GBS would make the scanned book corpus available to qualified researchers for “non-consumptive” uses (I read this as uses that don’t primarily require a human to read the books). Nobody will know how they will play out. I think for researchers on the computational side, it’ll be a huge boon, since they’ll have a big data set to use to test new algorithms.

However, humanities scholars are on the more “applied” side of this. They’re more likely to want to use text-mining techniques to better understand a collection. Where I see a problem is that they will not have clear permissions to share their understandings, especially as a new service (say one with enhanced, discipline-specific metadata over a portion of the corpus). Because that service may “compete with Google” or other “Rightsholders”. I really think that restriction matters.

The settlement also places restrictions on data extracted (through mining and other means) from copyrighted works. In the settlement on Page 82, “Rightsholders” can also require researchers to strike all data extracted from a given book. I see this as a major problem because it weakens the public domain status of facts/ideas. Another more down-stream worry lies in future services Google may offer on the corpus. If Google launches a Wolfram|Alpha like service on this corpus, they will also likely act like Wolfram|Alpha and claim ownership of mined “facts”.

None of this is good for researchers in the long term. Now, I’m not saying this has to be a totally “open” resource (it can’t because of the copyright status of many of the books). All I’m saying is that we should be REALLY concerned. We should push for some additional protections.

On that note, here’s a nice idea:

A quick note to draw attention to an article in the latest issue of The Art Newspaper: “Facebook is more than a fad—and museums need to learn from it.”

A few quotes: “Social networks and blogs are the fastest growing online activities, according to a report published in March by research firm Nielsen Online. Almost 10% of all time spent on the internet …” “… a major factor in the success of social networks is that they allow people to select and share content. This has become a hobby, even considered by some to be a serious creative outlet, with web users spending time ‘curating’ their online space. Museums are well placed to appeal to this new generation of ‘curators’because they offer rich and interesting content that can be virtually ‘cut-up’ and stuck back together online in numerous different ways to reflect the individual tastes of each user. If remixing, reinterpreting and sharing interesting content is, as Nielsen suggests, the kind of engaging interaction that draws people to social networks, then museums should embrace the idea that ‘everyone is a curator’, both online and offline.” “For example, the Art Museum of Estonia has gone against convention by actively encouraging visitors to photograph its collection; the MoMA website helps users to co-create content and share these creations with friends.”

DDIG member, Prof. Peter Bleed (University of Nebraska), sent this announcement of a website describing his research investigating battlefields of the Spanish-American War.

The website, with a rich array of maps, description, and images, is found at:

Check it out!

A series of lectures at Georgia Tech are now viewable online. They are interesting for all scholars of the digital inclination. For instance, Cliff Lynch, Executive Director of the Coalition for Networked Information, spoke on A Changing Society, Changing Scholarly Practices, and the New Landscape of Scholarly Communication. Other topics are The Current State of Journal Publishing & Open Access Journals 2.0, Repository Programs: What Can They Do for Faculty, Cyber Infrastructure: Removing Barriers in Research and Scholarly Communications.

Also, a new report is now available as a pdf download: Working Together or Apart: Promoting the Next Generation of Digital Scholarship. Report of a Workshop Cosponsored by the Council on Library and Information Resources and The National Endowment for the Humanities, March, 2009. 78 pp. “As part of its ongoing programs in digital scholarship and the cyberinfrastructure to support teaching, learning and research, … CLIR in cooperation with the … NEH held a symposium on September 15, 2008 in which a group of some 30 leading scholars was invited to
• articulate the research challenges that will use the new media to advance the analysis and interpretations of text, images and other sources of interest to the humanities and social sciences
• and in so doing, pose interesting problems for ongoing computational research.”

The Art Newspaper of 4-17-09 has an interesting article on an archaeological issue in Indonesia that has reached the highest level of government. It’s not everyday you see a minister apologize about disrespecting an archaeological site. There is hope after all! See the article for details.

Here’s some great news (esp. considering current economic conditions!) for those of you interested in digital data and archaeology:

Digital Antiquity Seeks a Founding Executive Director

Digital Antiquity seeks an entrepreneurial and visionary Executive Director who can play a central role in transforming the discipline of archaeology by leading the establishment of an on-line repository of the digital data and documents produced by archaeological research in the Americas. Digital Antiquity is a national initiative that is generously funded by the Andrew W. Mellon Foundation.

The Executive Director oversees all Digital Antiquity activities, including hiring and supervising staff, marketing repository services to the professional community, guiding software development, and managing acquisition of repository content.

During its startup phase, Digital Antiquity resides within Arizona State University and the Executive Director will hold the position of Research Professor at ASU with a 12 month, renewable appointment, excellent benefits, and a rank and attractive salary commensurate with experience. A fixed term secondment or IPA (paid transfer from another position) would also be considered.

A link to the full job announcement may be found at Interested individuals may also contact Keith Kintigh ( for more information. Consideration of applications will begin May 1, 2009 and will continue until the position is filled .

What is authentic? What is original? What is fake? What is a replica? Can you answer those questions? Ever since an exhibition in a Hamburg museum, which featured eight real terracotta warrior statues from the world famous tomb of China’s emperor Qin, was closed down in December 2007, these questions are not purely academic any more.

Emperor Qin

Qin Shi Huangdi was China’s first emperor, who first united the country. Upon his death in 210 BC, he was buried along with an army of 8,099 larger-than-life soldiers and horses, made from terracotta. They were discovered in 1974 near Qin’s extensive funerary complex in Xi’an and have been under archaeological investigation ever since. Amazingly, every statue seems to have been modelled after an individual person so that no two are alike. The tomb itself has not yet been excavated. Since the discovery, it seems like some terracotta statues have always been travelling around the world to figure as centrepieces of blockbuster exhibitions. I remember attending one in Brussels, Belgium, in the 1980s. The museum officials involved in an upcoming exhibition in Maaseik, Belgium, claim that it takes about eight months and direct contact with the proper Chinese authorities in Xi’an to secure all the official paperwork and permissions for the exhibition. But the Museum für Völkerkunde Hamburg (MVH; Hamburg Museum of Ethnology) which planned the “Power in Death” (Macht im Tod) exhibition, however, skipped the official Chinese channels and arranged to obtain the statues through the Leipzig-based Center of Chinese Arts and Culture (CCAC).

Qin tomb terracotta cavalryman and horse, Tokyo National Museum exhibition, 2005

Qin tomb terracotta cavalryman and horse, Tokyo National Museum exhibition, 2005

Authentic, original, real: take your pick!

The latter institution, which had its own Chinese terracotta warrior exhibition in Leipzig through 2007 with replicas – not so evident on the website I must say, claims they didn’t deceive anybody: the contract only stipulated “authentic” which they take to be not the same as “original,” i.e., real and excavated. In other words, they delivered statues made in China, with the correct dimensions, made of fired clay and resembling the real ones. Authentic, right? The MVH director, Wulf Köpke, doesn’t agree and has already said they likely will sue the CCAC. However, the MVH doesn’t look totally credible either. For instance, the sculptures arrived by boat from China, which is contrary to the custom of transporting this type of highly valuable and fragile artefact by plane. Also, the start of the exhibition was delayed for a month or so while there were problems with the paperwork for the statues. Again something that should have sent up warning flags. The museum is currently involved in a comprehensive rebuilding campaign, which has rendered its collections mostly inaccessible, hence the need for artefacts on loan to provide income from entrance fees. One can’t help but think that this may have influenced the museum in their willingness to press for full disclosure.

China: Intellectual property rights!

Chen Xianqi of the Shaanxi Provincial Bureau of Cultural Heritage in the city of Xi’an, where the terracotta army was found, angrily called it “… a serious act of fraud [which] has implications for intellectual property right[s]” and threatened legal action. He stated that it was illegal to have an exhibition of the real terracottas that wasn’t authorised by the Xi’an authorities. In fact, these rules do make practical sense as copies of the genuine terracotta warriors are readily available in China. A local factory, for instance, is known to offer life-size replicas for 1,500 yuan ($220). In light of this it surely is odd, however, that official Chinese state broadcaster, CCTV covered the opening of the Hamburg exhibition. The role of the Chinese consulate in Hamburg has also been questioned. In The Guardian, it is stated that the Chinese authorities might actually on occasion allow exhibitions with certified replicas as long as everything goes through the proper channels. Were the Hamburg warriors authorised copies? We don’t know. So this case could possibly be more about being left out of the loop and PR damage than a real concern about heritage. As the blog “Culture Matters” pointed out, this type of blockbuster exhibition is all about making money and the revenue sharing deals are hard fought. The Xi’an heritage authorities may talk a good talk about the public having been cheated but what they really may want is their share of the revenue that they normally would have negotiated.

Fake or real: Does it matter?

The irony of course lies in the fact that nearly 10,000 people happily came and visited the exhibition before it closed. They admired the warriors, horses, weapons and decorative objects. They studied the miniature version of the excavation site as well as the multimedia display about the archaeological investigations. Entrance tickets were hard to come by and visitors came from as far away as Austria and Switzerland (Hamburg is in the very north of Germany!). The leadership of the museum (a public institution) was very happy. When the first concerns surfaced in the media, a sign was set up that the authenticity of the statues was in doubt. After the show was closed, hardly any of the visitors took the MVH up on their offer for a no-questions-asked refund of their entrance fee. One wonders if it wouldn’t have made more sense to keep the exhibition open but with a clear explanation that the big terracotta statues – not the other artefacts – were replicas. There is, furthermore, a long history of successfully faked antiquities, for example, Brigido Lara, the post-pre-Colombian ceramicist, the authenticity of which is often contested to this day, and the Getty kouros.

Shiver me timbers – I’ve been pirated!

Some of the reactions in the Western media were definitely not withoutschadenfreude, as is proven by photo captions such as “I’m sure there was a Made in China sticker on here somewhere” and “Shiver me timbers – I’ve been pirated.” By the way, the MVH no longer has any mention of the infamous exhibition on its website. The site search function still yields results for it but the links only lead to purged pages. Even the press release about the closure of the exhibition and the way to get a refund is nowhere to be found. Nor does the CCAC make any mention of the whole controversy on its website either. To be continued in court?



Originally appeared in, Feb. 7, 2008

I recently returned from Athens Greece and a facinating meeting hosted by the Hellenic Ministry of Culture. The meeting (“Digital Heritage in the New Knowledge Environment: Shared Spaces & Open Paths to Cultural Content“) explored how the Greek cultural heritage sector is embracing and is challenged by the explosion of digital technologies and content that is currently reshaping the globe.

The meeting highlighted important tensions in the adoption of digital dissemination frameworks. For many of us who have been working with digital technologies for the past several years, the tensions are familiar, and at the risk of putting them into a characture form, I can summarize them below:



Nearly free access to the full richness of the documented record of Greece’s cultural heritage Resistance to abandoning traditional models of “cost recovery” (subscription charges). Continued attempts to charge for content, even though the justifications for such charges seem poorly articulated.
The possibility to use digital dissemination technologies to enhance the comprehensiveness, scope, and transparency in cultural heritage documentation and research. The social realities of micro-politics, personal rivalries, and established norms of professional practice which inhibit transparency and create incentives for data-hording. As in many other parts of the world (US archaeology included!) paper publication is still has more prestige than digital dissemination. A fetish for paper seems to be a common affliction in the humanities and social sciences.
The capability of digital content to be easily and endlessly duplicated, adapted, and incorporated into new scholarly, educational, or artistic works. Long standing national copyright claims over Greek cultural patrimony. It seems that the Greek state has legislated ownership over it’s past. Releasing the documentary record of Greece’s past into a digital commons may pose some legal challenges. (See these discussions: one and two of intellectual property claims over national heritage)


The whole “copyrighting the past” argument is interesting. Though I have no formal legal training, I’ve picked up some expectations from living within the Anglo-American legal tradition. At least traditionally, we’ve got a very economic / practical view of copyright, and typically regard copyright as a convenient legal fiction to incentivize creative production. “Copyrighting” a work that is 2500 years-old obviously flies in the face of this tradition. However, parts of Continental Europe have different legal traditions. Copyright over the works of Classical Antiquity seem to be somehow in line with “moral rights” types of perspectives, where the goal of copyright is not only to protect commercial incentives, but it is also to protect, in perpetuity, the dignity and honor of the creator of works. That seemed to be some of the argument given in comments made at this conference.

Given Greece’s recent history of resistence to Ottoman imperialism, exploitation by Western powers, and transition out of “developing world” to “developed” status, attempts to guard national honor and dignity of a past that is so important to Greece’s national identity makes some sense. However, this perspective doesn’t seem to work so well in the new digital environment, where everything is global, remixable, and seemingly uncontrollable. Legislative mandates to protect “dignity” seem difficult if not impossible to enforce.

Oddly enough, the current situation may have the perverse effect of making it difficult for members of the public to use Greek cultural heritage for mainstream academic or instructional purposes. People who would be more likely to use Greek antiquity in obnoxious ways are probably precisely those people who would tend to ignore legislative restrictions.

It’ll be fascinating to watch how Greece will adapt its cultural heritage policies in this new world. 

Other conference participants have blogged about the meeting. Check out Leif Isaksen’s post,  and Stefano Costa’s post.

[UPDATED]: Mary Saunders also posted about her experiences at the conference, and she has some additional useful links to related content. 

I’ll update with even more links of blog reactions as I find them.


Final Note:

I want to thank the Hellenic Ministry of Culture for their invitation for me to attend this meeting. I deeply appeciated the opportunity to participate in this discussion.

« Previous PageNext Page »