January 2008


Lately I sometimes wonder if I should just devote this entire blog to Google related posts, since Google continues to reshape archaeology and just about every other area of the sciences, social sciences, and humanties.

The Wired Science blog has a post about the impending launch of Google’s latest move into the world of scientific research. I think it’s called “Palimpset”. The Wired piece also links to this blog, “Pimm”, which has a presentation about this project available on Slideshare. Pimm’s blog said that this project is strictly nonprofit (I can’t find any confirmation of this, and it doesn’t look like it’s part of Google’s foundation, Google.org).

Anyway, this is very exciting. With Open Context, we’ve been working with the Internet Archive, and have started discussions with Metaweb for using Freebase for broader data sharing. Having access to Google’s tremendous infrastructure is also very welcome.

Now for some caveats on this good news.

Google is really, really powerful, and despite its stated policy of not being “evil”, I can’t help but to wonder if moving all our research and lives into the Goolgeverse is really a good idea. I also wonder if this program’s “nonprofit” status will change and what guarantees exist that they will perpetuate the data freely and forever. Companies, even giants like Google, do have a habit of coming and going (creative destruction and all that.) They can also change their policies if the bottom-line is suffering.

The open licensing of the data is a good thing though since it means the data can be in multiple places (offering redundancy). However, few other institutions have Google’s capacity to handle so much data, so even if the data is “open” there may not be many other places to put it (except for little chunks of data). Science (and the humanities) just keeps on creating ever larger and larger datasets. Even though storage costs are in decline, the community’s requirements for handling massive amounts of data keep expanding. I don’t see many other institutions that can match Google in offering this kind of service for “free”. It probably would be better for us all to have more than one organization with the kind of infrastructure needed to support this as a free service.

We’ll see. At anyrate, I’m looking forward to learning more about this.

Update: Peter Suber also discusses this announcement, and links back to this post announcing an earlier incarnation of Google’s science data program.

Looking at the blog Ars Technica, I ran into a post reviewing an interesting report by the British Library and JISC. The report looks as Internet usage patterns of young people born after 1993 (Side note! 1993! I met my WIFE that year! I’m feeling old…). The aim of the report is to help guide development of digital library services.

The report details how young people, while comfortable with technology, are by no means always “expert users”. Relatively simple search forms most widely used, and “advanced search” functions see comparatively little use. According to the report:

Users make very little use of advanced search facilities, assuming that search engines
`understand’ their queries. They tend to move rapidly from page to page, spending little time reading or digesting information and they have difficulty making relevance judgements about the pages they retrieve.

That’s interesting. It suggests that most young people have big expectations for getting relevant information from a simple text box form. I suppose that’s even more motivation for more intelligent natural language search. Academic repositories may want to look at Powerset, if they come up with search tools (like Google) that you can install on your own sites.

Ars Technica also noted this report claims that “authority” is not dead for the Google Generation. This should give some comfort to professional scholars who worry that students will uncritically believe everything they see on the Web and won’t pay attention to traditional mechanisms for validating information (peer review, credentials, quality of sources, etc.).

Anyway, there’s much more to this report. Dig away!

My colleague Erik Wilde is organizing a workshop on Location and the Web. I’m helping to organize and have already hit some of the email lists with a call for papers. The types of questions explored by this workshop will be directly relevant to researchers interested in using GoogleEarth or Second Life for visualization and analysis (for instance). Here’s his call for papers:

the paper submission deadline for the First Workshop on Location and the Web (LocWeb 2008) is only 18 days away. we now have a pretty stong program committee, and i am looking forward to the submitted paper and of course the workshop itself.

so if you are interested in location information and the web, please consider submitting a paper. the workshop is held in beijing and co-located with WWW2008, the 2008 edition of the world’s premier conference in the area of web technologies.

my personal hope for the workshop is that we will be able to get strong submissions in the area of how to make location information available as part of the web, not so much over the web. there are countless examples of applications with location as part of their data model, which are accessible through some web interface, but there are far less examples of applications which try to turn the web into a location-aware information system. the latter would be the perfect candidate for the workshop.

Shawn Graham got the ball rolling with his discussion of applying Second Life as an instructional platform for archaeology. It seems to have had some resonance with other archaeo-bloggers (see ClioAudio, and ArchaeoGeek). ArchaeoGeek noted some fascinating work attempting to link GIS-type capabilities in Second Life. They even have an elaborate model of downtown Berkeley, including BART station.

Shawn also rightly discusses some concerns that people have voiced. These comments show some worry that we’re in danger of putting our data eggs in one basket, aand becoming dependent on yet another commercial platform (as in my previous discussion of Google, and how much we’ve come to rely on it). Given all the data preservation problems caused by closed-proprietary file formats and software, these are valid issues.

However, Linden Labs is pretty good in this regard, and I wouldn’t put Second Life in the same realm as Microsoft or even Google. Mitch Kapor (of Lotus fame, and now Second Life’s major investor) recently gave a talk at the UC Berkeley ISchool about Second Life (link to podcast). He talked about how Linden Labs is doing much to open up its infrastructure, and has “open sourced” both its client and will do so soon with its backend infrastructure software. Others will soon be able to run a Second Life server on their own. I think portability of the data in virtual worlds makes using Second Life and investing some effort in playing with it much more worth while and less risky.

In any event, while reliance on any one system is probably dangerous, there are good immediate and practical reasons for avoiding such digital mono-culture. Certain systems are best for certain types of applications. Second Life is great for visualization, and offering rich and shared experiences. But it’s probably not the kind of thing I’d use to run a statistical analysis of pot-sherd distributions. That said, Second Life doesn’t have to do that, because Linden Labs is making it easier to integrate with systems that do offer such capabilities.

I think a lot of interesting things will happen in systems like Second Life (and GoogleEarth). However, I think the most interesting things will happen between and among such systems that work together as an ecosystem exchanging data. The capability to draw upon a diverse array of powerful web services (delivering XML-encoded data, or similar formats like JSON) from data providers such as Nabonidus, Open Context, Freebase, GoogleDocs, the Portable Antiquities Scheme and others.

Of course, all this leads directly into standards questions. I tend to favor simple, incremental (or “gracefully degradable”) standards, since this approach seems like the most feasible way of exchanging at least some data. I’ll write some more on the standards question shortly.

At the last ASOR (American Schools of Oriental Research), Gary L. Christopherson (University of Arizona) gave an interesting talk called “‘Google’ Archaeology: data and applications for everybody”. The talk discussed the huge and under-recognized impact Google is having in archaeological research. Google continues to add ever more free services, ranging from search, book-scanning (but with controversy), mapping, visualization, and “software-as-service” applications (office-suite tools called GoogleDocs). Without us really noticing, larger and larger chunks of our research activities are mediated by Google.

Where is this going? We should probably worry about being so dependent on one behemoth commercial service provider. Siva Vaidhyanathan has a fascinating blog “The Googlization of Everything” that takes a critical look at Google’s immense power in our society and economy.

Because Google is such a force, and something of an enigma, rumors and questions about its ambitions and intents flourish. Some of these rumors are fed directly by statements by Google’s leadership, such as when Larry Page told an audience at last year’s American Association for the Advancement of the Sciences meeting that Google was working on developing an Artificial Intelligence, and will do it on a “large scale”. Sergey Brin is reported to have said that the perfect search engine would “look like the mind of God“. Similar ideas, but less extravagantly worded, have from from Marissa Mayer, Google’s VP of Search Products and User Experience when she talked about how Google’s massive data stores and sophisticated algorithms are acting more and more like “intelligence”.

Pretty heady stuff.

I really don’t know where “Moore’s Law” and other rapid technological changes are taking us. Some of the ideas seem really extreme (see the so-called “Singularity“). But, I’m not a computer scientist or artificial life researcher, so I can’t dismiss these ideas out of hand, though I strongly suspect things will not work out in ways expected by starry-eyed futurists or techno-determinists.

What seems far more likely about Google’s statements in this area, is that they help fuel a mystique about Google as an unstoppable force that will shape the future. Who can contend with them if they have irresistible technologies on their side? It is powerful marketing, even if Artificial Intelligence remains 20, 2000, or 2 million years in the future, or always in the future.

But what seems absolutely clear is that all “digital archaeology” is done now in reference to Google. For better or worse, it will continue to shape archaeological cyberinfrastructure, research and education into any future I can see.

In a fascinating discussion of Google and it’s AI ambitions on Edge, George Dyson closed his article by quoting Simon Ings, and it seems fitting to do the same here:

“When our machines overtook us, too complex and efficient for us to control, they did it so fast and so smoothly and so usefully, only a fool or a prophet would have dared complain.”

This quote applies to Google, with or without “AI”. Its services are simply too useful and powerful. Hopefully, we’ll not all be fools for letting archaeology (not to mention other facets of our lives) become “Googlized”.

I have a short article over at iCommons about proposed Egyptian legislation to copyright antiquities. Here’s a link.

Shawn Graham over at the “Electric Archaeology” weblog has a post asking about the use of 2nd Life to teach archaeology. There is a UC Berkeley Catalhoyuk reconstruction in 2nd Life now, intended to be a teaching resource (it won an “Open Archaeology Prize“). He has some very interesting ideas about linking archaeological databases dynamically with the virtual world.

I think it’ll be really useful to connect Second Life with different archaeological databases for visualization. 2nd Life does support connections with other online data sources, or web services, (see link). I’ve never done any programming in Second Life, so I’m not sure what sorts of limits the system has in reading outside data.

At any rate, outside databases would have to express data in a machine-readable format so the Second Life scripting language could parse the information. XML is an obvious choice, but there needs to be lots of thought on how to apply it to support Second Life visualization.

Most archaeological datasets that I’ve seen don’t have enough spatial information to make an easy and precise mapping into a virtual world. For example, many finds are in “bulk find” category, and you’ll only know their spatial context approximately (from say from a specific contextual unit). The contextual units, their size, shape, and relative positioning may be very poorly recorded and documented. Thus, rendering in Second Life will require lots of guestimation.

Shawn mentions Open Context in his post as an example data source. Open Context does make XML data available for all media, locations & objects, and for its faceted browse. Examples:

(1) Here’s a link to XML data for all small finds from Petra that have pictures (from the faceted browse).

(2) Here’s a link to XML data for a specific sheep radius from Petra.

(3) Here’s another link to XML data for an elephant capital also from Petra.

Although there’s contextual information, the contexts don’t have very clear spatial referencing, so it’ll be hard to simply put these data into a good Second Life 3D view. Having some clear common standard for spatial referencing in 3D will be really useful, as well as clear conventions on how to visualize archaeological data when detailed spatial referencing isn’t available.

Sarah Kansa just alerted me to an outrageous article in Time Magazine advocating the antiquities trade. She found a link to it from the SAA’s website and critique. Here’s a little excerpt:

The good news is that it is possible for the individual investor to buy antiquities — and for a surprisingly moderate sum. According to John Ambrose, founder and director of Fragments of Time, a Boston-area antiquities dealer, they’re within even a modest investor’s reach. “For under $10,000 a year you could acquire two to four quality objects with good provenance that you could expect would not only hold their value but increase in value over time,” he says. In the past, the increase was anywhere from 8 to 9% annually, but in recent years that figure has gone up.

Ugh. As if that’s the whole story. Not even a hint at the larger external costs and widespread destruction that is part of this trade. The article is particularly sad given all the devastation going on now to Iraqi sites, as it opens with an account of the market value of a Mesopotamian figurine.

Time is one of the flagships of the mainstream media. But the beauty of the blogosphere, is that even a niche community like archaeology can also have a large voice and confront this kind of outrageous “reporting”. This could a good time for the now fairly large archaeological blogosphere to flex a little muscle on this important issue. This is well worth some strident commentary back to Time to let them know about what’s missing in their fawning account of the antiquities trade.