I’m putting up a brief note to see if “pingback” features in the Open Context website are working, and if so, if they are useful. Here’s an example:

An arrowhead from Petra.

Would blog links to “raw” archaeological data be useful? How many excavations maintain blogs, and if they do, would back and forth linking between a weblog and a archaeological data resource like Open Context help researchers interpret their observations?

I’ve been working on some citation issues in Open Context. One thing on the immediate horizon is implementing WebCite on Open Context. If all goes well, we should have the Open Context citation button send a copy of the cited web-page to WebCite for archiving and thereby gain a truly stable-URL that will make Open Context content retrievable even if the cite disappears. It’s not ideal, since it only archives the XHTML version of the cited page, but perhaps some added markup can be added to convey more structure.

Also, Stuart Campbell (University of Manchester) send some useful suggestions on how to cite sub-selections of data from a larger corpus. Together with co-director Elizabeth Carter (UCLA), he’s already contributed a substantial portion of his work at Domuztepe (Halaf period site in Turkey) to Open Context. The Domuztepe crew is in the midst of a big publication push and plan to add much more data. Thus the citation issues are becoming more pressing.

In practice researchers will rarely want to cite an entire 10-year excavation dataset generated by a large team of specialists researchers. They’ll want to cite parts of such datasets, ranging from an individual specialist analysis to a selection of items that may span across different specialist datasets but may still not encompass an entire project dataset. People may also want to cite subsets of specially selected data from several different projects.

All this makes citation issues very complicated. Who do you credit and how? Project directors should be credited, but so should individual specialists, and even lowly trench supervisors who make observations in the field. You can quickly gain a very long list of people who need some form of citation credit.

In addition, some uses of other people’s data can be quite sophisticated and should see recognition also. If you systematically go through the effort to comb through other people’s datasets, and attempt to interpret and select items among them, you can be actively doing significant research. Your selection of data should be credited (or blamed) to you, since this activity highlights items of interest and interpretive value and can clearly contribute to knowledge creation. People using and selecting sets of data from different projects and collections should also be cited.

All of this is hard to convey in typical citation conventions. I think it’s time to get some conversation about these issues since my poor brain hurts thinking about this.

Tom Elliot (Pleiades Project) sent me a link to a pretty hilarious discussion attempting to place archaeologists into a taxonomy based on their data sharing habits.

Tom self identifies as a “cranky space monkey“, and points to Bill Carahrer who thinks of himself as a squirrel. This was all touched off by Charles Watkinson who said that “grey panthers” (tenured people at the top of their field) are far more likely to experiment with total data transparency than would struggling junior faculty or graduate students.

Of course, Watkins has a good point, and has some more good thoughts about ways to link data publication with narrative publication. Sebastian Heath added some interesting discussion about back-and-forth linking between primary data and published narratives. I’ve been thinking about these issues too, and am working with my colleague Erik Wilde on a (hopefully) elegant approach to the issue based on his work on Linkbases. We’ll try to have something to publicly demo in the next few months.

Back to the taxa. In general, I also think that “grey panthers” are more likely to publish data than junior scholars, because junior researchers have more reason to be risk adverse. That said, like most things, there are plenty of exceptions. Some senior people may have excellent publication records but have shoddy field documentation and don’t like the idea of transparency. Some junior people act very openly with their material. Open Context has a mixture of datasets contributed from very prominent “grey panthers” (see Petra) and junior researchers who like this opportunity to advertise the quality of their research (see Justin Lev-Tov’s zooarch analysis of Hazor material).

As far as my own taxonomic self-identification, that’s a hard question. Open Context has been my main project for some time now, and its main aim thus-far has been to validate a common data model with lots of eclectic stuff (though we’re transitioning over to doing more thematic collection building). I’ve been eclectic and opportunistic in building Open Context content (and refining schema mapping processes etc.) with whatever people want to provide.

So I guess that makes me something like an Eastern Bluebird, since they build nests out of whatever is handy.

Hi Everyone.

This has nothing to do with archaeology, but I couldn’t help but to note this interesting April 1st development. It’s something of a follow up to my earlier posts on Google and its ambitions here and here. Please take a look at this short video by Google’s founders, Larry Page and Sergey Brin:

You need to a flashplayer enabled browser to view this YouTube video

That’s right. They claim to be teaming up with Virgin Galactic to colonize Mars. Here’s Richard Branson on the “project”:

You need to a flashplayer enabled browser to view this YouTube video

They’re calling it an “Open Source Planet”. The funny thing about this April fool’s joke is that it comes from a wildly ambitious and seemingly unstoppable firm (however, note that even Google seems to be constrained by market forces). Given their other goals, colonizing Mars almost seems like business as usual for Google.

The annual Society for American Archaeology (SAA) conference in Vancouver is fast approaching and I wanted to send an announcement about forums and sessions that will be of interest to our members. Please remember that the DIGITAL DATA INTEREST GROUP MEETING will be held on Thurs March 27 6:00 - 7:00pm  in the Hyatt Cypress Room.

Below are other digital data related events at the SAA meeting. Please be sure to look at the posters because fantastic work will be presented there as well, with the added benefit of greater interactivity and discussion with individual researchers. If I’ve missed one, please let me know and I will circulate:

Thanks!
-Eric

Thurs March 27, 1:00pm  SYMPOSIUM: International Curation Standards: What’s Working, What’s Not

Thurs March 27, 1:00pm  SYMPOSIUM: Geophyiscal Archaeology at World Heritage Sites

Thurs March 27, 3:00pm  FORUM: Digital Antiquity: Planning an Information Infrastructure for Archaeology

Thurs March 27, 3:15pm  SYMPOSIUM: Advances in Methodology: Survey Techniques, Computer Use and Interpretation
Fri March 28, 12:45pm  SYMPOSIUM: Web 2.0 and Beyond: New Tools for Collaboration and Communication

Sat March 29, 8:00am  FORUM: Modeling Paleoindian Sites and Assemblages: PIDBA
(Paleoindian Database of the Americas) and Other Approaches

Sat March 29, 10:15am  FORUM: Converging Communities in Digital Heritage

Sun March 30, 8:00am  SYMPOSIUM: Southwest Heritage: Strategies for Managing and Preserving Cultural Resources

Sun March 30, 10:45am  GENERAL SESSION: Computer Modeling and Simulation

Archaeoinformatics Lecture Series 2008

The Archaeoinformatics Consortium is pleased to announce the participants in the 2007-2008 Virtual Lecture Series schedule. The Virtual Lecture series involves leaders from around the world and many disciplines who each will be presenting information on their cyberinfrastructure initiatives and strategies and the ways in which their lessons learned may be useful to archaeology. In addition there will be presentations from archaeologists describing their successful cyberinfrastructure efforts.

These lectures are presented every other week using the NSF funded Access GRID video conferencing system. Many universities across the US, UK and Australia have Access GRID or compatible facilities. It is also possible to participate in the lectures by downloading the presentation slides and participating via a telephone bridge. Information on how to connect to the Access GRID system and alternatives are provided at http://archaeoinformatics.org/lecture_series.html. The lectures from the 2006-2007 series and this year’s lectures are also available as streaming video from the archaeoinformatics web site.

Archaeoinformatics.org

Archaeoinformatics.org, has been established as a collaborative organization to design, seek funding for, and direct a set of cyberinfrastructure initiatives for archaeology. Archaeoinformatics.org seeks to coordinate with and, develop interoperability of its own projects with other relevant data-sharing initiatives. It offers to work with professional organizations and federal agencies to promote policies that will foster the development of cyberinfrastructure for archaeology. More information is available at http://archaeoinformatics.org

Lecture 7
February 27, 2008, 10:30-12:30 CST

“Collaborative Adventures in Distributed Digital Preservation: The MetaArchive Cooperative and the Educopia Institute ”

Katherine Skinner
Digital Projects Librarian at the Emory University Libraries

The challenges presented by the concept of digital preservation require and have inspired a number of institutions to work cooperatively in order to accomplish meaningful programmatic advances. Among these collaborative ventures, the MetaArchive Cooperative, established in 2004, has developed an organizational model and technical infrastructure (building on the LOCKSS software developed at Stanford University) for preserving the digital assets of archives, museums, data centers, and libraries in a geographically distributed framework. This lecture takes as its focus some of the strategies that the MetaArchive Cooperative has employed in order to support, sustain, and grow its cross-institutional collaboration. During the session, Katherine will explore some of the logistical and organizational issues that have arisen for the Cooperative over the last four years and will talk more generally about the strengths of different organizational structures for accomplishing particular goals.

The MetaArchive Cooperative (http://metaarchive.org) began in 2004 as a collaborative venture of Emory University, Georgia Institute of Technology, University of Louisville, Virginia Polytechnic Institute and State University, Auburn University, Florida State University, and the Library of Congress. The MetaArchive Cooperative has operated a distributed preservation network infrastructure for several years that is based on the LOCKSS software, and has now transformed into an independent, international membership association hosted by the Educopia Institute and based in Atlanta, Georgia.

Lately I sometimes wonder if I should just devote this entire blog to Google related posts, since Google continues to reshape archaeology and just about every other area of the sciences, social sciences, and humanties.

The Wired Science blog has a post about the impending launch of Google’s latest move into the world of scientific research. I think it’s called “Palimpset”. The Wired piece also links to this blog, “Pimm”, which has a presentation about this project available on Slideshare. Pimm’s blog said that this project is strictly nonprofit (I can’t find any confirmation of this, and it doesn’t look like it’s part of Google’s foundation, Google.org).

Anyway, this is very exciting. With Open Context, we’ve been working with the Internet Archive, and have started discussions with Metaweb for using Freebase for broader data sharing. Having access to Google’s tremendous infrastructure is also very welcome.

Now for some caveats on this good news.

Google is really, really powerful, and despite its stated policy of not being “evil”, I can’t help but to wonder if moving all our research and lives into the Goolgeverse is really a good idea. I also wonder if this program’s “nonprofit” status will change and what guarantees exist that they will perpetuate the data freely and forever. Companies, even giants like Google, do have a habit of coming and going (creative destruction and all that.) They can also change their policies if the bottom-line is suffering.

The open licensing of the data is a good thing though since it means the data can be in multiple places (offering redundancy). However, few other institutions have Google’s capacity to handle so much data, so even if the data is “open” there may not be many other places to put it (except for little chunks of data). Science (and the humanities) just keeps on creating ever larger and larger datasets. Even though storage costs are in decline, the community’s requirements for handling massive amounts of data keep expanding. I don’t see many other institutions that can match Google in offering this kind of service for “free”. It probably would be better for us all to have more than one organization with the kind of infrastructure needed to support this as a free service.

We’ll see. At anyrate, I’m looking forward to learning more about this.

Update: Peter Suber also discusses this announcement, and links back to this post announcing an earlier incarnation of Google’s science data program.

Looking at the blog Ars Technica, I ran into a post reviewing an interesting report by the British Library and JISC. The report looks as Internet usage patterns of young people born after 1993 (Side note! 1993! I met my WIFE that year! I’m feeling old…). The aim of the report is to help guide development of digital library services.

The report details how young people, while comfortable with technology, are by no means always “expert users”. Relatively simple search forms most widely used, and “advanced search” functions see comparatively little use. According to the report:

Users make very little use of advanced search facilities, assuming that search engines
`understand’ their queries. They tend to move rapidly from page to page, spending little time reading or digesting information and they have difficulty making relevance judgements about the pages they retrieve.

That’s interesting. It suggests that most young people have big expectations for getting relevant information from a simple text box form. I suppose that’s even more motivation for more intelligent natural language search. Academic repositories may want to look at Powerset, if they come up with search tools (like Google) that you can install on your own sites.

Ars Technica also noted this report claims that “authority” is not dead for the Google Generation. This should give some comfort to professional scholars who worry that students will uncritically believe everything they see on the Web and won’t pay attention to traditional mechanisms for validating information (peer review, credentials, quality of sources, etc.).

Anyway, there’s much more to this report. Dig away!

My colleague Erik Wilde is organizing a workshop on Location and the Web. I’m helping to organize and have already hit some of the email lists with a call for papers. The types of questions explored by this workshop will be directly relevant to researchers interested in using GoogleEarth or Second Life for visualization and analysis (for instance). Here’s his call for papers:

the paper submission deadline for the First Workshop on Location and the Web (LocWeb 2008) is only 18 days away. we now have a pretty stong program committee, and i am looking forward to the submitted paper and of course the workshop itself.

so if you are interested in location information and the web, please consider submitting a paper. the workshop is held in beijing and co-located with WWW2008, the 2008 edition of the world’s premier conference in the area of web technologies.

my personal hope for the workshop is that we will be able to get strong submissions in the area of how to make location information available as part of the web, not so much over the web. there are countless examples of applications with location as part of their data model, which are accessible through some web interface, but there are far less examples of applications which try to turn the web into a location-aware information system. the latter would be the perfect candidate for the workshop.

Shawn Graham got the ball rolling with his discussion of applying Second Life as an instructional platform for archaeology. It seems to have had some resonance with other archaeo-bloggers (see ClioAudio, and ArchaeoGeek). ArchaeoGeek noted some fascinating work attempting to link GIS-type capabilities in Second Life. They even have an elaborate model of downtown Berkeley, including BART station.

Shawn also rightly discusses some concerns that people have voiced. These comments show some worry that we’re in danger of putting our data eggs in one basket, aand becoming dependent on yet another commercial platform (as in my previous discussion of Google, and how much we’ve come to rely on it). Given all the data preservation problems caused by closed-proprietary file formats and software, these are valid issues.

However, Linden Labs is pretty good in this regard, and I wouldn’t put Second Life in the same realm as Microsoft or even Google. Mitch Kapor (of Lotus fame, and now Second Life’s major investor) recently gave a talk at the UC Berkeley ISchool about Second Life (link to podcast). He talked about how Linden Labs is doing much to open up its infrastructure, and has “open sourced” both its client and will do so soon with its backend infrastructure software. Others will soon be able to run a Second Life server on their own. I think portability of the data in virtual worlds makes using Second Life and investing some effort in playing with it much more worth while and less risky.

In any event, while reliance on any one system is probably dangerous, there are good immediate and practical reasons for avoiding such digital mono-culture. Certain systems are best for certain types of applications. Second Life is great for visualization, and offering rich and shared experiences. But it’s probably not the kind of thing I’d use to run a statistical analysis of pot-sherd distributions. That said, Second Life doesn’t have to do that, because Linden Labs is making it easier to integrate with systems that do offer such capabilities.

I think a lot of interesting things will happen in systems like Second Life (and GoogleEarth). However, I think the most interesting things will happen between and among such systems that work together as an ecosystem exchanging data. The capability to draw upon a diverse array of powerful web services (delivering XML-encoded data, or similar formats like JSON) from data providers such as Nabonidus, Open Context, Freebase, GoogleDocs, the Portable Antiquities Scheme and others.

Of course, all this leads directly into standards questions. I tend to favor simple, incremental (or “gracefully degradable”) standards, since this approach seems like the most feasible way of exchanging at least some data. I’ll write some more on the standards question shortly.

Next Page »