September 2006

Every so often, one runs into something truly exciting. OpenRecord clearly falls into this category. It is still in early in its development, but it seems clear that this is a project worth watching.

OpenRecord is an open sourced initiative building a collaborative “wiki”-like tool for loosely structured database content. This will combine the analytic capabilities of database with community content development seen in standard text-based wikis. The highly generalized and loose data structure (defined by the Dojo foundation, and similar to RDF) can be applied widely, including for many archaeological databases. Multiple users can add and edit content and establish relationships between many database items by using a very attractive AJAX-based user interface that runs on standard web-browsers. Using a system like this can make it easier for researchers to collaborate (by creating, pooling, and editing content) in the course of a research project.

OpenRecord can also be a way to pool and share data from several projects. For these kinds of applications, a highly generalized data model is both useful and a challenge. It’s not clear what sort of global search and query capabilities that OpenRecord will support, based on my reading of their development documentation. It looks like OpenRecord can accommodate most archaeological data, but without some way to relate (more specifically) the meaning of these different datasets together, each project dataset would still be something of an island unto itself, and it may be hard to navigate.

In contrast, OCHRE and Open Context (both based on ArchaeoML, another item-based data model) have a bit more structure to help guide browsing and querying of pooled content.
The OCHRE project is working on some sophisticated was of relating descriptive properties from different projects (see this discussion), and Open Context is experimenting with a user-tagging system that draws linkages between items in different projects (see example tag). Tools to relate different OpenRecord database items with one another will probably be needed to enable many research applications.

Take a look at the OpenRecord Screencast (if you have 22 minutes).

DDIG aims to facilitate digital dissemination of archaeological data. As a follow up on the recent post about the Evergreen Cemetery, I interviewed Gregory Vogel about this project and its development. Hopefully, this kind of discussion can help other archaeologists better understand some key issues in digital dissemination.

[Question] Why do you think it is valuable to put so much content up, with free and open access, on the Internet? What are your major concerns and worries about making so much research data available?

[GV] Let me begin with a caveat: I am a consumer and distributor of digital data, but certainly not an expert in the field! My background is primarily anthropological and archaeological. I have worked with computers and databases for some time now (I learned how to program in Basic and Fortran on the Grinnell College PDP-11/70 in Junior High), but I have only recently (within the past three years) learned how to put together a web page, and I’m still struggling with designing an on-line database interface. My comments here are drawn from personal experience with digital data dissemination, and not from any academic study of the topic.

I think the free and open access itself is the value of a great number of Internet resources. I’ve recently heard human beings described as ‘infovores’, or compulsive consumers of information. The Internet is a great source of information to feed that compulsion.

There are potential problems with this source, of course. Much of the ‘information’ on-line is not really information but data (un-interpreted facts), and much of the information is simply wrong. This is not unique to the Internet, of course, but because data and information can be shared more quickly and in much greater volume than in other formats (books, for example), these problems may be compounded.

I also worry about issues larger than digital data itself. I think there may be a growing ‘digital data divide’, as on-line information is only free and open to people with Internet access. I don’t have hard data on this, but I suspect that this currently accounts for less than 1/4 of the world’s population. If easy Internet access really is a valuable commodity (and I think most people would agree that it is), those of us who have it are gaining a competitive advantage over those who don’t. The information-rich get information-richer, while the information-poor…?

Still, I think the benefits of increasing information-sharing on-line outweigh any potential negative outcomes. I certainly hope so, because it doesn’t seem that the trend will be reversing anytime soon.

[Question] Your site has a great deal of student-developed content. How does this benefit student instruction? Do you have any suggestions on how best to incorporate student participation in the development of digital resources?

[GV] Students (and volunteers) generated most of the information on the Evergreen web site, but putting it on-line has not (yet) become an integral part of any class I’ve taught. So far, the education has taken place in the classroom and in the field (the cemetery) during recording sessions, and I’ve put the content on-line afterwards. This is partly because of the nature of the courses I’ve taught, which have been anthropologically- and archaeologically-oriented. Cemetery studies relate directly to these topics, but I’ve found it difficult to incorporate web development into the curriculum. Possibly this is because I’m very new to web development myself!

I think that students can participate more directly in developing on-line resources as part of their classroom education, and I’m currently working on ways to engage local high school students to do just that through programs here at the Center for American Archeology. Incorporating Information Technologies (IT) into education certainly seems to be a priority at many levels, and the NSF currently has multiple programs involving education and IT.

[Question] How is digital data dissemination different from formal print publication? What needs to be in place to best assure quality and high professional standards in digital dissemination?

[GV] Digital data resources of course hold potential for disseminating vast amounts of data (and information) very quickly. Digital data resources can also be much more dynamic than traditional print publications, with any mix of text, audio, and visual content, and links to related resources that are immediately available. Most of the Evergreen web site is essentially an ‘online book’ of static text and pictures, with the addition of links for navigation. The interactive map goes beyond that, but is so far quite limited, and I’m currently looking at other on-line resources for sharing geospatial information ” MapServer, in particular, looks promising.

I’m not sure if there is a way to ensure quality or professional standards of digital dissemination overall. Just as with books or other forms of media, digital resources need to be viewed critically by consumers. We have systems set up to provide some level of quality control over particular publications, such as the peer-review process, but even this don’t guarantee that the information is complete or even correct. A basic set of guidelines or standards for archaeological data would be appropriate, to ensure that we can at least share digital data in compatible formats, but I don’t think such a set of standards is in place.

Aside from the issue of quality, there is a potentially disastrous problem with the diversity and rapid turnover of digital data formats. I wrote a little about this in a paper for Southeastern Archaeology (Vogel, Kay and Vogele, 2005, 21:28-45) ” the article concerns the profile of a prehistoric mound that we recorded as a digital photo-mosaic. The reviewers asked us to write about preserving the digital record we had created. I couldn’t find much written about this, particularly from an archaeological point of view. I came up with a few recommendations for preserving digital images, including incorporating metadata into the files themselves, storing them in non-proprietary formats, etc. The most important point I tried to make, however, is that methods must be put in place to ensure the upkeep and migration of the files to current formats and storage media. Digital data cannot be passively curated like artifacts. The purpose of archaeological curation is to ensure that the objects are not altered by the storage techniques, but for digital data the files must be altered to ensure that they are retrievable by modern hardware and software.

[Question] What were the main technical hurdles you encountered when developing this resource?

[GV] The main challenges have dealt with putting the information in a useful format on-line. From student- and volunteer-generated field maps and recording forms I’ve been able to generate GIS coverages and a useful database fairly easily (students have helped with these efforts too, through independent study courses). Operationalizing these in a user-friendly, useful way on-line has been very difficult, though. This is of course due to my inexperience in this area. We received a grant from the Arkansas Humanities Council to have the University of Arkansas’ Center for Advanced Spatial Technologies (CAST) serve as consultants for this, and CAST has been a great help, but I still have quite a bit of work to do.

I think it is important to put the information on-line in a user-friendly fashion, because many of the people who are interested in cemeteries (genealogists, for example, and local historians) may not be particularly computer-savvy, and won’t spend a great deal of time trying to understand a complicated search interface.

[Question] Your site is very rich in content, but the terms of use are somewhat ambiguous. Would you have any objection to using the popular Creative Commons licenses ( to clarify permissions to use this content?

[GV] I think this is an important point you raise, and one that I hadn’t thought about at all! Since you brought this up, I’ve begun looking into the Creative Commons licensing system, and it certainly seems appropriate. It’s hard to tell how easy it would actually be to enforce such on-line licensing, but I think it may be a useful system simply to clarify the intended terms of use.

[Question] The Digital Data Interest Group is a new interest group in the SAA. What role do you think it should play to promote and enhance this kind of work?

[GV] I think the DDIG web site is a great platform for sharing ideas about digital data in archaeology, and the DDIG may be able to expand its role at conferences and in print. A DDIG session or two at the SAAs or regional conferences may generate quite a bit of interest and help disseminate and crystallize ideas, and potentially lead to a DDIG publication of guidelines on digital data specific to archaeology.

We have formal guidelines or recommendations for a great many things in archaeology, and standardized ways of going about archaeology that help us gather and share data: we generally dig test units in metric increments, describe soil colors using the Munsell soil-color charts, use the same broad typological categories for artifacts, etc. None of these are perfectly standardized in the way they are set up or implemented, of course, but we at least have common starting points to begin with. I don’t think we have a common starting point for sharing digital data in archaeology. If the DDIG were able to generate a set of guidelines that the archaeological community as a whole could agree upon and actually implement, this would be a great service to the profession.

Thanks to Gregory Vogel for the discussion!

John Loomis of CyArk, a nonprofit organization dedicated toward “Preserving World Heritage Sites through collecting, archiving and providing open access to data created by laser scanning, digital modeling, and other state-of-the-art technologies“, recently circulated an important announcement of interest to DDIG members.

Their team has made a major contribution to data dissemination of significant world heritage sites. The new “CyArk 3D Heritage Archive” is a beautifully designed, highly professional, and easily navigated website with an impressive collection of media resources. All the media have clear spatial referencing and are easy reached through a map-based user interface. Professionals will eventually have access to underlying data files and other resources important for cultural and architectural resource management.

Most of their emphasis is on historic and architectural preservation, especially through laser scans, digital modeling, and other digital media documentation. These types of documentation have growing importance in many areas of archaeological research. Organizing these resources in an attractive and easily used framework is an impressive accomplishment, and can and should be combined with efforts to organize and share other important aspects of archaeological documentation, including excavation notes and analyses of finds.

Of equal importance to their technical accomplishments is their recognition that open access and open licensing adds value to the media documentation that they offer. The open terms of Creative Commons licenses enable users to incorporate these high-quality media items into new original works. Use of Creative Commons licenses makes sense, after all appeals to the “public interest” justifies both funding and preservations laws and policies that support archaeology and historical preservation. These licenses help ensure that the members of the public can draw value and creative inspiration from the documentary record of the past.