DDIG members may be interested in learning more about Omeka, a simple and open source collections / content management application developed at George Mason University. I took part in using Omeka as the basis of the “Modern Art Iraq Archive” (MAIA). In this particular case, we used Omeka to publish a collection of modern art lost, looted, or destroyed during the US invasion. The same software can be very useful to publish small archaeological collections, particularly since Omeka has an active user and developer community that continually makes new enhancements to the application.

For a bit of background, MAIA started as the result of a long-term effort to document and preserve the modern artistic works from the Iraqi Museum of Modern Art in Baghdad, most of which were lost and damaged in the fires and looting during the aftermath of the 2003 US invasion of Iraq. As the site shows, very little is known about many of the works, including their current whereabouts and their original location in the Museum. The lack of documents about modern Iraqi art prompted the growth of the project to include supporting text. The site makes the works of art available as an open access database in order to raise public awareness of the many lost works and to encourage interested individuals to participate in helping to document the museum’s original
and/or lost holdings.

The MAIA site is the culmination of seven years of work by Project Director Nada Shabout, a professor of Art History and the Director of the Contemporary Arab and Muslim Cultural Studies Institute (CAMCSI) at the University of North Texas. Since 2003, Shabout has been collecting any and all information on the lost works through intensive research, interviews with artists, museum personnel, and art gallery owners. Shabout received two fellowships from the American Academic Research Institute in Iraq (TAARII) in 2006 and 2007 to conduct the first phase of data collection. In 2009, she teamed with colleagues at the Alexandria Archive Institute, a California-based non-profit organization (and maintainer of this blog!) dedicated to opening up global cultural heritage for research, education, and creative works.

The team won a Digital Humanities Start-Up Grant from the U.S. National Endowment for the Humanities to develop MAIA.

The annual report of DDIG is due to the SAA on Friday, February 4th. They ask for a “report on interest group/representative activities” and “action items” to be included. The text below is merely a report, I am not presently planning to submit any action items. I’m glad to take comments on the DRAFT text below, and proposals for action items.

The Digital Data Interest Group (DDIG) had a productive year in 2010. The expansion of information and communication technologies (ICTs) creating digital data from archaeological practices appears to continue at a rate relatively equivalent or higher than that generally found within the social sciences. This year saw the publication of a number of items pertaining to digital data use in archaeology in SAA periodicals. The annual SAA meeting in Saint Louis contained a variety of symposia and general contributions specifically pertaining to the implications of ICTs and digital data in archaeological practice, including a DDIG-sponsored digital symposium. This report will address SAA activities related to DDIG, and then provide a general assessment of digital data developments in general with the potential to affect American archaeology as construed in the SAA mission statement.

SAA periodicals published several items this year directly addressing digital data practices. Three items were in the SAA Archaeological Record, “Transforming Archaeological Data into Knowledge” (McManamon and Kintigh 2010), “Fieldwork in the Age of Digital Reproduction: A Review of the Potentials and Limitations of Google Earth for Archaeologists” (Meyers 2010), and “Open Context in Context: Cyberinfrastructure and Distributed Approaches to Publish and Preserve Archaeological Data” (Kansa 2010). One article was published in American Antiquity, “Computational Modeling and Neolithic Socioecological Dynamics: A Case Study from Southwest Asia,” (Barton et al. 2010). A number of articles in both publications also indirectly included ICTs and digital data as part of their subject matter.

The 2010 SAA meeting in Saint Louis contained a large amount of activity related to ICT and digital data use. This included sessions on the Digital Archaeological Record (tDAR), digital curation, and digital publishing. The DDIG-sponsored digital symposium at the SAA, “Practical Methods of Data Production, Dissemination, and Preservation,” contained 13 individual presentations created by 27 contributors; the symposium highlighted results-driven applications of digital data management undertaken by DDIG members which could serve as examples of best practices in the field. Outside of specific symposia, at least 13 other presentations and posters appeared at the meeting with direct focus on ICT and digital data practices in their titles and abstracts. Although not part of the SAA, the

This past year saw the emergence of two important developments on the subject of digital data, with the potential for profound influence on archaeological practice: (1) The National Science Foundation (NSF) implemented a requirement for a data management plan to be included with all proposals beginning January 18, 2011; and (2) the White House Office of Science and Technology Policy (OSTP) called for commentary concerning how a new policy similar to that of the National Institute of Health (NIH) might be generally constructed for other agencies, creating a requirement for public access to data resultant from publicly-funded research.

The NSF requirement, now active, will have an immediate effect on archaeological practice in that all proposal writers now must make their data management plans explicit in less than two pages. This is a generally positive development. In order to help mitigate the most onerous step for proposal writers, the NSF has proactively suggested (but not required) that proposal writers avail themselves of the expertise of two non-profit, research organizations run by DDIG members, the Digital Archaeological Record (tDAR), and Open Context. However, in order to realize the best benefits of this new requirement and subsequently derived practices in other funding and regulatory agencies, the American archaeological community will need to engage in a substantive dialogue about data management standards, ethics of data sharing, and citation practices. This is not a call for prescribed, one-size-fits-all requirements, but for recognition of the fact that the ongoing development of open community standards takes explicit work in order to keep researchers from producing data management plans with low levels of interoperability.

The OSTP call for comments closed on January 21, 2010. The last update on the subject was March 8, 2010, which indicated that input was still being reviewed. Five PDF files are available on the OSTP website , with the contents of emails and other materials sent in response to their call. Comments from the archaeological community included a generally supportive letter co-authored by DDIG member Francis P. McManamon, executive director of tDAR, which also recognized the need for some measure of disciplinary cohesion around to derive benefits from such openness. Similar statements were made by many commenters representing a wide swath of the sciences and humanities. This OSTP initiative also will raise significant questions about what constitutes proper citation, and other recognition of contributions made by previous researchers, in professional reports of new findings involving curated public data.

The expansion of professional outreach and communication on digital data issues remains a top priority in DDIG. Such expansion is devoted to development of greater awareness within the SAA community of the ways in which ICT use and resultant digital data both structure work while simultaneously creating new affordances. The ability to capitalize on these new affordances is increasingly dependent upon the development of recognized data standards and (note: not necessarily mandated) collaborative networks of users (researchers, managers, educators, etc.). The position of tDAR and Open Context as institutional points of reference will be exceedingly valuable in the near- and medium-term. However, without the appearance of a more engaged community of archaeological data practitioners in the medium- to long-term, the expansion of broad efforts like those at the NSF and OSTP may not be highly beneficial. Similarly, in order to ensure that the population of archaeological practitioners remains prepared to create and maintain interoperable data sets and standards it is time for disciplinary conversation and concerted action on what constitutes appropriate technical training at various levels of educational development.


Joshua J. Wells, Ph.D, R.P.A.
Convener, Digital Data Interest Group of the Society for American Archaeology
Assistant Professor of Social Informatics
Department of Sociology and Anthropology
& Department of Informatics
Indiana University South Bend

Just a quick note at the start of this holiday week. I have been remiss about posting about the SAA Archaeological Record, an open access publication for SAA members. Over the past year, they have published a couple of papers about digital data preservation and access in archaeology. These include:

  McManamon, Francis P., and Keith W. Kintigh (2010) Digital Antiquity: Transforming Archaeological Data into Knowledge. The SAA Archaeological Record 10(2):37–40.
  Meyers, Adrian. (2010) Fieldwork in the Age of Digital Reproduction: A Review of the Potentials and Limitations of Google Earth for Archaeologists.  The SAA Archaeological Record 10(4):7–11.
  Kansa, Eric C. (2010) Open Context in Context: Cyberinfrastructure and Distributed Approaches to Publish and Preserve Archaeological Data. The SAA Archaeological Record 10(5):12–16.

If I missed any, please let me know and I will update this post! Thanks!

Clifford Lynch drew my attention to “an announcement from the UK Royal Society indicating that in celebration of Open Access week they were opening their entire journal archive for free access till the end of the society’s 350th anniversary year, 30 November 2010. This is a great opportunity to get access to two issues  of Philosophical Transactions of the Royal Society A from August and September 2010 which focus on E-science and contain a number of outstanding papers. See and

A few examples:

  • “Methodological commons: arts and humanities e-Science fundamentals” (abstract and pdf);
  • “Deploying general-purpose virtual research environments for humanities research” (abstract and pdf);
  • “Use of the Edinburgh geoparser for georeferencing digitized historical collections” (abstract and pdf);
  • “Adoption and use of Web 2.0 in scholarly communications” (abstract and pdf);
  • “Retaining volunteers in volunteer computing projects” (abstract and pdf).

figure from “Use of the Edinburgh geoparser for georeferencing digitized historical collections”

From Chuck Jones comes the following announcement:

Today is the first day of Open Access Week (October 18-24, 2010).  I am happy to announce, in collaboration with Phoebe Acheson of theUniversity of Georgia Libraries and  Becoming a Classics Librarian, the debut of a new blog: Ancient World Open Bibliographies.  Please have a look and consider subscribing by RSS or email.

This new blog is for discussion and development of a project to collect and solicit annotated bibliographies about subjects relevant to studies of the ancient world.  It is the first concrete step to come from a conversation Pheobe and I began after she posted  Oxford Bibliographies Online: More Rant Than Review. In this blog we will collect existing open access bibliographies we find on the web, and discuss the goals, audience, format, and scope of the final project: a dedicated wiki site collecting bibliographies, which will be open access.

We welcome your participation in the conversation and your contributions to the collection of bibliographies.

I’m pleased to announce that the National Science Foundation (NSF) archaeology program now links to Open Context (see example here). Open Context is an open-access data publication system, and I lead its development.  Obviously, a link from the NSF is a “big deal” to me, because it helps represent how data sharing is becoming a much more mainstream fact of life in the research world. After spending the better part of my post-PhD career on data sharing issues, I can’t describe how gratifying it is to witness this change.

Now for some context: Earlier this year, the NSF announced new data sharing requirements for grantees. Grant-seekers now need to supply data access and management plans in their proposals. This new requirement has the potential for improving transparency in research. Shared data also opens the door to new research programs that bring together results from multiple projects.

The downside is that grant seekers will now have additional work to create a data access and management plan. Many grant seekers will probably lack expertise and technical support in making data accessible. Thus, the new data access requirements will represent something of a burden, and many grant seekers may be confused about how to proceed.

That’s why it is useful for the NSF to link to specific systems and services. Along with Open Context, the NSF also links to Digital Antiquity’s tDAR system (Kudos to Digital Antiquity!). Open Context offers researchers guidance on how prepare datasets for presentation and how to budget for data dissemination and archiving (with the California Digital Library). Open Context also points to the “Good Practice” guides prepared by the Archaeology Data Service (and being revised with Digital Antiquity). Researchers can incorporate all of this information into their grant applications.

While the NSF did (informally) evaluate these systems for their technical merits, as you can see on the NSF pages, these links are not endorsements. Researchers can and should explore different options that best meet their needs. Nevertheless, these links do give grant-seekers some valuable information and services that can help meet the new data sharing requirements.

I came across a post in the Through the Kaleidoscope blog that got me thinking. “Crowd science – where masses of people participate in data collection for science projects – is growing … Astronomy is the area in which crowd science has been most frequently used, which makes sense given the field’s massive scale and large datasets. One example is the ten-year old SETI@home project …” I must admit here that I’ve been participating in the latter project since May 1999—which puts me in the 89th percentile of all 1.1 million SETI enthusiasts  :-)  I run the project using UC Berkeley’s BOINC, a commonly-used, multiplatform open-source program for volunteer computing and grid computing. BOINC facilitates running several projects at the same time according to selected settings. For instance, I’m also active in other projects: Einstein@home, MilkyWay@home (astronomy), (climatology), Rosetta@home, (medical research), SZTAKI Desktop Grid (math), Quake Catcher Network (seismology). At one time, I also participated in non-BOINC projects but that was too cumbersome. The BOINC projects have attracted a lot of creative programmers so that there are for example at least seven websites where you can easily access your statistics both by project as well as combined. Each project awards credits for work done, allowing cross-project comparison and combination of your “scores.” It all serves to involve the participants, make them feel invested. There is even a way to have important milestones in you efforts posted on your FaceBook account, e.g., on September 3, I passed the 6,000 credit milestone for

So what could we do with this crowd-sourced/distributed-computing approach in archaeology? After all, just like astronomy and medical research, we too have a lot of goodwill from the general public directed at us. There has to be a way to channel some of this. Surely, we can find some huge data sets that need processing and whose results can be appealing to a general audience? In the above blog post, another angle is also discussed, e.g., Galaxy Zoo, a project in which people help classify galaxies from Hubble Telescope images, a task that is hard to computerize. Some museums are letting the public tag artifacts online, a way to enhance the often-brief information available in the database (see the Steve Project). This is still primarily for art though, not archaeological artifacts. We all know that our budgets won’t increase in the near future, on the contrary. Let’s get creative!

There’s some thoughtful criticism and discussion about Chogha Mish in Open Context over at Secondary Refuse. I tried to post a comment directly to that blog, but blogger kept giving me an error, so I’m posting here. At least it’s nice to know other systems also have bug issues!

I very much agree with Secondary Refuse’s point about the difficulties associated with data sharing. Data sharing is a complex and theoretically challenging undertaking. However, the problem of mis-use and misintepretation is not something unique to datasets. Journal papers can and are misused both my novices and by even by domain specialists who fail to give a paper a careful read. Despite these problems and potential for misuse, we still publish papers because the benefits outweigh these risks. Similarly, I think we should still publish researcher datasets, because such data can improve the transparency and analytic rigor of analysis.

One of the points of posting the Chogha Mish data was that it helped illustrate some useful points about how to go about data sharing in a better way. If you see the ICAZ Poster associated with the project, there are many recommendations regarding the need to contextualize data (including editorial oversight of data publication). Ideally, data publication should accompany print/narrative publication, since the two forms of communication can enhance each other. Most of the data in Open Context comes from projects with active publication efforts, and as these publications become available, Open Context and the publications will link back and forth.

Regarding why we published these data, the point is to make these available, free-of-charge, and free of copyright barriers for anyone to reuse. These can be used in a class to teach analytic methods (one can ask a class to interpret the kill-off patterns, or ask them to critique the data and probe its ambiguities and limits). It can be used with other datasets for some larger research project involving a regional synthesis. The “About Section” of Open Context explains more.

Last, Secondary Refuse found an interface flaw I had missed. We had a bug where downloadable tables associated with projects weren’t showing up. The bug is fixed and when you look at the Chogha Mish Overview, you’ll find a link to a table you download and use in Excel or similar applications.

Kudos to Secondary Refuse’s author! Feedback like this is really important for us to learn how to improve Open Context. So this is much appreciated!!

In the New York Times, an article discusses how the “Venerable British Museum Enlists in the Wikipedia Revolution.’ Have you looked at what Wikipedia says about your project/museum/archaeological site/etc. as of late? If you think it is inadequate, consider doing what the BM is doing: collaborating with Wikipedia to ensure that its huge readership—admit it, it hasn’t been very long since you last consulted it too, right?—gets the correct information. After all, “‘[t]en years ago we were equal, and we were all fighting for position,’ Mr. Cock [BM webmaster] said. Now, he added, ‘people are gravitating to fewer and fewer sites. We have to shift with how we deal with the Web.’”

In other words, if you can’t beat ’em, join ’em. Once criticized as amateurism run amok, Wikipedia has become ingrained in the online world: it is consulted by millions of users when there is breaking news; its articles are frequently the first result when a search engine is used. This enhanced role has moved hand in hand with Wikipedia’s growing stability (some would say stagnation). With more than three million articles in English alone, there are fewer unexplored topics, and many of the most important articles have been edited thousands of times over a number of years. All of this means that in today’s Wikipedia there is renewed value in old-fashioned expertise, whether to provide obscure details to articles that have already been carefully edited or to find worthy topics that haven’t been written about yet. Mr. Cock, for example, estimated that there were thousands of British Museum objects (among the eight million total) that would be worth their own Wikipedia articles but don’t have them.

What unites them is each organization’s concern for educating the public: one has the artifacts and expertise, and the other has the online audience. Dividing them are issues of copyright and control, principally of images. Wikipedia’s parent, the Wikimedia Foundation, is strongly identified with the “free culture movement,” which generally holds that copyright laws are too restrictive. The foundation hosts an online “commons” with more than six million media files, photos, drawings and videos available under free licenses, which mean they can be copied by virtually anyone as long as there is a credit. That brought Wikipedia into a legal tussle with another prominent British institution, the National Portrait Gallery, when high-resolution copies of paintings from its collection were uploaded to the commons. A Wikipedia volunteer had cobbled the copies together from the gallery’s Web site, justifying his actions by noting that the paintings involved were no longer under copyright. Both the portrait gallery and the British Museum generate revenue by selling reprints and copies of pieces in their collections.

Archive ’10, the NSF Workshop on Archiving Experiments to Raise Scientific Standards, was just held on May 25-26 in Salt Lake City—sorry for not announcing this in advance, I just learnt about it myself via Clifford Lynch. The website states: “Archive ’10 will focus on the creation of archives of computer-based experiments: capturing and publishing entire experiments that are fully encapsulated, ready for immediate replay, and open to inspection. It will bring together a few areas of the scientific community that represent fairly advanced infrastructure for archiving experiments and data (physicists and biomedical researchers) with two areas of the computer systems community for which significant progress is still needed (networks and compilers). The workshop will also include experts in enabling technologies and publishing.”

The live video feed doesn’t seem to be working anymore. I hope it will be replaced with an archived version. A few of the position papers that stood out to me are:

This is not exactly archaeology of course but it still is a good idea to check on other disciplines for ideas and experiences.

