social software

As many of us know, the annual SAA conference is about to begin in Sacramento, California. Like all large conferences, scheduling represents a complex and difficult juggling act. So, it is not too much of a surprise when awkward schedule conflicts emerge. Unfortunately this year, two digitally themed sessions coincide in the schedule (see the Saturday schedule, 1-3ish PM slot).

The silver-lining is that these two sessions are digitally themed and both make excellent use of the Web. That means you can connect with the ideas and people involved in these sessions asynchronously. Colleen Morgan organized a session on blogging in archaeology. As one would expect from the subject matter, a great deal of excellent and fascinating discussion can be found online, contributed by many thoughtful archaeological bloggers. Here’s a link to a post that kicked off the discussion. The other digitally themed session was organized by Josh Wells, convener of DDIG. This session, an electronic symposium, also has excellent Web content published on Visible Past. Visible Past is an electronic publication platform built off WordPress, a powerful blogging application. These papers (and since they are more formal and less conversational, so I’ll call them “papers”, not “posts”) can be found here:



This is a topical blog about archaeology and digital data, so this post may appear off topic at first, but trust me it is not.

The Republican Party (or GOP), in its quest to appear like the party of “fiscal responsibility” [sic], has launched a new crowd-sourcing site to go after “questionable” grants made by the National Science Foundation (NSF). NSF funds some archaeology, so this development is of interest to readers of Digging Digitally.

While one can take issue with the wisdom of cutting NSF’s budget versus other areas of the federal budget, what makes this development noteworthy is the explicit use of crowd-sourcing to politicize specific funding decisions. The GOP sponsored site asks users to:

In the “Search Award For” field, try some keywords, such as: success, culture, media, games, social norm, lawyers, museum, leisure, stimulus, etc. to bring up grants. If you find a grant that you believe is a waste of your taxdollars, be sure to record the award number.

OK. So does that mean “museums”, “social norms” and “culture” are all implicitly a waste of money? I guess “success” is a waste too. Naturally, you can’t cut any other area of government spending (like defense or entitlements) from the GOP site. It’s a nice way to make “crowd-sourcing” less than democratic, since essentially this website predetermines your choices in what you will cut. But I’m going off track…

More to the point, how should the average lay person understand an NSF award enough to evaluate it, especially when all that is available is a title and a short abstract? I’m not qualified to evaluate many grants in archaeology because different areas of specialization require so much background knowledge. I consider myself pretty scientifically literate and I can barely understand NSF award information in some areas of computer science, economics, climate research, etc.

Nevertheless, I trust that the NSF awards in these areas outside of my field are probably worthwhile. That’s because I generally trust the scientific community and scientific processes (grant reviews, peer-review). Science is not perfect, but it does tend to value skepticism, evidence, and intellectual freedom.

The GOP’s crowd-sourcing effort shows an implicit, but fundamental distrust of the scientific community. The GOP wants you to second-guess expert opinion, because scientific expertise is by its nature suspect in contemporary Republican Party ideology. No doubt this will further politicize climate science, evolutionary science, and many other areas archaeologists care about.

Lastly, the whole “fiscal responsibility” thing is pretty laughable. Via Twitter, Tom Scheinfeldt wrote:

Total NSF budget=$7 billion. Cost of yesterday’s tax cuts=$700 billion. Targeting NSF is just a smokescreen to keep budget hawks preoccupied

Good point! I politely sent a note about Tom’s point via the GOP site that maybe they could look for budget savings more fruitfully in entitlements or defense spending.

I came across a post in the Through the Kaleidoscope blog that got me thinking. “Crowd science – where masses of people participate in data collection for science projects – is growing … Astronomy is the area in which crowd science has been most frequently used, which makes sense given the field’s massive scale and large datasets. One example is the ten-year old SETI@home project …” I must admit here that I’ve been participating in the latter project since May 1999—which puts me in the 89th percentile of all 1.1 million SETI enthusiasts  :-)  I run the project using UC Berkeley’s BOINC, a commonly-used, multiplatform open-source program for volunteer computing and grid computing. BOINC facilitates running several projects at the same time according to selected settings. For instance, I’m also active in other projects: Einstein@home, MilkyWay@home (astronomy), (climatology), Rosetta@home, (medical research), SZTAKI Desktop Grid (math), Quake Catcher Network (seismology). At one time, I also participated in non-BOINC projects but that was too cumbersome. The BOINC projects have attracted a lot of creative programmers so that there are for example at least seven websites where you can easily access your statistics both by project as well as combined. Each project awards credits for work done, allowing cross-project comparison and combination of your “scores.” It all serves to involve the participants, make them feel invested. There is even a way to have important milestones in you efforts posted on your FaceBook account, e.g., on September 3, I passed the 6,000 credit milestone for

So what could we do with this crowd-sourced/distributed-computing approach in archaeology? After all, just like astronomy and medical research, we too have a lot of goodwill from the general public directed at us. There has to be a way to channel some of this. Surely, we can find some huge data sets that need processing and whose results can be appealing to a general audience? In the above blog post, another angle is also discussed, e.g., Galaxy Zoo, a project in which people help classify galaxies from Hubble Telescope images, a task that is hard to computerize. Some museums are letting the public tag artifacts online, a way to enhance the often-brief information available in the database (see the Steve Project). This is still primarily for art though, not archaeological artifacts. We all know that our budgets won’t increase in the near future, on the contrary. Let’s get creative!

And now for something a bit different: “… volunteers are gathering in cities around the world to help bolster relief groups and government first responders in a new way: by building free open-source technology tools that can help aid relief and recovery in Haiti. ‘We’ve figured out a way to bring the average citizen, literally around the world, to come and help in a crisis,’ says Noel Dickover, co-founder of Crisis Commons (, which is organizing the effort.” (source: NYT article)

Update 2-17-10: Wired magazine has set up its own Haiti webpage: Haiti Rewired.

One brief additional note on Freebase:

Mia Ridge, another archaeologist with informatics interest, also has more on Freebase and pointed to an International Herald Tribune article about the system.

I’ve been poking around an interesting commercial initiative called “Freebase“, an open access / open licensed (using the Creative Commons attribution license) web-based data sharing system developed by Metaweb. Metaweb is a commercial enterprise, and according to their FAQ they plan on making money through some sort of fee structure on using their API (translation for archaeologists: an interface enabling machine-to-machine communication). Here’s a link to other blogger reactions and with lots of interesting discussion of Freebase.

I haven’t had any luck finding out how Freebase works, or what its underlying architecture is like. Given the shape of the Metaweb logo (triple lobes), I can only guess they have an RDF data-store (a big database of RDF-triplets). We’ll have an opportunity to learn more shortly, because Robert Cook of Metaweb has kindly agreed to speak about these efforts in our Information and Service Design Lecture series (at the UC Berkeley School of Information).

(Editing note: Here is a much more complete description of Freebase’s conceptual organization. )

However, my first impressions of surfing through Freebase remind me lots of some of the data structures we’ve been using in Open Context, which is based on the OCHRE project’s ArchaeoML global schema (database structure). For example, Freebase seems to emphasize items of observation that have descriptive properties and contextual relationships with other items. Open Context works just like that, but, being designed for the field sciences and material collections, Open Context assumes observations have some spatial relationships with one another (especially spatial containment). The overall point is that these systems offer data contributors tremendous flexibility in how they organize and describe their observations, while still enabling interoperability and a common set of tools for exploring and using multiple datasets. It’s a way of sharing data without forcing people into inappropriate, rigid or over specified standards.

Freebase looks more flexible in this regard (being designed for a wider set of applications). Freebase clearly has lots more professionalism in design and execution, and has an incredibly interesting API. It’s also great to see tools for data authors to share schemas (ways of organizing and describing datasets). All this shows you what great talent and venture capital funding delivers, and I’m duly impressed (and maybe a little jealous)!

We’re just now looking at RESTful web services for Open Context, and Freebase may offer an invaluable model / or set of design parameters for opening up systems for machine-to-machine interactions. In fact, making Open Context “play well” with a powerful commercial service such as Freebase would offer great new opportunities for our user community (choices of interfaces and tools).

Archaeology is a broad and diverse discipline, and making sure archaeologists can easily move data between different tools (blogs, online databases, and visualization environments like Google Earth) is an important need. We should take a serious look at systems like Freebase to make sure we’re best serving our community when we build such “cyberinfrastructure” systems.

BTW, anyone is welcome to help work with us on an archaeological web-services project. Open Context, unlike Freebase (which is a service built on a commercial product), is open sourced and you can get the source code here. It might be fun to come up with interesting ways to connect Freebase with Open Context.

I sent out an email call for nominations to the ArchaeoInformatics advisory board to the 800 or so people on the DDIG email list. The response to the email was truly overhelming, with 20 nominations coming in within 18 hours of my email (sent at 9:00 PM, PST).

In contrast, I heard 1 response to the weblog post made the day before. It’s an interesting observation about communication in the scholarly / research / and maybe larger professional world. There’s something about an email that provokes a response. It is personally directed, it sits in your inbox highlighted as unread until you do something about it, and once you’ve responded, you feel like you’ve earned a little bit of your pay check. An inbox is like a little to-do list that fills up everyday.

A website like this blog contrasts greatly. One can visit anonymously and not get the same “to-do” list incentive to act on it. At least that’s my impression of how things work for many professional researchers and scholars.

All of this probably has some bearing on the success and failure of collaborative systems for scholarly communication. If you want participation, and want people to feel like they are acting productively, it seems important to leverage the psychology of the inbox.

Tom Elliot, Executive Director at the Pleiades Project, recently asked me about my thoughts about where the field of digital humanities (especially with regard to archaeology) was going in the next few years. I basically wrote back saying I had no idea, but that there are some hints about greater access, interoperability, and linkages with the commercial sector. So, here are some random and poorly organized ideas I shared with him:

  • I think we’re seeing pretty explosive growth of a whole suite of online services for humanities. I’m struck by the growing awareness of the importance of standards (OAI-PMH, GeoRSS, COinS, etc.) and I think we’ll see increasing concern with interoperability, scalability, and extensibility in architectures. Initiatives that “play nicely” with each other will win out over stand-alone silos. Tom linked to this important page illustrating some essential features cross-service interoperability should support.
  • I’m not sure that some of the Web 2.0 developments (folksonomies, wikis, etc.) will catch on for scholars, but blogging will probably grow. I think online services will probably do more monitoring and data collection of user behavior and those data will be used to deliver better services.
  • I think we’re also going to see much more available in the way of open access and Creative Commons licensed materials, even by institutions that have resisted these moves in the past (scholarly societies and museums). I’m seeing some individual researchers open up their entire field projects to more or less comprehensive transparency. That will put a premium on methodological quality and project management. It may also lead to some embarrassment of some senior scholars who look good on paper, but keep a very sloppy and incomplete record of their primary research activities.

Last, but not least, I think many of these efforts will capture a lot more commercial attention, since effective strategies in dealing with complex, semi-structured, and often spatially located content will have applications outside scholarship. Similarly, we’ll see more adaptation of commercial tools and services to meet scholarly needs. I think this interaction with the commercial sector will yield the biggest surprises with the most impact. It’s already happening with Google, but will continue in exciting ways.


Well that was fast. It took me all of 30 seconds to find discussion that contradicted my impression that Web 2.0 social / collaborative tools were not catching on in scholarship. Look at the impressions of Library 2.0 for a completely different take on the issue (link from the Stoa Consortium). It’s a very interesting read.

I just finished installing COinS metadata into parts of Open Context. COinS is a lightweight, relatively easy to implement standard for expressing Dublin Core metadata (or “information about information”, as in a library catalog). Dublin Core is a very widely used set of metadata. It’s found in RSS feeds and it is the standard used by the pioneering Archaeology Data Service (UK).

Much discussion about metadata centers on interoperability of services and making information easier to find. To these ends, we’re also working on making Open Context compliant with the Open Archives Initiative Protocols for Metadata Harvesting.

Besides being important for back-end interoperability, there are also much more user-center applications of metadata. RSS really popularized Dublin Core. It made it much more than a librarian issue, and turned virtually everyone with a weblog into a Dublin Core metadata author.

Zotero, a break-through project out of George Mason University, promises to make digital metadata much more a part of the daily lives of scholars. Zotero is a free, open source, citation tool that plugs into the Firefox browser. It scans every webpage you view, ranging from weblog posts to articles in JSTOR, and looks for metadata. It uses this metadata to automatically capture bibliographic reference information. That saves researchers a great deal of tedium and reduces annoying typographic errors in building up their reference databases.

COinS is one of the standards for expressing Dublin Core supported by Zotero, and that’s why we use it in Open Context. And we’re not the only ones to realize the significance of Zotero’s automatic bibliographic tools. The Pleiades Project (an NEH funded open access initiative developing scholarly resources and community around ancient geography) is also compliant with Zotero.

These types of tools will do much to bootstrap digital dissemination of research. Easy capture of bibliographic information makes Web resources very convenient. It’s also amazing how some of the simple features (COinS is very easy to implement) make such a difference in easy of use and relevance for scholarship.

It is very exciting to see these developments come together!

Stuart Jeffrey of the Archaeology Data Service recently forwarded the following conference announcement to share with DDIG members:

Data Sans Frontières: web portals and the historic environment

25 May 2007: The British Museum, London

Organised by the Historic Environment Information Resources Network (HEIRNET) and supported by the AHRC ICT Methods Network and the British Museum, this one-day conference takes a comprehensive look at exciting new opportunities for disseminating and integrating historic environment data using portal technologies and Web 2.0 approaches. Bringing together speakers from national organisations, national and local government and academia, options for cooperation at both national and international levels will be explored.

The aims of the conference are:

  • To raise awareness of current developments in the online dissemination of Historic Environment Data
  • To set developments in the historic environment sector in a wider national and European information context
  • To raise awareness of current portal and interoperability technologies
  • To create a vision for a way forward for joined up UK historic environment information provision

This conference should be of interest to heritage professionals, researchers
and managers from all sectors.

The conference costs £12 and a full programme and online registration facilities are available at There may be tickets available on the day, but space is limited so please register as soon as possible.

Next Page »