semantic web


I’m happy to join with a fantastic team, led by Tom Eliot, Sebstian Heath, and John Muccigrosso on an NEH-funded “institute” called LAWDI (Linked Ancient World Data Institute). I promise it will have plenty of the enthusiasm and fervor implied by its acronym. To help spread the word, I’m reusing some of Tom Eliot’s text that he circulated on the Antiquist email list:

The Institute for the Study of the Ancient World at New York University will host the Linked Ancient World Data Institute (LAWDI) from May 31st to June 2nd, 2012 in New York City. Applications are due 17 February 2012.

LAWDI, funded by the Office of Digital Humanities of the National Endowment for Humanities, will bring together an international faculty of practitioners working in the field of Linked Data with twenty attendees who are implementing or planning the creation of digital resources.

More information, including a list of faculty and application instructions, are available at the LAWDI page on the Digital
Classicist wiki:

http://wiki.digitalclassicist.org/Linked_Ancient_World_Data_Institute

(Cross posted on Heritage Bytes)

We’re delighted to announce that Archaeology 2.0: New Approaches to Communication and Collaboration is now available via the University of California’s eScholarship repository, at the following link:http://escholarship.org/uc/item/1r6137tb 

This book explores the social use and context of the World Wide Web within the discipline of archaeology.  While the Web has radically altered journalism, commerce, media and social relationships, its sees very uneven adoption in professional scholarly contexts. Case studies discussed in this book help illuminate patterns of adoption and resistance to new forms of scholarly communication and data sharing. These case studies explore social media, digital preservation, and cultural representation concerns, as well as technical and semantic challenges and approaches toward data interoperability. Contributors to this volume debate the merits and sustainability of open access publishing and how the Web mediates interactions between professional and nonprofessional communities engaged in archaeology.

 

Archaeology 2.0 is the first book in the Cotsen Institute’s new Digital Archaeology Series (http://escholarship.org/uc/search?entity=cioa_cda). The editors want to thank all of the book’s contributors, and also the Cotsen Institute of Archaeology Press, especially Julie Nemer, Carol Leyba, and Willeke Wendrich. The printed version will be available for purchase shortly.

Sebastian Heath has an interesting discussion about museum identifiers. This is part of his ongoing project to document museum and online archaeological-collections identification schemes. Sebastian referenced a discussion circulated by Martin Doerr of the Center for Cultural Informatics on Crete (and of CIDOC fame) about aligning Web identifiers in museums toward some common design standards.

For instance, the Rosetta Stone has the PRN number: YCA62958, hence the “official” URI of the Rosetta stone is: http://collection.britishmuseum.org/object/YCA62958 . This URI should never become direct address of a document.

I absolutely agree with Sebastian on his points about getting human readable pages and avoiding divisions between the semantic and the “plain web” (contra the second sentence in the quote above).

Beyond those architecture issues however, I think the politics of naming and identifying cultural heritage will be a very interesting problem for semantic web approaches. Custody over the Rosetta Stone is in some dispute. The Elgin marbles are even more contested. I’m sure that some people in Greece would have a problem with “britishmuseum.org” in the internationally recognized / official / canonical  URI(s) for the Elgin marbles. In other words, naming and identifying things can be somewhat political and that will work against attempts to harmonize. I’m sure there will always be a need for third-parties to cross-reference identifiers.

I suspect issues like this will pose big problems to attempts to rationalize identifiers. That’s part of the reason why some digital library folks favor opaque identifiers. Of course, this digital library perspective is not universally shared.

It will be interesting to see how this discussion unfolds in cultural heritage applications.

Updated (Nov. 2):

  1. Also I should note that the “Museums and the machine-processable web wiki” (a fantastic resource and community hub!!) has some excellent discussion of these issues.
  2. Sebastian continued the discussion in this post.

Chris Rusbridge (Digital Curation Centre, Edinburgh, UK) wrote an interesting post in his Digital Curation Blog reflecting on, among other things, the book Data and Reality by William Kent:

The book is full of really scary ways in which the ambiguity of language can cause problems for what Kent often calls “data processing systems”. He quotes Metaxides: “Entities are a state of mind. No two people agree on what the real world view is”

“… the thing that makes computers so hard is not their complexity, but their utter simplicity… [possessing] incredibly little ordinary intelligence” I do commend this book to those (like me) who haven’t had formal training in data structures and modelling. I was reminded of this book by the very interesting attempt by Brain Kelly to find out whether Linked Data could be used to answer a fairly simple question. His challenge was ‘to make use of the data stored in DBpedia (which is harvested from Wikipedia) to answer the query “Which town or city in the UK has the highest proportion of students?”

… the answer Cambridge. That’s a little surprising, but for a while you might convince yourself it’s right; after all, it’s not a large town and it has 2 universities based there. The table of results shows the student population as 38,696, while the population of the town is… hang on… 12? So the percentage of students is 3224%.

There is of course something faintly alarming about this. What’s the point of Linked Data if it can so easily produce such stupid results? Or worse, produce seriously wrong but not quite so obviously stupid results? But in the end, I don’t think this is the right reaction. If we care about our queries, we should care about our sources; we should use curated resources that we can trust. Resources from, say… the UK government? And that’s what Chris Wallace has done.

The answer he came up with was Milton Keynes which is the headquarters of the Open University which has practically no students locally as they are typically long-distance learners…

So if you read the query as “Which town or city in the UK is home to one or more universities whose registered students divided by the local population gives the largest percentage?”, then it would be fine. And hang on again. I just made an explicit transition there that has been implicit so far. We’ve been talking about students, and I’ve turned that into university students. We can be pretty sure that’s what Brian meant, but it’s not what he asked. If you start to include primary and secondary school students, …

My sense of Brian’s question is “Which town or city in the UK is home to one or more university campuses whose registered full or part time (non-distance) students divided by the local population gives the largest percentage?”. Or something like that (remember Metaxides, above). Go on, have a go at expressing your own version more precisely!

He ends his investigation with “I’m beginning to worry that Linked Data may be slightly dangerous except for very well-designed systems and very smart people…”

Shawn Graham got the ball rolling with his discussion of applying Second Life as an instructional platform for archaeology. It seems to have had some resonance with other archaeo-bloggers (see ClioAudio, and ArchaeoGeek). ArchaeoGeek noted some fascinating work attempting to link GIS-type capabilities in Second Life. They even have an elaborate model of downtown Berkeley, including BART station.

Shawn also rightly discusses some concerns that people have voiced. These comments show some worry that we’re in danger of putting our data eggs in one basket, aand becoming dependent on yet another commercial platform (as in my previous discussion of Google, and how much we’ve come to rely on it). Given all the data preservation problems caused by closed-proprietary file formats and software, these are valid issues.

However, Linden Labs is pretty good in this regard, and I wouldn’t put Second Life in the same realm as Microsoft or even Google. Mitch Kapor (of Lotus fame, and now Second Life’s major investor) recently gave a talk at the UC Berkeley ISchool about Second Life (link to podcast). He talked about how Linden Labs is doing much to open up its infrastructure, and has “open sourced” both its client and will do so soon with its backend infrastructure software. Others will soon be able to run a Second Life server on their own. I think portability of the data in virtual worlds makes using Second Life and investing some effort in playing with it much more worth while and less risky.

In any event, while reliance on any one system is probably dangerous, there are good immediate and practical reasons for avoiding such digital mono-culture. Certain systems are best for certain types of applications. Second Life is great for visualization, and offering rich and shared experiences. But it’s probably not the kind of thing I’d use to run a statistical analysis of pot-sherd distributions. That said, Second Life doesn’t have to do that, because Linden Labs is making it easier to integrate with systems that do offer such capabilities.

I think a lot of interesting things will happen in systems like Second Life (and GoogleEarth). However, I think the most interesting things will happen between and among such systems that work together as an ecosystem exchanging data. The capability to draw upon a diverse array of powerful web services (delivering XML-encoded data, or similar formats like JSON) from data providers such as Nabonidus, Open Context, Freebase, GoogleDocs, the Portable Antiquities Scheme and others.

Of course, all this leads directly into standards questions. I tend to favor simple, incremental (or “gracefully degradable”) standards, since this approach seems like the most feasible way of exchanging at least some data. I’ll write some more on the standards question shortly.

One brief additional note on Freebase:

Mia Ridge, another archaeologist with informatics interest, also has more on Freebase and pointed to an International Herald Tribune article about the system.

I’ve been poking around an interesting commercial initiative called “Freebase“, an open access / open licensed (using the Creative Commons attribution license) web-based data sharing system developed by Metaweb. Metaweb is a commercial enterprise, and according to their FAQ they plan on making money through some sort of fee structure on using their API (translation for archaeologists: an interface enabling machine-to-machine communication). Here’s a link to other blogger reactions and with lots of interesting discussion of Freebase.

I haven’t had any luck finding out how Freebase works, or what its underlying architecture is like. Given the shape of the Metaweb logo (triple lobes), I can only guess they have an RDF data-store (a big database of RDF-triplets). We’ll have an opportunity to learn more shortly, because Robert Cook of Metaweb has kindly agreed to speak about these efforts in our Information and Service Design Lecture series (at the UC Berkeley School of Information).

(Editing note: Here is a much more complete description of Freebase’s conceptual organization. )

However, my first impressions of surfing through Freebase remind me lots of some of the data structures we’ve been using in Open Context, which is based on the OCHRE project’s ArchaeoML global schema (database structure). For example, Freebase seems to emphasize items of observation that have descriptive properties and contextual relationships with other items. Open Context works just like that, but, being designed for the field sciences and material collections, Open Context assumes observations have some spatial relationships with one another (especially spatial containment). The overall point is that these systems offer data contributors tremendous flexibility in how they organize and describe their observations, while still enabling interoperability and a common set of tools for exploring and using multiple datasets. It’s a way of sharing data without forcing people into inappropriate, rigid or over specified standards.

Freebase looks more flexible in this regard (being designed for a wider set of applications). Freebase clearly has lots more professionalism in design and execution, and has an incredibly interesting API. It’s also great to see tools for data authors to share schemas (ways of organizing and describing datasets). All this shows you what great talent and venture capital funding delivers, and I’m duly impressed (and maybe a little jealous)!

We’re just now looking at RESTful web services for Open Context, and Freebase may offer an invaluable model / or set of design parameters for opening up systems for machine-to-machine interactions. In fact, making Open Context “play well” with a powerful commercial service such as Freebase would offer great new opportunities for our user community (choices of interfaces and tools).

Archaeology is a broad and diverse discipline, and making sure archaeologists can easily move data between different tools (blogs, online databases, and visualization environments like Google Earth) is an important need. We should take a serious look at systems like Freebase to make sure we’re best serving our community when we build such “cyberinfrastructure” systems.

BTW, anyone is welcome to help work with us on an archaeological web-services project. Open Context, unlike Freebase (which is a service built on a commercial product), is open sourced and you can get the source code here. It might be fun to come up with interesting ways to connect Freebase with Open Context.

Stuart Jeffrey of the Archaeology Data Service recently forwarded the following conference announcement to share with DDIG members:

Data Sans Frontières: web portals and the historic environment

25 May 2007: The British Museum, London

Organised by the Historic Environment Information Resources Network (HEIRNET) and supported by the AHRC ICT Methods Network and the British Museum, this one-day conference takes a comprehensive look at exciting new opportunities for disseminating and integrating historic environment data using portal technologies and Web 2.0 approaches. Bringing together speakers from national organisations, national and local government and academia, options for cooperation at both national and international levels will be explored.

The aims of the conference are:

  • To raise awareness of current developments in the online dissemination of Historic Environment Data
  • To set developments in the historic environment sector in a wider national and European information context
  • To raise awareness of current portal and interoperability technologies
  • To create a vision for a way forward for joined up UK historic environment information provision

This conference should be of interest to heritage professionals, researchers
and managers from all sectors.

The conference costs £12 and a full programme and online registration facilities are available at http://www.britarch.ac.uk/HEIRNET/ There may be tickets available on the day, but space is limited so please register as soon as possible.

Wow! The Stoa Consortium group blog recently highlighted a very interesting development noted by the blog “Semantic Humanities“.
Here’s a really interesting collection of software resources and tools being developed at MIT under the banner of the SMILE project. Most are based around RDF (the Resource Description Framework), a W3C standard that is the backbone of efforts to develop the Semantic Web.

It looks like many of the tools here can help in dealing with many of the interoperability concerns faced by researchers. For instance, archaeological datasets would be more valuable when seen against various environmental and ecological datasets. Tools like this can help enable “mash-ups” between resources developed in these various disciplines.

Their TimeLine widget would be of immediate interest to archaeologists, because it provides a ready-made software tool to graphically represent chronology (in the form of an easy to navigate timeline). It looks very clean and easy to implement, and would make an impressive navigation tool for archaeological resources on the web. This would be a great browsing navigation and visualization enhancement for Open Context.

All their work is free and open source. It’s definitely worth exploring.