informatics


I’m happy to join with a fantastic team, led by Tom Eliot, Sebstian Heath, and John Muccigrosso on an NEH-funded “institute” called LAWDI (Linked Ancient World Data Institute). I promise it will have plenty of the enthusiasm and fervor implied by its acronym. To help spread the word, I’m reusing some of Tom Eliot’s text that he circulated on the Antiquist email list:

The Institute for the Study of the Ancient World at New York University will host the Linked Ancient World Data Institute (LAWDI) from May 31st to June 2nd, 2012 in New York City. Applications are due 17 February 2012.

LAWDI, funded by the Office of Digital Humanities of the National Endowment for Humanities, will bring together an international faculty of practitioners working in the field of Linked Data with twenty attendees who are implementing or planning the creation of digital resources.

More information, including a list of faculty and application instructions, are available at the LAWDI page on the Digital
Classicist wiki:

http://wiki.digitalclassicist.org/Linked_Ancient_World_Data_Institute

(Cross posted on Heritage Bytes)

We’re delighted to announce that Archaeology 2.0: New Approaches to Communication and Collaboration is now available via the University of California’s eScholarship repository, at the following link:http://escholarship.org/uc/item/1r6137tb 

This book explores the social use and context of the World Wide Web within the discipline of archaeology.  While the Web has radically altered journalism, commerce, media and social relationships, its sees very uneven adoption in professional scholarly contexts. Case studies discussed in this book help illuminate patterns of adoption and resistance to new forms of scholarly communication and data sharing. These case studies explore social media, digital preservation, and cultural representation concerns, as well as technical and semantic challenges and approaches toward data interoperability. Contributors to this volume debate the merits and sustainability of open access publishing and how the Web mediates interactions between professional and nonprofessional communities engaged in archaeology.

 

Archaeology 2.0 is the first book in the Cotsen Institute’s new Digital Archaeology Series (http://escholarship.org/uc/search?entity=cioa_cda). The editors want to thank all of the book’s contributors, and also the Cotsen Institute of Archaeology Press, especially Julie Nemer, Carol Leyba, and Willeke Wendrich. The printed version will be available for purchase shortly.

DDIG member Ethan Watrall (Asst. Professor of Anthropology @ MSU) sends us the following information about his upcoming Cultural Heritage Informatics (CHI) field school, which is part of the CHI Initiative at Michigan State University.

Excerpts quoted. For full details, please see this PDF LINK.

Site Link:<http://chi.matrix.msu.edu/fieldschool> Email:watrall@msu.edu

We are extremely happy to officially announce the Cultural Heritage Informatics Fieldschool (ANP491: Methods in Cultural Heritage Informatics). Taking place from May 31st to July 1st (2011) on the campus of Michigan State University, the Cultural Heritage Informatics Fieldschool will introduce students to the tools and techniques required to creatively apply information and computing technologies to cultural heritage materials and questions.

The Cultural Heritage Informatics Fieldschool is a unique experience that uses the model of an archaeological fieldschool (in which students come together for a period of 5 or 6 weeks to work on an archaeological site in order to learn how to do archaeology). Instead of working on an archaeological site, however, students in the Cultural Heritage Informatics Fieldschool will come together to collaboratively work on several cultural heritage informatics projects. In the process they will learn a great deal about what it takes to build applications and digital user experiences that serve the domain of cultural heritage – skills such as programming, user experience design, media design, project management, user centered design, digital storytelling, etc. …

The Cultural Heritage Informatics Fieldschool is open to both graduate students and undergraduates. There are no prerequisites (beyond an interest in the topic). Students from a wide variety of departments, programs, and disciplines are welcome. students are required to enroll for both sections 301 (3 credits) and 631 (3 credits) of ANP 491 (Methods in Cultural Heritage Informatics).

Admission to the Cultural Heritage Informatics Fieldschool is by application only.

To apply, please fill out the Cultural Heritage Informatics Fieldschool Application Form <http://chi.matrix.msu.edu/fieldschool/chi-fieldschool-application>. Applications are due no later than 5pm on March 14th. Students will be notified as to whether they have been accepted by March 25th.

Sebastian Heath has an interesting discussion about museum identifiers. This is part of his ongoing project to document museum and online archaeological-collections identification schemes. Sebastian referenced a discussion circulated by Martin Doerr of the Center for Cultural Informatics on Crete (and of CIDOC fame) about aligning Web identifiers in museums toward some common design standards.

For instance, the Rosetta Stone has the PRN number: YCA62958, hence the “official” URI of the Rosetta stone is: http://collection.britishmuseum.org/object/YCA62958 . This URI should never become direct address of a document.

I absolutely agree with Sebastian on his points about getting human readable pages and avoiding divisions between the semantic and the “plain web” (contra the second sentence in the quote above).

Beyond those architecture issues however, I think the politics of naming and identifying cultural heritage will be a very interesting problem for semantic web approaches. Custody over the Rosetta Stone is in some dispute. The Elgin marbles are even more contested. I’m sure that some people in Greece would have a problem with “britishmuseum.org” in the internationally recognized / official / canonical  URI(s) for the Elgin marbles. In other words, naming and identifying things can be somewhat political and that will work against attempts to harmonize. I’m sure there will always be a need for third-parties to cross-reference identifiers.

I suspect issues like this will pose big problems to attempts to rationalize identifiers. That’s part of the reason why some digital library folks favor opaque identifiers. Of course, this digital library perspective is not universally shared.

It will be interesting to see how this discussion unfolds in cultural heritage applications.

Updated (Nov. 2):

  1. Also I should note that the “Museums and the machine-processable web wiki” (a fantastic resource and community hub!!) has some excellent discussion of these issues.
  2. Sebastian continued the discussion in this post.

Chris Rusbridge (Digital Curation Centre, Edinburgh, UK) wrote an interesting post in his Digital Curation Blog reflecting on, among other things, the book Data and Reality by William Kent:

The book is full of really scary ways in which the ambiguity of language can cause problems for what Kent often calls “data processing systems”. He quotes Metaxides: “Entities are a state of mind. No two people agree on what the real world view is”

“… the thing that makes computers so hard is not their complexity, but their utter simplicity… [possessing] incredibly little ordinary intelligence” I do commend this book to those (like me) who haven’t had formal training in data structures and modelling. I was reminded of this book by the very interesting attempt by Brain Kelly to find out whether Linked Data could be used to answer a fairly simple question. His challenge was ‘to make use of the data stored in DBpedia (which is harvested from Wikipedia) to answer the query “Which town or city in the UK has the highest proportion of students?”

… the answer Cambridge. That’s a little surprising, but for a while you might convince yourself it’s right; after all, it’s not a large town and it has 2 universities based there. The table of results shows the student population as 38,696, while the population of the town is… hang on… 12? So the percentage of students is 3224%.

There is of course something faintly alarming about this. What’s the point of Linked Data if it can so easily produce such stupid results? Or worse, produce seriously wrong but not quite so obviously stupid results? But in the end, I don’t think this is the right reaction. If we care about our queries, we should care about our sources; we should use curated resources that we can trust. Resources from, say… the UK government? And that’s what Chris Wallace has done.

The answer he came up with was Milton Keynes which is the headquarters of the Open University which has practically no students locally as they are typically long-distance learners…

So if you read the query as “Which town or city in the UK is home to one or more universities whose registered students divided by the local population gives the largest percentage?”, then it would be fine. And hang on again. I just made an explicit transition there that has been implicit so far. We’ve been talking about students, and I’ve turned that into university students. We can be pretty sure that’s what Brian meant, but it’s not what he asked. If you start to include primary and secondary school students, …

My sense of Brian’s question is “Which town or city in the UK is home to one or more university campuses whose registered full or part time (non-distance) students divided by the local population gives the largest percentage?”. Or something like that (remember Metaxides, above). Go on, have a go at expressing your own version more precisely!

He ends his investigation with “I’m beginning to worry that Linked Data may be slightly dangerous except for very well-designed systems and very smart people…”

Here’s some great news (esp. considering current economic conditions!) for those of you interested in digital data and archaeology:

Digital Antiquity Seeks a Founding Executive Director

Digital Antiquity seeks an entrepreneurial and visionary Executive Director who can play a central role in transforming the discipline of archaeology by leading the establishment of an on-line repository of the digital data and documents produced by archaeological research in the Americas. Digital Antiquity is a national initiative that is generously funded by the Andrew W. Mellon Foundation.

The Executive Director oversees all Digital Antiquity activities, including hiring and supervising staff, marketing repository services to the professional community, guiding software development, and managing acquisition of repository content.

During its startup phase, Digital Antiquity resides within Arizona State University and the Executive Director will hold the position of Research Professor at ASU with a 12 month, renewable appointment, excellent benefits, and a rank and attractive salary commensurate with experience. A fixed term secondment or IPA (paid transfer from another position) would also be considered.

A link to the full job announcement may be found at http://www.digitalantiquity.org/confluence/display/DIGITAQ/Executive+Director+Search. Interested individuals may also contact Keith Kintigh (kintigh@asu.edu) for more information. Consideration of applications will begin May 1, 2009 and will continue until the position is filled .

There’s a fairly close allignment of interests and goals between the folks working for open access to scholarship and open data in science (one of the main themes of this blog), and the folks working for greater government transparency. As is the case with science and scholarship, access government data can enhance participation (of the civil society kind) and accountability. Our recent work relating to Recovery.gov (here, and here), attempted to bring some of the experience we had in “open data” (for science) to open data for government.

Initially, we were very optimistic. The Office of Management and Budget (OMB) issued guidelines on Feb 18th that required individual agencies participating in the recovery effort to publish feeds that disclosed important information about their actions, spending, and who recieved money. The great thing about these guidelines was that the very agencies who spent recovery dollars would reveal exactly how they spent the money. There were many missing pieces and unanswered questions in these guidelines, and my colleagues Erik Wilde, Raymond Yee, and I tried to fill in these blanks with this report and demonstration implementation.

However, OMB just issued a new set of revised guidelines that represent a big step backwards from their initial call for decentralized disclosure [UPDATED WITH CLARIFICATION SEE BELOW]. The decentralized approach is now replaced by a centralized approach of having Recovery.gov publish all the data. All the information flows from the agencies, to OMB, to Recovery.gov will be opaque to the public. (Actually, according to the guidelines, much of this will take place via email).

This issue of centralization marks how our group diverges with other transparency advocates. For example, the transparency advocacy group OMB Watch explicitly called for a “Centralized Reporting System” (page 9 of this report). [UPDATED WITH CLARIFICATION SEE BELOW]. While in some ways convenient, centralization is not required, and in, our view, works against transparency. First off, feeds can be readily aggregated. With feeds, the disclosure reports of distributed agencies can be brought together for convenience and “one stop shopping” monitoring. Secondly, the call for a centralized reporting source means that all the data gathering and reporting processes happen behind the scenes in a manner that is not publicly visible. What’s happening in these back-end processes? How is the data being managed and processed? How is it transformed? You end up with “black-box transparency” which is obviously an oxymoron.

But this gets to the heart of the issue. Transparency advocacy groups need to be much more aware of the architecture issues behind “transparency”. Access to data is not enough. The processes behind how the data is gathered, processed, and published also matter.

There’s much more to say about this issue, but in the interim, please look at Erik Wilde’s detailed discussion about why architectures of transparency matter.

Update:Over at the “Open House” discussion list, Gary Bass made an important comment regarding OMB Watch’s position on “centralization”. He wrote:

For the record, and to clarify your blog post, at no time did OMB Watch ever support only sending information to OMB to build a single database.  OMB Watch has always supported comprehensive machine readable feeds (APIs and syndications) from agencies. I also believe that is OMB’s intent based on our reading of the guidance.

His comment and statement on this matter is very welcome, and I stand corrected. I’m glad that this important organization is taking a thoughtful position on this matter.

UPDATE about OMB’s Guidelines. Regarding page 68 of the OMB revised guidelines. It still says feeds are required, then a few lines down the text says that if an agency is unable to publish a feed, it can do something else (with some instructions about how to do the alternative). Of a 172 page document, only 3 pages (68-70) discuss feeds and their implementation. This suggests that feeds are being dropped as a vehicle for disclosure.

Archaeoinformatics Lecture Series 2008

The Archaeoinformatics Consortium is pleased to announce the participants in the 2007-2008 Virtual Lecture Series schedule. The Virtual Lecture series involves leaders from around the world and many disciplines who each will be presenting information on their cyberinfrastructure initiatives and strategies and the ways in which their lessons learned may be useful to archaeology. In addition there will be presentations from archaeologists describing their successful cyberinfrastructure efforts.

These lectures are presented every other week using the NSF funded Access GRID video conferencing system. Many universities across the US, UK and Australia have Access GRID or compatible facilities. It is also possible to participate in the lectures by downloading the presentation slides and participating via a telephone bridge. Information on how to connect to the Access GRID system and alternatives are provided at http://archaeoinformatics.org/lecture_series.html. The lectures from the 2006-2007 series and this year’s lectures are also available as streaming video from the archaeoinformatics web site.

Archaeoinformatics.org

Archaeoinformatics.org, has been established as a collaborative organization to design, seek funding for, and direct a set of cyberinfrastructure initiatives for archaeology. Archaeoinformatics.org seeks to coordinate with and, develop interoperability of its own projects with other relevant data-sharing initiatives. It offers to work with professional organizations and federal agencies to promote policies that will foster the development of cyberinfrastructure for archaeology. More information is available at http://archaeoinformatics.org

Lecture 7
February 27, 2008, 10:30-12:30 CST

“Collaborative Adventures in Distributed Digital Preservation: The MetaArchive Cooperative and the Educopia Institute ”

Katherine Skinner
Digital Projects Librarian at the Emory University Libraries

The challenges presented by the concept of digital preservation require and have inspired a number of institutions to work cooperatively in order to accomplish meaningful programmatic advances. Among these collaborative ventures, the MetaArchive Cooperative, established in 2004, has developed an organizational model and technical infrastructure (building on the LOCKSS software developed at Stanford University) for preserving the digital assets of archives, museums, data centers, and libraries in a geographically distributed framework. This lecture takes as its focus some of the strategies that the MetaArchive Cooperative has employed in order to support, sustain, and grow its cross-institutional collaboration. During the session, Katherine will explore some of the logistical and organizational issues that have arisen for the Cooperative over the last four years and will talk more generally about the strengths of different organizational structures for accomplishing particular goals.

The MetaArchive Cooperative (http://metaarchive.org) began in 2004 as a collaborative venture of Emory University, Georgia Institute of Technology, University of Louisville, Virginia Polytechnic Institute and State University, Auburn University, Florida State University, and the Library of Congress. The MetaArchive Cooperative has operated a distributed preservation network infrastructure for several years that is based on the LOCKSS software, and has now transformed into an independent, international membership association hosted by the Educopia Institute and based in Atlanta, Georgia.

My colleague Erik Wilde is organizing a workshop on Location and the Web. I’m helping to organize and have already hit some of the email lists with a call for papers. The types of questions explored by this workshop will be directly relevant to researchers interested in using GoogleEarth or Second Life for visualization and analysis (for instance). Here’s his call for papers:

the paper submission deadline for the First Workshop on Location and the Web (LocWeb 2008) is only 18 days away. we now have a pretty stong program committee, and i am looking forward to the submitted paper and of course the workshop itself.

so if you are interested in location information and the web, please consider submitting a paper. the workshop is held in beijing and co-located with WWW2008, the 2008 edition of the world’s premier conference in the area of web technologies.

my personal hope for the workshop is that we will be able to get strong submissions in the area of how to make location information available as part of the web, not so much over the web. there are countless examples of applications with location as part of their data model, which are accessible through some web interface, but there are far less examples of applications which try to turn the web into a location-aware information system. the latter would be the perfect candidate for the workshop.

Shawn Graham over at the “Electric Archaeology” weblog has a post asking about the use of 2nd Life to teach archaeology. There is a UC Berkeley Catalhoyuk reconstruction in 2nd Life now, intended to be a teaching resource (it won an “Open Archaeology Prize“). He has some very interesting ideas about linking archaeological databases dynamically with the virtual world.

I think it’ll be really useful to connect Second Life with different archaeological databases for visualization. 2nd Life does support connections with other online data sources, or web services, (see link). I’ve never done any programming in Second Life, so I’m not sure what sorts of limits the system has in reading outside data.

At any rate, outside databases would have to express data in a machine-readable format so the Second Life scripting language could parse the information. XML is an obvious choice, but there needs to be lots of thought on how to apply it to support Second Life visualization.

Most archaeological datasets that I’ve seen don’t have enough spatial information to make an easy and precise mapping into a virtual world. For example, many finds are in “bulk find” category, and you’ll only know their spatial context approximately (from say from a specific contextual unit). The contextual units, their size, shape, and relative positioning may be very poorly recorded and documented. Thus, rendering in Second Life will require lots of guestimation.

Shawn mentions Open Context in his post as an example data source. Open Context does make XML data available for all media, locations & objects, and for its faceted browse. Examples:

(1) Here’s a link to XML data for all small finds from Petra that have pictures (from the faceted browse).

(2) Here’s a link to XML data for a specific sheep radius from Petra.

(3) Here’s another link to XML data for an elephant capital also from Petra.

Although there’s contextual information, the contexts don’t have very clear spatial referencing, so it’ll be hard to simply put these data into a good Second Life 3D view. Having some clear common standard for spatial referencing in 3D will be really useful, as well as clear conventions on how to visualize archaeological data when detailed spatial referencing isn’t available.

Next Page »