I recently had a chance to take at look at the current state of play with the Recovery Act transparency measures. It seems that in the next month or so, some critical decisions will be made, and these decisions will likely have a profound impact on the shape of government transparency measures in the future.

Next week, OMB will issue new guidance for how agencies are required to report on their Recovery related activities. Also, it looks like there will be some bidding or other processes for contracting out the work of developing a more robust infrastructure and reporting system for the Recovery. Once Recovery related contracts and grants are made, there will be a tremendous volume of reports that will need management and dissemination. After all nearly $800 billion in spending, spread over several agencies, and countless recipients and sub-contractors, can generate a great deal of financial information.

So, while these plans are being formulate, it is useful to take stock of where we now stand. Recovery.gov still offers reporting information in HTML and Excel formats. These formats are clearly not adequate to the task of public reporting, since they both require use of custom developed software scrapers, and these scrapers are not reliable. The scrapers are also difficult to maintain. In monitoring Recovery.gov, we’ve noticed that they seem to introduce a new Excel template every month or so. These templates alter how reporting data is expressed. The may add or drop fields and change layouts. All of these changes can play havoc with our scrapers. In fact we usually notice a new template when our scraper crashes.

But just as importantly, constant change in the templates (and schemas) of the reporting data makes it very difficult to aggregate reports, compare between reports, or do other analysis of pooled reporting data. Changes in the templates create incompatible data. All these changes, which come un-announced and without explanation, throw a monkey-wrench into “transparency”. At least this is a great learning experience. In addition to having structured data made available in open, machine-readable formates (ideally XML), we need to have some stability in the schemas used in the reporting data. Making data incompatible with last months reporting is just not helpful.

However, I am not in favor of setting a schema down in stone. Again, we’re all learning about how to “do transparency”, and it may be some changes in the schemas of reports will be very needed and helpful. For instance, as Erik Wilde noted, the latest reports from Recovery.gov have geographic information, and this opens up great possibilities for geographic analyses and visualizations. So kudo’s to the good folks at Recovery.gov for making this change!! At the same time however, while we need to be flexible and handle new requirements for our reporting data, backwards compatibility must be maintained. Ideally, reporting information should be made available in easily extensible schemas, and there should be good processes to determine how updates to these schemas will be made.

Government transparency, while superficially about access to information, is a much larger and more difficult subject. Their are important architectural issues as discussed by Erik Wilde and myself. In addition, the experience watching Recovery.gov and its changing templates also highligh how change managment is a critical concern for transparency advocates.

A nice overview of the many digital preservation project that are going on can be found in a recent article in the Wall Street Journal. It focuses on often-crumbling manuscripts and texts but is still interesting for archaeologists too (thanks to Jack Sasson for the tip).

The Next Age of Discovery by A. Alter, in WSJ, May 8, 2009

A quick note to draw attention to an article in the latest issue of The Art Newspaper: “Facebook is more than a fad—and museums need to learn from it.”

A few quotes: “Social networks and blogs are the fastest growing online activities, according to a report published in March by research firm Nielsen Online. Almost 10% of all time spent on the internet …” “… a major factor in the success of social networks is that they allow people to select and share content. This has become a hobby, even considered by some to be a serious creative outlet, with web users spending time ‘curating’ their online space. Museums are well placed to appeal to this new generation of ‘curators’because they offer rich and interesting content that can be virtually ‘cut-up’ and stuck back together online in numerous different ways to reflect the individual tastes of each user. If remixing, reinterpreting and sharing interesting content is, as Nielsen suggests, the kind of engaging interaction that draws people to social networks, then museums should embrace the idea that ‘everyone is a curator’, both online and offline.” “For example, the Art Museum of Estonia has gone against convention by actively encouraging visitors to photograph its collection; the MoMA website helps users to co-create content and share these creations with friends.”

DDIG member, Prof. Peter Bleed (University of Nebraska), sent this announcement of a website describing his research investigating battlefields of the Spanish-American War.

The website, with a rich array of maps, description, and images, is found at: http://cdrh.unl.edu/cubanbattlefields/

Check it out!

A series of lectures at Georgia Tech are now viewable online. They are interesting for all scholars of the digital inclination. For instance, Cliff Lynch, Executive Director of the Coalition for Networked Information, spoke on A Changing Society, Changing Scholarly Practices, and the New Landscape of Scholarly Communication. Other topics are The Current State of Journal Publishing & Open Access Journals 2.0, Repository Programs: What Can They Do for Faculty, Cyber Infrastructure: Removing Barriers in Research and Scholarly Communications.

Also, a new report is now available as a pdf download: Working Together or Apart: Promoting the Next Generation of Digital Scholarship. Report of a Workshop Cosponsored by the Council on Library and Information Resources and The National Endowment for the Humanities, March, 2009. 78 pp. “As part of its ongoing programs in digital scholarship and the cyberinfrastructure to support teaching, learning and research, … CLIR in cooperation with the … NEH held a symposium on September 15, 2008 in which a group of some 30 leading scholars was invited to
• articulate the research challenges that will use the new media to advance the analysis and interpretations of text, images and other sources of interest to the humanities and social sciences
• and in so doing, pose interesting problems for ongoing computational research.”

The Art Newspaper of 4-17-09 has an interesting article on an archaeological issue in Indonesia that has reached the highest level of government. It’s not everyday you see a minister apologize about disrespecting an archaeological site. There is hope after all! See the article for details.

Here’s some great news (esp. considering current economic conditions!) for those of you interested in digital data and archaeology:

Digital Antiquity Seeks a Founding Executive Director

Digital Antiquity seeks an entrepreneurial and visionary Executive Director who can play a central role in transforming the discipline of archaeology by leading the establishment of an on-line repository of the digital data and documents produced by archaeological research in the Americas. Digital Antiquity is a national initiative that is generously funded by the Andrew W. Mellon Foundation.

The Executive Director oversees all Digital Antiquity activities, including hiring and supervising staff, marketing repository services to the professional community, guiding software development, and managing acquisition of repository content.

During its startup phase, Digital Antiquity resides within Arizona State University and the Executive Director will hold the position of Research Professor at ASU with a 12 month, renewable appointment, excellent benefits, and a rank and attractive salary commensurate with experience. A fixed term secondment or IPA (paid transfer from another position) would also be considered.

A link to the full job announcement may be found at http://www.digitalantiquity.org/confluence/display/DIGITAQ/Executive+Director+Search. Interested individuals may also contact Keith Kintigh (kintigh@asu.edu) for more information. Consideration of applications will begin May 1, 2009 and will continue until the position is filled .

DDIG Meeting, Friday April 24:

A final reminder— Please mark your calendars for the Digital Data Interest Group meeting, taking place next Friday, April 24th, from 6:30 – 7:30pm (Atlanta Marriott, Room L504/505). Non-DDIG members are also welcome to attend.

Web Tools Survey and Free Drinks:

Fill out a short survey about web tools and receive a free drink at the DDIG meeting! There are still a few drink coupons left, so hurry on over! The survey will close on Tuesday, April 21st. You can access it by clicking here or following this link:

http://www.surveymonkey.com/s.aspx?sm=Zs8zvtJye2vv7CNKItaHyw_3d_3d

Even if you’re not attending the upcoming SAA meeting, your thoughts and insights are valuable to us and we encourage you to take the survey anyway. An overview of the survey results will be posted on this blog in May.

SAA 2009 DDIG-Related Events:

Below I have identified (in order of occurrence) some of the workshops, sessions, individual papers and posters related to DDIG subject areas (please note- I have tried to be inclusive, but be sure to peruse the entire program for other presentations of interest):

  • [1A] WORKSHOP: New Developments in the Preservation of Digital Data for Archaeology (Wed. April 22, 1 – 4:30 pm; Room: L404)
  • [2B] WORKSHOP: Using High Precision Laser Scanning to Create Digital 3D Versions of Archaeological Materials for Analysis and Public Interpretation (Thurs. April 23, 8:30 am – 12:00pm; Room: L404)
  • [37] PAPER: Keith Kintigh and Jeffrey Altschul—Sustaining the Digital Archaeological Record (Thurs. April 23, 2pm; Room M202)
  • [40] GENERAL SESSION: Tracing Trails and Modeling Movement: Understanding Past Cultural Landscapes and Social Networks Through Least-Cost Analysis (Thurs. April 23, 1 – 3:45 pm; Room: M302)
  • [43] PAPER: Ivan Davis, Andy Bean and John Hall—The Statistical Research, Inc., Database (SRID): Flexible Integration of Large Diverse Datasets (Thurs., April 23, 1pm; Room M304)
  • [53] POSTER: Tamara Whitley and Elyssa Gutbrod—A GIS Analysis of Spatial Data From the Carrizo Plain National Monument (Thurs., April 23, 4 – 6pm; Room: Marquis Lobby)
  • [88] POSTER: David Anderson, D. Shane Miller, Derek T. Anderson, Stephen J. Yerka and Ashley Smallwood—Paleoindians in North America: Evidence from PIDBA (Paleoindian Paleoindian Database of the Americas) (Fri., April 24, 9 – 11am; Room: Marquis Lobby)
  • [88] POSTER: R. Kyle Bocinsky—Understanding and modeling turkey domestication in the American Southwest: A preliminary simulation module for Repast (Fri. April 24, 9 – 11am; Room: Marquis Lobby)
  • [88] POSTER: Amy Wood and Christopher McDaid—17th Century Predicitve Modeling in the Chesapeake (Fri. April 24, 9 – 11am; Room: Marquis Lobby)
  • [99] POSTER: Susan Gillespie, Joshua Toney and Michael Volk—Mapping La Venta Complex A: Archival archaeology in the Digital age (Fri. April 24, 12 – 2pm; Room: Marquis Lobby)
  • [130] POSTER: Lucy Burgchardt, William T. Whitehead, Jonathan Palacek and Emily Stovel—A Database of South American Ceramics: Phase 2 (Fri., April 24, 3 – 5pm; Room: Marquis Lobby)
  • [134] GENERAL SESSION: Digital Data (Sat. April 25, 8 – 9:30am; Room: International A)
  • [147] POSTER: Britton Shepardson and Tim Jeffryes—Making GIS Data Accessible and Public: Terevaka.net Data Community (Sat. April 25, 9 – 11am; Room: Marquis Lobby)
  • [157] PAPER: John Chamblee and Mark Williams—Almost There! CRM Data and Macroregional Analysis in Georgia (Sat., April 25, 11:15am; Room: M302)
  • [167] PAPER: Carlos Zeballos Velarde—Landscape 3d Modeling And Animation For Public Outreach And Education (Sat. April 25, 3:45pm; Room: M202)
  • [174] POSTER: Thomas Penders, Lori Collins and Travis Doering—High Definition Digital Documentation of the Beehive Blockhouses, Launch Complex 31/32, Cape Canaveral Air Force Station, Brevard County, Florida (Sat. April 25, 2 – 4pm; Room: Marquis Lobby)
  • [174] POSTER: Mark Woodson and Angela Keller—Virtual Data: Making Web-based Data Sharing Work for Archaeology (Sat. April 25, 2 – 4pm; Room: Marquis Lobby)
  • [180] PAPER: Philip Mink—Investigating Grand Canyon Cultural Landscapes AD 400 – AD 1250: Recent Geophysical and Geospatial Mapping and Modeling (Sat. April 25, 3:30pm; Room M103)
  • [180] PAPER: Glendee Ane Osborne—Using Spatial Data Modeler for Predictive Modeling: Application on the Shivwits Plateau, NW AZ (Sat. April 25, 4:00pm; Room M103)

There’s a fairly close allignment of interests and goals between the folks working for open access to scholarship and open data in science (one of the main themes of this blog), and the folks working for greater government transparency. As is the case with science and scholarship, access government data can enhance participation (of the civil society kind) and accountability. Our recent work relating to Recovery.gov (here, and here), attempted to bring some of the experience we had in “open data” (for science) to open data for government.

Initially, we were very optimistic. The Office of Management and Budget (OMB) issued guidelines on Feb 18th that required individual agencies participating in the recovery effort to publish feeds that disclosed important information about their actions, spending, and who recieved money. The great thing about these guidelines was that the very agencies who spent recovery dollars would reveal exactly how they spent the money. There were many missing pieces and unanswered questions in these guidelines, and my colleagues Erik Wilde, Raymond Yee, and I tried to fill in these blanks with this report and demonstration implementation.

However, OMB just issued a new set of revised guidelines that represent a big step backwards from their initial call for decentralized disclosure [UPDATED WITH CLARIFICATION SEE BELOW]. The decentralized approach is now replaced by a centralized approach of having Recovery.gov publish all the data. All the information flows from the agencies, to OMB, to Recovery.gov will be opaque to the public. (Actually, according to the guidelines, much of this will take place via email).

This issue of centralization marks how our group diverges with other transparency advocates. For example, the transparency advocacy group OMB Watch explicitly called for a “Centralized Reporting System” (page 9 of this report). [UPDATED WITH CLARIFICATION SEE BELOW]. While in some ways convenient, centralization is not required, and in, our view, works against transparency. First off, feeds can be readily aggregated. With feeds, the disclosure reports of distributed agencies can be brought together for convenience and “one stop shopping” monitoring. Secondly, the call for a centralized reporting source means that all the data gathering and reporting processes happen behind the scenes in a manner that is not publicly visible. What’s happening in these back-end processes? How is the data being managed and processed? How is it transformed? You end up with “black-box transparency” which is obviously an oxymoron.

But this gets to the heart of the issue. Transparency advocacy groups need to be much more aware of the architecture issues behind “transparency”. Access to data is not enough. The processes behind how the data is gathered, processed, and published also matter.

There’s much more to say about this issue, but in the interim, please look at Erik Wilde’s detailed discussion about why architectures of transparency matter.

Update:Over at the “Open House” discussion list, Gary Bass made an important comment regarding OMB Watch’s position on “centralization”. He wrote:

For the record, and to clarify your blog post, at no time did OMB Watch ever support only sending information to OMB to build a single database.  OMB Watch has always supported comprehensive machine readable feeds (APIs and syndications) from agencies. I also believe that is OMB’s intent based on our reading of the guidance.

His comment and statement on this matter is very welcome, and I stand corrected. I’m glad that this important organization is taking a thoughtful position on this matter.

UPDATE about OMB’s Guidelines. Regarding page 68 of the OMB revised guidelines. It still says feeds are required, then a few lines down the text says that if an agency is unable to publish a feed, it can do something else (with some instructions about how to do the alternative). Of a 172 page document, only 3 pages (68-70) discuss feeds and their implementation. This suggests that feeds are being dropped as a vehicle for disclosure.

The annual Digital Data Interest Group meeting will take place on Friday April 24th at 6:30pm (Atlanta Marriott, Room L504/505).

We have a special offer for DDIG members this year: You can receive a coupon for a free drink from the DDIG meeting room bar! Simply take part in a short (10-15 minute) survey about web tools for publishing archaeological data by clicking here or following this link:

http://www.surveymonkey.com/s.aspx?sm=Zs8zvtJye2vv7CNKItaHyw_3d_3d

The first 50 respondents will receive a free drink coupon by email. Bring your coupon to the DDIG meeting and join us for drinks and socializing with other DDIG members. We will share the results of this survey will hear opinions and ideas from DDIG members about promoting better use of web technologies in archaeology.

Even if you’re not attending the upcoming SAA meeting, your thoughts and insights are valuable to us and we encourage you to take the survey anyway! An overview of the survey results will be posted on this blog in May.

Next Page »