I recently had a chance to take at look at the current state of play with the Recovery Act transparency measures. It seems that in the next month or so, some critical decisions will be made, and these decisions will likely have a profound impact on the shape of government transparency measures in the future.

Next week, OMB will issue new guidance for how agencies are required to report on their Recovery related activities. Also, it looks like there will be some bidding or other processes for contracting out the work of developing a more robust infrastructure and reporting system for the Recovery. Once Recovery related contracts and grants are made, there will be a tremendous volume of reports that will need management and dissemination. After all nearly $800 billion in spending, spread over several agencies, and countless recipients and sub-contractors, can generate a great deal of financial information.

So, while these plans are being formulate, it is useful to take stock of where we now stand. Recovery.gov still offers reporting information in HTML and Excel formats. These formats are clearly not adequate to the task of public reporting, since they both require use of custom developed software scrapers, and these scrapers are not reliable. The scrapers are also difficult to maintain. In monitoring Recovery.gov, we’ve noticed that they seem to introduce a new Excel template every month or so. These templates alter how reporting data is expressed. The may add or drop fields and change layouts. All of these changes can play havoc with our scrapers. In fact we usually notice a new template when our scraper crashes.

But just as importantly, constant change in the templates (and schemas) of the reporting data makes it very difficult to aggregate reports, compare between reports, or do other analysis of pooled reporting data. Changes in the templates create incompatible data. All these changes, which come un-announced and without explanation, throw a monkey-wrench into “transparency”. At least this is a great learning experience. In addition to having structured data made available in open, machine-readable formates (ideally XML), we need to have some stability in the schemas used in the reporting data. Making data incompatible with last months reporting is just not helpful.

However, I am not in favor of setting a schema down in stone. Again, we’re all learning about how to “do transparency”, and it may be some changes in the schemas of reports will be very needed and helpful. For instance, as Erik Wilde noted, the latest reports from Recovery.gov have geographic information, and this opens up great possibilities for geographic analyses and visualizations. So kudo’s to the good folks at Recovery.gov for making this change!! At the same time however, while we need to be flexible and handle new requirements for our reporting data, backwards compatibility must be maintained. Ideally, reporting information should be made available in easily extensible schemas, and there should be good processes to determine how updates to these schemas will be made.

Government transparency, while superficially about access to information, is a much larger and more difficult subject. Their are important architectural issues as discussed by Erik Wilde and myself. In addition, the experience watching Recovery.gov and its changing templates also highligh how change managment is a critical concern for transparency advocates.