Reading the recent posts by Fennelle Miller and Kevin Schwarz got me to look into the spatial data a bit more closely. One of the issues that seems to crop up again and again is cost and complexity.

GIS data is still difficult to share dynamically over the Web, but things are changing. GoogleEarth, Google Maps, Open Layers, etc. provide great tools on the client side for viewing and interacting with spatial data (not just points too, but also vector lines and polygons). GoogleEarth and Google Maps are proprietary, but they are available as free downloads or free APIs. They also work with an XML format (KML) that is pretty simple, enjoys a wide user-community and can work with non-Google developed tools.

There are some tools for transforming the ubiquitous ESRI shape files into KML documents (the XML format used by Google’s applications for spatial data)(See this blog post at PerryGeo, see also the post’s comments). Here’s a link to some “how to” discussions on using PHP to read MapInfo (.mif) files to use with Google Maps. Here’s a link to an open source PHP class that reads ESRI shape files, the first step needed in converting them on a server to KML or other formats. The point of all this is that, with some development work, we can transform (to some degree at least) typical GIS data into formats work better on the Web.

Of course, GML (the community developed open standard) is a better choice for GIS data than KML. KML is needed for Google’s great and easy to use visualization tools, but GML is a much more robust standard for GIS data. GML also has the advantage of being an open, non-proprietary XML format. You’re not locked into any one software vendor and you have important data longevity advantages with GML. It should be noted that Open Layers (the open source equivalent of Google Maps) supports GML.

However, I’m not sure of the immediate need to go through all this effort. Sure it’s nice to have GIS data easily viewable on a web-browser or slick visualization tool like GoogleEarth. But the fundamentals of data access, longevity and discovery need to be in place first before we put lots of effort into online visualization.

Instead, we should look at some strategies to make our GIS data easier to find and maintain. And we need to approach the issue pragmatically, since overly complex or elaborate requirements will mean little community uptake. Perhaps we can look at ways of registering GIS datasets (ideally stored in GML) in online directories with some simple metadata (“information about information”). A dataset’s general location (say Lat / Lon point coordinates), some information about authorship, keywords, etc. and a stable link to download the full GIS dataset would be an excellent and simple start. Simple point data describing the general location of a project dataset will be enough to develop an easy map interface for users to find information about locations.

Such directories can be maintained by multiple organizations, and they can share/syndicate their content with tools such as GeoRSS feeds (RSS with geographic point data). It’s easy to develop aggregation services from such feed. You can also use something like Yahoo Pipes to process these feeds into KML formats for use in GoogleEarth! (We do that with Open Context, though it still needs some trouble shooting).

Also, Sean Gilles (with the Ancient World Mapping Center) is doing some fantastic work on “Mush” his project for processing of GeoRSS feeds. See this post and this post for details and exanples. Thus, simple tools like GeoRSS feeds we can contribute toward a low-cost distributed system that makes archaeological datasets much easier to find and discoverable with map-based interfaces and some types of spatial querying (such as buffers). This may be a good way to address some of Fennell Miller’s concerns about recovering and reusing all that hard-won geospatial data.

Of course, site security is an important issue, and finding ways of making our data as accessible as possible without endangering sites or sacred locations is important. I’m glad Kevin Schwarz raised the issue, and it’ll be very useful to learn more about how he and his colleagues are dealing with it.