open data


It’s getting to the end of the year, and I’m feeling a little retrospective and I’m (anxiously) looking forward to the future. We have enjoyed a great year with Open Context (see here).

More generally, it’s obviously been a big year for all things “open.” The White House has embraced Open Access and Open Data policies, and even recognized the work of some advocates of reform, and that has been hugely exciting. It seems that the arguments for greater openness have finally led to some meaningful changes. All of these are signs of real progress.

However, I’m increasingly convinced that advocating for openness in research (or government) isn’t nearly enough. There’s been too much of an instrumentalist justification for open data an open access. Many advocates talk about how it will cut costs and speed up research and innovation. They also argue that it will make research more “reproducible” and transparent so interpretations can be better vetted by the wider community. Advocates for openness, particularly in open government, also talk about the wonderful commercial opportunities that will come from freeing research.

This last justification boils down to creating a “research commons” in order to remove impediments for (text, data) mining of that commons in order to foster entrepreneurialism and create wealth. This is pretty explicit here in this announcement from Europeana, the EU’s major open culture system (now threatened with devastating cuts). I don’t have a problem with wealth creation as an outcome of greater openness in research. Who doesn’t want more wealth? However we need to ask about wealth creation for whom and under what conditions? Will the lion’s share of the wealth created on newly freed research only go to a tiny elite class of investors? Will it simply mean a bit more profit for Google and a few other big aggregators? Will this wealth be taxed and redistributed enough to support and sustain the research commons exploited to feed it? The fact that the new OSTP embrace of Open Data in research is an unfunded mandate makes me worry about the prospect of “clear-cutting” the open data commons.

These are all very big policy issues, but they need to be asked if the Open Movement really stands for reform and not just a further expansion and entrenchment of Neoliberalism. I’m using the term “Neoliberalism” because it resonates as a convenient label for describing how and why so many things seem to suck in Academia. Exploding student debt, vanishing job security, increasing compensation for top administrators, expanding bureaucracy and committee work, corporate management methodologies (Taylorism), and intensified competition for ever-shrinking public funding all fall under the general rubric of Neoliberalism. Neoliberal universities primarily serve the needs of commerce. They need to churn out technically skilled human resources (made desperate for any work by high loads of debt) and easily monetized technical advancements.

This recent White House announcement about making universities “partner at the speed of business” could not be a clearer example of the Neoliberal mindset. It was written by Tom Kalil, one of the administration’s leading advocates for open science. The same White House that has embraced “open government,” “open science,” and “open data” has also ruthlessly fought whistle-blowers (Snowden), perpetuated ubiquitous surveillance (in conjunction with telecom and tech giants), hounded Aaron Swartz (my take here), and secretly negotiated the TPP, a far reaching expansion of intellectual property controls and punishments. All of these developments happened in a context of record corporate profits and exploding wealth inequality. And yes, I think these are all related trends.

How can something so wonderful and right as “openness” further promote Neoliberalism? After all, aren’t we the rebels blasting at the exhaust vents of Elsevier’s Death Star? But in selling openness to the heads of foundations, businesses, governments and universities, we often end up adopting the tropes of Neoliberalism. As a tactic, that’s perfectly reasonable. As a long-term strategy, I think it’s doomed.

The problem is not that the Open Movement is wrong. The problem is that the need for reform goes far deeper than simply making papers and data available under CC-By or CC-Zero. Exploitative publishing regimes are symptomatic of larger problems in the distribution of wealth and power. The concentration of wealth that warps so much of our political and economic life will inevitably warp the Open Movement toward unintended and unwanted outcomes.

Let them Eat Cake Open Data

Let’s face it. Most researchers that I know who are lucky enough to be employed are doing the work of 4 or 5 people (see also this paper by Rosalind Gil). Even some of my friends, lucky enough to have tenure or tenure-track positions, seem miserable. Maybe it’s survivor guilt, but they are stressed, distracted, and harried. Time and attention are precious and spent judiciously, usually in a manner where rewards are clear and certain. Data management plans, data sharing or collaboration on GitHub? Who has time for all that?! They don’t count for much in the academic rat-race, and so the normative reward structures in the Academy create perverse incentives for neglecting or outright hoarding of data.

Data sharing advocates talk about how data should get rewarded just like other forms of publication. Data should “count” with measurable impacts. As a data sharing advocate, much of this really does appeal to me. Making data sharing and collaboration part of the mainstream would be fantastic. If we convince universities to monitor data citation metrics, they can “incentivize” more data sharing. We can also monitor participation in social media (Twitter), version control (GitHub), etc. All of these statistics can be compiled and collated to provide an even more totalizing picture of a researcher’s contributions.

But are more metrics (even Alt-metrics) really the solution to the perverse incentives embodied by our existing metrics? The much derided “Impact Factor” started out as a way for librarians to make more informed choices about journal subscriptions (at least according to this account). In that context, the Impact Factor was relatively benign (see this history), but it then became a tool for Taylorism and the (coercive) monitoring of research outputs by university bureaucracies. That metric helps shape who gets hired and fired. And while metrics can be useful tools, the Impact Factor case shows hows metrics can be used by bureaucracies to reward and punish.

What does all of this have to do with the Open Movement?

One’s position as a subordinate in today’s power structures is partially defined by living under the microscope of workplace monitoring. Does such monitoring promote conformity? The freedom, innovation, and creativity we hope to unlock through openness requires greater toleration for risk. Real and meaningful openness means encouraging out-of-the-ordinary projects that step out of the mainstream. Here is where I’m skeptical about relying upon metrics-based incentives to share data or collaborate on GitHub.

By the time metrics get incorporated into administrative structures, the behaviors they measure aren’t really innovative any more!

Worse, as certain metrics grow in significance (meaning – they’re used in the allocation of money), entrenched constituencies build around them. Such constituencies become interested parties in promoting and perpetuating a given metric, again leading to conformity.

Metrics, even better Alt-metrics, won’t make researchers or research more creative and innovative. The crux of the problem centers A Hunger Games-style “winner take all” dynamic that pervades commerce and in the Academy. A rapidly shrinking minority has any hope of gaining job security or the time and resources needed for autonomous research. In an employment environment where one slip means complete ejection from the academy, risk-taking becomes quasi-suicidal. With employment increasingly precarious, professional pressures balloon in ways that make risk taking and going outside of established norms unthinkable. Adding more or better metrics without addressing the underlying job security issues just adds to the ways people will be ejected from the research community.

Metrics, while valuable, need to carry fewer professional consequences. In other words, researchers need freedom to experiment and fail and not make every last article, grant proposal, or tweet “count.”

Equity and Openness

“Big Data,” “Data Science,” and “Open Data” are now hot topics at universities. Investments are flowing into dedicated centers and programs to establish institutional leadership in all things related to data. I welcome the new Data Science effort at UC Berkeley to explore how to make research data professionalism fit into the academic reward systems. That sounds great! But will these new data professionals have any real autonomy in shaping how they conduct their research and build their careers? Or will they simply be part of an expanding class of harried and contingent employees- hired and fired through the whims of creative destruction fueled by the latest corporate-academic hype-cycle?

Researchers, including #AltAcs and “data professionals”, need  a large measure of freedom. Miriam Posner’s discussion about the career and autonomy limits of Alt-academic-hood help highlight these issues. Unfortunately, there’s only one area where innovation and failure seem survivable, and that’s the world of the start-up. I’ve noticed how the “Entrepreneurial Spirit” gets celebrated lots in this space. I’m guilty of basking in it myself (10 years as a quasi-independent #altAc in a nonprofit I co-founded!).

But in the current Neoliberal setting, being an entrepreneur requires a singular focus on monetizing innovation. PeerJ and Figshare are nice, since they have business models that less “evil” than Elsevier’s. But we need to stop fooling ourselves that the only institutions and programs that we can and should sustain are the ones that can turn a profit. For every PeerJ or Figshare (and these are ultimately just as dependent on continued public financing of research as any grant-driven project), we also need more innovative organizations like the Internet Archive, wholly dedicated to the public good and not the relentless pressure to commoditize everything (especially their patrons’ privacy). We need to be much more critical about the kinds of programs, organizations, and financing strategies we (as a society) can support. I raised the political economy of sustainability issue at a recent ThatCamp and hope to see more discussion.

In reality so much of the Academy’s dysfunctions are driven by our new Gilded Age’s artificial scarcity of money. With wealth concentrated in so few hands, it is very hard to finance risk taking and entreprenurialism in the scholarly community, especially to finance any form of entrepreneurialism that does not turn a profit in a year or two.

Open Access and Open Data will make so much more of a difference if we had the same kind of dynamism in the academic and nonprofit sector as we have in the for-profit start-up sector. After all, Open Access and Open Data can be key enablers to allow much broader participation in research and education. However, broader participation still needs to be financed: you cannot eat an open access publication. We cannot gloss over this key issue.

We need more diverse institutional forms so that researchers can find (or found) the kinds of organizations that best channel their passions into contributions that enrich us all. We need more diverse sources of financing (new foundations, better financed Kickstarters) to connect innovative ideas with the capital needed to see them implemented. Such institutional reforms will make life in the research community much more livable, creative, and dynamic. It would give researchers more options for diverse and varied career trajectories (for-profit or not-for-profit) suited to their interests and contributions.

Making the case to reinvest in the public good will require a long, hard slog. It will be much harder than the campaign for Open Access and Open Data because it will mean contesting Neoliberal ideologies and constituencies that are deeply entrenched in our institutions. However, the constituencies harmed by Neoliberalism, particularly the student community now burdened by over $1 trillion in debt and the middle class more generally, are much larger and very much aware that something is badly amiss. As we celebrate the impressive strides made by the Open Movement in the past year, it’s time we broaden our goals to tackle the needs for wider reform in the financing and organization of research and education.

Editing Note: fixed a few typos on Friday, Dec. 13, 2013.

Mitch Allen, a publisher that I greatly respect, commented on my blog posts about Aaron Swartz and scholarly communications in archaeology. His comments got me thinking again about the issue in some depth, and I want to take the opportunity to write about it in preparation for the SAA conference in Hawaii.

Allen thought I was probably overstating the legal issues associated with sharing logins and sharing files to get scholarly publications. Sadly, I don’t think my statements were hyperbole:

  • Sharing logins to gain access to university library systems can involve grave legal risks. It violates the same sort of violations of terms-of-service that made Aaron Swartz face 50 years in prison. For instance, JSTOR’s terms of service (that Swartz allegedly violated in his felony charges) specifically prohibited actions like sharing logins.
  • Sharing papers (mainly in email, but also social networking sites) also carries risks, mainly in civil and not criminal law (but that could change if something like SOPA passes). Mass copyright lawsuits with financially ruinous penalties happen- even involving 100,000 people at a time, including children.
  • Litigiousness has entered the scholarly domain. University presses are suing universities over e-reserves to curtail “fair-use” (limitations in copyright law to allow research, instruction, critique, free speech).
  • Law Prof. John Tehranian published a study where he calculated a jaw-dropping $4.5 billion in potential copyright liability involved in routine academic research and instructional activities over the course of a single year.

I think the evidence is clear that current intellectual property rules carry significant legal risks for everyone. It’s worse for researchers at the margins of the profession who lack their own institutional logins.

Normative Publishing Practices and Antiquities Trading

Network security laws and copyright laws are unjust because they carry such disproportionate penalties. Huge commercial scientific publishers like Elsevier push to further strengthen these draconian laws. Elsevier lobbied in favor of SOPA, a bill that would have made even non-commercial infringement a felony offense. That would have put many routine library activities at risk. Copyright has expanded in scope into a more or less absolute and perpetual property right. No US copyrighted works entered into the public domain last year.

Like it or not (and I don’t), this legal context shapes academic communication and shapes its ethics. Regarding my point about the antiquities trade, yes, that was purposeful polemic to highlight these ethical issues. To expand on this point, if archaeologists only communicate their results as all-rights-reserved intellectual property, they’re clearly engaged in a form of appropriation. The (more or less) absolute (no fair use) and perpetual (de facto unlimited copyright terms) nature of these property rights increasingly excludes all uses, save commercial transactions. Doesn’t that reduce the scholarly record of the past into commodities?

Status quo publishing practices also carry similar destructive externalities as the antiquities trade. In the antiquities trade, only beautiful or rare objects get valued and contextual information is neglected and destroyed because it has no market value. How different is Academia then, when researchers think that only the final polished article or monograph has any value? What happens to all of that rich contextual information that can’t be squeezed into a 10 page paper? While researchers have very different and much more pro-social goals than antiquities traders, publishing incentives and practices clearly need to better align to those goals.

Open Access and Commerce

Lastly, the open access and open data movements are not anti-commercial. The public good that comes from public financing of research means making information resources that can be used commercially. The normative definitions of “Open Data” explicitly allow for commercial uses, as do open access publishers like PLoS. With Open Context, we happily work with commercial publishers to try to build incentives for the better treatment of primary data.

While Open Data and Open Access are not (usually) anti-commercial, these movements are anti-monopoly. They grew in response to the increasing absurdities of global intellectual property regimes that perpetuate monopolies of big media conglomerates. My objection to the status quo is not that publishing involves commerce, I object to fact that we’re largely failing to make any public goods (despite public funding), since the vast majority of academic communication happens in a monopolistic and exclusionary way.

Getting Past the Dysfunctional Status Quo

Something is obviously very screwed-up when university presses sue universities over e-reserves and many researchers lack the means to legally participate in their discipline’s communications. I don’t think the current situation works to anyone’s interest, except for large conglomerates like Elsevier. It certainly doesn’t help small publishers like Left Coast Press, since the cost escalations of the big commercial science publishers mean less budget to buy humanities and social science books (as eloquently noted by Cathy Davidson). It is self-defeating for archaeology’s professional societies to fight (or avoid) open access, since they are simply helping to perpetuate cost-escalations in the areas of scientific publishing (chemistry, biology, computer science) that university administrators prioritize over the humanities and social sciences. Our professional societies need to consider this larger economic reality when determining their positions on open access.

The work of publishers like Mitch Allen are important to the health of archaeology. His efforts add value and quality to archaeological communications. I am very open to debate about what constitutes the right balance between public and private in archaeology’s information resources and also a debate about how we finance quality publishing. However, I stand by my point that our current policy of investing almost nothing in public (open) information resources hurts our discipline and puts many of its practitioners in legal jeopardy.

UPDATE

Lawyers at the Electronic Frontier Foundation just posted a piece about the issues of felony violations of terms of service. Look at Point 4, substitute Pandora with JSTOR or a university library and you’ll see how all this applies to scholarship. See also this discussion of library licensing terms, since:

It is, however, very clear that licensing terms, which govern an increasingly large proportion of our collections, are a fundamental issue in the present and future usability of library resources by our campus populations.

 

 

In case you all didn’t know, today is the last day of 6th annual Open Access Week. I’ve been very busy lately with software updates to Open Context, an open access data publishing service for archaeology, so I haven’t had a chance to cover archaeology developments as much as I would like.

However, I recently submitted a paper about open access in archaeology that was accepted to a special issue of World Archaeology.  Like most of archaeology’s mainstream, conventional journals, World Archaeology is a closed, toll-access venue. Participating in this kind of publishing is not ideal, since it perpetuates a high cost scholarly communications system that impedes access, opportunities for new research (especially text-mining), and uses public research funding to, in effect, subsidize the creation of private intellectual property. Most people who read blogs like this know the story.

However, I decided to publish there because I thought it important to reach a different audience, one that does not follow blogs or discussions about scholarly communications. Mainstream archaeology needs to participate in arguments about open access, and needs to understand why open access is an important issue. The highly problematic stance of the Archaeological Institute of America serves as a case in point (see Ancient World Online, Doug’s Archaeology, and this letter Jessica Ogden wrote that I co-signed).

My paper introduces some of the basic arguments in favor of open access to a mainstream archaeological audience. None of these arguments are especially new to folks following the issue on the Web, but I think it’s useful to enter into a conversation with other members of our profession less familiar with the topic. Also, the paper introduces ideas about Open Data, a related area of innovation in researcher communications.

One area that I touch on in this paper is an issue of “open architectures.” It’s an emerging area of interest to me, and one where I’m still formulating some thoughts. But I think it’s as important an issue as licensing and access for the future of archaeological communications. It directly touches on the issue of centralization and decentralization in archaeological information systems. Centralization can save money, and has other efficiencies, especially in performance for searches and analysis. However, it can also reduce and constrain freedom and innovation, since implementation choices, technologies, interfaces, and development directions are under control of one group with its own set of agendas. Decentralization, on the other hand, allows wider participation and choice in development strategies. However, decentralization can dilute resources too widely, leading to lots of varied, under-supported, and poorly coordinated implementations. Decentralized systems can also have performance and user experience problems. For instance, a distributed search across lots of different systems involves many trade-offs. It  is only as fast as the slowest  participant in the distributed networked offering search results.

I wonder about ways we can reconcile the polar opposites of centralized versus decentralized systems. When you think about it, the distinction between centralization and decentralization depends on how narrowly or broadly you see your environment. In archaeology, the big centralized systems are the Archaeology Data Service repository and the tDAR repository. But, in the larger world of scholarly communications and scientific data sharing, these are just two of a wide number of systems serving different constituencies. Which gets me to the point of this post.

Openness and interoperability are vital because even big and centralized systems (within the scope of archaeology) are still small when one considers the bigger picture of the world of research. This is particularly important for archaeology, because archaeology is inherently multidisciplinary. We will always need to link and reference data and other content from other disciplines. Those disciplines will have their own data systems and repositories. So we can’t escape the need to think about building distributed systems.

Can we find ways to have our cake and eat it too, and enjoy benefits of both approaches while mitigating their problems? I think the Pelagios approach may point to a good direction. In Pelagios, several distributed systems offer data according to a simple common standard. The Pelagios team harvested these data and built a centralized index facilitating fast and efficient search and retrieval of resources from these different collections. Pelagios is also interesting because it achieves much with very little effort and cost and its participating collections have such widely varying disciplinary themes and emphases (only some of which were archaeological).

This is an important point. Centralization is indeed useful, but people will need to define the focus of centralization in very different ways, and only sometimes will the need to centralize align with traditional disciplinary boundaries. In a later blog post, I will follow up with more on centralization versus decentralization. But for now,  please enjoy a pre-print draft of my paper on open access for World Archaeology.

Openness and Archaeology’s Information Ecosystem

 

 

Yesterday was Archaeology Day organized by the AIA. (BTW. In case you didn’t notice, despite some prophetic warnings, the world apparently did not end to ruin Archaeology Day).

It’s also Archaeology Month here in California. “Archaeology Months” are sponsored by various state historical societies and various state and federal government agencies. They help spotlight local archaeology and archaeologists, and offer a focus for organizing, reaching out to a larger community and highlighting accomplishments and challenges. The Society for California Archaeology runs an annual great poster competition that helps encapsulate some of the activities of an Archaeology Month.

Which brings us to the last alignment of the calendar that I’ll note. Next week is Open Access Week! Which brings us to a fortuitous alignment in the calendar, especially with respect to the themes long explored by this blog, namely, archaeology and open access.

I see open access (and open data) as an important aspect of making archaeology broadly relevant and a more integral part of scientific, policy, and cultural debates. Open access is a necessary precondition to making archaeology part of larger conversations. It’s also an important issue when so many of our colleagues work outside of university settings and have to live, work, and make their research contributions without access to JSTOR or subscriptions to other publishers. While there’s been lots of discussion about how “grey literature” (that is, research content that’s hard to discover and sees very limited circulation) is bad for the discipline, few in archaeology have noted that many mainstream archaeological journals are “grey literature” to people outside the academy.

Of course, most people, including most archaeologists, are outside of the academy. If we want our publicly supported (through direct funding and grants, or through regulatory mandates) research to have any positive impact to our peers inside and outside of our discipline, we need to consider access issues. At the same time, we need to consider access issues when thinking about how archaeology relates to many different communities in the larger public. From the outset, it’s clear open access is not sufficient in itself to make archaeology intelligible to the public.  It often takes lots of work to help guide non-archaeologists through often very technical archaeological findings.  But at the very least, open access to archaeological literature can make it easier for outside communities to learn, even through simple Google searches, that archaeology has something (though probably very technical) to say on many different issues and many different places.

So, I’m glad these chance calendar alignments help put some focus on these issues.

BTW: In keeping with these themes, the e-journal Internet Archaeology (an essential resource for some of the best in digital archaeology) is going fully open access this week! So fire up Zotero and go get some great papers while you can!

Clifford Lynch drew my attention to “an announcement from the UK Royal Society indicating that in celebration of Open Access week they were opening their entire journal archive for free access till the end of the society’s 350th anniversary year, 30 November 2010. This is a great opportunity to get access to two issues  of Philosophical Transactions of the Royal Society A from August and September 2010 which focus on E-science and contain a number of outstanding papers. See http://rsta.royalsocietypublishing.org/content/368/1925.toc and http://rsta.royalsocietypublishing.org/content/368/1926.toc

A few examples:

  • “Methodological commons: arts and humanities e-Science fundamentals” (abstract and pdf);
  • “Deploying general-purpose virtual research environments for humanities research” (abstract and pdf);
  • “Use of the Edinburgh geoparser for georeferencing digitized historical collections” (abstract and pdf);
  • “Adoption and use of Web 2.0 in scholarly communications” (abstract and pdf);
  • “Retaining volunteers in volunteer computing projects” (abstract and pdf).

figure from “Use of the Edinburgh geoparser for georeferencing digitized historical collections”

I’m pleased to announce that the National Science Foundation (NSF) archaeology program now links to Open Context (see example here). Open Context is an open-access data publication system, and I lead its development.  Obviously, a link from the NSF is a “big deal” to me, because it helps represent how data sharing is becoming a much more mainstream fact of life in the research world. After spending the better part of my post-PhD career on data sharing issues, I can’t describe how gratifying it is to witness this change.

Now for some context: Earlier this year, the NSF announced new data sharing requirements for grantees. Grant-seekers now need to supply data access and management plans in their proposals. This new requirement has the potential for improving transparency in research. Shared data also opens the door to new research programs that bring together results from multiple projects.

The downside is that grant seekers will now have additional work to create a data access and management plan. Many grant seekers will probably lack expertise and technical support in making data accessible. Thus, the new data access requirements will represent something of a burden, and many grant seekers may be confused about how to proceed.

That’s why it is useful for the NSF to link to specific systems and services. Along with Open Context, the NSF also links to Digital Antiquity’s tDAR system (Kudos to Digital Antiquity!). Open Context offers researchers guidance on how prepare datasets for presentation and how to budget for data dissemination and archiving (with the California Digital Library). Open Context also points to the “Good Practice” guides prepared by the Archaeology Data Service (and being revised with Digital Antiquity). Researchers can incorporate all of this information into their grant applications.

While the NSF did (informally) evaluate these systems for their technical merits, as you can see on the NSF pages, these links are not endorsements. Researchers can and should explore different options that best meet their needs. Nevertheless, these links do give grant-seekers some valuable information and services that can help meet the new data sharing requirements.

There’s some thoughtful criticism and discussion about Chogha Mish in Open Context over at Secondary Refuse. I tried to post a comment directly to that blog, but blogger kept giving me an error, so I’m posting here. At least it’s nice to know other systems also have bug issues!

I very much agree with Secondary Refuse’s point about the difficulties associated with data sharing. Data sharing is a complex and theoretically challenging undertaking. However, the problem of mis-use and misintepretation is not something unique to datasets. Journal papers can and are misused both my novices and by even by domain specialists who fail to give a paper a careful read. Despite these problems and potential for misuse, we still publish papers because the benefits outweigh these risks. Similarly, I think we should still publish researcher datasets, because such data can improve the transparency and analytic rigor of analysis.

One of the points of posting the Chogha Mish data was that it helped illustrate some useful points about how to go about data sharing in a better way. If you see the ICAZ Poster associated with the project, there are many recommendations regarding the need to contextualize data (including editorial oversight of data publication). Ideally, data publication should accompany print/narrative publication, since the two forms of communication can enhance each other. Most of the data in Open Context comes from projects with active publication efforts, and as these publications become available, Open Context and the publications will link back and forth.

Regarding why we published these data, the point is to make these available, free-of-charge, and free of copyright barriers for anyone to reuse. These can be used in a class to teach analytic methods (one can ask a class to interpret the kill-off patterns, or ask them to critique the data and probe its ambiguities and limits). It can be used with other datasets for some larger research project involving a regional synthesis. The “About Section” of Open Context explains more.

Last, Secondary Refuse found an interface flaw I had missed. We had a bug where downloadable tables associated with projects weren’t showing up. The bug is fixed and when you look at the Chogha Mish Overview, you’ll find a link to a table you download and use in Excel or similar applications.

Kudos to Secondary Refuse’s author! Feedback like this is really important for us to learn how to improve Open Context. So this is much appreciated!!

We are proud to announce the arrival of a new, exciting project in the Open Context database, co-authored by Levent Atici (University of Nevada Las Vegas), Justin S.E. Lev-Tov (Statistical Research, Inc.) and our own Sarah Whitcher Kansa.

Chogha Mish Fauna

This project uses the publicly available dataset of over 30,000 animal bone specimens from excavations at Chogha Mish, Iran during the 1960s and 1970s.The specimens were identified by Jane Wheeler Pires-Ferreira in the 1960s and though she never analyzed the data or produced a report, her identifications were saved and later transferred to punch cards and then to Excel. This ‘orphan’ dataset was made available on the web in 2008 by Abbas Alizadeh (University of Chicago) at the time of his publication of Chogha Mish, Volume II.

The site of Chogha Mish spans the time period from Archaic through Elamite periods, with also later Achaemenid occupation.  These phases subdived further into several subphases, and some of those chronological divisions are also represented in this dataset. Thus the timespan present begins at the mid-seventh millennium and continues into the third millennium B.C.E. In terms of cultural development in the region, these periods are key, spanning the later Neolithc (after the period of caprid and cattle domestication, but possibly during the eras in which pigs and horses were domesticated) through the development of truly settled life, cities, supra-regional trade and even the early empires or state societies of Mesopotamia and Iran. Therefore potential questions of relevance to address with this data collection are as follows:

  1. The extent to which domesticated animals were utilized, and how/whether this changed over time
  2. The development of centralized places
  3. Increasing economic specialization
  4. General changes in subsistence economy
  5. The development of social complexity/stratification.

Publication of this dataset accompanied a study of data-sharing needs in zooarchaeology. Preliminary results of this study were presented as a poster titled: “Other People’s Data: Blind Analysis and Report Writing as a Demonstration of the Imperative of Data Publication”. The poster was presented at the 11th ICAZ International Conference of ICAZ (International Council for Archaeozoology), in Paris (August 2010), in Session 2-4, “Archaeozoology in a Digital World : New Approaches to Communication and Collaboration”. The poster presented at this conference accompanies this project.

(more…)

Archive ’10, the NSF Workshop on Archiving Experiments to Raise Scientific Standards, was just held on May 25-26 in Salt Lake City—sorry for not announcing this in advance, I just learnt about it myself via Clifford Lynch. The website states: “Archive ’10 will focus on the creation of archives of computer-based experiments: capturing and publishing entire experiments that are fully encapsulated, ready for immediate replay, and open to inspection. It will bring together a few areas of the scientific community that represent fairly advanced infrastructure for archiving experiments and data (physicists and biomedical researchers) with two areas of the computer systems community for which significant progress is still needed (networks and compilers). The workshop will also include experts in enabling technologies and publishing.”

The live video feed doesn’t seem to be working anymore. I hope it will be replaced with an archived version. A few of the position papers that stood out to me are:

This is not exactly archaeology of course but it still is a good idea to check on other disciplines for ideas and experiences.

The National Science Foundation sent out a press release on the new data management requirements for applicants (see earlier post). “[O]n or around October, 2010, NSF is planning to require that all proposals include a data management plan in the form of a two-page supplementary document.” “‘The change reflects a move to the Digital Age, where scientific breakthroughs will be powered by advanced computing techniques that help researchers explore and mine datasets,’ said Jeannette Wing, assistant director for NSF’s Computer & Information Science & Engineering directorate.  ’Digital data are both the products of research and the foundation for new scientific insights and discoveries that drive innovation.’”

Next Page »