public domain


Mitch Allen, a publisher that I greatly respect, commented on my blog posts about Aaron Swartz and scholarly communications in archaeology. His comments got me thinking again about the issue in some depth, and I want to take the opportunity to write about it in preparation for the SAA conference in Hawaii.

Allen thought I was probably overstating the legal issues associated with sharing logins and sharing files to get scholarly publications. Sadly, I don’t think my statements were hyperbole:

  • Sharing logins to gain access to university library systems can involve grave legal risks. It violates the same sort of violations of terms-of-service that made Aaron Swartz face 50 years in prison. For instance, JSTOR’s terms of service (that Swartz allegedly violated in his felony charges) specifically prohibited actions like sharing logins.
  • Sharing papers (mainly in email, but also social networking sites) also carries risks, mainly in civil and not criminal law (but that could change if something like SOPA passes). Mass copyright lawsuits with financially ruinous penalties happen- even involving 100,000 people at a time, including children.
  • Litigiousness has entered the scholarly domain. University presses are suing universities over e-reserves to curtail “fair-use” (limitations in copyright law to allow research, instruction, critique, free speech).
  • Law Prof. John Tehranian published a study where he calculated a jaw-dropping $4.5 billion in potential copyright liability involved in routine academic research and instructional activities over the course of a single year.

I think the evidence is clear that current intellectual property rules carry significant legal risks for everyone. It’s worse for researchers at the margins of the profession who lack their own institutional logins.

Normative Publishing Practices and Antiquities Trading

Network security laws and copyright laws are unjust because they carry such disproportionate penalties. Huge commercial scientific publishers like Elsevier push to further strengthen these draconian laws. Elsevier lobbied in favor of SOPA, a bill that would have made even non-commercial infringement a felony offense. That would have put many routine library activities at risk. Copyright has expanded in scope into a more or less absolute and perpetual property right. No US copyrighted works entered into the public domain last year.

Like it or not (and I don’t), this legal context shapes academic communication and shapes its ethics. Regarding my point about the antiquities trade, yes, that was purposeful polemic to highlight these ethical issues. To expand on this point, if archaeologists only communicate their results as all-rights-reserved intellectual property, they’re clearly engaged in a form of appropriation. The (more or less) absolute (no fair use) and perpetual (de facto unlimited copyright terms) nature of these property rights increasingly excludes all uses, save commercial transactions. Doesn’t that reduce the scholarly record of the past into commodities?

Status quo publishing practices also carry similar destructive externalities as the antiquities trade. In the antiquities trade, only beautiful or rare objects get valued and contextual information is neglected and destroyed because it has no market value. How different is Academia then, when researchers think that only the final polished article or monograph has any value? What happens to all of that rich contextual information that can’t be squeezed into a 10 page paper? While researchers have very different and much more pro-social goals than antiquities traders, publishing incentives and practices clearly need to better align to those goals.

Open Access and Commerce

Lastly, the open access and open data movements are not anti-commercial. The public good that comes from public financing of research means making information resources that can be used commercially. The normative definitions of “Open Data” explicitly allow for commercial uses, as do open access publishers like PLoS. With Open Context, we happily work with commercial publishers to try to build incentives for the better treatment of primary data.

While Open Data and Open Access are not (usually) anti-commercial, these movements are anti-monopoly. They grew in response to the increasing absurdities of global intellectual property regimes that perpetuate monopolies of big media conglomerates. My objection to the status quo is not that publishing involves commerce, I object to fact that we’re largely failing to make any public goods (despite public funding), since the vast majority of academic communication happens in a monopolistic and exclusionary way.

Getting Past the Dysfunctional Status Quo

Something is obviously very screwed-up when university presses sue universities over e-reserves and many researchers lack the means to legally participate in their discipline’s communications. I don’t think the current situation works to anyone’s interest, except for large conglomerates like Elsevier. It certainly doesn’t help small publishers like Left Coast Press, since the cost escalations of the big commercial science publishers mean less budget to buy humanities and social science books (as eloquently noted by Cathy Davidson). It is self-defeating for archaeology’s professional societies to fight (or avoid) open access, since they are simply helping to perpetuate cost-escalations in the areas of scientific publishing (chemistry, biology, computer science) that university administrators prioritize over the humanities and social sciences. Our professional societies need to consider this larger economic reality when determining their positions on open access.

The work of publishers like Mitch Allen are important to the health of archaeology. His efforts add value and quality to archaeological communications. I am very open to debate about what constitutes the right balance between public and private in archaeology’s information resources and also a debate about how we finance quality publishing. However, I stand by my point that our current policy of investing almost nothing in public (open) information resources hurts our discipline and puts many of its practitioners in legal jeopardy.

UPDATE

Lawyers at the Electronic Frontier Foundation just posted a piece about the issues of felony violations of terms of service. Look at Point 4, substitute Pandora with JSTOR or a university library and you’ll see how all this applies to scholarship. See also this discussion of library licensing terms, since:

It is, however, very clear that licensing terms, which govern an increasingly large proportion of our collections, are a fundamental issue in the present and future usability of library resources by our campus populations.

 

 

In case you all didn’t know, today is the last day of 6th annual Open Access Week. I’ve been very busy lately with software updates to Open Context, an open access data publishing service for archaeology, so I haven’t had a chance to cover archaeology developments as much as I would like.

However, I recently submitted a paper about open access in archaeology that was accepted to a special issue of World Archaeology.  Like most of archaeology’s mainstream, conventional journals, World Archaeology is a closed, toll-access venue. Participating in this kind of publishing is not ideal, since it perpetuates a high cost scholarly communications system that impedes access, opportunities for new research (especially text-mining), and uses public research funding to, in effect, subsidize the creation of private intellectual property. Most people who read blogs like this know the story.

However, I decided to publish there because I thought it important to reach a different audience, one that does not follow blogs or discussions about scholarly communications. Mainstream archaeology needs to participate in arguments about open access, and needs to understand why open access is an important issue. The highly problematic stance of the Archaeological Institute of America serves as a case in point (see Ancient World Online, Doug’s Archaeology, and this letter Jessica Ogden wrote that I co-signed).

My paper introduces some of the basic arguments in favor of open access to a mainstream archaeological audience. None of these arguments are especially new to folks following the issue on the Web, but I think it’s useful to enter into a conversation with other members of our profession less familiar with the topic. Also, the paper introduces ideas about Open Data, a related area of innovation in researcher communications.

One area that I touch on in this paper is an issue of “open architectures.” It’s an emerging area of interest to me, and one where I’m still formulating some thoughts. But I think it’s as important an issue as licensing and access for the future of archaeological communications. It directly touches on the issue of centralization and decentralization in archaeological information systems. Centralization can save money, and has other efficiencies, especially in performance for searches and analysis. However, it can also reduce and constrain freedom and innovation, since implementation choices, technologies, interfaces, and development directions are under control of one group with its own set of agendas. Decentralization, on the other hand, allows wider participation and choice in development strategies. However, decentralization can dilute resources too widely, leading to lots of varied, under-supported, and poorly coordinated implementations. Decentralized systems can also have performance and user experience problems. For instance, a distributed search across lots of different systems involves many trade-offs. It  is only as fast as the slowest  participant in the distributed networked offering search results.

I wonder about ways we can reconcile the polar opposites of centralized versus decentralized systems. When you think about it, the distinction between centralization and decentralization depends on how narrowly or broadly you see your environment. In archaeology, the big centralized systems are the Archaeology Data Service repository and the tDAR repository. But, in the larger world of scholarly communications and scientific data sharing, these are just two of a wide number of systems serving different constituencies. Which gets me to the point of this post.

Openness and interoperability are vital because even big and centralized systems (within the scope of archaeology) are still small when one considers the bigger picture of the world of research. This is particularly important for archaeology, because archaeology is inherently multidisciplinary. We will always need to link and reference data and other content from other disciplines. Those disciplines will have their own data systems and repositories. So we can’t escape the need to think about building distributed systems.

Can we find ways to have our cake and eat it too, and enjoy benefits of both approaches while mitigating their problems? I think the Pelagios approach may point to a good direction. In Pelagios, several distributed systems offer data according to a simple common standard. The Pelagios team harvested these data and built a centralized index facilitating fast and efficient search and retrieval of resources from these different collections. Pelagios is also interesting because it achieves much with very little effort and cost and its participating collections have such widely varying disciplinary themes and emphases (only some of which were archaeological).

This is an important point. Centralization is indeed useful, but people will need to define the focus of centralization in very different ways, and only sometimes will the need to centralize align with traditional disciplinary boundaries. In a later blog post, I will follow up with more on centralization versus decentralization. But for now,  please enjoy a pre-print draft of my paper on open access for World Archaeology.

Openness and Archaeology’s Information Ecosystem

 

 

President Obama’s Office of Science and Technology Policy (OSTP) has been conducting ”a public consultation on Public Access Policy. The Administration is seeking public input on access to publicly-funded research results, such as those that appear in academic and scholarly journal articles. Currently, the National Institutes of Health require that research funded by its grants be made available to the public online at no charge within 12 months of publication. The Administration is seeking views as to whether this policy should be extended to other science agencies and, if so, how it should be implemented.” “To that end, OSTP is currently conducting an interactive, online discussion that began Thursday, December 10, 2009. We will focus on three major areas of interest: Implementation (Dec. 10 to 20); Features and Technology (Dec. 21 to Dec 31); Management (Jan. 1 to Jan. 7). UPDATE: Due to a high number of requests, all three phases of the Public Access Policy Forum will remain open through Jan. 21, 2010.”

Surely NSF-funded archaeological research would be covered under this proposal, maybe even NEH-grant research. Archaeologists are bound to be impacted by this. I for one would definitely applaud our field and the humanities to be included in this mandate. Now is the time to express our opinion in this forum…

When you are looking for public-domain images (handy for the underfunded archaeologist), this is a good web page to keep in mind: Wikipedia’s Public Domain Image Resources. Here are the sections:

  • 1 Wikimedia operated
  • 2 History
  • 3 Art
  • 4 Books
  • 5 Logos and flags
  • 6 Postage stamps
  • 7 Culture
  • 8 General collections
  • 9 Computer-generated public domain images
  • 10 Public domain image meta-resources
  • 11 Uncategorized links
  • 12 U.S. Government sites
  • 13 Search Engines
  • Wikipedia actually has developed its own search engine exactly for searching public-domain images: FST – Free Image Search Tool. It isn’t very user-friendly and doesn’t always return results promptly but maybe I haven’t grasped how to use it properly (or it might be improved upon in the future?). A generic “archaeology” search yields this result page.

    “Excavations at the Roman Forum in Rome, Italy, are being mapped by these archaeologists. Photographed by myself (Adrian Pingstone) in June 2007 and placed in the public domain.”

    One of my favorite topics for discussion on this blog is the subject of Open Data. In following this interest, I worked with Erik Wilde and Raymond Yee in developing a site to help guide implementation of Recovery.gov transparency measures. The site is located at:

    http://isd.ischool.berkeley.edu/stimulus/2009-029/

    The site has demonstrations and an accompanying report (all under a Creative Commons attribution license). We’ve developed a set of simulated data that conforms to the Office of Management and Budget’s (OMB) February 18th specifications for disclosure. These data are offered in a variety of human and machine-readable RESTful web services. We hope that this simulated data will help act as a guide for implementation federal agencies.

    We machine-readable XML data, it was pretty simple to do a variety of “mashup”-things:

    However, one topic that needs more attention is the issue about what kind of information is required for “transparency”. To help answer this question, we’re seeking feedback from the wider community. Do these data really help in offering a more meaningful level of transparency? What additional information would be required to make this even more useful for community oversight?

    Information architectures, services, and machine-readable data are all essential requirements for making data open and encouraging transparency in both research and policy. However, in some ways, these are the easy questions. What’s harder is knowing the specifics about what information is required to make open data actually meaningful for wider communities, whether its for research, instruction, or public oversight of government.

    Any feed back and help on these questions would be most welcome!

    PS. See Erik Wilde’s blog post for more.

    Thanks to an invitation from Charles Ellwood Jones, I just wrote a post on “Archaeological Openness” over at the Ancient World Bloggers group blog. It’s a critical examination of the Open Data Protocol announced by Science Commons.

    It’s been a long time since I’ve had much of an opportunity to blog, or digest many of the important developments that have been taking place in the world of open access and open science.

    First off, I had the pleasure of attending the Creative Commons 5th anniversary party. Besides enjoying myself, it was a great chance to reflect on how far Creative Commons has come. Millions of people and sites using their licenses, important projects in Science and Education, and continuing and growing buzz and energy.

    Creative Commons made some recent announcements at the party and after that are important for archaeologists and for our planning for cyber-infrastructure. Here’s an update:

    (1) CC-Zero: CC-Zero is a new protocol that lets people assert that content has no legal restrictions associated with it, or lets them sign a waiver removing all rights associated with a work. It is similar to a public domain dedication, but (in the words of Creative Commons) “The key addition is that the assertion that content is in the public domain will be vouched for by users, so that there is a platform for reputation systems to develop. People will then be able to judge the reliability of content’s copyright status based on who has done the certifying.” In other words, this seems like a way of trying to get encourage certainty that an item of content really IS in the public domain.

    (2) Community Norms: One of the interesting recent developments out of Creative Commons (especially Science Commons) is the growing emphasis on relying on social norms (see this link). Social norms are an important force in science (and the humanities), and many researchers probably mistake the social norms of their fields with copyright or other legal protections. For example, archaeologists (and other researchers) have been publishing non-copyrightable facts for a long time, these include counts of species, dimensions of artifacts, etc. You’re still expected to cite the people who published these facts, even though the facts are in the public domain. This is an example of a powerful and good social norm.

    (3) Open Data Protocol: Here’s where everything above comes together. The new Science Commons “Open Data Protocol” discusses the issues of copyright, factual (non copyrightable) data, interoperability, attribution, and community norms. It is an impressive and needed document, and makes a very clear and compelling case for moving away from the traditional Creative Commons approach of leveraging copyright licenses to encourage (or mandate) good behavior such as citation and attribution.

    Scientific data repositories are typically full of content that is both “expressive” (subject to copyright) and “factual” (not subject to copyright). Data sources are also highly global, and therefore subject to all sorts of legal jurisdictions and rules. Because of all of these legal complexities, it becomes very difficult to achieve legal interoperability between data from different data repositories (that may have different terms of use, license frameworks, or may be subject to different legal systems). This case is clearly made in their document.

    The solution that Science Commons advocates is to essentially move all open science data repositories to a common legal baseline, which is basically the public domain. This is very different to the “traditional” (if one can speak of traditional in the context of a 5 year old organization) Creative Commons approach of “some rights reserved” copyright licensing. Science Commons argues against licensing or other legal instruments to mandate “good” behavior (such as citation and attribution) of data resources, because these seem legal unworkable and have many practical problems. Instead, citation and acknowledgments of data contributions should be the province of social norms, and not legally enforced. Scientific data repositories that want to be “open” should shape their terms of use, copyright, and other policies so their content is essentially public domain and freely remixable with other resources.

    Comments: Wow! This Science Commons development is very impressive, and very compelling.

    Unfortunately, I think it’s very likely to freak out a large portion of the archaeological community. Science Commons is simply so far ahead of our community, that I worry about how this will be received.

    I just gave a paper at the recent American Schools of Oriental Research meeting in San Diego. I presented our recent work with Prof. Martha Joukowsky to publish 15 years of her excavations at the Great Temple of Petra (see here, and here) in Open Context. This content is licensed with the Creative Commons Attribution license, the most open of their licensing choices. This was very generous and forward looking of Prof. Joukowsky.

    To illustrate how far we have to go to advocate “openness”, in the Q&A part of my talk, someone in the audience advocated legally registering the copyright of archaeological content. That way, one could sue for damages in infringement. Ugh, so the discussion turned from how to share our data to how to sue each other for copyright infringement. Given some of the audience reaction to the discussion, it seems very clear that in Near Eastern archaeology, there is very little faith in “social norms”. A good fraction of my audience seemed to believe that their community was so dysfunctional that they needed legal protections.

    Thus, while I a agree with the arguments in Open Data Protocol, I believe a large portion of the archaeological community has a long way to go. It will take many more examples of datasets like Petra to move this community from secrecy and suspicion. Sigh, we truly have our work cut out for us.

    Update: Peter Suber (as always) has excellent commentary on these developments.

    The NEH funded Pleiades discussion list recently picked up on my last post about copyright and scientific data. Several contributors to that list had important points and resources to add, especially about geospatial data. These include:

    • Here’s an interesting post by Chris Holmes, “Promoting freely available geodata“. It touches on many of these themes, and also notes that Creative Commons and Science Commons is reluctant to develop licensing mechanisms around factual data. He also explores some of the policy implications of “copyleft”-type contracts that are not based on copyright law.
    • Another contributor to the Pleiades discussion list rightly pointed out that geospatial data sees very different legal regulatory frameworks internationally. I should also add that the EU has greater copyright protection for database content than the US. James Boyle (who’s on the Board of Creative Commons), wrote an interesting piece in the Financial Times about how the EU database protection laws have not helped the European database industry. This perspective helps explain why Creative Commons and Science Commons are very reluctant to get involved in licensing factual data. “Protecting” such content with licenses (even with “some rights reserved” licenses) may do more damage than good.

    Aside from the fact that it seems we all need some good lawyers, these discussions help illustrate the importance of community social norms. Scholars are already (largely) a self-regulating community. Inviting in lawyers to craft custom licenses and contracts may not make the most sense, unless the law directly impedes our work (as is the case with standard “all rights reserved” copyright, where Creative Commons licenses are a vast improvement). Developing positive social norms is something of an art, but there are many examples of successful online communities. Hopefully we can learn from these examples and adapt them to help make open research in everyone’s enlightened self-interest.

    Additional Note:

    Before someone else points out my error, I was remiss in not linking to the original blog post over at the Open Knowledge Foundation that started all this discussion. Jamie Boyle’s article is already well discussed in this first post! It clearly pays to thoroughly read one’s primary sources before posting to a weblog. My apologies!

    Peter Suber, an essential source of scholarly open access news, recently posted a discussion about the copyright status of “data”, and if Creative Commons licenses where appropriate for such content. Copyright law makes a distinction between “facts” (and/or ideas) and “expressions”. Original expressions are protected by copyright, but the ideas and facts being communicated by these expressions are public. If I write “Stratum B at site X dates to between 7500 – 7000 BP”, this specific sentence is an original expression and is copyright protected. However, you are free to “abstract” the ideas and facts out of my sentence and put them into a new expression such as the following table:

    Site Phase Est. Dates (BP)

    Site X Stratum B 7500-7000

    Because the ideas and facts in my original sentence are not copyright protected, no permissions need to be asked to re-express them in a new way, like the table above. Legally, citation isn’t even required, though citation is a very important social norm for the scholarly community, even when it involves crediting non-copyrightable facts.

    The legal distinctions between “facts” and “expressions” are important to consider when we develop online data-sharing systems. Creative Commons licenses are wonderful tools for the research community to share expressive (copyright protected) content. Each Creative Commons license requires attribution for all uses of a licensed work. Attributing researchers for their contributions is very important, since it helps them build their reputation.

    However, Creative Commons licenses are copyright licenses. They only work with copyrightable material. Many scientific databases lack enough original expression and are too factual to be copyrightable. Their contents are therefore public domain and can’t be licensed with Creative Commons licenses. Here’s a great paper (“Geographic Information Legal Issues”) by Harlan Onsrud that explores these issues. He noted a legal case involving the copyright status of an alphabetically organized phonebook, where a court decided that the content (names and phone numbers) lacked sufficient originality of expression to make it copyrightable. Peter Suber also links to the Science Commons FAQ about databases and copyright, which is also an excellent resource.

    So what’s the threshhold for original expression to make content copyrightable? The answer is ambiguous. For archaeology, which so often sees documentation expressed in free-form notes and drawings, copyright will probably often apply. In such cases, Creative Commons licenses can and should be used. However, some areas of archaeology capture much less expressive and more “factual” kinds of data (archaeometry, zooarchaeology, some studies involving GIS, etc.). In these cases Creative Commons licenses shouldn’t be used.

    The public domain nature of factual data raises an incentive problem. Factual data can be legally copied and used without attribution. Again, even traditionally published factual data can be legally used without attribution. However, putting such resources up in open online archives would make such legal appropriation very easy. Without some reasonable expectation of attribution, why would any researcher share their hard-earned data?

    Therefore, developing online archives of factual data requires developing social norms to regulate their use. Just as we expect citation even when we publish “facts” in traditional paper media, we should expect citation in online publication of our data. Professional ethical codes should be updated to reflect these needs, and journal editors and reviewers should be aware of these issues to help prevent cheating.

    In addition, data archives may want to consider “terms and conditions of use” contracts that require end-users to attribute sources of factual data. Such contracts need not be based on copyright (as are Creative Commons licenses), but are made as a condition for using a data archive. While these should be explored, we should be very careful about such legal “solutions”. There may be hidden costs and unwanted problems associated with such end-user agreements. Nevertheless, I welcome such discussion, since, as a developer of tools for open access data archives, I’m keenly interested in incentives!