Peter Suber, an essential source of scholarly open access news, recently posted a discussion about the copyright status of “data”, and if Creative Commons licenses where appropriate for such content. Copyright law makes a distinction between “facts” (and/or ideas) and “expressions”. Original expressions are protected by copyright, but the ideas and facts being communicated by these expressions are public. If I write “Stratum B at site X dates to between 7500 – 7000 BP”, this specific sentence is an original expression and is copyright protected. However, you are free to “abstract” the ideas and facts out of my sentence and put them into a new expression such as the following table:

Site Phase Est. Dates (BP)

Site X Stratum B 7500-7000

Because the ideas and facts in my original sentence are not copyright protected, no permissions need to be asked to re-express them in a new way, like the table above. Legally, citation isn’t even required, though citation is a very important social norm for the scholarly community, even when it involves crediting non-copyrightable facts.

The legal distinctions between “facts” and “expressions” are important to consider when we develop online data-sharing systems. Creative Commons licenses are wonderful tools for the research community to share expressive (copyright protected) content. Each Creative Commons license requires attribution for all uses of a licensed work. Attributing researchers for their contributions is very important, since it helps them build their reputation.

However, Creative Commons licenses are copyright licenses. They only work with copyrightable material. Many scientific databases lack enough original expression and are too factual to be copyrightable. Their contents are therefore public domain and can’t be licensed with Creative Commons licenses. Here’s a great paper (“Geographic Information Legal Issues”) by Harlan Onsrud that explores these issues. He noted a legal case involving the copyright status of an alphabetically organized phonebook, where a court decided that the content (names and phone numbers) lacked sufficient originality of expression to make it copyrightable. Peter Suber also links to the Science Commons FAQ about databases and copyright, which is also an excellent resource.

So what’s the threshhold for original expression to make content copyrightable? The answer is ambiguous. For archaeology, which so often sees documentation expressed in free-form notes and drawings, copyright will probably often apply. In such cases, Creative Commons licenses can and should be used. However, some areas of archaeology capture much less expressive and more “factual” kinds of data (archaeometry, zooarchaeology, some studies involving GIS, etc.). In these cases Creative Commons licenses shouldn’t be used.

The public domain nature of factual data raises an incentive problem. Factual data can be legally copied and used without attribution. Again, even traditionally published factual data can be legally used without attribution. However, putting such resources up in open online archives would make such legal appropriation very easy. Without some reasonable expectation of attribution, why would any researcher share their hard-earned data?

Therefore, developing online archives of factual data requires developing social norms to regulate their use. Just as we expect citation even when we publish “facts” in traditional paper media, we should expect citation in online publication of our data. Professional ethical codes should be updated to reflect these needs, and journal editors and reviewers should be aware of these issues to help prevent cheating.

In addition, data archives may want to consider “terms and conditions of use” contracts that require end-users to attribute sources of factual data. Such contracts need not be based on copyright (as are Creative Commons licenses), but are made as a condition for using a data archive. While these should be explored, we should be very careful about such legal “solutions”. There may be hidden costs and unwanted problems associated with such end-user agreements. Nevertheless, I welcome such discussion, since, as a developer of tools for open access data archives, I’m keenly interested in incentives!