How to free your facts

May 12th, 2008 by dwentworth

With the open access movement surging — and the discussion surrounding open data gaining velocity — we’re getting more emails with questions about how best to share collections of factual data. One of the most common questions: How do I mark my data explicitly as “open access” and free for anyone to use?

In general, we encourage you to choose waivers, like the Open Data Commons Public Domain Dedication and License (ODC-PDDL) or the Creative Commons CC0 waiver, rather than licenses, such as CC-BY, FDL or other licenses.

The issues surrounding how to treat factual data are complex. To help bring more clarity for those of you exploring your options, here’s a short overview of the reasons why we generally advise using waivers, prepared by Science Commons Counsel Thinh Nguyen.

Facts are (and should be) free
There is long tradition in science and law of recognizing basic facts and ideas as existing in the public domain of open discourse. At Science Commons we summarize that by saying “facts are free.”

Of course you can patent some ideas, but you can’t stop people from talking about or referring to them. In fact, the patent system was established to encourage public disclosure of facts and ideas, so that we can discuss them in the open. When Congress wrote the Copyright Act, it made sure to spell out that facts cannot be subject to copyright. “In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” (Section 102(b) of the United States Copyright Act)

And there are good reasons for this. Imagine if you couldn’t reference physical constants — like the height of Mount Everest — without permission. Imagine you couldn’t use the laws of gravity to calculate without attributing Isaac Newton each time. Or if you had to get a license from the heirs of Charles Darwin to talk or write about evolution. Such a world would be absurd, and we can easily understand why. We all need access to a basic pool of ideas and concepts in order to have any kind of meaningful discourse. So copyright is supposed to protect creative expression–the unique and individual ways we express ourselves–but not the invariant concepts and ideas that we need to think and carry on a conversation.

Licensing facts can cause legal uncertainty and confusion
So why is it that increasingly, especially online, there is talk about licensing factual data–assertions of rights and obligations over assertions of facts? Part of the answer is that as facts get represented in formats that look more like computer code, the impulse is to treat it like any other computer code. And that means putting a license on it. Part of the answer is that the law is still struggling with how to treat databases, and in some countries, database rights have expanded (particularly in Europe under the database directive). Other countries have loosened copyright standards to allow purely factual databases to be protected. (For a more detailed discussion of these issues, see the Science Commons paper, Freedom to Research: Keeping Scientific Data Open, Accessible, and Interoperable [PDF].)

But even if you could find a legal angle from which to impose licensing or contractual controls over factual data, why would you want to? Doesn’t this just create the very absurdity that Congress and the scientific tradition have been able to avoid for many years?

Attribution for facts can add complexity and hamper reuse
Many people cite the desire to receive attribution. In scientific papers, we have a tradition of citing sources for facts and ideas. But those traditions evolved over hundreds of years. There’s a lot of discretion and judgment that goes into deciding whom to cite and when. At some point, you don’t need to cite Isaac Newton any more for the formula for gravity, or Darwin for the idea of evolution. Sometimes you do, and sometimes you don’t need to, but that’s a matter of common sense. But what happens to common sense when you convert that requirement into a legal requirement? Can a license represent these complex norms and traditions? We don’t think so.

Imposing licensing on data creates all kinds of unanticipated problems. If you have a database with thousands or hundreds of thousands of pieces of facts, does each fact have to come with their own attribution and licensing data? How do we aggregate and recombine such data? If we use a tiny piece of that data to make an assertion about the world–to carry on a discourse–do you still have to attribute, and how far does that obligation go? In the future, will every database need its own database of attribution? Will every book need another book in which every word and idea and fact comes with its own genealogy detailing how it made its way through various databases, web sites and so on?

This problem, which we call “attribution stacking,” can saddle science with an unbearable administrative burden. It could shut down present and future sites that aggregate and federate data from many different sources. It could stifle entire fields of research that rely on summarizing, annotating, translating and integrating many different kinds and sources data.

The solution: use a waiver for factual data, not a license or contract
Can licensing facts create its own technological absurdities? We think it can, and it will unless we resist the impulse to license. We think the best answer is to go back to what scientists themselves have been doing for centuries: giving attribution without legal requirements. We think Congress got it right when it excluded facts and ideas from copyright protection. And we think it should stay that way, even when those facts happen to get incorporated into databases. That’s why we published the Science Commons Data Protocol and the accompanying FAQ.

We hope that if you are preparing to publish a compilation of factual data, you will choose to waive any rights to the data, whatever they may be.

