Science Commons was re-integrated with Creative Commons. This content is no longer maintained and remains only for reference.

Comments on the Open Database License Proposed by Open Data Commons

Background

Creative Commons is a non-profit organization that provides free tools that let authors, scientists, artists, and educators easily mark their creative work with the freedoms they want it to carry. In addition to our copyright licenses, Creative Commons has recently released CC0, a waiver of copyright and related rights in data and databases. Science Commons, a project of Creative Commons, has also published a number of recommendations on data exchange protocols within the scientific community, including the Protocol for Implementing Open Access Data. [1]

Based on this work, Creative Commons wishes to take this opportunity to provide comments on the proposed Open Database License (“ODbL”), as published in the public beta hosted on the Open Data Common’s Website (http://www.opendatacommons.org/licenses/odbl/).

Our comments are based primarily on application of principles and guidelines that we discuss in greater detail in the Protocol for Implementing Open Access, and we would like to direct your attention to that document for a more in-depth policy discussion. In this comment, we summarize how those principles and guidelines affect our view of the ODbL.

In general, we believe that the interests of both providers and users of data and databases, particularly in science, education, and other areas where the ability to exchange and re-use data freely is critical to achieving the objectives of the data exchange community, are best served by reducing unnecessary transaction costs, simplifying legal tools, and providing as much clarity and certainty to providers and users of their respective rights and obligations as the law allows.

We believe that the ODbL fails to achieve these objectives for public data sharing. First, by adopting a licensing approach for data and databases, ODbL fails to promote legal predictability and certainty for both providers and users of databases in cases when the copyrightability of a database, or the availability of other legal protections, is itself unclear. Second, the complexity of the licensing scheme embodied by the ODbL can make it difficult for many users, particularly those not legally trained, to understand their rights and obligations. Third, the ODbL’s requirements can impose substantial legal and administrative burdens on data providers and data users, which can raise transaction costs that impede the free exchange and reuse of data and databases. Fourth, the ODbL claims to be not only a license but a contract premised on “access” rather than on the existence of copyright or other underlying statutory protection. This approach is over-broad and expands restrictions and copyright-like controls beyond the scope of copyrightable subject matter and introduces additional questions and uncertainties associated with contract formation. Fifth, the ODbL can cause legal interoperability problems for data sharing through its copyleft or share-alike provisions, which may conflict with pre-existing obligations on data providers or data users with respect to underlying content.

We discuss each of these points further below. In brief, we believe that the ODbL is incompatible with the goals we have articulated in the Science Commons Protocol for Implementing Open Access Data, that it creates barriers that can impede the free and open exchange and reuse of data and databases, that it may create unintended consequences for both data providers and users, and that it does not well serve the essential purposes of public data sharing, particularly in education and science.

Given the complexity and divergences in U.S. laws and the laws of other countries, as they apply to data and databases, we believe these objectives are best served by converging on the public domain as the preferred form for data sharing.

The ODbL Fails to Promote Legal Predictability and Certainty Over Use of Databases

The first principle of the Protocol for Implementing Open Access Data is that it should strive to provide both providers and users with a high degree of predictability and certainty with respect to their rights and obligations. This is based on comments we have received throughout our development of CC0, as well as prior guidelines we have developed for using Creative Commons licenses with databases. Many scientific commentators felt that legal tools that require them to parse and apply complex legal doctrines do nothing to further transparency and usability, but may actively impede data sharing by introducing complexity, confusion, and uncertainty. This can require them to consult their institution’s lawyers or retain their own lawyers to perform the necessary legal analysis—including contract language, copyright doctrine, fair use, and many other issues–creating both costs and burdens that can significantly chill the ability to share and use data.

Because the ODbL is a license that imposes restrictions and obligations tied to the existence of copyright, neighboring rights, or sui generis database rights, it can create uncertainty for providers and users because the content, standards, and applicability of these laws varies from country to country. (See note [2]). Therefore, simply providing an answer to a basic question such as “Is this Database protected by copyright or other laws?” can be surprisingly difficult, especially when the contributors and users of the database reside in different countries, as may often be the case for international data sharing projects. For example, under U.S. law, a certain degree of creative expression is a constitutional requirement for copyright protection. While this standard, like any legal standard, can itself be complicated enough to apply in any particular case, the problem is magnified many times over when we consider international data sharing, in which the standards in other countries for copyright protection are different, with some countries adopting a “sweat of the brow” copyright theory while others have intermediate standards. Similarly, “sui generis” database rights primarily apply to database creators in Europe, but do not apply to U.S.-based database creators or those in other countries. In an international database project that involves collaborators and contributors from many different jurisdictions, what are the relevant rights and standards when we consider the question of “is this database protected by copyright or other laws?”

Yet, it is essential that we are able to answer this question in a licensing regime, because any license is premised on the existence of some underlying right that is being licensed. If a database does not qualify for any relevant legal protection under applicable law, then accepting a “license” is unnecessary, at least in the absence of a binding contract (a topic discussed below). But if we cannot easily determine whether there are underlying property rights, or if answering this question requires a significant amount of research or analysis, then neither the provider nor the user will have any reasonable degree of certainty over what exactly they are licensing—and whether it is necessary to “accept” or comply with the license as a precondition to using the publicly available data or database.

It is impractical to expect data providers and users, and database project managers, to analyze and apply these different rules and standards within the context of a licensing regime where the existence of these rights makes a difference in terms of permissions and obligations defined in a license. Answering these basic questions in the context of an international collaboration would be an exceedingly complex task for even an experienced international licensing lawyer, let alone scientists, educators, or other non-lawyers. While it may be tempting to achieve the desired level of uniformity through boot-strapping contract law, in addition to license requirements, the contract approach has its own problems, which are discussed further below.

In summary, the complexities and uncertain application of copyright and other laws that might apply to data and database protection in any given case, not to mention significant variations among countries for international data sharing, make attempts to impose obligations or restrictions through a licensing approach extremely problematic from the perspective of providing clear and predictable rules and standards of behavior to providers and users and from the perspective of compliance.

The ODbL Is Complex and Difficult for Non-Lawyers to Understand and Apply

While some complexities are introduced by differences in background legal doctrines, others are introduced by the ODbL scheme itself.

The ODbL is a ten-page document of legal requirements; some are quite complicated and written in ways that may be difficult for non-lawyers to understand. For example, the distinction between rights in the database and rights in data contained in the database may cause significant confusion over when the terms of the license apply, to whom, and how. The distinction between “Collective Databases” and “Derivative Databases” is another potential area for confusion.

We believe that legal tools aimed at the public should be brief and written in simple language that can be readily understood by non-experts. This will facilitate comprehension by a broader base of users, and reduce the need to involve lawyers in reviewing every instance of data exchange and use.

The ODbL Can Result in High Transaction Costs on the Data Sharing Community

A goal of public data sharing, as with any large network, is to reduce transaction costs for each individual transaction, because even relatively small individual transaction costs, when aggregated over large communities, can add up to significant burdens. For example, if it is necessary to obtain expert legal advice to interpret a license associated with each database, then the cumulative legal burden on one user of many databases, or on many users of one database, can result in significant legal costs, delays, and inefficiencies.

Are these costs justified in light of the benefits to be achieved? This is a question that every data provider considering the use of a license on a database must answer for himself or herself, but we believe that in the vast majority of cases, these transaction costs are at odds with the goals and mission of many public data sharing projects, which are to disseminate the fruits of research or education widely and to reach large communities, with a minimum of friction.

Furthermore, the need to obtain expert legal advice can put users without access to legal resources at a significant disadvantage compared to more sophisticated users. This can be an unintended consequence of licensing, when the goal of many public Web-based education or research data sharing projects is to achieve precisely the opposite: to reduce barriers to access to broader audiences, including those traditionally disadvantaged in terms of access to such materials.

In science, researchers must draw on many databases, often in the hundred or thousands, to perform research. Furthermore, each of those databases can contain summaries, annotations, or data derived from hundreds or thousands of other databases. To attempt to analyze the licensing terms associated with each database would impose transaction costs that would not only slow down research and education, but that may in some cases make large-scale attempts to federate data or do large-scale querying and analysis impossible. At a very basic level, as one scientist told us at a data conference hosted by Science Commons and CODATA, “if I am doing research at 2 am in the morning, and I want to access an online database, I don’t want to have to wait until I can get to my university’s general counsel’s office the next day, or the next month, for an analysis of the license before I can use that data.” The same can be said of students doing research or experiments.

Therefore, the transaction costs imposed by license compliance, combined with the legal complexities discussed, multiplied over many hundreds or thousands of databases, or over many hundreds or thousands of users, providers, and contributors, can create systemic “friction” that can harm the goals and mission of many public data sharing projects or make them impossible in practice.

The ODbL Imposes Contractual Obligations Even in the Absence of Copyright

Section 2.2(c) of the proposed ODbL explicitly makes the ODbL a contract, in addition to being a license. As mentioned above, accepting a license is usually only necessary when there exists some underlying property right. However, a contract can be based simply on mutual agreement, provided that the requisite requirements of contract formation (meeting of the minds, consideration, etc.) are met. The result is that the ODbL can in certain circumstances impose obligations and restrictions on users under a contract theory, rather than based on a protection afforded by statute, common law, or other recognized right.

Thus, it is not clear under the ODbL whether providers would have an independent breach of contract claim, in addition to an infringement claim, or even in the absence of an infringement claim, for any violations of the “license” (or alternatively, contract).

This is important for several reasons. First, as discussed above, due to legal variations in copyright doctrines among different countries, as well as the availability of sui generis protection in some countries but not in others, there may be cases where an infringement claim is not available to a provider because no underlying property right exists. However, in such cases, could the provider seek to enforce a provision of the ODbL, such as the share-alike provision, under a contract theory instead? And if it could do so, would that constitute an extension of protection beyond the scope intended by existing statutory schemes? For example, could data or databases that fail to qualify for copyright protection under U.S. law due to lack of the requisite level of creativity nevertheless be made subject to the share-alike provision in the U.S. under a contract theory? Could this be applied to individual data elements that are not themselves copyrightable—such as sensor readings or basic facts and ideas? Could European sui generis database rights be enforced against a U.S. user on the basis of the existence of a contractual relationship created by the ODbL?

In addition to these issues, contract theories introduce inherent uncertainties of their own, including all the questions associated with contract formation. Was access to a publicly available database adequate consideration to support a contract? Is there sufficient consideration when there is no underlying statutory protection for the database? Is the ODbL enforceable on downstream recipients of the database or data who—for whatever reason—may have never had the opportunity to review the ODbL or manifest assent to its terms (and thus lack “privity”)? If the ODbL functions in some cases as a contract, and not as a license, then what formalities must be observed in contract formation? Does the provider have to obtain agreement in the form of a “click-wrap” agreement, or would it be sufficient to have a notice of the license in its Web site terms of use? These are important questions for contract theories of enforcement, and it is unclear how the ODbL would interact with these issues when it is treated as a contract and not a license.

The ODbL Can Cause Legal Non-Interoperability for Data Sharing

Because the ODbL is a share-alike or copyleft type of license, which says that derivative works can only be distributed publicly under its terms, it can cause problems for users of databases who must combine or use works from different sources.

For example, can a database licensed under ODbL be used in combination with underlying content licensed under GPL, FDL, or CC-BY-SA? Even though the ODbL distinguishes between “Collective Databases” and “Derivative Databases” in many cases, it may be in practice difficult to apply that distinction for all the different ways that data and databases can be combined, annotated, summarized, or re-used.

These licensing incompatibilities would mean that a data user must be able to comply with every license, or not be able to share the work at all. But this can be impossible under a copyleft license that specifies that derivative works can only be distributed under its own terms. Thus, a possible unintended consequence in the long run is the possibility of creating systems of data sharing that have embedded within them the seeds of license incompatibility—seeds that once planted may mature into future interoperability problems that were not only unanticipated and unintended, but that are also too complex to solve or even to understand completely. These problems may undermine the very goals of public data sharing and artificially limit data exchange and collaboration between communities—not for justifiable technical or scientific constraints but merely for license incompatibility reasons. There is evidence that such problems have already arisen throughout the open data community and may become more severe unless we take steps to address them at an early stage through developing a workable policy consensus. [3]

Open Data Sharing Should Converge on the Public Domain

For the reasons discussed above, we believe that “open” data sharing—by which we mean public data sharing with the intended benefits of widespread dissemination and use—should come with the fewest possible restrictions and obligations. It should rely on established scientific and scholarly standards, norms, and usages for citation and quality control rather than on complex licenses or other legal tools that impose requirements or obligations that are difficult to understand to users and, as a consequence, raise transactional barriers and obstacles that are at odds with the stated goals and mission of open data projects. We believe that the PDDL, CC0, and other public domain dedications or copyright waivers, provide a far simpler, more consistent, and more benign approach that closely mirrors a long history and tradition of scientific, educational, and cultural sharing practices. For these communities, these norms converge on the public domain.

Notes:

1. Science Commons, “Protocol for Implementing Open Access Data.” http://sciencecommons.org/projects/publishing/open-access-data-protocol

2. Thinh Nguyen, “Freedom to Research: Keeping Scientific Data Open, Accessible, and Interoperable.” http://sciencecommons.org/wp-content/uploads/freedom-to-research.pdf

3. Melanie Dulong du Rosnay, “Check Your Data Freedom: A Taxonomy to Access Life Science Database Openness,” Nature Proceedings: doi:10.103B/npre.2008.2083.1, available at http://sciencecommons.org/wp-content/uploads/npre20082083-1.pdf.