Open licenses for data

How to choose and apply an open license to your research data

June 9, 2023 9 mins


Open licenses allow anyone to use, re-use, and redistribute research data with minimal restrictions. In this article, you’ll find everything you need to know about open licenses, including the most common open licenses, how to apply an open license to your data, and the benefits of doing so.

It’s incredibly beneficial for society to share data. Open data can accelerate scientific discovery, increase collaboration, and improve reproducibility. Open licenses also facilitate data sharing by managing copyright and other restrictions that might otherwise restrict its dissemination or reuse. 

Moreover, many publishing venues, funders, and governments now enforce data sharing policies and mandates. At F1000, all datasets associated with articles submitted to an F1000 publishing venue must have an open license, permitting maximum reuse by others with minimal restrictions. 

Read on to find out how to apply a license to your research data.

What is an open license?

An open license allows you to permit others to use your creative work under copyright laws. Research data can be reused with minimum restrictions when researchers apply open licenses. By permitting reuse, you support reproducibility and transparency in research and allow others to build on your findings.

Open licenses are not to be confused with open source software licenses. Open source licenses allow others to use, modify, and share your research software. Software licensing allows you to communicate what other researchers can and cannot do with your software. Still, choosing the best license for your project can be challenging.

We’ll be offering preliminary guidance on selecting an open license for research data in this blog.

Why apply an open license to your research data?

The two most effective ways of communicating permissions to potential reusers of data are licenses and waivers. If you don’t apply a license to your research data, the default position of “all rights reserved” will apply to your work. Reserving all rights prevents other researchers from reusing your data.

By applying a license, you make your expectations around reuse clear and place the obligation to respect your wishes and rights on the user.

How to apply an open license to research data

Applying an open license to your dataset is a straightforward process.

Choose your license

First, choose which license best suits what you want others to do with your dataset. Your choice of license depends on the following:

  • The type of research data
  • The extent of reuse you wish to allow
  • Compliance with relevant funder, institution, or government policies

Additionally, if you plan to submit your paper to a specific publisher, they may require you to apply a specific license to your work. For example, to submit your work to an F1000 publishing venue, you must have applied a CC0 or CC-BY license to your work.

When choosing an open license, you’ll also need to consider who owns the research data. Only rights holders can apply licenses. Intellectual property rights (IPR) and good data management affect how you and others can use your and others’ research data. If you fail to clarify rights in your primary data and permissions for using secondary data at the start of your research it can affect your ability to use and disseminate the data. It can also lead to legal trouble if you infringe another party’s IPR, for example, by publishing data without authorization.

Communicate your license clearly

Usually, the repository in which you are hosting your dataset should display the license associated with the data.. If you are sharing open source software or code, the open source license, such as an Apache License or GNU General Public Licence (GPL), can also be included in a README file. 

You should also consider all the information you’ll need to describe the data and provide context for your work. Metadata and documentation allow data users to have sufficient information to understand the research data’s source, strengths, weaknesses, and analytical limitations to make informed decisions when using it.

Write a data availability statement

Write a data availability statement as part of your article stating which license you have chosen to apply to your dataset, the name of the data repository where the data is stored, and any other information required by your specific publisher.

At F1000, a data availability statement is a required section of a manuscript that tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused. 

Your statement should reference all data associated with your article and details of any software you used to process results. 

How to apply an open license to software or code

If you have generated software or code, this should also be licensed openly so that other users can reuse it. Like openly licensed datasets, open source software or code can be freely used, modified, and shared by others. You should ideally choose a license approved by the Open Source Initiative (OSI) to enable reuse. Popular OSI licenses include: MIT, GNU General Public License, and Apache License 2.0. 

What are the benefits of applying a license to my research data?

Supports reproducibility

Sharing your research data openly enhances research rigor by making it easier for others to validate, replicate, and reproduce your findings. By making your intentions for reuse clear, you support reproducibility and research integrity.

Increase citations

Open data enables validation of your research and a deeper understanding of the research findings. Research shows that opening your data can generate up to 25% more citations than articles that did not share data in repositories.

Providing clarity and certainty to users

Copyright and database laws can be complex. While data might be intended to be open, a license provides explicit legal clarity for users. Open data licenses assure users that the data can be used and shared for a wide range of purposes. Without a license, users might not know that they have permission to use and cite your work.

Enables innovation

By applying an open license to your research data, you help facilitate experimentation by encouraging the innovative use of your dataset. Allowing others to reuse your data could lead to further developments in your field of research and beyond.

Common questions about open licenses

How can I ensure others cite my data?

Some open licenses require that researchers receive due credit for their work and their research outputs through what’s known as ‘attribution.’ Attribution is crediting another person who created the work you have used in your research. For example, recognizing the creators who generated the original version of the dataset you used to inform your research output. In scholarly publishing, this usually takes the form of a citation.

What is a Creative Commons license?

Researchers worldwide use Creative Commons licenses and public domain tools to share their research and data. Creative Commons open licenses allow researchers to retain their copyright while allowing others to copy, distribute, and use their work.

What are some examples of open licenses?

Here are some examples of popular open licences, including creative commons licences:

Creative Commons Public Domain Dedication (CC0)

CC0 (or Creative Commons zero) is a public domain dedication tool that has no restrictions on reuse at all, so reusers can distribute, remix, adapt and build upon the dataset in any medium or format with no conditions. Citing CC0 datasets in a research context is generally accepted and expected, though attribution is not required. Datasets under this license can be used for commercial purposes.

Creative Commons Attribution Only (CC-BY)

Data used under the CC-BY license must be attributed to the creator. It allows reusers to distribute, remix, adapt, and build upon the materials, so long as the creator is credited. Furthermore, datasets may also be reused for commercial purposes under this license. It is your responsibility to ensure that your chosen license complies with all relevant funder, institutional, legal, or ethical guidelines.

The Open Data Commons Attribution License (ODC-By) 

This license applies to the reuse of databases. This license enables you to share your research data for redistribution and reuse, to produce works from it, and to modify, transform and build upon it for any purpose. Derived databases from the original can be re-licensed with a compatible license.

Open Data Commons Open Database License (ODbl)

An ODbl license allows users to freely share, modify, and use a database while maintaining the same freedom for others. It permits the use and distribution of data or work derived from the original, providing that the original work is kept intact, and referenced. Moreover, it must be made known that the new work is derived from the original. To ensure the work remains open for all to use and build upon, any derivations must possess the same license.

Creative Commons Attribution NonCommercial NoDerivs (CC-BY-NC-ND)

The most restrictive license offered by Creative Commons. With this license, the user (while attributing the original creator) can only share the work. No derivatives mean this license restricts derivative or modified works and commercial use, meaning users cannot change it in any way or ever use it commercially. 

Creative Commons Attribution NonCommercial ShareAlike License (CC BY-NC-SA)

This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you. If users adapt or build upon your scientific research data they must license the modified and derivative works under the same terms. 

Creative Commons Attribution 4.0 International License (CC BY 4.0)

The Creative Commons Attribution 4.0 International license allows users to copy, modify and distribute data in any format for any purpose, including commercial use. Users are only obligated to give appropriate credit (attribution) and indicate if they have made any changes, including translations.

To submit your work to an F1000 publishing venue, you will need to apply a CC0 or CC-BY license to your research data.  

Checklist for publishing with F1000

If you plan to submit your research to an F1000 publishing venue, ensure you meet our progressive open data policy by following this data sharing checklist:

  • Choose which license to apply to your dataset (CC0 or CC-BY).

  • If you have generated software or code, choose an appropriate OSI-approved license.

  • Add these license(s) to the repository record and/or to the README file for your dataset, software, or code.

  • Write a data availability statement to accompany your article, stating which license you have applied.

  • If you have used a dataset from a third party, the original creators of the dataset should have applied an open license so the data. You must report the existing license in your data availability statement and state how you accessed the data, e.g. via a repository.

Open licenses are an essential part of making your research data available for reuse and supporting reproducibility in research. By applying a CC0 or CC-BY license to your dataset, you open the doors to greater visibility for your research and enable others to build upon your findings. Over the past decade, data has become a priority for academic stakeholders, including governments, funders, institutions, and publishers worldwide. In fact, today nearly all major scientific journals have an open data policy. It’s clear that data sharing is here to stay, but are researchers ready for the additional time, skills, and effort it requires?

