Open data

Everything you need to know to make your research data open and FAIR

What is open data?

Open data is data that is available for everyone to access, use, and share. For researchers, this refers to any datasets collected or created as part of your research project.

In some cases, data sharing is not appropriate for legal, ethical, data protection, or confidentiality reasons. F1000 recommends researchers strive to make their data as open as possible and as closed as necessary. This means you should only restrict access to data where essential, for example, for security or confidentiality reasons.

Types of data in research

Research data is the evidence that underpins your findings and can be used to validate the claims made in your publication. There are many types of research data, both quantitative and qualitative. This includes:

Survey results
Software
Models and algorithms

Interviews and transcripts
Images, videos, and audio files
Genome sequences

DOWNLOAD OUR FREE eBook

Open data demystified: the essential toolkit for researchers

Fill in the form below for expert guidance on how to collect, store, format, share, and publish your data.

Still unsure about research data?

If you find yourself wondering which parts of your work are research data, The Turing Way suggests asking yourself:

What information did I need to answer my research question?
What information would I need to back up my findings?
What information would someone else need to reproduce my work?

The answer is your research data.

Benefits of open data

Sharing your data benefits your career, other researchers, and society. In recent years, open data has become a priority for academic stakeholders globally. So how can you benefit from open data?

Open data

Benefits for your career

Increase the discoverability of your research: linking your open data and published research outputs can increase the readership of your research.
Increase citations: research shows that articles with links to datasets shared in repositories generated up to 25% more citations than articles that did not share data in repositories.
Enhance the credibility of your work: when the data supporting your findings is openly available, others can replicate your work to validate your results and conclusions.
Establish ownership and get credit for your data: uploading it to a repository allows you to establish ownership through a persistent identifier so other researchers can cite it.
Facilitate collaboration and new partnerships: researchers in your field and beyond can access and use your data, leading to greater collaboration and new research projects.

Open data

Benefits for the community

Supports reproducibility: open data enhances research rigor by making it easier for others to validate, replicate, and reproduce your findings.
Reduces research waste: when data is openly available, research becomes more efficient by removing duplication of efforts from other researchers.
Enables others to reuse your data: sharing data can lead to reuse by providing a foundation for others to build on.
Preserves data more securely over time: data hosted on a repository is more secure than data hosted on a website or personal files.

Open data

Benefits for society

Gives greater visibility over results of publicly funded research: open data offers a chance to make research results openly available as a public good, as research is often publicly funded.
Can lead to real-world impact: when data is open, we can accelerate the pace of research discovery to solve societal challenges in real-time.
Fosters trust in research: transparency and accountability help to foster public trust in the research process and results.

How to share your research data

Write a data management plan before your project begins

Planning for managing and sharing your data can go a long way in making it easy to open your data at the end of your project. Before research begins, create a detailed data management plan (DMP). A DMP is a living document that describes how your research outputs will be generated, stored, used, and shared. The document can change and evolve throughout your research project. While most funders and publishers don’t require researchers to create a DMP, it can help to ensure efficient data management and makes it easier to make your data FAIR.

Write a data management plan

Prepare the data for sharing

You’ve collected your data; now it’s time to prepare it for sharing. While some restrictions may make it impossible to share your dataset, in other cases, you can share sensitive data provided you take the necessary precautions to protect the confidentiality of research participants. Once you’ve determined the extent to which you can share your data, you’ll need to format your data, label your files for sharing, and prepare any additional materials needed to understand and use the data. For example, you may include a data dictionary and details of any software needed to process the data. Different disciplines and data repositories may have different standards around formatting data, so research this before you get started.

Learn more about sensitive data

Deposit your data in a repository

A repository is an online storage infrastructure for researchers to store data, code, and other research outputs. Depositing your data in a publicly accessible, recognized repository ensures that your dataset continues to be available to both humans and machines in a usable form. Uploading data to a repository helps preserve it more securely over time than hosting it on a website. Plus, you’ll receive a persistent identifier (PID) to establish ownership and enable others to cite the data. Your institutional librarian, funder, and colleagues can likely guide you in choosing a repository relevant to your discipline.

See how to choose a data repository

Apply an open license to the data

Apply an open license to your data to permit others to reuse it with minimal restrictions. Permitting reuse supports reproducibility and transparency in research and allows others to build on your findings. The Creative Commons Public Domain Dedication (CC0) and the Creative Commons Attribution Only (CC BY) licenses are popular examples of open licenses. Both licenses allow reusers to distribute, remix, adapt and build upon the materials in any medium or format. The critical difference is that the CC0 license has no requirement for attribution, while the CC BY license requires reusers to credit the original creator.

Read our guide to open licenses

Make your data easy to find

Always cite your dataset in your published article and include a data availability statement. A data availability statement is a short section of text which tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused. Once your research is published, some repositories allow you to add the article’s digital object identifier (DOI) to the metadata of your dataset to establish a permanent link between these two outputs of your research. You can also choose to publish a Data Note to maximize the potential of your research data. Data Notes are a peer reviewed article type that indicates why and how your data was collected, analyzed, and validated.

Find out more about data notes

What is FAIR data?

The FAIR guiding principles for scientific data management and stewardship were developed in 2016 to ensure research data is:

Findable

Data should be deposited in a repository, giving you a digital object identifier (DOI) or persistent identifier (PID). Use metadata to give a detailed description of your data.

Accessible

The repository must use a standard protocol like http://. The repository must continue to provide a landing page and the metadata even if the dataset were removed.

Interoperable

The metadata used to describe the data are based on the standard subject vocabularies and should be machine-readable. You can find the subject standards at FAIRsharing.org.

Reusable

The metadata which describes the data is accurate and relevant. An explicit data license has been applied to the data, explaining what other users can and cannot do.

The FAIR principles play a vital role in ensuring your research data is as reusable as possible.

Understanding sensitive data

Sensitive data are those which might identify an individual or otherwise cause risk of harm if it was shared openly. Sensitive data may involve human research participants, plant or animal species, commercially sensitive information, or information relating to national security, amongst other topics.

If you are working with human participants, your data may be considered sensitive even if no personal identifiers are collected. As such, sensitive data must be protected from unauthorized access or unwarranted disclosure.

If you are in doubt about the sensitivity of your data, check with your institution’s Ethics Office or Committee.

Datasets that contain sensitive data can often be shared in part or in full if informed consent has been provided, with appropriate use of anonymization or controlled access repositories.

FAQs

Metadata is data about other data. It aids both discoverability and understanding of the data. Metadata can contain descriptive information about the dataset and administrative or structural information.

The content and format of metadata are often guided by a specific discipline and/or repository. Still, metadata records in a repository typically include information such as creation date, file format, data creator, keywords, location, how the data was generated, and version information, amongst other things.

DOI is an acronym for digital object identifier (DOI), a type of persistent identifier. A DOI is a string of numbers, letters, and symbols to identify a unique research output and enable citation.

Persistent identifiers are vital as they remain constant, even if the location of the digital research output moves. While a URL may change, a persistent identifier will carry across to the new location.

Some repositories accommodate changes to deposited datasets through versioning. Selecting a repository with versioning gives you the flexibility to add new data, restructure, and improve your dataset. Each version of your dataset is uniquely identifiable and maintained so others can find, access, reuse, and cite whichever version of the dataset they require. At F1000, we require authors to use repositories that support versioning.

A data availability statement is a short section of text which tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused.

If your dataset is associated with a publication, your publisher may ask peer reviewers to review the data as part of their formal peer review process. For example, F1000 asks peer reviewers for F1000Research to review the dataset as part of their assessment of the research.

In some situations, sharing your data would not be legal or ethical. This includes: when you don’t have ownership of the data; when sharing the data would conflict with the need to protect personal identities; when data is commercially sensitive or could cause a security risk; or if data is sub judice and public discussion is prohibited. Your institution’s research ethics committee can provide guidance if you are unsure about sharing your specific dataset.

A key benefit of open data is that it establishes a precedent of a dataset through persistent identifiers, timestamps, and links to the authors. This clarifies which researcher established the idea and created the results first. When a researcher deposits their data in a public repository, they have established that they are the creator of the dataset and should receive credit for it.

There are many options to choose from when sharing your data openly. There are institutional repositories, discipline-specific repositories, general data repositories, and controlled access repositories.

If your research involves sensitive human data, you need to take extra steps to maintain the privacy of your research participants and share your data in an ethically and legally compliant way. First, anonymize the data. Anonymization techniques include removing any identifying information, using pseudonyms, and generalizing where possible. In some cases, you may want to limit access to the data to specific parties who will treat the data carefully. Controlled access allows you to upload your data to a repository and keep the files private. You can share access with others if certain requirements are met.

Expand your open research knowledge

Advancing Transparency in Academic Publishing: The Growing Adoption of Open Peer Review Practices

4 min

31st Jul 2025

Exploring how visible review processes benefit authors, reviewers, and the wider research community Peer reviewers are essential to scholarly publishing. Their expert feedback provides validation of the research and helps…

When did peer review start: the origins and evolution of peer review through time

8 min

12th Oct 2023

Peer review is not just quality control, it is the backbone of modern scientific and academic publishing, ensuring the validity and credibility of research articles. While it may seem like…

How to respond to peer reviewers comments: top tips on addressing reviewer feedback

8 min

28th Sep 2023

The peer review process is a fundamental component of scholarly publishing, ensuring the quality and credibility of academic research. After submitting your manuscript to a publishing venue, it undergoes rigorous…

FREE EBOOK

Open data demystified: the essential toolkit for researchers

Download the free ebook

Open data

Everything you need to know to make your research data open and FAIR

What is open data?

Types of data in research

DOWNLOAD OUR FREE eBook

Open data demystified: the essential toolkit for researchers

Still unsure about research data?

Benefits of open data

Benefits for your career

Benefits for the community

Benefits for society

How to share your research data

Write a data management plan before your project begins

Prepare the data for sharing

Deposit your data in a repository

Apply an open license to the data

Make your data easy to find

What is FAIR data?

Findable

Accessible

Interoperable

Reusable

Understanding sensitive data

FAQs

What is metadata and what information should it include?

What is a DOI or persistent identifier?

What is versioning and why is it important?

What is a data availability statement?

Will the data I share be peer reviewed?

Are there instances in which I shouldn’t share my data?

How can I ensure my data won’t be scooped?

Where can I deposit my data?

How can I protect the privacy of research participants?

Expand your open research knowledge

Advancing Transparency in Academic Publishing: The Growing Adoption of Open Peer Review Practices

When did peer review start: the origins and evolution of peer review through time

How to respond to peer reviewers comments: top tips on addressing reviewer feedback