Open data
Everything you need to know to make your research data open and FAIR
What is open data?
Open data is data that is available for everyone to access, use, and share. For researchers, this refers to any datasets collected or created as part of your research project.
In some cases, data sharing is not appropriate for legal, ethical, data protection, or confidentiality reasons. F1000 recommends researchers strive to make their data as open as possible and as closed as necessary. This means you should only restrict access to data where essential, for example, for security or confidentiality reasons.
Types of data in research
Research data is the evidence that underpins your findings and can be used to validate the claims made in your publication. There are many types of research data, both quantitative and qualitative. This includes:
- Survey results
- Software
- Models and algorithms
- Interviews and transcripts
- Images, videos, and audio files
- Genome sequences
DOWNLOAD OUR FREE eBook
Open data demystified: the essential toolkit for researchers
Fill in the form below for expert guidance on how to collect, store, format, share, and publish your data.
Still unsure about research data?
If you find yourself wondering which parts of your work are research data, The Turing Way suggests asking yourself:
- What information did I need to answer my research question?
- What information would I need to back up my findings?
- What information would someone else need to reproduce my work?
The answer is your research data.
Benefits of open data
Sharing your data benefits your career, other researchers, and society. In recent years, open data has become a priority for academic stakeholders globally. So how can you benefit from open data?
Open data
Benefits for your career
Open data
Benefits for the community
Open data
Benefits for society
How to share your research data
Write a data management plan before your project begins
Planning for managing and sharing your data can go a long way in making it easy to open your data at the end of your project. Before research begins, create a detailed data management plan (DMP). A DMP is a living document that describes how your research outputs will be generated, stored, used, and shared. The document can change and evolve throughout your research project. While most funders and publishers don’t require researchers to create a DMP, it can help to ensure efficient data management and makes it easier to make your data FAIR.
Write a data management planPrepare the data for sharing
You’ve collected your data; now it’s time to prepare it for sharing. While some restrictions may make it impossible to share your dataset, in other cases, you can share sensitive data provided you take the necessary precautions to protect the confidentiality of research participants. Once you’ve determined the extent to which you can share your data, you’ll need to format your data, label your files for sharing, and prepare any additional materials needed to understand and use the data. For example, you may include a data dictionary and details of any software needed to process the data. Different disciplines and data repositories may have different standards around formatting data, so research this before you get started.
Learn more about sensitive dataDeposit your data in a repository
A repository is an online storage infrastructure for researchers to store data, code, and other research outputs. Depositing your data in a publicly accessible, recognized repository ensures that your dataset continues to be available to both humans and machines in a usable form. Uploading data to a repository helps preserve it more securely over time than hosting it on a website. Plus, you’ll receive a persistent identifier (PID) to establish ownership and enable others to cite the data. Your institutional librarian, funder, and colleagues can likely guide you in choosing a repository relevant to your discipline.
See how to choose a data repositoryApply an open license to the data
Apply an open license to your data to permit others to reuse it with minimal restrictions. Permitting reuse supports reproducibility and transparency in research and allows others to build on your findings. The Creative Commons Public Domain Dedication (CC0) and the Creative Commons Attribution Only (CC BY) licenses are popular examples of open licenses. Both licenses allow reusers to distribute, remix, adapt and build upon the materials in any medium or format. The critical difference is that the CC0 license has no requirement for attribution, while the CC BY license requires reusers to credit the original creator.
Read our guide to open licensesMake your data easy to find
Always cite your dataset in your published article and include a data availability statement. A data availability statement is a short section of text which tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused. Once your research is published, some repositories allow you to add the article’s digital object identifier (DOI) to the metadata of your dataset to establish a permanent link between these two outputs of your research. You can also choose to publish a Data Note to maximize the potential of your research data. Data Notes are a peer reviewed article type that indicates why and how your data was collected, analyzed, and validated.
Find out more about data notesWhat is FAIR data?
The FAIR guiding principles for scientific data management and stewardship were developed in 2016 to ensure research data is:
Findable
Data should be deposited in a repository, giving you a digital object identifier (DOI) or persistent identifier (PID). Use metadata to give a detailed description of your data.
Accessible
The repository must use a standard protocol like http://. The repository must continue to provide a landing page and the metadata even if the dataset were removed.
Interoperable
The metadata used to describe the data are based on the standard subject vocabularies and should be machine-readable. You can find the subject standards at FAIRsharing.org.
Reusable
The metadata which describes the data is accurate and relevant. An explicit data license has been applied to the data, explaining what other users can and cannot do.
The FAIR principles play a vital role in ensuring your research data is as reusable as possible.
Understanding sensitive data
Sensitive data are those which might identify an individual or otherwise cause risk of harm if it was shared openly. Sensitive data may involve human research participants, plant or animal species, commercially sensitive information, or information relating to national security, amongst other topics.
If you are working with human participants, your data may be considered sensitive even if no personal identifiers are collected. As such, sensitive data must be protected from unauthorized access or unwarranted disclosure.
If you are in doubt about the sensitivity of your data, check with your institution’s Ethics Office or Committee.
Datasets that contain sensitive data can often be shared in part or in full if informed consent has been provided, with appropriate use of anonymization or controlled access repositories.
FAQs
Expand your open research knowledge
Peer review is not just quality control, it is the backbone of modern scientific and academic publishing, ensuring the validity and credibility of research articles. While it may seem like…
The peer review process is a fundamental component of scholarly publishing, ensuring the quality and credibility of academic research. After submitting your manuscript to a publishing venue, it undergoes rigorous…
Peer review is an integral part of scholarly communication and academic publishing. A key player in this process is the peer reviewer, who is typically a recognized expert in the…