Open data | Learn how to make your research data open and FAIR

Open data

Everything you need to know to make your research data open and FAIR

What is open data?

Open data is data that is available for everyone to access, use, and share. For researchers, this refers to any datasets collected or created as part of your research project.  

In some cases, data sharing is not appropriate for legal, ethical, data protection, or confidentiality reasons. F1000 recommends researchers strive to make their data as open as possible and as closed as necessary. This means you should only restrict access to data where essential, for example, for security or confidentiality reasons.

Types of data in research

Research data is the evidence that underpins your findings and can be used to validate the claims made in your publication. There are many types of research data, both quantitative and qualitative. This includes:

  • Survey results 
  • Software 
  • Models and algorithms
  • Interviews and transcripts
  • Images, videos, and audio files  
  • Genome sequences  

DOWNLOAD OUR FREE eBook

Open data demystified: the essential toolkit for researchers

Fill in the form below for expert guidance on how to collect, store, format, share, and publish your data.

Still unsure about research data?

If you find yourself wondering which parts of your work are research data, The Turing Way suggests asking yourself: 

  • What information did I need to answer my research question? 
  • What information would I need to back up my findings? 
  • What information would someone else need to reproduce my work? 

The answer is your research data. 

Benefits of open data

Sharing your data benefits your career, other researchers, and society. In recent years, open data has become a priority for academic stakeholders globally. So how can you benefit from open data? 

Open data

Benefits for your career

  • Increase the discoverability of your research: linking your open data and published research outputs can increase the readership of your research.

  • Increase citations: research shows that articles with links to datasets shared in repositories generated up to 25% more citations than articles that did not share data in repositories.

  • Enhance the credibility of your work: when the data supporting your findings is openly available, others can replicate your work to validate your results and conclusions. 

  • Establish ownership and get credit for your data: uploading it to a repository allows you to establish ownership through a persistent identifier so other researchers can cite it.

  • Facilitate collaboration and new partnerships: researchers in your field and beyond can access and use your data, leading to greater collaboration and new research projects.

Open data

Benefits for the community

  • Supports reproducibility: open data enhances research rigor by making it easier for others to validate, replicate, and reproduce your findings.

  • Reduces research waste: when data is openly available, research becomes more efficient by removing duplication of efforts from other researchers.

  • Enables others to reuse your data: sharing data can lead to reuse by providing a foundation for others to build on. 

  • Preserves data more securely over time: data hosted on a repository is more secure than data hosted on a website or personal files.

Open data

Benefits for society

  • Gives greater visibility over results of publicly funded research: open data offers a chance to make research results openly available as a public good, as research is often publicly funded. 

  • Can lead to real-world impact: when data is open, we can accelerate the pace of research discovery to solve societal challenges in real-time.  

  • Fosters trust in research: transparency and accountability help to foster public trust in the research process and results. 

How to share your research data

What is FAIR data?

The FAIR guiding principles for scientific data management and stewardship were developed in 2016 to ensure research data is:

Findable

Data should be deposited in a repository, giving you a digital object identifier (DOI) or persistent identifier (PID). Use metadata to give a detailed description of your data.

Accessible

The repository must use a standard protocol like http://. The repository must continue to provide a landing page and the metadata even if the dataset were removed.

Interoperable

The metadata used to describe the data are based on the standard subject vocabularies and should be machine-readable. You can find the subject standards at FAIRsharing.org.

Reusable

The metadata which describes the data is accurate and relevant. An explicit data license has been applied to the data, explaining what other users can and cannot do.

The FAIR principles play a vital role in ensuring your research data is as reusable as possible.

Understanding sensitive data

Sensitive data are those which might identify an individual or otherwise cause risk of harm if it was shared openly. Sensitive data may involve human research participants, plant or animal species, commercially sensitive information, or information relating to national security, amongst other topics.

If you are working with human participants, your data may be considered sensitive even if no personal identifiers are collected. As such, sensitive data must be protected from unauthorized access or unwarranted disclosure.

If you are in doubt about the sensitivity of your data, check with your institution’s Ethics Office or Committee.

Datasets that contain sensitive data can often be shared in part or in full if informed consent has been provided, with appropriate use of anonymization or controlled access repositories.

FAQs

Metadata is data about other data. It aids both discoverability and understanding of the data. Metadata can contain descriptive information about the dataset and administrative or structural information.

The content and format of metadata are often guided by a specific discipline and/or repository. Still, metadata records in a repository typically include information such as creation date, file format, data creator, keywords, location, how the data was generated, and version information, amongst other things.

DOI is an acronym for digital object identifier (DOI), a type of persistent identifier. A DOI is a string of numbers, letters, and symbols to identify a unique research output and enable citation.

Persistent identifiers are vital as they remain constant, even if the location of the digital research output moves. While a URL may change, a persistent identifier will carry across to the new location.

Some repositories accommodate changes to deposited datasets through versioning. Selecting a repository with versioning gives you the flexibility to add new data, restructure, and improve your dataset. Each version of your dataset is uniquely identifiable and maintained so others can find, access, reuse, and cite whichever version of the dataset they require. At F1000, we require authors to use repositories that support versioning.

A data availability statement is a short section of text which tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused.

If your dataset is associated with a publication, your publisher may ask peer reviewers to review the data as part of their formal peer review process. For example, F1000 asks peer reviewers for F1000Research to review the dataset as part of their assessment of the research.

In some situations, sharing your data would not be legal or ethical. This includes: when you don’t have ownership of the data; when sharing the data would conflict with the need to protect personal identities; when data is commercially sensitive or could cause a security risk; or if data is sub judice and public discussion is prohibited. Your institution’s research ethics committee can provide guidance if you are unsure about sharing your specific dataset.

A key benefit of open data is that it establishes a precedent of a dataset through persistent identifiers, timestamps, and links to the authors. This clarifies which researcher established the idea and created the results first. When a researcher deposits their data in a public repository, they have established that they are the creator of the dataset and should receive credit for it.

There are many options to choose from when sharing your data openly. There are institutional repositories, discipline-specific repositories, general data repositories, and controlled access repositories.

If your research involves sensitive human data, you need to take extra steps to maintain the privacy of your research participants and share your data in an ethically and legally compliant way. First, anonymize the data. Anonymization techniques include removing any identifying information, using pseudonyms, and generalizing where possible. In some cases, you may want to limit access to the data to specific parties who will treat the data carefully. Controlled access allows you to upload your data to a repository and keep the files private. You can share access with others if certain requirements are met.

Expand your open research knowledge

12th Oct 2023

Peer review is not just quality control, it is the backbone of modern scientific and academic publishing, ensuring the validity and credibility of research articles. While it may seem like…

Read More
28th Sep 2023

The peer review process is a fundamental component of scholarly publishing, ensuring the quality and credibility of academic research. After submitting your manuscript to a publishing venue, it undergoes rigorous…

Read More
17th Aug 2023

Peer review is an integral part of scholarly communication and academic publishing. A key player in this process is the peer reviewer, who is typically a recognized expert in the…

Read More

FREE EBOOK

Open data demystified: the essential toolkit for researchers