Heritage collections: the invisible data source in academic publishing?
At F1000, we believe open data is an essential part of the shift towards more open research. While there are already established policies and practices for all forms of openness in the sciences, humanities researchers are still finding their feet with open data. The challenge facing humanities researchers is twofold: they must not only determine what constitutes ‘data’ in their field, but also how to manage, share, and cite it. Here, Rebecca Grant (Head of Data and Software Publishing, F1000) and Frances Madden (Research Associate, the British Library) discuss the role of heritage collections in data sharing and their hopes for the future of open data in the humanities.
Meet the authors
Frances Madden is a Research Associate at the British Library where she works on the Heritage PIDs project. Heritage PIDs aims to increase the adoption of persistent identifiers, PIDs, in heritage organizations such as galleries, libraries, archives and museums. Persistent identifiers are long-lasting digital references to resources that researchers can use to create reliable citations to both digital and physical resources.
Rebecca Grant is Head of Data and Software Publishing at F1000, where she supports the provision of open data sharing policies and innovative methods of data publishing. She is also co-chair of the STM Association’s Research Data Program Humanities sub-group, which is exploring the ways that publishers can create data sharing policies that are relevant and appropriate for authors in the humanities.
What are heritage collections?
Rebecca: Frances, what should humanities authors know about citing heritage collections in their reference lists?
Frances: Heritage collections such as archive material, rare books, and museum collection items are long-standing subjects of research, maybe even the longest standing. As a result, there is a well-established citation practice in reference lists, but these references have no fixed format, and the information they contain can vary. This can make it very difficult to trace back to the exact item.
References hardly ever contain a link to an online version of a resource or a link that a computer can analyze. As collections are digitized, and the collection items themselves or their information has moved online, citation practices have not kept up. Often the physical objects are cited, even when the researchers have used digitized material.
Not including links to the digitally available material reduces the prospect of others consulting the material and makes your work less transparent and reproducible.
Transparency is crucial
Rebecca: For journals and publishing platforms, like F1000Research and Routledge Open Research, transparency is crucial. We want readers to identify the sources that our authors use. We also want to ensure that claims in the body of articles are supported.
Additionally, we can link reference lists and citations to the “currency” of research: academic credit. When we think about citing heritage collections, we might reframe this as “acknowledgment.” Authors are highly motivated to cite and be cited. Still, heritage organizations that provide access to their collections must also receive credit through a citation.
Frances: Has something changed? It appears publishers are starting to push humanities authors to cite “data” now rather than “sources.”
Rebecca: This is an emerging challenge for our humanities authors. Many publishers, including F1000, have robust research data sharing policies that impact any author publishing with us. There’s a growing expectation that authors will include a data availability statement with their article and add a data citation to their reference list. Now sources like museum collections are referred to as “data” in journal policies and author instructions.
The challenges of citation
Rebecca: But do you think it’s intrinsically challenging to cite heritage sources, even without confusion about what constitutes “data”?
Frances: There are several reasons it can be hard to know how and what to cite when using heritage materials. Many journals do not have clear policies or guidance, and sometimes editors dislike long URLs. In addition, links can break, making their inclusion redundant.
To avoid the issue of broken links, heritage organizations can provide persistent identifiers or PIDs. PIDS are long-lasting links that organizations guarantee to maintain. However, not all heritage organizations use PIDs. As a result, it can be hard to know what to do when working across different collections. Even if organizations have implemented PIDs, recommended citation guidance can be challenging to find. It may not even be explicit anywhere.
Differing views about what a citation refers to add another layer of complexity. In the natural history community, CETAF Stable Identifiers refer to the physical item. Within the library and archives sphere, it is often unclear whether a digital identifier refers to the object’s physical or digital version.
Rebecca: Absolutely. Suppose a physical and digital version of the object exists. Should authors cite the digital version, as it is more readily accessible to the reader? Does this rule still apply if the author had accessed the physical version? Perhaps even integrated analysis of the object’s physicality into their research?
In attempting to create this guidance, publishers might be in danger of creating lengthy author instructions that authors skip over entirely. Creating a policy that covers every possible use case is also challenging.
The importance of proper citation
Rebecca: Can you explain why correct citation of these collections is essential to heritage organizations? Why should authors (and publishers) do their best to get it right?
Frances: The proper and full citation of heritage collections opens enormous possibilities for further research. Heritage organizations want to know how frequently researchers are using their collections. Usage statistics can be vital in securing funding for additional digitization work, website revamps, and new tools and access mechanisms. See the V&A’s Raphael Cartoons site as a great example.
As part of the Heritage PIDs project, we created the demonstrator tool. Our aim was to illustrate how bi-directional linking between collection items can help researchers navigate different resources. It also shows the difference between the publication and its underlying data.
However, not all organizations can provide clear guidance online about how authors should cite their collections. Perhaps due to resource constraints or the lack of clarity across the sector. Ideally, the page on which you access a digitized resource or metadata about a physical resource would contain a recommended citation. Such as at the British Library or Royal Botanic Garden Edinburgh. Unfortunately, this is not always the case.
Rebecca: I agree. It can be challenging for authors to identify what “best practice” looks like when several stakeholders are involved (e.g., publishers, heritage organizations, creators of citation standards). The Heritage PIDs project will be so valuable in this regard.The Heritage PIDs project will be so valuable in this regard.
The role authors can play
Rebecca: Ideally, what would you like authors and editors to do when citing heritage collections?
Frances: Where heritage organizations provide guidance on citing their collections, authors should follow it where possible. Editors can encourage the citation of digital items or the inclusion of digital references to physical materials.
Heritage organizations should make it as straightforward as possible to encourage the citation of digital versions of resources. This could be on a resource’s landing page, main citation guidance pages, or both. They should also try to encourage the inclusion of digital references for physical works.
Does this align with what publishers expect?
Rebecca: Yes. While most publishers have some form of citation guidance, it’s essential that authors and editors acknowledge the heritage organizations’ recommendations. The heritage organization should be the author’s first stop to find out what they should include in the reference list.
Heritage collections offer enormous possibilities for the future of humanities research. They also open the door to important conversations around what openness looks like in the humanities and how it works in practice. At F1000, our goal has always been to empower researchers to communicate their findings openly and easily. We’re committed to supporting humanities researchers on their journey to more open research and finding ways to ensure open data can be seen as an opportunity, not an obstacle. Find more about our work with research communities here.