3.2 Data Publishing: Linking data to publications
Description
In this unit you will learn to:
- Identify the various practices of linking data and publication
Learning resources
Data publishing: linking data to publications
Linking publications to data promotes transparency, reproducibility and collaboration across disciplines. When researchers make their datasets and other output formats (see the module on data in SSH) available, they enable the scientific community to verify results, replicate studies and build on existing work.
To maximise the impact of linking publications to data, it is essential to adhere to standardised data formats and metadata conventions to ensure that data are both discoverable and interpretable. This requires collaboration between researchers, publishers and funders to establish and enforce best practices for data sharing.
As a helpful measure, here is a publication checklist to always keep in mind (Table produced from Stockholm University: Publish Research Data):
Checklist |
---|
1. Select filenames that indicate the contents and are understandable |
2. Store files in common -and ideally- open formats. Describe file formats and indicate relevant software |
3. Consider providing additional documentation on variables applied |
4. Link data and metadata to standardized vocabularies standards ontologies |
5. Provide complete reference records including PIDs |
6. Link the deposited resource to authors institutions and other entities involved in or supporting the research project. Use PIDs (i.e. ORCIDs) |
Data repositories - Examples
DataverseNO - National Research Data Repository (Norway) - Go to page
DataverseNO is a national, multi-institutional repository for open research data in Norway. It provides a secure and reliable platform for researchers to share, publish and preserve their datasets, thereby increasing transparency and reproducibility in research. Managed by UiT The Arctic University of Norway, DataverseNO supports a wide range of disciplines and promotes best practices in data management, ensuring that research data remains accessible and reusable in the long term.
DANS Data Station for Social Sciences and Humanities -Go to page
The DANS Data Station Social Sciences and Humanities is a prominent repository managed by the Dutch Data Archiving and Networked Services (DANS). It provides a specialised platform for storing, sharing and preserving research data in the social sciences and humanities. DANS provides comprehensive support and resources to researchers, ensuring that data are FAIR (Findable, Accessible, Interoperable and Reusable), fostering academic collaboration and facilitating the reuse of data for new research.
Swedish National Data Service - Go to page
The Swedish National Data Service (SND) is a leading national infrastructure for open access to research data in Sweden. It supports researchers in the social sciences, humanities and health sciences by providing services for data management, archiving and dissemination. SND aims to promote the availability and reuse of research data, thereby increasing the transparency and reproducibility of scientific research. By collaborating with Swedish universities and research institutions, SND ensures that high quality data resources are preserved and made available for future research.
Finnish Social Science Data Archive - Go to page
The Finnish Social Science Data Archive (FSD) is a central repository for social science research data in Finland. It provides a wide range of services to researchers, including data acquisition, preservation and dissemination. The FSD supports the FAIR principles by ensuring that data are findable, accessible, interoperable and reusable. By facilitating access to high quality data, FSD promotes the advancement of social science research and encourages the reuse of data in new studies, thus contributing to the growth of knowledge and innovation in the field.
Data Availability Statement
What is a data availability statement? It is a section in a published resource where authors provide information about the accessibility of the data that accompany the publication.
By including a data availability statement, researchers not only improve the quality and reliability of their work, but also contribute to a more open, reproducible and impactful scientific enterprise.
Benefits
So what are the benefits for providing a data availability statement?
- Promotes transparency and helps researchers understand the published findings and verify the results.
- Facilitates reproducibility and allows researchers to replicate the study
- Enhances discoverability and allows researchers to access and reuse the data
- Encourages ethical practices and compliance to relevant standards
- Increases impact, as data become easily findable and citable
However, it is essential that data are properly prepared before publication. Researchers need to ensure that their datasets are managed in a way that maximises their utility and impact. This includes assigning unique identifiers to datasets to improve discoverability, storing data in accessible repositories with clear licences for use, and adhering to standardised formats and metadata conventions to facilitate compatibility and integration between different systems.
Scenario
A guest editorial at COPE, highlighting the discrepancies between data sharing principles and actual practices: The gap between promise and reality.
Linking monographs and data
Data are the foundation of empirical research and theoretical argumentation. At SSH, data-whether quantitative or qualitative-provide the essential evidence needed to understand and analyze complex social phenomena. Quantitative data, derived from surveys, experiments, and administrative records, allow researchers to identify patterns and test hypotheses. Qualitative data, such as interviews, multimedia resources, and textual resources, provide contextual insights.
In the social sciences and humanities, monographs are the primary form of scholarly output because they allow for extensive examination of primary sources, the development of sophisticated arguments, and the presentation of a coherent thesis, often shaped by data analysis and interpretation. Thus, research data are part of the output and the published version, which is not necessarily the case in other disciplines where data are deposited/published as supplementary materials, datasets, etc.
Therefore, linking data to monographs increases the transparency, credibility, and impact of scholarly work.
- The case of Open Book Publishers as an example of a comprehensive data publication policy.
Examples:
Authors store the resources in their repository, where possible, as a further backup. If the data is hosted online as an additional resource, OPB links to the online resources in the book, usually as part of the table of contents.
Even if the dataset is hosted elsewhere, OBP prefers that a copy of it is also hosted on the book's website, as this keeps all the key resources together on the book's landing page (which is also where the book's DOI points to).
-
Image, Knife and Gluepot, by Kate Rudy : The book includes an appendix, presented as a digital resource and hosted on OBP’s website. It is a spreadsheet of data about where all of the pieces of a dismantled manuscript are kept (the author had spent many years tracking them down in various archives and the manuscript, its dismantling and its afterlives is the subject of the book).
-
Health Care in the Information Society by David Ingram also has a large number of appendices relating to legal and policy changes in health provision in the UK that are hosted on our website.
-
William Moorcroft, Potter: Individuality by Design by Jonathan Mallinson has online image portfolios and unpublished documents hosted on our website.
-
A Lexicon of Medieval Nordic Law by Inger Larsson, Ulrika Djärv, Jeffrey Love, Christine Peel, and Erik Simensen, is a suite of printed and other digital editions of this website, drawing the content directly from the online database.
Data Citation
Researchers should provide citations for datasets. This involves creating a formal citation with information about the dataset authors, title, year of publication, version and PID.
Data Citation Principles
The 8 citation principles as defined in the Joint Declaration of Data Citation Principles are:
- Importance: Data should be regarded as legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects.
- Credit and attribution: Data citations should facilitate the provision of scholarly credit and normative and legal attribution to all contributors to the data.
- Evidence: Whenever and wherever a claim is based on data, the corresponding data should be cited.
- Unique identification: A data citation should include a persistent method of identification that is machine-processable, globally unique, and widely used by a community.
- Access: Data citations should facilitate access to the data itself and to associated metadata, documentation, code, and other materials necessary for humans and machines to make informed use of the referenced data.
- Persistence: Unique identifiers and metadata describing the data and their disposition should persist, even beyond the lifetime of the data they describe.
- Specificity and Verifiability: Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity.
- Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the different practices of different communities, but should not differ so much as to compromise the interoperability of data citation practices.
Data Citation - Example
Dataverse.org provides an example and a breakdwon of how the principles are exemplified through a proper citation practice.
References
Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 https://doi.org/10.25490/a97f-egyk
Jamshidi-Naeini, J., Brown, A. W., Najam, W., Vorland, C. J., Dickinson, S., & Allison, D. B. (2023, July 24). Guest editorial: Data availability statements. Committee on Publication Ethics. Retrieved from https://publicationethics.org/news/guest-editorial-data-availability-statements
Stockholm University. (n.d.). Publish research data. Retrieved from https://www.su.se/staff/researchers/research-data/preserve-research-information/publish-research-data-1.598234#Checklist%20for%20publishing