Description
In this unit you will learn to:
- Select relevant open file formats
Learning resources
File formats
Research data comes from a wide variety of instruments and software, and can even include handwritten notes. These data can exist in either physical or digital formats. In the context of open science, where reproducibility, replicability and the FAIR principles (Findability, Accessibility, Interoperability and Reusability) are paramount, it is crucial to make all data and research materials available in non-proprietary formats whenever possible. This practice ensures that data can be accessed, used and shared across platforms and by different users. The following sections provide comparisons and examples to illustrate the importance and benefits of adopting open file formats.
Open File Formats
There are several key factors to consider when implementing data conversion to open file formats:
-
Open: It is important to choose formats that are open, meaning that the specifications for the format are publicly available and free to use. This ensures that anyone can read, edit and share the data without the need for proprietary software.
-
Standard: Choose formats that are standardised, meaning they are widely accepted and used by the community. Standardised formats often have better support, more robust documentation and a higher likelihood of compatibility across different systems and applications.
-
Interchangeable: The chosen formats should be easily interchangeable, allowing seamless data exchange between different software and platforms. This interoperability is essential for collaboration, data integration and long-term accessibility.
-
Sustainable: Ensure that formats are sustainable and long-lived, meaning that they are likely to be used and supported over time. This helps preserve data for future use and prevents obsolescence that can occur with proprietary or less widely adopted formats.
By considering these factors, you can ensure that your data remains accessible, usable and compatible over the long term, facilitating better data management and preservation. For guidance about how to think around file formats in general please have a look at:
- CESSDA Data Management Expert Guide - File formats and data conversion
- Swedish National Data Service - File format
- Geo Data - Support for researchers by Utrecht University - Open and FAIR File Formats
- Data Archiving and Networked Services - File formats
- Howtofair.dk - File formats
Guidance on file formats
There are two important questions to ask in order to understand what to do with open file formats: What domain or sub-domain are you working with? And what repositories do you want to upload to? The responses should dictate what type of file formats you may need.
However, several agencies have provided guidance on this type of issue. Usually the advice is simple, suggesting recommended or preferred formats as opposed to acceptable or non-preferred ones. The assumption is that sometimes it is indeed not possible to convert to a non-proprietary format, either because of the specific type of software needed to do the analysis, or because there is not yet an open source software alternative.
What we have done here is to bring together advice from a number of data specialists to give you an idea of what file formats are being discussed. It is up to you to decide what is or is not appropriate for your domain and specific research area. Please read the table with caution: some organisations may say that a format is not preferred, but others may find it acceptable. This is exactly the kind of decision-making that is involved in deciding which file formats to work with or convert to. The sources we looked into were:
- UK Data Service - Recommended formats
- 4TU. Research Data - Preferred file formats
- Data Archiving and Networked Services - File formats
- Swedish National Data Service - Choosing a file format
Types | Recommended/Preferred | Accepted | Non-preferred |
---|---|---|---|
Text | .xml, .rtf, .txt, .pdf, .json | .doc, .docx, .html | |
Audio | .mxf, .flac, | .mp3, .aif | .wav, .m4a |
Digital Image | .tif, .tiff, .jpeg, .jpg, .png, .svg | .dxf, .psd, .raw, .bmp | .ai, .eps. wmf, .emf |
Statistics (Spreadsheets) | .csv, .ods, .tsv | .xls, .xlsx, | |
Archives | .zip |
There are several ways to categorise and classify data types. Some organisations do this by the type of research data, e.g. qualitative or quantitative, and others focus on whether the data is textual or numerical, audio or video. Therefore, our table is not exhaustive and an extended list of further links can be found at Further Reading section.
Note:
- Where numerical values are involved in the form of a spreadsheet .csv is preferable to ensure interoperability with different softwares. For example, you could easily import .csv in most open or closed source software.
- Plain text .txt is the simplest form of textual data one can share. For example, all ReadMe files are shared in .txt format to ensure that guidance on how to reuse and interpret research data are at a minimum fully open and reusable.
- When it comes to images, it is important to make an informed decision on what type of format serves best the content of the image. Utrecht University, provides some great guidance on file functionality with a specific example on how to think about images: FAIR File Formats
References
CESSDA Data Management Expert Guide. (n.d.). CESSDA Data Management Expert Guide - File formats and data conversion. Retrieved from https://dmeg.cessda.eu/Data-Management-Expert-Guide/3.-Process/File-formats-and-data-conversion
Swedish National Data Service. (n.d.). Swedish National Data Service - File format. Retrieved from https://snd.se/en/manage-data/organise/file-format
Swedish National Data Service. (n.d.). Swedish National Data Service - Choosing a file format. Retrieved from https://snd.se/en/manage-data/guides/choosing-file-format
Geo Data - Support for researchers by Utrecht University. (n.d.). Geo Data - Support for researchers by Utrecht University - Open and FAIR File Formats. Retrieved from https://geo-data-support.sites.uu.nl/open-science-open-data/fair-file-formats/
Howtofair.dk. (n.d.). Howtofair.dk - File formats. Retrieved from https://www.howtofair.dk/how-to-fair/file-formats/
UK Data Service. (n.d.). UK Data Service - Recommended formats. Retrieved from https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/recommended-formats/
4TU. Research Data. (2023). 4TU. Research Data - Preferred file formats. Retrieved from https://data.4tu.nl/s/documents/Preferred_File_Formats_2023.pdf
Data Archiving and Networked Services. (n.d.). Data Archiving and Networked Services - File formats. Retrieved from https://dans.knaw.nl/en/file-formats/