Skip to content

2.1 Metadata types and standards

Description

In this unit you will learn to:

  • Identify different types and standards of metadata

Learning resources

Metadata

Simply put, metadata is data about data. However, its role in scholarly communication is far more profound. Borgman (2015) states, "Data will remain 'dark matter' in scholarly communication unless it is described, curated, and made discoverable." This highlights the critical need for metadata to make research data accessible and useful.

Metadata performs multiple functions: it describes, organises and enables the discovery of data, ensuring that data does not remain obscure and under-utilised. Its importance in research cannot be overstated, as it increases the transparency, reproducibility and long-term usability of data.

To understand the full scope of metadata's functions, importance and relevance to research, we will explore different types of metadata and the standards that govern them. By understanding these elements, researchers can effectively manage their data, making it a valuable asset in the academic and scientific community.

Types of metadata

Metadata can be broadly categorised into three types: Administrative, Descriptive and Structural. Each type serves a different purpose in the effective management and use of data.

Administrative Metadata:

  • Purpose: Manages a resource.
  • Components: Contains information about the creation, acquisition and use of data.
  • Examples: File type, creation date, access rights and preservation metadata. This type of metadata ensures the proper management and long-term accessibility of data.

Descriptive metadata:

  • Purpose: Describes a resource.
  • Components: Provides information to help identify and discover a resource.
  • Examples: Title, author, abstract, keywords and subject classifications. Descriptive metadata is critical to search and retrieval processes, making data discoverable and understandable.

Structural metadata:

  • Purpose: Indicates the provenance of a resource.
  • Components: Shows how different components of a resource are organised.
  • Examples: Tables of contents, page numbers, chapters, sections, and relationships between records. Structural metadata ensures that data is organised logically and that relationships are clear.

Understanding these types of metadata is essential for effective data management, ensuring that data is accurately described, easily discoverable, and properly organised for future use.

Metadata Standards

Deutz et al. (2020), in the howtofair.dk webpage, provide an explanation on what is a metadata standard:

A metadata standard is a subject-specific guide to your metadata. Metadata elements are grouped into sets designed for a specific purpose and given a standard name and definition. Rules on what content must be included, what syntax must be used, or a controlled vocabulary can also be included in a metadata standard.

Focusing on SSH, there are several standards which could be useful to researchers according always to the type of research. You can explore the full list of metadata standards in the RDA Metadata Standards Catalog. Another fun and interesting depiction of the multiplicity of metadata, is the work produced by Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe

Metadata Transcription Template

The Data Archiving and Networked Services (DANS) with the contribution of University of Amsterdam produced a Metadata transcription template. The template provides useful information about the context and process of the interview. Besides the most well-known descriptive metadata fields, such as title and subtitle, we would like to emphasise for example the following metadata fields:

  • Recording by: Name and type of recording (audio/video), naming device

    • Note: The name of the device can enable others who wish to conduct interviews to use similar technology. It is also essential for replication if one wishes to test the functions and capabilities of the device as reported in a research report.
  • Setting: Describe the setting and atmosphere of the interview to illustrate what cannot be sensed by simply reading the text in the transcript.

    • Note: This section of the template is used, for example, to provide contextual information about a situation that could add qualitative elements, for example if interviews take place in situations where refugees are involved, or for some reason there is stress or reasons that influence aspects of the interview.

Similar to the transcription template, DANS provides guidelines as to what metadata crucial to be created before uploading data in the repository, however, without providing a template this time. We provide the information, which is derived verbatim from the DANS Website:

  • Describe the variable labels and value labels.
  • Describe the questionnaires and/or other research tools.
  • Include the fieldwork report (if available).
  • Include a codebook: a description of variables and information about population, types of data (units of observation/analysis), sample procedure, response/non-response, data collection method, weighting variables, constructed and/or derived variables.
  • Ensure that the language of the variable labels and value labels correspond to the language of the rest of the dataset.

Finally, we would like to close this section with a reference to two well-known metadata standards that are widely applied in several repositories: Dublin Core and DataCite.

Dublin Core

The Dublin Core Metadata Initiative focuses on providing a simple and generic framework for describing digital resources. It comprises 15 core metadata elements designed to cover three primary aspects: content, intellectual property and instantiation. This approach aims to be accessible and applicable to different types of resources, making it a versatile tool for metadata creation. The Dublin Core elements are widely used in institutional repositories where they help to effectively manage and preserve digital content. This framework supports consistent and comprehensive metadata descriptions that facilitate the discovery, sharing and reuse of digital resources.

Example of Dublin Core: Dublin Core and the Digital Repository of Ireland

DataCite

The DataCite initiative focuses on assigning persistent identifiers to research data, ensuring that datasets can be cited and shared. It is widely used for data and research materials and provides a structured framework for metadata, including mandatory, recommended and optional fields. DataCite standards evolve with new versions over time, allowing for continuous improvement and adaptation to the needs of the research community. In particular, platforms such as Zenodo have adopted versions of DataCite and are using its framework to improve the discoverability and citation of research outputs. This initiative plays a crucial role in the scholarly communication ecosystem by promoting data accessibility and reuse.

More details about the DataCite Metadata Schema here: DataCite Metadata Schema 4.5 (Released 22 Jan, 2024)

References

Borgman, C. L. (2017). Big data, little data, no data: Scholarship in the networked world. MIT press.

D.B. Deutz, M.C.H. Buss, J. S. Hansen, K. K. Hansen, K.G. Kjelmann, A.V. Larsen, E. Vlachos, K.F. Holmstrand (2020). How to FAIR: a Danish website to guide researchers on making research data more FAIR https://doi.org/10.5281/zenodo.3712065

European Data Portal. (n.d.). Introduction to metadata management (Version 1.4). Retrieved from https://data.europa.eu/sites/default/files/d2.1.2_training_module_1.4_introduction_to_metadata_management_en_edp.pdf

Further reading

Avanço, K. (January 19, 2023). What is metadata for publication and how is it used? Part 1: Introduction to metadata. The road to FAIR. Retrieved July 17, 2024 from https://doi.org/10.58079/trrx

Avanço, K. (February 3, 2023). What is metadata for publication and how is it used? Part 2: Metadata standards. The road to FAIR. Retrieved July 17, 2024 from https://doi.org/10.58079/trry

Riley, J. (2018). Seeing Standards: A Visualization of the Metadata Universe (Version V3) [Dataset]. Borealis. https://doi.org/10.5683/SP2/UOHPVH