1.6 Data in the SSH
Description
In this unit you will learn to:
- Recognise SSH specificities within the research workflow during knowledge production from an ethical, legal and methodological perspective.
Learning resources
Data in the SSH
Data is a multidimensional concept that can be interpreted in several ways (Giglia, 2021a, 2021b):
First, data as a process: it is a dynamic output that is generated, processed, refined and used in an ongoing cycle that spans the entire research life cycle. This interpretation emphasises the continuous nature of data, highlighting its evolution and the transformative processes it undergoes.
Second, data as a record obtained through a method: in the social sciences and humanities (SSH), data represent information obtained through systematic observation, measurement, and experimental procedures, such as surveys, interviews, or ethnographic studies. This perspective emphasises the importance of methodological rigour in the collection of both qualitative and quantitative data.
Third, data as anything formalised through language: language acts as a 'data carrier', containing rich information about human history, culture and society. Language records also document the evolution of language, semantics and usage patterns over time, providing valuable insights into linguistic and cultural development.
Finally, data in the form of books, corpora and codices: these resources serve as repositories of information, providing structured and/or unstructured data that can be analysed for various purposes. This interpretation recognises the role of traditional and digital texts as important sources of knowledge and data for analysis.
As such, we arrive at certain definitions, if that is possible:
Humanities:
“We could then define data in the humanities broadly as all materials and assets scholars collect, generate, and use during all stages of the research cycle” (ALLEA Report, 2020 – Sustainable and FAIR data sharing in the humanities).
Social Sciences:
In the social sciences, "data" refers to quantitative or qualitative information collected through observation, measurement, or inquiry that is used to understand, interpret, and analyze human behavior, social structures, and societal trends.
The word "data" evolves to become, perhaps, research material. Highlighting the challenges of interpreting what is data in the SSH reveals several key issues. Subjectivity plays an important role, as there is no clear definition of data, leading to different types of resources being interpreted as data. This ambiguity is compounded by contextualisation, where the definition of data depends on various factors, including the perspective of the researcher and the specific context of the study. In addition, the multiplicity of data types and formats adds to the complexity, as data in SSH can encompass a wide range of formats and types, making standardisation and consistent interpretation difficult.
Data in the Humanities
In the humanities, data encompasses a diverse array of formats, each serving distinct roles in research and analysis. Metadata provides essential contextual information that supports digital workflows, enabling visualization and analysis. Tabular data, which is structured data organized in rows and columns, serves as a foundational source for organizing, analyzing, and visualizing information. Structured text refers to text formatted in a consistent and predictable way, facilitating its use in various applications. In contrast, non-structured text includes free-form, natural language texts such as transcriptions, narratives, social media posts, and correspondence, offering rich, albeit less organized, information. Image files and digitized manuscripts preserve visual and textual historical artifacts, while sound files capture audio data crucial for various analyses. Maps offer geographical data representations, essential for spatial analysis. Lastly, relational databases store tabular data in a structured format where each row represents a record and each column denotes a specific attribute or field, allowing for efficient data retrieval and management (Arnold, Valencia, & Arènes, 2022).
Data in the Social Sciences
Similarly, data in the social sciences is characterised by a diverse range of formats, each of which contributes a distinct perspective to the research and analysis process.
Social science research frequently employs both quantitative and qualitative data, which differ in their nature, characteristics, and methods of collection and analysis. Quantitative data comprises numerical information that can be measured and statistically analysed, whereas qualitative data encompasses non-numerical insights, often gathered through interviews, observations, and textual analysis, providing a deeper understanding of human behaviour and social phenomena. This comprehensive approach ensures robust and ethically sound research practices in the social sciences (CESSDA Training Team, 2020).
Observational data is collected in real time, frequently through field notes or social experiments, thereby providing insights into human behaviour and interactions. Experimental data is derived from controlled experiments, which provide a structured and replicable methodology for testing hypotheses. Simulation data, generated from test models, allows researchers to explore scenarios and predict outcomes in a controlled virtual environment. Survey data, derived from individual or aggregate survey results, provides valuable quantitative and qualitative information on various social phenomena. Records, which are documentation-based data, offer a wealth of historical and administrative information. Secondary data, previously collected, processed, and analysed by others, can be an invaluable resource, providing context and supporting comparative analysis. Collectively, these diverse data types enable a comprehensive and nuanced understanding of social dynamics and structures (Jeng, He, & Oh, 2016).
The CESSDA Training Team (2020) provides further descriptions of data in the social sciences that can be categorised into several types, each of which presents unique considerations and challenges, particularly when dealing with personal or sensitive data. The term "personal and sensitive data" encompasses any information that can directly identify an individual, as well as data that is deemed particularly sensitive or confidential, thus necessitating special protection. Researchers must be aware of their ethical and legal obligations and ensure compliance with current legislation and requirements, including the General Data Protection Regulation (GDPR). It is of the utmost importance for researchers to implement the recommended practices, such as obtaining informed consent and anonymising data, in order to protect the identity of participants when required by law or requested.
References
References:
Arnold, M., Valencia, O., & Arènes, C. (2022, May 9). Humanities and FAIR data. Zenodo. https://doi.org/10.5281/zenodo.6531506
CESSDA Training Team. (2020). CESSDA Data Management Expert Guide. CESSDA ERIC. https://doi.org/10.5281/zenodo.3820473
Giglia, E. (2021, January 28). CO-OPERAS: FAIR data in the SSH. Zenodo. https://doi.org/10.5281/zenodo.4475487
Giglia, E. (2021, September 15)b. FAIR data in the Humanities. Zenodo. https://doi.org/10.5281/zenodo.5510388
Harrower, N., Maryl, M., Biro, T., Immenhauser, B., & ALLEA Working Group E-Humanities. (2020). Sustainable and FAIR Data Sharing in the Humanities: Recommendations of the ALLEA Working Group E-Humanities. Digital Repository of Ireland. https://doi.org/10.7486/DRI.tq582c863
Jeng, W., He, D., & Oh, J. S. (2016). Toward a conceptual framework for data sharing practices in social sciences: A profile approach. Proceedings of the Association for Information Science and Technology, 53(1), 1-10. https://doi.org/10.1002/pra2.2016.14505301037
Further reading
Edmond, J. (Ed.). (2020). Digital Technology and the Practices of Humanities Research. Open Book Publishers.
Gualandi, B., Pareschi, L., & Peroni, S. (2023). What do we mean by “data”? A proposed classification of data types in the arts and humanities. Journal of Documentation, 79(7), 51-71. https://doi.org/10.1108/JD-07-2022-0146
Whyte, A., Green, D., Avanço, K., Di Giorgio, S., Gingold, A., Horton, L., Koteska, B., Kyprianou, K., Prnjat, O., Rauste, P., Schirru, L., Sowinski, C., Torres Ramos, G., van Leersum, N., Sharma, C., Méndez, E., & Lazzeri, E. (2023). D2.1 Catalogue of Open Science Career Profiles - Minimum Viable Skillsets (v1.2). Zenodo. https://doi.org/10.5281/zenodo.8101903