Gubler, Kaspar (4 June 2021). SNSF Spark Projekt ‘Dynamic Data Ingestion’ for server-side data harmonisation: Creating a database with 200k students and scholars 1200-1800: Method, concept and practical implementation (Unpublished). In: nodegoat day 2021: From source to visualization: Data modeling and analysis with Nodegoat. Universität Bern. 04.06.2021.
|
Text (Tagungsprogramm nodegoat day 2021)
Nodegoat-Day-2021-Programm.pdf - Supplemental Material Available under License BORIS Standard License. Download (995kB) | Preview |
The linking of research data has been a dominant topic for years, especially in digital history. Linked Open Data (LOD) is the buzzword at conferences and in research projects. However, it is not the collection of such data available on the internet that is the greatest challenge here, but its harmonisation, because research databases are usually structured differently. It is therefore not surprising that despite many initiatives no research project in digital history has yet been realised being able to harmonise data across several structural levels of the databases. This means, for example, not only linking persons of databases by their names, but going deeper into the data structure to harmonise, for example, the geographical origin or attributes of a person’s education. But that would be the aim: to answer scientific questions through structural data harmonisation. This is where our SPARK project comes in. The third and final phase of the project (Episode 3) has been completed in January 2021. What are the core results of this project? In essence, it is a software module (DDI module for ‘dynamic data ingestion) and a method: data (research data) is collected from different source databases and ingested on a central server using the module according to the spider principle, creating a new metadatabase. The harmonisation of the collected data in this new build database is done as far as possible already with the data ingestion by mapping the database fields of the source databases into corresponding database fields to the new metadatabase. If such a mapping is not or only partially possible because the database fields of the source database and the metadatabase are too dissimilar, in a second step, as soon as the data is stored on the central server, an algorithm can be used to bring uniformity to this data by data reconciliation. In addition, the data can also be automatically reclassified in order to standardise it. These measures prepare the data for analysis and ultimately for publication, which both can be done in the virtual research environment Nodegoat.
Item Type: |
Conference or Workshop Item (Speech) |
---|---|
Division/Institute: |
06 Faculty of Humanities > Department of History and Archaeology > Institute of History 06 Faculty of Humanities > Department of History and Archaeology > Institute of History > Medieval History |
UniBE Contributor: |
Gubler, Kaspar |
Subjects: |
900 History > 940 History of Europe |
Language: |
English |
Submitter: |
Kaspar Gubler |
Date Deposited: |
05 Jul 2024 14:53 |
Last Modified: |
08 Jul 2024 08:43 |
Uncontrolled Keywords: |
nodegoat, data modeling, data analysis, data visualisation, digital humanities |
BORIS DOI: |
10.48350/198516 |
URI: |
https://boris.unibe.ch/id/eprint/198516 |