Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

Romein, C. Annemieke; Hodel, Tobias; Gordijn, Femke; Zundert, Joris J. van; Chagué, Alix; Lange, Milan van; Jensen, Helle Strandgaard; Stauder, Andy; Purcell, Jake; Terras, Melissa M.; Heuvel, Pauline van den; Keijzer, Carlijn; Rabus, Achim; Sitaram, Chantal; Bhatia, Aakriti; Depuydt, Katrien; Afolabi-Adeolu, Mary Aderonke; Anikina, Anastasiia; Bastianello, Elisa; Benzinger, Lukas Vincent; ... (2024). Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done. Journal of data mining and digital humanities Episciences 10.46298/jdmdh.10403

[img]
Preview
Text
Exploring_Data_Provenance_11-3.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (4MB) | Preview

This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.

Item Type:

Journal Article (Original Article)

Division/Institute:

06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg) > Digital Humanities
06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg)

UniBE Contributor:

Hodel, Tobias Mathias

Subjects:

100 Philosophy
800 Literature, rhetoric & criticism
900 History

ISSN:

2416-5999

Publisher:

Episciences

Language:

English

Submitter:

Tobias Mathias Hodel

Date Deposited:

27 Mar 2024 09:42

Last Modified:

27 Mar 2024 09:42

Publisher DOI:

10.46298/jdmdh.10403

Uncontrolled Keywords:

Handwritten Text Recognition, Ground Truth, Crowdsourcing, Citizen Science

BORIS DOI:

10.48350/194575

URI:

https://boris.unibe.ch/id/eprint/194575

Actions (login required)

Edit item Edit item
Provide Feedback