Ströbel, Phillip Benjamin; Hodel, Tobias; Boente, Walter; Volk, Martin (2023). The Adaptability of a Transformer-Based OCR Model for Historical Documents. In: Coustaty, Mickael; Fornés, Alicia (eds.) Document Analysis and Recognition - ICDAR 2023 Workshops. Lecture Notes in Computer Science: Vol. 14193 (pp. 34-48). Cham: Springer 10.1007/978-3-031-41498-5_3
Text
978-3-031-41498-5_3.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (1MB) |
We tested the capabilities of Transformer-based text recognition technology when dealing with (multilingual) real-world datasets. This is a crucial aspect for libraries and archives that must digitise various sources. The digitisation process cannot rely solely on manual transcription due to the complexity and diversity of historical materials. Therefore, text recognition models must be able to adapt to various printed texts and manuscripts, especially regarding different handwriting styles. Our findings demonstrate that Transformer-based models can recognise text from printed and handwritten documents, even in multilingual environments. These models require minimal training data and are a suitable solution for digitising libraries and archives. However, it is essential to note that the quality of the recognised text can be affected by the handwriting style.
Item Type: |
Book Section (Book Chapter) |
---|---|
Division/Institute: |
06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg) > Digital Humanities 06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg) |
UniBE Contributor: |
Ströbel, Phillip Benjamin, Hodel, Tobias Mathias |
Subjects: |
100 Philosophy 800 Literature, rhetoric & criticism 900 History 000 Computer science, knowledge & systems |
ISSN: |
1611-3349 |
ISBN: |
978-3-031-41498-5 |
Series: |
Lecture Notes in Computer Science |
Publisher: |
Springer |
Funders: |
[159] Hasler Foundation |
Language: |
English |
Submitter: |
Tobias Mathias Hodel |
Date Deposited: |
22 Sep 2023 13:47 |
Last Modified: |
22 Sep 2023 13:47 |
Publisher DOI: |
10.1007/978-3-031-41498-5_3 |
BORIS DOI: |
10.48350/186497 |
URI: |
https://boris.unibe.ch/id/eprint/186497 |