The Adaptability of a Transformer-Based OCR Model for Historical Documents

Ströbel, Phillip Benjamin; Hodel, Tobias; Boente, Walter; Volk, Martin (2023). The Adaptability of a Transformer-Based OCR Model for Historical Documents. In: Coustaty, Mickael; Fornés, Alicia (eds.) Document Analysis and Recognition - ICDAR 2023 Workshops. Lecture Notes in Computer Science: Vol. 14193 (pp. 34-48). Cham: Springer 10.1007/978-3-031-41498-5_3

[img] Text
978-3-031-41498-5_3.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (1MB) | Request a copy

We tested the capabilities of Transformer-based text recognition technology when dealing with (multilingual) real-world datasets. This is a crucial aspect for libraries and archives that must digitise various sources. The digitisation process cannot rely solely on manual transcription due to the complexity and diversity of historical materials. Therefore, text recognition models must be able to adapt to various printed texts and manuscripts, especially regarding different handwriting styles. Our findings demonstrate that Transformer-based models can recognise text from printed and handwritten documents, even in multilingual environments. These models require minimal training data and are a suitable solution for digitising libraries and archives. However, it is essential to note that the quality of the recognised text can be affected by the handwriting style.

Item Type:

Book Section (Book Chapter)

Division/Institute:

06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg) > Digital Humanities
06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg)

UniBE Contributor:

Ströbel, Phillip Benjamin, Hodel, Tobias Mathias

Subjects:

100 Philosophy
800 Literature, rhetoric & criticism
900 History
000 Computer science, knowledge & systems

ISSN:

1611-3349

ISBN:

978-3-031-41498-5

Series:

Lecture Notes in Computer Science

Publisher:

Springer

Funders:

[159] Hasler Foundation

Language:

English

Submitter:

Tobias Mathias Hodel

Date Deposited:

22 Sep 2023 13:47

Last Modified:

22 Sep 2023 13:47

Publisher DOI:

10.1007/978-3-031-41498-5_3

BORIS DOI:

10.48350/186497

URI:

https://boris.unibe.ch/id/eprint/186497

Actions (login required)

Edit item Edit item
Provide Feedback