Chapter 6: Supervised and Unsupervised: Approaches to Machine Learning for Textual Entities

Hodel, Tobias (2022). Chapter 6: Supervised and Unsupervised: Approaches to Machine Learning for Textual Entities. In: Archives, Access and Artificial Intelligence. Working with Born-digital and Digitized Archival Collections. Digital Humanities Research: Vol. 2 (pp. 157-177). Bielefeld: Transcript

[img]
Preview
Text
Hodel.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (2MB) | Preview

Applications that feed text into machine learning algorithms have existed for more than a decade. But it took multiple developments to make machine learning an exciting methodological approach to questions grounded in the humanities. The latest developments in handwritten text recognition (HTR) show the capabilities of supervised deep learning. However, the success of the technology comes with a price: It generates a set of methods that are complicated to grasp in theory and difficult to train algorithms in, algorithms that are not comprehensible to humans at all. By focusing on the two most frequently used approaches in machine learning (unsupervised and supervised), this paper lays out ways to critically use machine learning algorithms in the humanities. At the same time, we argue that these approaches help us to understand the epistemological assumptions of our disciplines and our methods.
Topic modeling used on large corpora of text leads to new insights into what topics occur, as well as the tendencies of a corpus. The approach uses unsupervised machine learning, through which a set of algorithms identify what words appear together frequently and so might indicate a topic. Topic modeling puts scholars at the end of the process, where they must still interpret the output of the algorithms.
In deciphering handwriting, supervised deep learning approaches have led to astonishing results, but also to new problems induced by the algorithm. The algorithm tries to adapt to the desired output, raising epistemological questions about transcribing and transliterating. The scholar is only able to alter the input, not how the algorithm manipulates it.
Based on these two examples, this paper promises a deeper understanding of a technology that is currently remodeling the way we do our research and that will increasingly intervene in our scholarship and even our daily lives in the future.

Item Type:

Book Section (Book Chapter)

Division/Institute:

06 Faculty of Humanities > Other Institutions > Walter Benjamin Kolleg (WBKolleg) > Digital Humanities

UniBE Contributor:

Hodel, Tobias Mathias

Subjects:

000 Computer science, knowledge & systems
900 History

ISSN:

2749-1986

ISBN:

978-3-8394-5584-5

Series:

Digital Humanities Research

Publisher:

Transcript

Language:

English

Submitter:

Tobias Mathias Hodel

Date Deposited:

05 May 2022 09:50

Last Modified:

05 Dec 2022 16:18

Uncontrolled Keywords:

Machine Learning, Text Recognition, Topic Modeling, Bias

BORIS DOI:

10.48350/169050

URI:

https://boris.unibe.ch/id/eprint/169050

Actions (login required)

Edit item Edit item
Provide Feedback