Using the Europarl corpus for linguistic research

Cartoni, Bruno; Meyer, Thomas; Zufferey, Sandrine (2013). Using the Europarl corpus for linguistic research. Belgian Journal of Linguistics, 27(1), pp. 23-42. John Benjamins 10.1075/bjl.27.02car

Full text not available from this repository.

Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various fields of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.

Item Type:	Journal Article (Original Article)
Division/Institute:	06 Faculty of Humanities > Department of Linguistics and Literary Studies > Institute of French Language and Literature
UniBE Contributor:	Zufferey, Sandrine
Subjects:	800 Literature, rhetoric & criticism > 840 French & related literatures 400 Language > 440 French & related languages
ISSN:	0774-5141
Publisher:	John Benjamins
Language:	English
Submitter:	Sandrine Zufferey
Date Deposited:	25 Apr 2016 11:21
Last Modified:	05 Dec 2022 14:53
Publisher DOI:	10.1075/bjl.27.02car
URI:	https://boris.unibe.ch/id/eprint/78532

Actions (login required)

Edit item

Using the Europarl corpus for linguistic research

Interest & Impact

Downloads

Citations

Search

Services

Actions (login required)

Item Type:

Division/Institute:

UniBE Contributor:

Subjects:

ISSN:

Publisher:

Language:

Submitter:

Date Deposited:

Last Modified:

Publisher DOI:

URI:

Actions (login required)