How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives

Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011.

[img] Text
p78-cartoni.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (173kB)

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Item Type:

Conference or Workshop Item (Paper)

Division/Institute:

06 Faculty of Humanities > Department of Linguistics and Literary Studies > Institute of French Language and Literature

UniBE Contributor:

Zufferey, Sandrine

Subjects:

800 Literature, rhetoric & criticism > 840 French & related literatures
400 Language > 440 French & related languages

ISBN:

978-1-937284-015

Language:

English

Submitter:

Sandrine Zufferey

Date Deposited:

25 Apr 2016 11:46

Last Modified:

05 Dec 2022 14:53

BORIS DOI:

10.7892/boris.78680

URI:

https://boris.unibe.ch/id/eprint/78680

Actions (login required)

Edit item Edit item
Provide Feedback