Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011.
Text
p78-cartoni.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (173kB) |
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.
Item Type: |
Conference or Workshop Item (Paper) |
---|---|
Division/Institute: |
06 Faculty of Humanities > Department of Linguistics and Literary Studies > Institute of French Language and Literature |
UniBE Contributor: |
Zufferey, Sandrine |
Subjects: |
800 Literature, rhetoric & criticism > 840 French & related literatures 400 Language > 440 French & related languages |
ISBN: |
978-1-937284-015 |
Language: |
English |
Submitter: |
Sandrine Zufferey |
Date Deposited: |
25 Apr 2016 11:46 |
Last Modified: |
05 Dec 2022 14:53 |
BORIS DOI: |
10.7892/boris.78680 |
URI: |
https://boris.unibe.ch/id/eprint/78680 |