Bitext alignment

J Tiedemann - 2011‏ - books.google.com
This book provides an overview of various techniques for the alignment of bitexts. It
describes general concepts and strategies that can be applied to map corresponding parts …

[PDF][PDF] Little strokes fell great oaks: Creating CoRoLa, the reference corpus of contemporary Romanian

D Tufiș, V Barbu Mititelu, E Irimia, V Păiș, R Ion… - 2019‏ - dspace.bcu-iasi.ro
The paper presents the quite long-standing tradition of Romanian corpus acquisition and
processing, which reaches its peak with the reference corpus of contemporary Romanian …

CombAlign: a tool for obtaining high-quality word alignments

S Steingrímsson, H Loftsson, A Way - Proceedings of the 23rd …, 2021‏ - aclanthology.org
Being able to generate accurate word alignments is useful for a variety of tasks. While
statistical word aligners can work well, especially when parallel training data are plentiful …

[PDF][PDF] Hybrid parallel sentence mining from comparable corpora

D Stefanescu, R Ion, S Hunsicker - Proceedings of the 16th annual …, 2012‏ - aclanthology.org
This paper presents a fast and accurate parallel sentence mining algorithm for comparable
corpora called LEXACC based on the Cross-Language Information Retrieval framework …

[PDF][PDF] Romanian TimeBank: An annotated parallel corpus for temporal information

C Forăscu, D Tufiş - … of the Eighth International Conference on …, 2012‏ - metanet4u.eu
The paper describes the main steps for the construction, annotation and validation of the
Romanian version of the TimeBank corpus. Starting from the English TimeBank corpus–the …

Automatic Lexical Alignment between Syntactically Weak Related Languages. Application for English and Romanian

M Colhon - … and Applications: 5th International Conference, ICCCI …, 2013‏ - Springer
In this paper we describe an alignment system that takes English-Romanian parallel
sentences (bitexts) and aligns them at their content-word level. A syntactic feature approach …

[PDF][PDF] Collecting and using comparable corpora for statistical machine translation

I Skadiņa, A Aker, N Mastropavlos, F Su… - Proceedings of the 8th …, 2012‏ - academia.edu
Lack of sufficient parallel data for many languages and domains is currently one of the major
obstacles to further advancement of automated translation. The ACCURAT project is …

[PDF][PDF] Analysis and evaluation of comparable corpora for under-resourced areas of machine translation

I Skadiņa, A Vasiļjevs, R Skadiņš… - The 5th Workshop on …, 2012‏ - perso.limsi.fr
This abstract presents the FP7 project ACCURAT that aims to research methods and create
tools that find, measure, and use bi/multilingual comparable corpora to improve the quality of …

[PDF][PDF] Transferring Coreference Chains through Word Alignment.

O Postolache, D Cristea, C Orasan - LREC, 2006‏ - lrec-conf.org
This paper investigates the problem of automatically annotating resources with NP
coreference information using a parallel corpus, English-Romanian, in order to transfer …

[PDF][PDF] Knowledge-lean projection of coreference chains across languages

Y Grishina, M Stede - Proceedings of the Eighth Workshop on …, 2015‏ - aclanthology.org
Common technologies for automatic coreference resolution require either a language-
specific rule set or large collections of manually annotated data, which is typically limited to …