The FRENK datasets of socially unacceptable discourse in Slovene and English

N Ljubešić, D Fišer, T Erjavec - … , September 11–13, 2019, Proceedings 22, 2019 - Springer
In this paper we present datasets of Facebook comment threads to mainstream media posts
in Slovene and English developed inside the Slovene national project FRENK (the acronym …

Analysing terminology translation errors in statistical and neural machine translation

R Haque, M Hasanuzzaman, A Way - Machine Translation, 2020 - Springer
Terminology translation plays a critical role in domain-specific machine translation (MT).
Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past …

[PDF][PDF] The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics

B QasemiZadeh, S Handschuh - Proceedings of the 4th …, 2014 - aclanthology.org
This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification
of terms from literature in the domain of computational linguistics. The dataset is derived …

Orchestrating NLP services for the legal domain

J Moreno-Schneider, G Rehm… - arxiv preprint arxiv …, 2020 - arxiv.org
Legal technology is currently receiving a lot of attention from various angles. In this
contribution we describe the main technical components of a system that is currently under …

[PDF][PDF] Extracting bilingual terminologies from comparable corpora

A Aker, ML Paramita, R Gaizauskas - Proceedings of the 51st …, 2013 - aclanthology.org
In this paper we present a method for extracting bilingual terminologies from comparable
corpora. In our approach we treat bilingual term extraction as a classification problem. For …

A graph-based approach to topic clustering for online comments to news

A Aker, E Kurtic, AR Balamurali, M Paramita… - Advances in Information …, 2016 - Springer
This paper investigates graph-based approaches to labeled topic clustering of reader
comments in online news. For graph-based clustering we propose a linear regression model …

TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction

R Haque, S Penkale, A Way - Language Resources and Evaluation, 2018 - Springer
Bilingual termbanks are important for many natural language processing applications,
especially in translation workflows in industrial settings. In this paper, we apply a log …

Enhancing statistical machine translation with bilingual terminology in a cat environment

M Arcan, M Turchi, S Topelli… - Proceedings of the 11th …, 2014 - aclanthology.org
In this paper, we address the problem of extracting and integrating bilingual terminology into
a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool …

Leveraging bilingual terminology to improve machine translation in a CAT environment

M Arcan, M Turchi, S Tonelli… - Natural Language …, 2017 - cambridge.org
This work focuses on the extraction and integration of automatically aligned bilingual
terminology into a Statistical Machine Translation (SMT) system in a Computer Aided …

The KAS corpus of Slovenian academic writing

T Erjavec, D Fišer, N Ljubešić - Language Resources and Evaluation, 2021 - Springer
The paper presents the KAS corpus of Slovenian academic writing, which consists of almost
65,000 BA/B. Sc., 16,000 MA/M. Sc. and 1600 Ph. D. theses (5 million pages or 1.7 billion …