Rolling out text categorization for language learning assessment supported by language technology

A Branco, J Rodrigues, F Costa, J Silva… - … Processing of the …, 2014 - Springer
This paper is concerned with a tool that supports human experts in their task of classifying
text excerpts suitable to be used in quizzes for learning materials and as items of exams that …

Co** with highly imbalanced datasets: A case study with definition extraction in a multilingual setting

R Del Gaudio, G Batista, A Branco - Natural Language Engineering, 2014 - cambridge.org
This paper addresses the task of automatic extraction of definitions by thoroughly exploring
an approach that solely relies on machine learning techniques, and by focusing on the issue …

Assessing automatic text classification for interactive language learning

A Branco, J Rodrigues, F Costa… - … on Information Society …, 2014 - ieeexplore.ieee.org
In this paper we discuss the design options for a language processing tool that supports
humans in their task of classifying text excerpts according to CEFR levels of language …

Universal grammatical dependencies for Portuguese with CINTIL data, LX processing and CLARIN support

A Branco, J Silva, L Gomes… - Proceedings of the …, 2022 - aclanthology.org
The grammatical framework for the map** between linguistic form and meaning
representation known as Universal Dependencies relies on a non-constituency syntactic …

Neural text categorization with transformers for learning portuguese as a second language

R Santos, J Rodrigues, A Branco, R Vaz - Progress in Artificial Intelligence …, 2021 - Springer
We report on the application of a neural network based approach to the problem of
automatically categorizing texts according to their proficiency levels and suitability for …

[PDF][PDF] LemPORT: a high-accuracy cross-platform lemmatizer for portuguese

R Rodrigues, H Gonçalo Oliveira… - 3rd Symposium on …, 2014 - drops.dagstuhl.de
Although lemmatization is a very common subtask in many natural language processing
tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted …

The BDCamões collection of Portuguese literary documents: a research resource for digital humanities and language technology

S Grilo, M Bolrinha, J Silva, R Vaz… - Proceedings of the …, 2020 - aclanthology.org
This paper presents the BDCamões Collection of Portuguese Literary Documents, a new
corpus of literary texts written in Portuguese that in its inaugural version includes close to 4 …

Combining a double clustering approach with sentence simplification to produce highly informative multi-document summaries

SB Silveira, A Branco - … on Information Reuse & Integration (IRI), 2012 - ieeexplore.ieee.org
This paper presents a method for extractive multi-document summarization that explores a
two-phase clustering approach that, combined with a sentence simplification procedure …

Historical Portuguese corpora: a survey

TF Osório, H Lopes Cardoso - Language Resources and Evaluation, 2024 - Springer
This survey aims to thoroughly examine and evaluate the current landscape of electronic
corpora in historical Portuguese. This is achieved through a comprehensive analysis of …

From greatest simplicity to full power: Research-Infrastructure-as-a-Service for language science and technology

L Gomes, A Branco, J Silva, R Branco - Language Resources and …, 2024 - Springer
While language processing services are key assets for the science and technology of
language, the possible ways under which they may be made available to the widest range of …