On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

W Cunha, V Mangaravite, C Gomes, S Canuto… - Information Processing …, 2021 - Elsevier
This article brings two major contributions. First, we present the results of a critical analysis
of recent scientific articles about neural and non-neural approaches and representations for …

Latent semantic indexing (LSI) and hierarchical dirichlet process (HDP) models on news data

AR Lubis, S Prayudani, Y Fatmi… - 2022 5th International …, 2022 - ieeexplore.ieee.org
News has become a very important need in modern society. Almost every level of society
needs information such as news. Online news gets the attention of writers because there are …

A novel two-step fine-tuning pipeline for cold-start active learning in text classification tasks

F Belém, W Cunha, C França, C Andrade… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the first work to investigate the effectiveness of BERT-based contextual embeddings
in active learning (AL) tasks on cold-start scenarios, where traditional fine-tuning is …

Exploiting contextual embeddings in hierarchical topic modeling and investigating the limits of the current evaluation metrics

F Viegas, A Pereira, W Cunha, C França… - Computational …, 2024 - direct.mit.edu
We investigate two essential challenges in the context of Hierarchical Topic Modeling (HTM)–
(i) the impact of data representation and (ii) topic evaluation. The data representation directly …

Evaluating topic modeling pre-processing pipelines for portuguese texts

APDS Júnior, P Cecilio, F Viegas, W Cunha… - Proceedings of the …, 2022 - dl.acm.org
Topic Modeling (TM) is among the most exploited approaches to extracting and organizing
information from large amounts of data. Basically, these approaches aim to find semantic …

[PDF][PDF] Semantic N-Gram Topic Modeling.

P Kherwa, P Bansal - EAI Endorsed Transactions on Scalable …, 2020 - researchgate.net
In this paper a novel approach for effective topic modeling is presented. The approach is
different from traditional vector space model-based topic modeling, where the Bag of Words …

Combining representations for effective citation classification

CMV de Andrade, MA Gonçalves - Proceedings of the 8th …, 2020 - aclanthology.org
In this paper, we describe our participation in two tasks organized by WOSP 2020,
consisting of classifying the context of a citation (eg, background, motivational, extension) …

PATopics: An automatic framework to extract useful information from pharmaceutical patents documents

P Cecilio, A Perreira, JSR Viegas, W Cunha… - arxiv preprint arxiv …, 2024 - arxiv.org
Pharmaceutical patents play an important role by protecting the innovation from copies but
also drive researchers to innovate, create new products, and promote disruptive innovations …

Fusing parallel social contexts within flexible-order proximity for microblog topic detection

H Liu, R He, H Wang, B Wang - Proceedings of the 29th ACM …, 2020 - dl.acm.org
Topic detection in social media is a challenging task due to large-scale short, noisy and
informal nature of messages. Most existing methods only consider textual content or …

Novel semantic tagging detection algorithms based non-negative matrix factorization

FS Gadelrab, MH Haggag, RA Sadek - SN Applied Sciences, 2020 - Springer
The tagging aims to address a challenge to search relevant text-documents given a set of
tags. In addition, the tag-based approaches received a wide attention as a possible solution …