A global patent dataset of bioeconomy-related inventions

L Kriesch, S Losacker - Scientific Data, 2024 - nature.com
Many governments worldwide have proposed transitioning from a fossil-based economy to a
bioeconomy to address climate change, resource depletion, and other environmental …

On the limitations of simulating active learning

K Margatina, N Aletras - arxiv preprint arxiv:2305.13342, 2023 - arxiv.org
Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects
informative unlabeled data for human annotation, aiming to improve over random sampling …

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning

HR Kirk, B Vidgen, SA Hale - arxiv preprint arxiv:2209.10193, 2022 - arxiv.org
Annotating abusive language is expensive, logistically complex and creates a risk of
psychological harm. However, most machine learning research has prioritized maximizing …

A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice

HY Huang, Y Yang, Z Zhang, S Lee, Y Wu - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) continue to evolve, understanding and quantifying the
uncertainty in their predictions is critical for enhancing application credibility. However, the …

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis

D Kartchner, I Al-Hussaini, H Turner, J Deng… - Proceedings of the 46th …, 2023 - dl.acm.org
This work presents a new, original document classification dataset, BioSift, to expedite the
initial selection and labeling of studies for drug repurposing. The dataset consists of 10,000 …

ALToolbox: a set of tools for active learning annotation of natural language texts

A Tsvigun, L Sanochkin, D Larionov… - Proceedings of the …, 2022 - aclanthology.org
We present ALToolbox–an open-source framework for active learning (AL) annotation in
natural language processing. Currently, the framework supports text classification, sequence …

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

L Zhang, J Wu, D Zhou, G Xu - arxiv preprint arxiv:2403.01165, 2024 - arxiv.org
Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-
shot learning through prompting methods, supervised training is still necessary for complex …

Active learning for identifying disaster-related tweets: A comparison with keyword filtering and generic fine-tuning

D Hanny, S Schmidt, B Resch - Intelligent Systems Conference, 2024 - Springer
Abstract Information from social media can provide essential information for emergency
response during natural disasters in near real-time. However, it is a difficult task to identify …

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

J Gonsior, C Falkenberg, S Magino, A Reusch… - arxiv preprint arxiv …, 2022 - arxiv.org
Despite achieving state-of-the-art results in nearly all Natural Language Processing
applications, fine-tuning Transformer-based language models still requires a significant …

ALPET: Active Few-shot Learning for Citation Worthiness Detection in Low-Resource Wikipedia Languages

A Halitaj, A Zubiaga - arxiv preprint arxiv:2502.03292, 2025 - arxiv.org
Citation Worthiness Detection (CWD) consists in determining which sentences, within an
article or collection, should be backed up with a citation to validate the information it …