A global patent dataset of bioeconomy-related inventions
Many governments worldwide have proposed transitioning from a fossil-based economy to a
bioeconomy to address climate change, resource depletion, and other environmental …
bioeconomy to address climate change, resource depletion, and other environmental …
On the limitations of simulating active learning
Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects
informative unlabeled data for human annotation, aiming to improve over random sampling …
informative unlabeled data for human annotation, aiming to improve over random sampling …
Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
Annotating abusive language is expensive, logistically complex and creates a risk of
psychological harm. However, most machine learning research has prioritized maximizing …
psychological harm. However, most machine learning research has prioritized maximizing …
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
As large language models (LLMs) continue to evolve, understanding and quantifying the
uncertainty in their predictions is critical for enhancing application credibility. However, the …
uncertainty in their predictions is critical for enhancing application credibility. However, the …
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis
This work presents a new, original document classification dataset, BioSift, to expedite the
initial selection and labeling of studies for drug repurposing. The dataset consists of 10,000 …
initial selection and labeling of studies for drug repurposing. The dataset consists of 10,000 …
ALToolbox: a set of tools for active learning annotation of natural language texts
We present ALToolbox–an open-source framework for active learning (AL) annotation in
natural language processing. Currently, the framework supports text classification, sequence …
natural language processing. Currently, the framework supports text classification, sequence …
STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models
Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-
shot learning through prompting methods, supervised training is still necessary for complex …
shot learning through prompting methods, supervised training is still necessary for complex …
Active learning for identifying disaster-related tweets: A comparison with keyword filtering and generic fine-tuning
Abstract Information from social media can provide essential information for emergency
response during natural disasters in near real-time. However, it is a difficult task to identify …
response during natural disasters in near real-time. However, it is a difficult task to identify …
To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models
Despite achieving state-of-the-art results in nearly all Natural Language Processing
applications, fine-tuning Transformer-based language models still requires a significant …
applications, fine-tuning Transformer-based language models still requires a significant …
ALPET: Active Few-shot Learning for Citation Worthiness Detection in Low-Resource Wikipedia Languages
Citation Worthiness Detection (CWD) consists in determining which sentences, within an
article or collection, should be backed up with a citation to validate the information it …
article or collection, should be backed up with a citation to validate the information it …