Machine learning methods for small data challenges in molecular science

B Dou, Z Zhu, E Merkurjev, L Ke, L Chen… - Chemical …, 2023 - ACS Publications
Small data are often used in scientific and engineering research due to the presence of
various constraints, such as time, cost, ethics, privacy, security, and technical limitations in …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arxiv preprint arxiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

Making sense of citizens' input through artificial intelligence: a review of methods for computational text analysis to support the evaluation of contributions in public …

J Romberg, T Escher - Digital Government: Research and Practice, 2024 - dl.acm.org
Public sector institutions that consult citizens to inform decision-making face the challenge of
evaluating the contributions made by citizens. This evaluation has important democratic …

Large language models as annotators: Enhancing generalization of nlp models at minimal cost

P Bansal, A Sharma - arxiv preprint arxiv:2306.15766, 2023 - arxiv.org
State-of-the-art supervised NLP models achieve high accuracy but are also susceptible to
failures on inputs from low-data regimes, such as domains that are not represented in …

Active instruction tuning: Improving cross-task generalization by training on prompt sensitive tasks

PN Kung, F Yin, D Wu, KW Chang, N Peng - arxiv preprint arxiv …, 2023 - arxiv.org
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large
language models (LLMs) on a massive amount of diverse tasks with instructions. However …

Which examples to annotate for in-context learning? towards effective and efficient selection

C Mavromatis, B Srinivasan, Z Shen, J Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is
efficient as it does not require any parameter updates to the trained LLM, but only few …

Interactive multi-fidelity learning for cost-effective adaptation of language model with sparse human supervision

J Zhang, Z Li, K Das, S Kumar - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have demonstrated remarkable capabilities in various tasks.
However, their suitability for domain-specific tasks, is limited due to their immense scale at …

Videocot: A video chain-of-thought dataset with active annotation tool

Y Wang, Y Zeng, J Zheng, X **ng, J Xu, X Xu - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) are flourishing, but mainly focus on images with
less attention than videos, especially in sub-fields such as prompt engineering, video chain …

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

On the limitations of simulating active learning

K Margatina, N Aletras - arxiv preprint arxiv:2305.13342, 2023 - arxiv.org
Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects
informative unlabeled data for human annotation, aiming to improve over random sampling …