Focused transformer: Contrastive training for context scaling

S Tworkowski, K Staniszewski… - Advances in …, 2024 - proceedings.neurips.cc
Large language models have an exceptional capability to incorporate new information in a
contextual manner. However, the full potential of such an approach is often restrained due to …

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - ar** the big data paradigm with compact transformers
A Hassani, S Walton, N Shah, A Abuduweili… - arxiv preprint arxiv …, 2021 - arxiv.org
With the rise of Transformers as the standard for language processing, and their
advancements in computer vision, there has been a corresponding growth in parameter size …

Multitask prompted training enables zero-shot task generalization

V Sanh, A Webson, C Raffel, SH Bach… - arxiv preprint arxiv …, 2021 - arxiv.org
Large language models have recently been shown to attain reasonable zero-shot
generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that …

[PDF][PDF] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N Reimers - arxiv preprint arxiv:1908.10084, 2019 - fq.pkwyx.com
BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art
performance on sentence-pair regression tasks like semantic textual similarity (STS) …

Bae: Bert-based adversarial examples for text classification

S Garg, G Ramakrishnan - arxiv preprint arxiv:2004.01970, 2020 - arxiv.org
Modern text classification models are susceptible to adversarial examples, perturbed
versions of the original text indiscernible by humans which get misclassified by the model …