Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

[HTML][HTML] Pre-trained language models and their applications

H Wang, J Li, H Wu, E Hovy, Y Sun - Engineering, 2023 - Elsevier
Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

BioGPT: generative pre-trained transformer for biomedical text generation and mining

R Luo, L Sun, Y **a, T Qin, S Zhang… - Briefings in …, 2022 - academic.oup.com
Pre-trained language models have attracted increasing attention in the biomedical domain,
inspired by their great success in the general natural language domain. Among the two main …

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Impact of code language models on automated program repair

N Jiang, K Liu, T Lutellier, L Tan - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Automated program repair (APR) aims to help developers improve software reliability by
generating patches for buggy programs. Although many code language models (CLM) are …

Learning inverse folding from millions of predicted structures

C Hsu, R Verkuil, J Liu, Z Lin, B Hie… - International …, 2022 - proceedings.mlr.press
We consider the problem of predicting a protein sequence from its backbone atom
coordinates. Machine learning approaches to this problem to date have been limited by the …

Data2vec: A general framework for self-supervised learning in speech, vision and language

A Baevski, WN Hsu, Q Xu, A Babu… - … on Machine Learning, 2022 - proceedings.mlr.press
While the general idea of self-supervised learning is identical across modalities, the actual
algorithms and objectives differ widely because they were developed with a single modality …

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …