AutoML: A survey of the state-of-the-art

X He, K Zhao, X Chu - Knowledge-based systems, 2021 - Elsevier
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …

[PDF][PDF] Language models are unsupervised multitask learners

A Radford, J Wu, R Child, D Luan… - OpenAI …, 2019 - insightcivic.s3.us-east-1.amazonaws …
Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised …

[HTML][HTML] Augmenting organizational decision-making with deep learning algorithms: Principles, promises, and challenges

YR Shrestha, V Krishna, G von Krogh - Journal of Business Research, 2021 - Elsevier
The current expansion of theory and research on artificial intelligence in management and
organization studies has revitalized the theory and research on decision-making in …

{Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention}

B Gao, Z He, P Sharma, Q Kang, D Jevdjic… - 2024 USENIX Annual …, 2024 - usenix.org
Interacting with humans through multi-turn conversations is a fundamental feature of large
language models (LLMs). However, existing LLM serving engines executing multi-turn …

Big code!= big vocabulary: Open-vocabulary models for source code

RM Karampatsis, H Babii, R Robbes, C Sutton… - Proceedings of the …, 2020 - dl.acm.org
Statistical language modeling techniques have successfully been applied to large source
code corpora, yielding a variety of new software development tools, such as tools for code …

BPE-dropout: Simple and effective subword regularization

I Provilkov, D Emelianenko, E Voita - arxiv preprint arxiv:1910.13267, 2019 - arxiv.org
Subword segmentation is widely used to address the open vocabulary problem in machine
translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE) …

Charformer: Fast character transformers via gradient-based subword tokenization

Y Tay, VQ Tran, S Ruder, J Gupta, HW Chung… - arxiv preprint arxiv …, 2021 - arxiv.org
State-of-the-art models in natural language processing rely on separate rigid subword
tokenization algorithms, which limit their generalization ability and adaptation to new …

Barack's wife Hillary: Using knowledge-graphs for fact-aware language modeling

RL Logan IV, NF Liu, ME Peters, M Gardner… - arxiv preprint arxiv …, 2019 - arxiv.org
Modeling human language requires the ability to not only generate fluent text but also
encode factual knowledge. However, traditional language models are only capable of …

Representation degeneration problem in training natural language generation models

J Gao, D He, X Tan, T Qin, L Wang, TY Liu - arxiv preprint arxiv …, 2019 - arxiv.org
We study an interesting problem in training neural network-based models for natural
language generation tasks, which we call the\emph {representation degeneration problem} …

Event knowledge in large language models: the gap between the impossible and the unlikely

C Kauf, AA Ivanova, G Rambelli, E Chersoni… - Cognitive …, 2023 - Wiley Online Library
Word co‐occurrence patterns in language corpora contain a surprising amount of
conceptual knowledge. Large language models (LLMs), trained to predict words in context …