AutoML: A survey of the state-of-the-art
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …
such as image recognition, object detection, and language modeling. However, building a …
[PDF][PDF] Language models are unsupervised multitask learners
Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised …
reading comprehension, and summarization, are typically approached with supervised …
[HTML][HTML] Augmenting organizational decision-making with deep learning algorithms: Principles, promises, and challenges
The current expansion of theory and research on artificial intelligence in management and
organization studies has revitalized the theory and research on decision-making in …
organization studies has revitalized the theory and research on decision-making in …
{Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention}
Interacting with humans through multi-turn conversations is a fundamental feature of large
language models (LLMs). However, existing LLM serving engines executing multi-turn …
language models (LLMs). However, existing LLM serving engines executing multi-turn …
Big code!= big vocabulary: Open-vocabulary models for source code
Statistical language modeling techniques have successfully been applied to large source
code corpora, yielding a variety of new software development tools, such as tools for code …
code corpora, yielding a variety of new software development tools, such as tools for code …
BPE-dropout: Simple and effective subword regularization
Subword segmentation is widely used to address the open vocabulary problem in machine
translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE) …
translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE) …
Charformer: Fast character transformers via gradient-based subword tokenization
State-of-the-art models in natural language processing rely on separate rigid subword
tokenization algorithms, which limit their generalization ability and adaptation to new …
tokenization algorithms, which limit their generalization ability and adaptation to new …
Barack's wife Hillary: Using knowledge-graphs for fact-aware language modeling
Modeling human language requires the ability to not only generate fluent text but also
encode factual knowledge. However, traditional language models are only capable of …
encode factual knowledge. However, traditional language models are only capable of …
Representation degeneration problem in training natural language generation models
We study an interesting problem in training neural network-based models for natural
language generation tasks, which we call the\emph {representation degeneration problem} …
language generation tasks, which we call the\emph {representation degeneration problem} …
Event knowledge in large language models: the gap between the impossible and the unlikely
Word co‐occurrence patterns in language corpora contain a surprising amount of
conceptual knowledge. Large language models (LLMs), trained to predict words in context …
conceptual knowledge. Large language models (LLMs), trained to predict words in context …