Transformers learn shortcuts to automata
Algorithmic reasoning requires capabilities which are most naturally understood through
recurrent models of computation, like the Turing machine. However, Transformer models …
recurrent models of computation, like the Turing machine. However, Transformer models …
Open-world story generation with structured knowledge enhancement: A comprehensive survey
Storytelling and narrative are fundamental to human experience, intertwined with our social
and cultural engagement. As such, researchers have long attempted to create systems that …
and cultural engagement. As such, researchers have long attempted to create systems that …
[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding
This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
Medusa: Simple llm inference acceleration framework with multiple decoding heads
The inference process in Large Language Models (LLMs) is often limited due to the absence
of parallelism in the auto-regressive decoding process, resulting in most operations being …
of parallelism in the auto-regressive decoding process, resulting in most operations being …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
Boosting consistency in story visualization with rich-contextual conditional diffusion models
Recent research showcases the considerable potential of conditional diffusion models for
generating consistent stories. However, current methods, which predominantly generate …
generating consistent stories. However, current methods, which predominantly generate …
Retrosynthesis prediction with an iterative string editing model
Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial
intelligence (AI) is increasingly employed to expedite the process. However, existing …
intelligence (AI) is increasingly employed to expedite the process. However, existing …
Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise
In this paper, we introduce a novel dIffusion language modEl pre-training framework for text
generation, which we call GENIE. GENIE is a large-scale pre-trained diffusion language …
generation, which we call GENIE. GENIE is a large-scale pre-trained diffusion language …
Accelerating transformer inference for translation via parallel decoding
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT).
The community proposed specific network architectures and learning-based methods to …
The community proposed specific network architectures and learning-based methods to …
Model-enhanced vector index
Embedding-based retrieval methods construct vector indices to search for document
representations that are most similar to the query representations. They are widely used in …
representations that are most similar to the query representations. They are widely used in …