An empirical survey on long document summarization: Datasets, models, and metrics
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …
format to detail out important issues and complicated subjects that require extra attention. An …
Voicecraft: Zero-shot speech editing and text-to-speech in the wild
We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-
of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on …
of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on …
SpiRit-LM: Interleaved Spoken and Written Language Model
We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …
speech. Our model is based on a 7B pretrained text language model that we extend to the …
Summscreen: A dataset for abstractive screenplay summarization
We introduce SummScreen, a summarization dataset comprised of pairs of TV series
transcripts and human written recaps. The dataset provides a challenging testbed for …
transcripts and human written recaps. The dataset provides a challenging testbed for …
Expresso: A benchmark and analysis of discrete expressive speech resynthesis
Recent work has shown that it is possible to resynthesize high-quality speech based, not on
text, but on low bitrate discrete units that have been learned in a self-supervised fashion and …
text, but on low bitrate discrete units that have been learned in a self-supervised fashion and …
Building real-world meeting summarization systems using large language models: A practical perspective
This paper studies how to effectively build meeting summarization systems for real-world
usage using large language models (LLMs). For this purpose, we conduct an extensive …
usage using large language models (LLMs). For this purpose, we conduct an extensive …
Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …
speech and natural language processing tasks. However, most previous pre-trained models …
MeetingBank: A benchmark dataset for meeting summarization
As the number of recorded meetings increases, it becomes increasingly important to utilize
summarization technology to create useful summaries of these recordings. However, there is …
summarization technology to create useful summaries of these recordings. However, there is …
Long-span summarization via local attention and content selection
Transformer-based models have achieved state-of-the-art results in a wide range of natural
language processing (NLP) tasks including document summarization. Typically these …
language processing (NLP) tasks including document summarization. Typically these …
How might we create better benchmarks for speech recognition?
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …
due to recent significant quality improvements. However, as recent work indicates, even …