An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Voicecraft: Zero-shot speech editing and text-to-speech in the wild

P Peng, PY Huang, SW Li, A Mohamed… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-
of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on …

SpiRit-LM: Interleaved Spoken and Written Language Model

TA Nguyen, B Muller, B Yu, MR Costa-Jussa… - Transactions of the …, 2025 - direct.mit.edu
We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …

Summscreen: A dataset for abstractive screenplay summarization

M Chen, Z Chu, S Wiseman, K Gimpel - arxiv preprint arxiv:2104.07091, 2021 - arxiv.org
We introduce SummScreen, a summarization dataset comprised of pairs of TV series
transcripts and human written recaps. The dataset provides a challenging testbed for …

Expresso: A benchmark and analysis of discrete expressive speech resynthesis

TA Nguyen, WN Hsu, A d'Avirro, B Shi, I Gat… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent work has shown that it is possible to resynthesize high-quality speech based, not on
text, but on low bitrate discrete units that have been learned in a self-supervised fashion and …

Building real-world meeting summarization systems using large language models: A practical perspective

MTR Laskar, XY Fu, C Chen, SB Tn - arxiv preprint arxiv:2310.19233, 2023 - arxiv.org
This paper studies how to effectively build meeting summarization systems for real-world
usage using large language models (LLMs). For this purpose, we conduct an extensive …

Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

T Yu, H Gao, TE Lin, M Yang, Y Wu, W Ma… - Proceedings of the …, 2023 - aclanthology.org
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …

MeetingBank: A benchmark dataset for meeting summarization

Y Hu, T Ganter, H Deilamsalehy, F Dernoncourt… - arxiv preprint arxiv …, 2023 - arxiv.org
As the number of recorded meetings increases, it becomes increasingly important to utilize
summarization technology to create useful summaries of these recordings. However, there is …

Long-span summarization via local attention and content selection

P Manakul, MJF Gales - arxiv preprint arxiv:2105.03801, 2021 - arxiv.org
Transformer-based models have achieved state-of-the-art results in a wide range of natural
language processing (NLP) tasks including document summarization. Typically these …

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …