[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

A Warstadt, A Mueller, L Choshen… - … of the BabyLM …, 2023 - research-collection.ethz.ch
Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …

Quantifying the redundancy between prosody and text

L Wolf, T Pimentel, E Fedorenko, R Cotterell… - arxiv preprint arxiv …, 2023 - arxiv.org
Prosody--the suprasegmental component of speech, including pitch, loudness, and tempo--
carries critical aspects of meaning. However, the relationship between the information …