Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arxiv preprint arxiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

Algorithms and theory for multiple-source adaptation

J Hoffman, M Mohri, N Zhang - Advances in neural …, 2018 - proceedings.neurips.cc
We present a number of novel contributions to the multiple-source adaptation problem. We
derive new normalized solutions with strong theoretical guarantees for the cross-entropy …

Validating large language models with relm

M Kuchnik, V Smith… - Proceedings of Machine …, 2023 - proceedings.mlsys.org
Although large language models (LLMs) have been touted for their ability to generate
natural-sounding text, there are growing concerns around possible negative effects of LLMs …

Neural models of text normalization for speech applications

H Zhang, R Sproat, AH Ng, F Stahlberg… - Computational …, 2019 - direct.mit.edu
Abstract Machine learning, including neural network techniques, have been applied to
virtually every domain in natural language processing. One problem that has been …

Hierarchical structure guides rapid linguistic predictions during naturalistic listening

JR Brennan, JT Hale - PloS one, 2019 - journals.plos.org
The grammar, or syntax, of human language is typically understood in terms of abstract
hierarchical structures. However, theories of language processing that emphasize …

RNN approaches to text normalization: A challenge

R Sproat, N Jaitly - arxiv preprint arxiv:1611.00068, 2016 - arxiv.org
This paper presents a challenge to the community: given a large corpus of written text
aligned to its normalized spoken form, train an RNN to learn the correct normalization …

Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework

JR Novak, N Minematsu, K Hirose - Natural Language Engineering, 2016 - cambridge.org
This paper provides an analysis of several practical issues related to the theory and
implementation of Grapheme-to-Phoneme (G2P) conversion systems utilizing the Weighted …

The SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion

K Gorman, LFE Ashby, A Goyzueta… - Proceedings of the …, 2020 - aclanthology.org
We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual
grapheme-to-phoneme conversion. Participants were asked to submit systems which take in …

The Kestrel TTS text normalization system

P Ebden, R Sproat - Natural Language Engineering, 2015 - cambridge.org
This paper describes the Kestrel text normalization system, a component of the Google text-
to-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars …

Recognition and Information Extraction in Historical Handwritten Tables: Toward Understanding Early Century Paris Census

T Constum, N Kempf, T Paquet, P Tranouez… - … Workshop on Document …, 2022 - Springer
We aim to build a vast database (up to 9 million individuals) from the handwritten tabular
nominal census of Paris of 1926, 1931 and 1936, each composed of about 100,000 …