Computational sociophonetics using automatic speech recognition

R Coto‐Solano - Language and Linguistics Compass, 2022‏ - Wiley Online Library
Recent years have seen numerous advances in natural language processing that can help
accelerate sociophonetic work. These include software to align speech recordings with their …

Automatic speech recognition for supporting endangered language documentation

E Prud'hommeaux, R Jimerson, R Hatcher… - 2021‏ - scholarspace.manoa.hawaii.edu
Generating accurate word-level transcripts of recorded speech for language documentation
is difficult and time-consuming, even for skilled speakers of the target language. Automatic …

" It's how you do things that matters": Attending to Process to Better Serve Indigenous Communities with Language Technologies

N Cooper, C Heldreth, B Hutchinson - arxiv preprint arxiv:2402.02639, 2024‏ - arxiv.org
Indigenous languages are historically under-served by Natural Language Processing (NLP)
technologies, but this is changing for some languages with the recent scaling of large …

Sparse transcription

S Bird - Computational Linguistics, 2021‏ - direct.mit.edu
The transcription bottleneck is often cited as a major obstacle for efforts to document the
world's endangered languages and supply them with language technologies. One solution …

Writing system and speaker metadata for 2,800+ language varieties

D van Esch, T Lucassen, S Ruder… - Proceedings of the …, 2022‏ - aclanthology.org
We describe an open-source dataset providing metadata for about 2,800 language varieties
used in the world today. Specifically, the dataset provides the attested writing system (s) for …

Recent advances in technologies for resource creation and mobilization in language documentation

AL Berez-Kroeker, S Gabber… - Annual Review of …, 2023‏ - annualreviews.org
Language documentation as a subfield of linguistics has arisen over the past roughly two
and a half decades more or less simultaneously with the widespread availability of …

Development of automatic speech recognition for the documentation of Cook Islands Māori

R Coto-Solano, SA Nicholas, S Datta, V Quint, P Wills… - 2022‏ - mro.massey.ac.nz
This paper describes the process of data processing and training of an automatic speech
recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by …

[PDF][PDF] Balancing Social Impact, Opportunities, and Ethical Constraints of Using AI in the Documentation and Vitalization of Indigenous Languages.

CS Pinhanez, PR Cavalin, M Vasconcelos, J Nogima - IJCAI, 2023‏ - ijcai.org
In this paper we discuss how AI can contribute to support the documentation and vitalization
of Indigenous languages and how that involves a delicate balancing of ensuring social …

Learning from failure: Data capture in an australian aboriginal community

É Le Ferrand, S Bird, L Besacier - … of the 60th Annual Meeting of …, 2022‏ - aclanthology.org
Most low resource language technology development is premised on the need to collect
data for training statistical models. When we follow the typical process of recording and …

Enabling interactive transcription in an indigenous community

ÉL Ferrand, S Bird, L Besacier - arxiv preprint arxiv:2011.06198, 2020‏ - arxiv.org
We propose a novel transcription workflow which combines spoken term detection and
human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero …