When Is Multilinguality a Curse? Language Modeling for 250 High-and Low-Resource Languages TA Chang, C Arnett, Z Tu, BK Bergen arXiv preprint arXiv:2311.09205, 2023 | 20 | 2023 |
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models JA Michaelov, C Arnett, TA Chang, BK Bergen arXiv preprint arXiv:2311.09194, 2023 | 16 | 2023 |
Goldfish: Monolingual Language Models for 350 Languages TA Chang, C Arnett, Z Tu, BK Bergen arXiv preprint arXiv:2408.10441, 2024 | 3 | 2024 |
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement C Arnett, PD Rivière, TA Chang, S Trott arXiv preprint arXiv:2403.13754, 2024 | 3 | 2024 |
Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models C Arnett, TA Chang, JA Michaelov, BK Bergen arXiv preprint arXiv:2310.07929, 2023 | 2 | 2023 |
Why do language models perform worse for morphologically complex languages? C Arnett, BK Bergen arXiv preprint arXiv:2411.14198, 2024 | 1 | 2024 |
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics JA Michaelov, C Arnett, BK Bergen arXiv preprint arXiv:2404.19178, 2024 | 1 | 2024 |
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages C Arnett, TA Chang, BK Bergen arXiv preprint arXiv:2403.00686, 2024 | 1 | 2024 |
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training P Chizhov, C Arnett, E Korotkova, I Yamshchikov Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024 | | 2024 |
Toxicity of the Commons: Curating Open-Source Pre-Training Data C Arnett, E Jones, IP Yamshchikov, PC Langlais arXiv preprint arXiv:2410.22587, 2024 | | 2024 |
Syntax drives default language selection in bilingual connected speech production. J Quinn, M Goldrick, C Arnett, VS Ferreira, TH Gollan Journal of Experimental Psychology: Learning, Memory, and Cognition, 2024 | | 2024 |