Unlike “Likely”,“Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs

P Lerner, F Yvon - … of the 31st International Conference on …, 2025 - aclanthology.org
Abstract Large Language Models (LLMs) rely on subword vocabularies to process and
generate text. However, because subwords are marked as initial-or intra-word, we find that …

Towards the Machine Translation of Scientific Neologisms

P Lerner, F Yvon - 2025 - hal.science
Scientific research continually discovers and invents new concepts, which are then referred
to by new terms, neologisms, or neonyms in this context. As the vast majority of publications …

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

A Naik, K Zhang, N Robinson, A Mysore, C Marr… - arxiv preprint arxiv …, 2024 - arxiv.org
Historical linguists have long written a kind of incompletely formalized''program''that converts
reconstructed words in an ancestor language into words in one of its attested descendants …

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

A Naik, D Agrawal, H Sng, C Marr, K Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
Historical linguists have long written" programs" that convert reconstructed words in an
ancestor language into their attested descendants via ordered string rewrite functions …

Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks

YS Chuang, AR Sarkar, YC Hsu, N Mohammed… - arxiv preprint arxiv …, 2024 - arxiv.org
This study examines integrating EHRs and NLP with large language models (LLMs) to
improve healthcare data management and patient care. It focuses on using advanced …