Latxa: An open language model and evaluation suite for Basque

J Etxaniz, O Sainz, N Miguel, I Aldabe… - Proceedings of the …, 2024 - aclanthology.org
We introduce Latxa, a family of large language models for Basque ranging from 7 to 70
billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new …

A survey of large language models for european languages

W Ali, S Pyysalo - arxiv preprint arxiv:2408.15040, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention due to their high
performance on a wide range of natural language tasks since the release of ChatGPT. The …

Evaluating Morphological Compositional Generalization in Large Language Models

M Ismayilzada, D Circi, J Sälevä, H Sirin… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated significant progress in various natural
language generation and understanding tasks. However, their linguistic generalization …

Overview of ELOQUENT 2024—shared tasks for evaluating generative language model quality

J Karlgren, L Dürlich, E Gogoulou, L Guillou… - … Conference of the Cross …, 2024 - Springer
ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative
language models. ELOQUENT aims to apply high-level quality criteria, grounded in …

LiCoEval: Evaluating LLMs on License Compliance in Code Generation

W Xu, K Gao, H He, M Zhou - arxiv preprint arxiv:2408.02487, 2024 - arxiv.org
Recent advances in Large Language Models (LLMs) have revolutionized code generation,
leading to widespread adoption of AI coding tools by developers. However, LLMs can …

LLMs for Extremely Low-Resource Finno-Ugric Languages

T Purason, HA Kuulmets, M Fishel - arxiv preprint arxiv:2410.18902, 2024 - arxiv.org
The advancement of large language models (LLMs) has predominantly focused on high-
resource languages, leaving low-resource languages, such as those in the Finno-Ugric …

OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches

J Kanerva, C Ledins, S Käpyaho, F Ginter - arxiv preprint arxiv …, 2025 - arxiv.org
Optical Character Recognition (OCR) systems often introduce errors when transcribing
historical documents, leaving room for post-correction to improve text quality. This study …

GECKO: Generative Language Model for English, Code and Korean

S Oh, D Kim - arxiv preprint arxiv:2405.15640, 2024 - arxiv.org
We introduce GECKO, a bilingual large language model (LLM) optimized for Korean and
English, along with programming languages. GECKO is pretrained on the balanced, high …

Large Language Model Agent as Insurance Law Assistant

A Ingher - 2024 - aaltodoc.aalto.fi
Traffic insurance law can be a complex domain for ordinary individuals to grasp, thus
affecting its accessibility. Although the traffic insurance law itself and precedents are publicly …

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

M Merler - 2024 - aaltodoc.aalto.fi
In this Thesis we consider Code World Models, world models generated by a Large
Language Model (LLM) in the form of Python code for offline model-based Reinforcement …