xLSTM: Extended Long Short-Term Memory

M Beck, K Pöppel, M Spanring, A Auer… - arxiv preprint arxiv …, 2024 - arxiv.org
In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners' Perspective

X Yu, Z Zhang, F Niu, X Hu, X **a… - Proceedings of the 39th …, 2024 - dl.acm.org
Large Language Models (LLMs) have demonstrated remarkable performance in various
application domains, largely due to their self-supervised pre-training on extensive high …

Data contamination report from the 2024 CONDA shared task

O Sainz, I García-Ferrero, A Jacovi, JA Campos… - arxiv preprint arxiv …, 2024 - arxiv.org
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of
data contamination in natural language processing, where data contamination is understood …

Beyond perplexity: Multi-dimensional safety evaluation of llm compression

Z Xu, A Gupta, T Li, O Bentham, V Srikumar - arxiv preprint arxiv …, 2024 - arxiv.org
Increasingly, model compression techniques enable large language models (LLMs) to be
deployed in real-world applications. As a result of this momentum towards local deployment …

How to Synthesize Text Data without Model Collapse?

X Zhu, D Cheng, H Li, K Zhang, E Hua, X Lv… - arxiv preprint arxiv …, 2024 - arxiv.org
Model collapse in synthetic data indicates that iterative training on self-generated data leads
to a gradual decline in performance. With the proliferation of AI models, synthetic data will …