- Academic Search

A survey of ai-generated content (aigc)

Y Cao, S Li, Y Liu, Z Yan, Y Dai, P Yu, L Sun - ACM Computing Surveys, 2025 - dl.acm.org

Recently, Artificial Intelligence Generated Content (AIGC) has gained significant attention
from society, especially with the rise of Generative AI (GAI) techniques such as ChatGPT …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel

[Free GPT-4]

[PDF] arxiv.org

A survey on data selection for language models

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org

A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Speichern Zitieren Zitiert von: 73 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

Speichern Zitieren Zitiert von: 180 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] pnas.org

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

RT McCoy, S Yao, D Friedman, MD Hardy… - Proceedings of the …, 2024 - pnas.org

The widespread adoption of large language models (LLMs) makes it important to recognize
their strengths and limitations. We argue that to develop a holistic understanding of these …

Speichern Zitieren Zitiert von: 20 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models

M Deitke, C Clark, S Lee, R Tripathi, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Today's most advanced multimodal models remain proprietary. The strongest open-weight
models rely heavily on synthetic data from proprietary VLMs to achieve good performance …

Speichern Zitieren Zitiert von: 37 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arxiv preprint arxiv …, 2024 - arxiv.org

Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …

Speichern Zitieren Zitiert von: 22 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] hal.science

Consent in crisis: The rapid decline of the ai data commons

S Longpre, R Mahari, A Lee, C Lund, H Oderinwale… - NEURIPS, 2024 - hal.science

General-purpose artificial intelligence (AI) systems are built on massive swathes of public
web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge …

Speichern Zitieren Zitiert von: 24 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui, S Gopal - arxiv preprint arxiv:2404.07143, 2024 - arxiv.org

This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …

Speichern Zitieren Zitiert von: 87 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] nature.com

Generative language models exhibit social identity biases

T Hu, Y Kyrychenko, S Rathje, N Collier… - Nature Computational …, 2024 - nature.com

Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity)
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …

Speichern Zitieren Zitiert von: 19 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] arxiv.org

Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

X Wang, A Antoniades, Y Elazar, A Amayuelas… - arxiv preprint arxiv …, 2024 - arxiv.org

The impressive capabilities of large language models (LLMs) have sparked debate over
whether these models genuinely generalize to unseen tasks or predominantly rely on …

Speichern Zitieren Zitiert von: 15 Ähnliche Artikel Alle 4 Versionen HTML-Version

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

A survey of ai-generated content (aigc)

A survey on data selection for language models

Octopack: Instruction tuning code large language models

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models

Language models scale reliably with over-training and on downstream tasks

Consent in crisis: The rapid decline of the ai data commons

Leave no context behind: Efficient infinite context transformers with infini-attention

Generative language models exhibit social identity biases

Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data