Študovňa Google

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arxiv preprint arxiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Uložiť Citovať Citované 41-krát Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Obelics: An open web-scale filtered dataset of interleaved image-text documents

H Laurençon, L Saulnier, L Tronchon… - Advances in …, 2023 - proceedings.neurips.cc

Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …

Uložiť Citovať Citované 261-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Uložiť Citovať Citované 246-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc

As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

Uložiť Citovať Citované 193-krát Súvisiace články Všetky verzie 22 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

J Dodge, M Sap, A Marasović, W Agnew… - arxiv preprint arxiv …, 2021 - arxiv.org

Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …

Uložiť Citovať Citované 497-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The interplay of variant, size, and task type in Arabic pre-trained language models

G Inoue, B Alhafni, N Baimukan, H Bouamor… - arxiv preprint arxiv …, 2021 - arxiv.org

In this paper, we explore the effects of language variants, data sizes, and fine-tuning task
types in Arabic pre-trained language models. To do so, we build three pre-trained language …

Uložiť Citovať Citované 279-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages

T Nguyen, C Van Nguyen, VD Lai, H Man… - arxiv preprint arxiv …, 2023 - arxiv.org

The driving factors behind the development of large language models (LLMs) with
impressive learning capabilities are their colossal model sizes and extensive training …

Uložiť Citovať Citované 85-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media

A Safaya, M Abdullatif, D Yuret - arxiv preprint arxiv:2007.13184, 2020 - arxiv.org

In this paper, we describe our approach to utilize pre-trained BERT models with
Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language …

Uložiť Citovať Citované 384-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages

K Ogueji, Y Zhu, J Lin - Proceedings of the 1st workshop on …, 2021 - aclanthology.org

Pretrained multilingual language models have been shown to work well on many languages
for a variety of downstream NLP tasks. However, these models are known to require a lot of …

Uložiť Citovať Citované 195-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] sagepub.com

Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories

M Pellert, CM Lechner, C Wagner… - Perspectives on …, 2024 - journals.sagepub.com

We illustrate how standard psychometric inventories originally designed for assessing
noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous …

Uložiť Citovať Citované 69-krát Súvisiace články Všetky verzie 15

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

A monolingual approach to contextualized word embeddings for mid-resource languages

On efficient training of large-scale deep learning models: A literature review

Obelics: An open web-scale filtered dataset of interleaved image-text documents

Scaling data-constrained language models

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

The interplay of variant, size, and task type in Arabic pre-trained language models

Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages

Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media

Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages

Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories