Google Академик

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Сачувај Цитирај 782 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Сачувај Цитирај 137 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

C-pack: Packed resources for general chinese embeddings

S **ao, Z Liu, P Zhang, N Muennighoff, D Lian… - Proceedings of the 47th …, 2024 - dl.acm.org

We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …

Сачувај Цитирај 461 пута наведен Сродни чланци Све верзије (5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Сачувај Цитирај 293 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Longbench: A bilingual, multitask benchmark for long context understanding

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …

Сачувај Цитирај 176 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hallucination detection: Robustly discerning reliable answers in large language models

Y Chen, Q Fu, Y Yuan, Z Wen, G Fan, D Liu… - Proceedings of the …, 2023 - dl.acm.org

Large language models (LLMs) have gained widespread adoption in various natural
language processing tasks, including question answering and dialogue systems. However …

Сачувај Цитирај 108 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc

As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

Сачувај Цитирај 192 пута наведен Сродни чланци Све верзије (22) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation

Y Sun, S Wang, S Feng, S Ding, C Pang… - arxiv preprint arxiv …, 2021 - arxiv.org

Pre-trained models have achieved state-of-the-art results in various Natural Language
Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up …

Сачувај Цитирај 549 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Сачувај Цитирај 232 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

JH Clark, E Choi, M Collins, D Garrette… - Transactions of the …, 2020 - direct.mit.edu

Confidently making progress on multilingual modeling requires challenging, trustworthy
evaluations. We present TyDi QA—a question answering dataset covering 11 typologically …

Сачувај Цитирај 586 пута наведен Сродни чланци Све верзије (13)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Dureader: a chinese machine reading comprehension dataset from real-world applications

A comprehensive overview of large language models

Datasets for large language models: A comprehensive survey

C-pack: Packed resources for general chinese embeddings

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

Longbench: A bilingual, multitask benchmark for long context understanding

Hallucination detection: Robustly discerning reliable answers in large language models

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages