- Academic Search

EA Ashley - Nature Reviews Genetics, 2016‏ - nature.com‏

There is great potential for genome sequencing to enhance patient care through improved
diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential …‏

שמור צטט צוטט על ידי 1112 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From matching to generation: A survey on generative information retrieval‏

X Li, J **, Y Zhou, Y Zhang, P Zhang, Y Zhu… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Information Retrieval (IR) systems are crucial tools for users to access information, widely
applied in scenarios like search engines, question answering, and recommendation …‏

שמור צטט צוטט על ידי 36 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions‏

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025‏ - dl.acm.org‏

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …‏

שמור צטט צוטט על ידי 963 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only‏

G Penedo, Q Malartic, D Hesslow, R Cojocaru… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large language models are commonly trained on a mixture of filtered web data and curated
high-quality corpora, such as social media conversations, books, or technical papers. This …‏

שמור צטט צוטט על ידי 751 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only‏

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023‏ - proceedings.neurips.cc‏

Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …‏

שמור צטט צוטט על ידי 119 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The bigscience roots corpus: A 1.6 tb composite multilingual dataset‏

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022‏ - proceedings.neurips.cc‏

As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …‏

שמור צטט צוטט על ידי 191 מאמרים בנושא זה כל 22 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deduplicating training data makes language models better‏

K Lee, D Ippolito, A Nystrom, C Zhang, D Eck… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

We find that existing language modeling datasets contain many near-duplicate examples
and long repetitive substrings. As a result, over 1% of the unprompted output of language …‏

שמור צטט צוטט על ידי 613 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rest: Retrieval-based speculative decoding‏

Z He, Z Zhong, T Cai, JD Lee, D He - arxiv preprint arxiv:2311.08252, 2023‏ - arxiv.org‏

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to
speed up language model generation. The key insight driving the development of REST is …‏

שמור צטט צוטט על ידי 79 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] oup.com Full View‏

OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more‏

AM Altenhoff, CM Train, KJ Gilbert… - Nucleic acids …, 2021‏ - academic.oup.com‏

OMA is an established resource to elucidate evolutionary relationships among genes from
currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise …‏

שמור צטט צוטט על ידי 198 מאמרים בנושא זה כל 18 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of multimodal large language model from a data-centric perspective‏

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …‏

שמור צטט צוטט על ידי 38 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Suffix arrays: a new method for on-line string searches

Towards precision medicine‏

From matching to generation: A survey on generative information retrieval‏

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions‏

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only‏

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only‏

The bigscience roots corpus: A 1.6 tb composite multilingual dataset‏

Deduplicating training data makes language models better‏

Rest: Retrieval-based speculative decoding‏

OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more‏

A survey of multimodal large language model from a data-centric perspective‏