Redpajama: an open dataset for training large language models

M Weber, D Fu, Q Anthony, Y Oren… - Advances in …, 2025 - proceedings.neurips.cc
Large language models are increasingly becoming a cornerstone technology in artificial
intelligence, the sciences, and society as a whole, yet the optimal strategies for dataset …

Searching for best practices in retrieval-augmented generation

X Wang, Z Wang, X Gao, F Zhang, Y Wu… - Proceedings of the …, 2024 - aclanthology.org
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating
up-to-date information, mitigating hallucinations, and enhancing response quality …

Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models

Z Wang, Z Liu, Y Zhang, A Zhong, J Wang… - Proceedings of the 33rd …, 2024 - dl.acm.org
Large language model (LLM) applications in cloud root cause analysis (RCA) have been
actively explored recently. However, current methods are still reliant on manual workflow …

Algorithmic management in the gig economy: A systematic review and research integration

I Kadolkar, S Kepes… - Journal of Organizational …, 2024 - Wiley Online Library
Rapid growth in the gig economy has been facilitated by the increased use of algorithmic
management (AM) in online platforms (OPs) coordinating gig work. There has been a …

Bright: A realistic and challenging benchmark for reasoning-intensive retrieval

H Su, H Yen, M **a, W Shi, N Muennighoff… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing retrieval benchmarks primarily consist of information-seeking queries (eg,
aggregated questions from search engines) where keyword or semantic-based retrieval is …

Gecko: Versatile text embeddings distilled from large language models

J Lee, Z Dai, X Ren, B Chen, D Cer, JR Cole… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong
retrieval performance by leveraging a key idea: distilling knowledge from large language …

Making text embedders few-shot learners

C Li, MH Qin, S **ao, J Chen, K Luo, Y Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) with decoder-only architectures demonstrate remarkable in-
context learning (ICL) capabilities. This feature enables them to effectively handle both …

Llms4ol 2024 overview: The 1st large language models for ontology learning challenge

HB Giglou, J D'Souza, S Auer - arxiv preprint arxiv:2409.10146, 2024 - arxiv.org
This paper outlines the LLMs4OL 2024, the first edition of the Large Language Models for
Ontology Learning Challenge. LLMs4OL is a community development initiative collocated …

Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation

M Li, S Miao, P Li - arxiv preprint arxiv:2410.20724, 2024 - arxiv.org
Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations
such as hallucinations and outdated knowledge. Knowledge Graph (KG)-based Retrieval …

Weblinx: Real-world website navigation with multi-turn dialogue

XH Lù, Z Kasner, S Reddy - arxiv preprint arxiv:2402.05930, 2024 - arxiv.org
We propose the problem of conversational web navigation, where a digital agent controls a
web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue …