Longrag: Enhancing retrieval-augmented generation with long-context llms

Z Jiang, X Ma, W Chen - arxiv preprint arxiv:2406.15319, 2024 - arxiv.org
In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …

Summary of a haystack: A challenge to long-context llms and rag systems

P Laban, AR Fabbri, C **ong, CS Wu - arxiv preprint arxiv:2407.01370, 2024 - arxiv.org
LLMs and RAG systems are now capable of handling millions of input tokens or more.
However, evaluating the output quality of such systems on long-context tasks remains …

mgte: Generalized long-context text representation and reranking models for multilingual text retrieval

X Zhang, Y Zhang, D Long, W **e, Z Dai, J Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …

Financial Forecasting from Textual and Tabular Time Series

R Koval, N Andrews, X Yan - Findings of the Association for …, 2024 - aclanthology.org
There is a variety of multimodal data pertinent to public companies, spanning from
accounting statements, macroeconomic statistics, earnings conference calls, and financial …

Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

L Boytsov, D Akinpelu, T Lin, F Gao, Y Zhao… - arxiv preprint arxiv …, 2022 - arxiv.org
We evaluated 20+ Transformer models for ranking of long documents (including recent
LongP models trained with FlashAttention) and compared them with a simple FirstP …

Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval

H Zhang, C Chen, L Mei, Q Liu, J Mao - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
In the information retrieval (IR) area, dense retrieval (DR) models use deep learning
techniques to encode queries and passages into embedding space to compute their …

MCTASmRNA: A deep learning framework for alternative splicing events classification

JY Zheng, G Jiang, FH Gao, SN Ren, CY Zhu… - International Journal of …, 2025 - Elsevier
Alternative Splicing (AS) plays crucial post-transcriptional gene function regulation roles in
eukaryotic. Despite progress in studying AS at the RNA level, existing methods for AS event …

GeAR: Generation Augmented Retrieval

H Liu, S Huang, J Liu, Y Zhan, H Sun, W Deng… - arxiv preprint arxiv …, 2025 - arxiv.org
Document retrieval techniques form the foundation for the development of large-scale
information systems. The prevailing methodology is to construct a bi-encoder and compute …

Drowning in Documents: Consequences of Scaling Reranker Inference

M Jacob, E Lindgren, M Zaharia, M Carbin… - arxiv preprint arxiv …, 2024 - arxiv.org
Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by
cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be …

Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives

H Kang, JK Jung - Available at SSRN 5040806 - papers.ssrn.com
Recent advancements in embedding models have brought significant progress in the field of
natural language processing (NLP). In particular, embedding models play a critical role in …