Longrag: Enhancing retrieval-augmented generation with long-context llms
In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …
Summary of a haystack: A challenge to long-context llms and rag systems
LLMs and RAG systems are now capable of handling millions of input tokens or more.
However, evaluating the output quality of such systems on long-context tasks remains …
However, evaluating the output quality of such systems on long-context tasks remains …
mgte: Generalized long-context text representation and reranking models for multilingual text retrieval
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
Financial Forecasting from Textual and Tabular Time Series
There is a variety of multimodal data pertinent to public companies, spanning from
accounting statements, macroeconomic statistics, earnings conference calls, and financial …
accounting statements, macroeconomic statistics, earnings conference calls, and financial …
Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding
L Boytsov, D Akinpelu, T Lin, F Gao, Y Zhao… - arxiv preprint arxiv …, 2022 - arxiv.org
We evaluated 20+ Transformer models for ranking of long documents (including recent
LongP models trained with FlashAttention) and compared them with a simple FirstP …
LongP models trained with FlashAttention) and compared them with a simple FirstP …
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval
In the information retrieval (IR) area, dense retrieval (DR) models use deep learning
techniques to encode queries and passages into embedding space to compute their …
techniques to encode queries and passages into embedding space to compute their …
MCTASmRNA: A deep learning framework for alternative splicing events classification
JY Zheng, G Jiang, FH Gao, SN Ren, CY Zhu… - International Journal of …, 2025 - Elsevier
Alternative Splicing (AS) plays crucial post-transcriptional gene function regulation roles in
eukaryotic. Despite progress in studying AS at the RNA level, existing methods for AS event …
eukaryotic. Despite progress in studying AS at the RNA level, existing methods for AS event …
GeAR: Generation Augmented Retrieval
Document retrieval techniques form the foundation for the development of large-scale
information systems. The prevailing methodology is to construct a bi-encoder and compute …
information systems. The prevailing methodology is to construct a bi-encoder and compute …
Drowning in Documents: Consequences of Scaling Reranker Inference
Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by
cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be …
cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be …
Similarity Evaluation and Fine-Tuning of Embedding Models from Various Linguistic Perspectives
H Kang, JK Jung - Available at SSRN 5040806 - papers.ssrn.com
Recent advancements in embedding models have brought significant progress in the field of
natural language processing (NLP). In particular, embedding models play a critical role in …
natural language processing (NLP). In particular, embedding models play a critical role in …