mgte: Generalized long-context text representation and reranking models for multilingual text retrieval
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
Normalization and effective learning rates in reinforcement learning
Normalization layers have recently experienced a renaissance in the deep reinforcement
learning and continual learning literature, with several works highlighting diverse benefits …
learning and continual learning literature, with several works highlighting diverse benefits …
On the overlooked structure of stochastic gradients
Stochastic gradients closely relate to both optimization and generalization of deep neural
networks (DNNs). Some works attempted to explain the success of stochastic optimization …
networks (DNNs). Some works attempted to explain the success of stochastic optimization …
Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement
Replacing non-polynomial functions (eg, non-linear activation functions such as ReLU) in a
neural network with their polynomial approximations is a standard practice in privacy …
neural network with their polynomial approximations is a standard practice in privacy …
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Storing open-source fine-tuned models separately introduces redundancy and increases
response times in applications utilizing multiple models. Delta-parameter pruning (DPP) …
response times in applications utilizing multiple models. Delta-parameter pruning (DPP) …
ConoDL: a deep learning framework for rapid generation and prediction of conotoxins
M Guo, Z Li, X Deng, D Luo, J Yang, Y Chen… - Journal of Computer …, 2025 - Springer
Conotoxins, being small disulfide-rich and bioactive peptides, manifest notable
pharmacological potential and find extensive applications. However, the exploration of …
pharmacological potential and find extensive applications. However, the exploration of …
Neural Field Classifiers via Target Encoding and Classification Loss
Neural field methods have seen great progress in various long-standing tasks in computer
vision and computer graphics, including novel view synthesis and geometry reconstruction …
vision and computer graphics, including novel view synthesis and geometry reconstruction …
Weight decay induces low-rank attention layers
The effect of regularizers such as weight decay when training deep neural networks is not
well understood. We study the influence of weight decay as well as $ L2 $-regularization …
well understood. We study the influence of weight decay as well as $ L2 $-regularization …
Avoiding Catastrophic Forgetting Via Neuronal Decay
RO Malashin, MA Mikhalkova - 2024 Wave Electronics and its …, 2024 - ieeexplore.ieee.org
In continual learning settings neural network is taught different tasks sequentially and the
network is prone to catastrophic forgetting. We investigate the role of regularization methods …
network is prone to catastrophic forgetting. We investigate the role of regularization methods …