mgte: Generalized long-context text representation and reranking models for multilingual text retrieval

X Zhang, Y Zhang, D Long, W **e, Z Dai, J Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …

Normalization and effective learning rates in reinforcement learning

C Lyle, Z Zheng, K Khetarpal, J Martens… - arxiv preprint arxiv …, 2024 - arxiv.org
Normalization layers have recently experienced a renaissance in the deep reinforcement
learning and continual learning literature, with several works highlighting diverse benefits …

On the overlooked structure of stochastic gradients

Z **e, QY Tang, M Sun, P Li - Advances in Neural …, 2023 - proceedings.neurips.cc
Stochastic gradients closely relate to both optimization and generalization of deep neural
networks (DNNs). Some works attempted to explain the success of stochastic optimization …

Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement

C Zhang, J Fan, MH Au, SM Yiu - arxiv preprint arxiv:2402.11224, 2024 - arxiv.org
Replacing non-polynomial functions (eg, non-linear activation functions such as ReLU) in a
neural network with their polynomial approximations is a standard practice in privacy …

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

W Deng, Y Zhao, V Vakilian, M Chen, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Storing open-source fine-tuned models separately introduces redundancy and increases
response times in applications utilizing multiple models. Delta-parameter pruning (DPP) …

ConoDL: a deep learning framework for rapid generation and prediction of conotoxins

M Guo, Z Li, X Deng, D Luo, J Yang, Y Chen… - Journal of Computer …, 2025 - Springer
Conotoxins, being small disulfide-rich and bioactive peptides, manifest notable
pharmacological potential and find extensive applications. However, the exploration of …

Neural Field Classifiers via Target Encoding and Classification Loss

X Yang, Z **e, X Zhou, B Liu, B Liu, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural field methods have seen great progress in various long-standing tasks in computer
vision and computer graphics, including novel view synthesis and geometry reconstruction …

Weight decay induces low-rank attention layers

S Kobayashi, Y Akram, J Von Oswald - arxiv preprint arxiv:2410.23819, 2024 - arxiv.org
The effect of regularizers such as weight decay when training deep neural networks is not
well understood. We study the influence of weight decay as well as $ L2 $-regularization …

Avoiding Catastrophic Forgetting Via Neuronal Decay

RO Malashin, MA Mikhalkova - 2024 Wave Electronics and its …, 2024 - ieeexplore.ieee.org
In continual learning settings neural network is taught different tasks sequentially and the
network is prone to catastrophic forgetting. We investigate the role of regularization methods …