SinKD: Sinkhorn Distance Minimization for Knowledge Distillation
Knowledge distillation (KD) has been widely adopted to compress large language models
(LLMs). Existing KD methods investigate various divergence measures including the …
(LLMs). Existing KD methods investigate various divergence measures including the …
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
X Cui, M Zhu, Y Qin, L **e, W Zhou, H Li - arxiv preprint arxiv:2412.14528, 2024 - arxiv.org
Knowledge distillation (KD) has become a prevalent technique for compressing large
language models (LLMs). Existing KD methods are constrained by the need for identical …
language models (LLMs). Existing KD methods are constrained by the need for identical …
Graph Exploration for Effective Multiagent Q-Learning
A Zhaikhan, AH Sayed - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
This article proposes an exploration technique for multiagent reinforcement learning (MARL)
with graph-based communication among agents. We assume that the individual rewards …
with graph-based communication among agents. We assume that the individual rewards …
Terminal Line-of-Sight Angle-Constrained Target Tracking Guidance for Unmanned Surface Vehicles
B Du, K Yang, W Zhang, H Chen - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
This paper investigates a Terminal Line-of-Sight Angle-Constrained Target Tracking
Guidance (TLATT) for Unmanned Surface Vehicles (USVs). Dynamic target tracking …
Guidance (TLATT) for Unmanned Surface Vehicles (USVs). Dynamic target tracking …
Intelligent towing and pushing system for unmanned tugboats under wind and wave disturbances
B Du, W Zhang - Ships and Offshore Structures, 2024 - Taylor & Francis
Intelligent unmanned tugboats have the potential to significantly improve the efficiency of
barge manipulation. This paper investigates the kinetic modelling, optimal force distribution …
barge manipulation. This paper investigates the kinetic modelling, optimal force distribution …
A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
Q Qi, X Yang, G **a, DWC Ho, P Tang - arxiv preprint arxiv:2410.06847, 2024 - arxiv.org
This paper proposes a safety modulator actor-critic (SMAC) method to address safety
constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A …
constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A …
Extended reality-based training and adaptive machine learning-based optimization for thermoforming process
I Jalilvand - 2025 - open.library.ubc.ca
Extended reality (XR) and machine learning (ML) are becoming pivotal enabling
technologies in smart manufacturing, particularly for optimizing complex/multi-step …
technologies in smart manufacturing, particularly for optimizing complex/multi-step …