Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like
Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose …
Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose …
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Large language models (LLMs) have shown great potential in natural language processing
tasks, but their application to machine translation (MT) remains challenging due to …
tasks, but their application to machine translation (MT) remains challenging due to …
Can Automatic Metrics Assess High-Quality Translations?
Automatic metrics for evaluating translation quality are typically validated by measuring how
well they correlate with human assessments. However, correlation methods tend to capture …
well they correlate with human assessments. However, correlation methods tend to capture …