Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …
A practical guide to multi-objective reinforcement learning and planning
Real-world sequential decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of research in …
between multiple, often conflicting, objectives. Despite this, the majority of research in …
Multi-objective multi-agent decision making: a utility-based analysis and survey
The majority of multi-agent system implementations aim to optimise agents' policies with
respect to a single objective, despite the fact that many real-world problem domains are …
respect to a single objective, despite the fact that many real-world problem domains are …
MO-MIX: Multi-objective multi-agent cooperative decision-making with deep reinforcement learning
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-
making problems. In many real-world scenarios, tasks often have several conflicting …
making problems. In many real-world scenarios, tasks often have several conflicting …
Multi-objective deep reinforcement learning
We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-
objective decision problems where the relative importances of the objectives are not known …
objective decision problems where the relative importances of the objectives are not known …
Human-aligned artificial intelligence is a multiobjective problem
As the capabilities of artificial intelligence (AI) systems improve, it becomes important to
constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of …
constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of …
A multi-objective deep reinforcement learning framework
This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL)
framework based on deep Q-networks. We develop a high-performance MODRL framework …
framework based on deep Q-networks. We develop a high-performance MODRL framework …
[كتاب][B] Metrics and benchmarks for self-aware computing systems
N Herbst, S Becker, S Kounev, H Koziolek, M Maggio… - 2017 - Springer
In this chapter, we propose a list of metrics grouped by the MAPE-K paradigm for quantifying
properties of self-aware computing systems. This set of metrics can be seen as a starting …
properties of self-aware computing systems. This set of metrics can be seen as a starting …
Autonomy and intelligence in the computing continuum: Challenges, enablers, and future directions for orchestration
Future AI applications require performance, reliability and privacy that the existing, cloud-
dependant system architectures cannot provide. In this article, we study orchestration in the …
dependant system architectures cannot provide. In this article, we study orchestration in the …
Self-improving system integration: Mastering continuous change
The research initiative “self-improving system integration”(SISSY) was established with the
goal to master the ever-changing demands of system organisation in the presence of …
goal to master the ever-changing demands of system organisation in the presence of …