Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Opendataval: a unified benchmark for data valuation
Assessing the quality and impact of individual data points is critical for improving model
performance and mitigating undesirable biases within the training dataset. Several data …
performance and mitigating undesirable biases within the training dataset. Several data …
Performance scaling via optimal transport: Enabling data selection from partially revealed sources
Traditionally, data selection has been studied in settings where all samples from prospective
sources are fully revealed to a machine learning developer. However, in practical data …
sources are fully revealed to a machine learning developer. However, in practical data …
Triage: Characterizing and auditing training data for improved regression
Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …
centric AI emphasizing the importance of training data characterization. However, current …
Data shapley in one training run
Data Shapley provides a principled framework for attributing data's contribution within
machine learning contexts. However, existing approaches require re-training models on …
machine learning contexts. However, existing approaches require re-training models on …
Rethinking data shapley for data selection tasks: Misleads and merits
Data Shapley provides a principled approach to data valuation and plays a crucial role in
data-centric machine learning (ML) research. Data selection is considered a standard …
data-centric machine learning (ML) research. Data selection is considered a standard …
Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning
Massive data is often considered essential for deep learning applications, but it also incurs
significant computational and infrastructural costs. Therefore, dataset pruning (DP) has …
significant computational and infrastructural costs. Therefore, dataset pruning (DP) has …
Get more for less: Principled data selection for warming up fine-tuning in llms
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-
tune a pre-trained language model. The goal is to minimize the need for costly domain …
tune a pre-trained language model. The goal is to minimize the need for costly domain …
Distributionally robust data valuation
Data valuation quantifies the contribution of each data point to the performance of a machine
learning model. Existing works typically define the value of data by its improvement of the …
learning model. Existing works typically define the value of data by its improvement of the …
Data valuation and detections in federated learning
Federated Learning (FL) enables collaborative model training while preserving the privacy
of raw data. A challenge in this framework is the fair and efficient valuation of data which is …
of raw data. A challenge in this framework is the fair and efficient valuation of data which is …
Finding needles in a haystack: A black-box approach to invisible watermark detection
In this paper, we propose WaterMark Detector (WMD), the first invisible watermark detection
method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary …
method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary …