Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning

X Zhang, J Du, Y Li, W **e… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Dataset pruning aims to construct a coreset capable of achieving performance comparable
to the original full dataset. Most existing dataset pruning methods rely on snapshot-based …

Learning with noisy labels for robust fatigue detection

M Wang, R Hu, X Zhu, D Zhu, X Wang - Knowledge-Based Systems, 2024 - Elsevier
Fatigue is a significant safety concern across various domains, and accurate detection is
vital. However, the commonly employed fine-grained labels (seconds-based) frequently …

Refined coreset selection: Towards minimal coreset size under model performance constraints

X **a, J Liu, S Zhang, Q Wu, H Wei, T Liu - arxiv preprint arxiv:2311.08675, 2023 - arxiv.org
Coreset selection is powerful in reducing computational costs and accelerating data
processing for deep learning algorithms. It strives to identify a small subset from large-scale …

Clipcleaner: Cleaning noisy labels with clip

C Feng, G Tzimiropoulos, I Patras - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Learning with Noisy labels (LNL) poses a significant challenge for the Machine Learning
community. Some of the most widely used approaches that select as clean samples for …

GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning

G Zhang, H Dong, Y Zhang, Z Li, D Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Training high-quality deep models necessitates vast amounts of data, resulting in
overwhelming computational and memory demands. Recently, data pruning, distillation, and …

Perplexed by perplexity: Perplexity-based pruning with small reference models

Z Ankner, C Blakeney, K Sreenivasan… - ICLR 2024 Workshop …, 2024 - openreview.net
In this work, we consider whether pretraining on a pruned high-quality subset of a large-
scale text dataset can improve LLM performance. While existing work has shown that …

Prioritizing Informative Features and Examples for Deep Learning from Noisy Data

D Park - arxiv preprint arxiv:2403.00013, 2024 - arxiv.org
In this dissertation, we propose a systemic framework that prioritizes informative features
and examples to enhance each stage of the development process. Specifically, we prioritize …

Lightweight spatial-channel feature disentanglement modeling with confidence evaluation for uncertain industrial image

L Lei, HX Li, HD Yang - Applied Mathematical Modelling, 2025 - Elsevier
Process uncertainty has a significant impact on industrial image processing. Existing deep
learning methods were established on high-quality datasets without considering the …

DynImpt: A Dynamic Data Selection Method for Improving Model Training Efficiency

W Huang, Y Zhang, S Guo, Y Shang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Selecting key data subsets for model training is an effective way to improve training
efficiency. Existing methods generally utilize a well-trained model to evaluate samples and …

Co-active: an efficient selective relabeling model for resource constrained edge AI

C Hou, K Jiang, T Li, M Zhou, J Jiang - Wireless Networks, 2025 - Springer
With high-quality annotation data, edge AI has emerged as a pivotal technology in various
domains. Unfortunately, due to sensor errors and discrepancies in data collection, datasets …