A survey on evaluation of out-of-distribution generalization

H Yu, J Liu, X Zhang, J Wu, P Cui - arxiv preprint arxiv:2403.01874, 2024 - arxiv.org
Machine learning models, while progressively advanced, rely heavily on the IID assumption,
which is often unfulfilled in practice due to inevitable distribution shifts. This renders them …

Energy-based automated model evaluation

R Peng, H Zou, H Wang, Y Zeng, Z Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The conventional evaluation protocols on machine learning models rely heavily on a
labeled, iid-assumed testing dataset, which is not often present in real world applications …

Test optimization in DNN testing: a survey

Q Hu, Y Guo, X **e, M Cordy, L Ma… - ACM Transactions on …, 2024 - dl.acm.org
This article presents a comprehensive survey on test optimization in deep neural network
(DNN) testing. Here, test optimization refers to testing with low data labeling effort. We …

Source-Free Domain-Invariant Performance Prediction

E Khramtsova, M Baktashmotlagh, G Zuccon… - … on Computer Vision, 2024 - Springer
Accurately estimating model performance poses a significant challenge, particularly in
scenarios where the source and target domains follow different data distributions. Most …

Label-free evaluation for performance of fault diagnosis model on unknown distribution dataset

Z Liu, H Zheng, H Liu, W Jia, J Tan - Advanced Engineering Informatics, 2024 - Elsevier
Real-time data may undergo distribution drift due to changes in operating conditions and
other factors, which can affect the classification accuracy of online fault diagnosis models …

Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis

X Sun, X Leng, Z Wang, Y Yang, Z Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Analyzing model performance in various unseen environments is a critical research problem
in the machine learning community. To study this problem, it is important to construct a …

Active Testing of Large Language Model via Multi-Stage Sampling

Y Huang, J Song, Q Hu, F Juefei-Xu, L Ma - arxiv preprint arxiv …, 2024 - arxiv.org
Performance evaluation plays a crucial role in the development life cycle of large language
models (LLMs). It estimates the model's capability, elucidates behavior characteristics, and …

Learning diverse features in vision transformers for improved generalization

AM Nicolicioiu, AL Nicolicioiu, B Alexe… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep learning models often rely only on a small set of features even when there is a rich set
of predictive signals in the training data. This makes models brittle and sensitive to …

Methodology for Evaluating the Generalization of ResNet

A Du, Q Zhou, Y Dai - Applied Sciences, 2024 - mdpi.com
Convolutional neural networks (CNNs) have achieved promising results in many tasks, and
evaluating the model's generalization ability based on the trained model and training data is …

Towards Efficient Multi-Domain Knowledge Fusion Adaptation via Low-Rank Reparameterization and Noisy Label Learning: Application to Source-Free Cross …

Y Lin, Y Wang, M Zhang, H Cao, L Ma… - IEEE Internet of …, 2024 - ieeexplore.ieee.org
Domain adaptation in fault diagnosis can efficiently handle different data distributions by co-
training source and target domain data. However, the source domain data may not be …