Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasound

X Dan, K Xu, Y Zhou, C Yang, Y Chen, Y Dou… - Speech …, 2025 - Elsevier
Abstract The integration of Ultrasound Tongue Imaging (UTI) into clinical linguistics and
phonetics research facilitates the examination of articulatory patterns and the correlation …

Test-time Adaptation for Cross-modal Retrieval with Query Shift

H Li, P Hu, Q Zhang, X Peng, X Liu, M Yang - arxiv preprint arxiv …, 2024 - arxiv.org
The success of most existing cross-modal retrieval methods heavily relies on the assumption
that the given queries follow the same distribution of the source domain. However, such an …

Automatic Meter Pointer Reading Based on Knowledge Distillation

R Sun, W Yang, F Zhang, Y **ang, H Wang… - … on Knowledge Science …, 2024 - Springer
With the rapid development of industrial automation, automatic reading of pointer meters has
become a trend of data monitoring and efficient measurement in the industrial field. In the …

Enhancing Image Generation Fidelity via Progressive Prompts

Z **ong, Y Li, C Yang, T Tan, Z Zhu, S Li… - arxiv preprint arxiv …, 2025 - arxiv.org
The diffusion transformer (DiT) architecture has attracted significant attention in image
generation, achieving better fidelity, performance, and diversity. However, most existing DiT …