Glitch tokens in large language models: Categorization taxonomy and effective detection

Y Li, Y Liu, G Deng, Y Zhang, W Song, L Shi… - Proceedings of the …, 2024 - dl.acm.org
With the expanding application of Large Language Models (LLMs) in various domains, it
becomes imperative to comprehensively investigate their unforeseen behaviors and …

An empirical study on large language models in accuracy and robustness under chinese industrial scenarios

Z Li, W Qiu, P Ma, Y Li, Y Li, S He, B Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent years have witnessed the rapid development of large language models (LLMs) in
various domains. To better serve the large number of Chinese users, many commercial …

MORTAR: Metamorphic Multi-turn Testing for LLM-based Dialogue Systems

G Guo, A Aleti, N Neelofar… - arxiv preprint arxiv …, 2024 - arxiv.org
With the widespread application of LLM-based dialogue systems in daily life, quality
assurance has become more important than ever. Recent research has successfully …

Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

S Wu, Y Hu, Y Wang, J Gu, J Meng, L Fan… - … Proceedings of the …, 2024 - dl.acm.org
Search components in e-commerce apps, often complex AI-based systems, are prone to
bugs that can lead to missed recalls—situations where items that should be listed in search …

SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing

Y Liu, G Wang, X Zheng, G Deng, K Wang, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Image captioning (IC) systems, such as Microsoft Azure Cognitive Service, translate image
content into descriptive language but can generate inaccuracies leading to …

MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization Systems

X Zhu, M Jiang, XY Zhang, L Nie, Z Ding - Proceedings of the ACM on …, 2024 - dl.acm.org
Abstractive summarization (AS) systems, which aim to generate a text for summarizing
crucial information of the original document, have been widely adopted in recent years …

VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing

Z Chang, M Li, J Wang, C Li, Q Wang - arxiv preprint arxiv:2403.02581, 2024 - arxiv.org
Visual entailment (VE) is a multimodal reasoning task consisting of image-sentence pairs
whereby a promise is defined by an image, and a hypothesis is described by a sentence …