Μελετητής Google

Y Li, Y Liu, G Deng, Y Zhang, W Song, L Shi… - Proceedings of the …, 2024 - dl.acm.org

With the expanding application of Large Language Models (LLMs) in various domains, it
becomes imperative to comprehensively investigate their unforeseen behaviors and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 18 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An empirical study on large language models in accuracy and robustness under chinese industrial scenarios

Z Li, W Qiu, P Ma, Y Li, Y Li, S He, B Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent years have witnessed the rapid development of large language models (LLMs) in
various domains. To better serve the large number of Chinese users, many commercial …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 3 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MORTAR: Metamorphic Multi-turn Testing for LLM-based Dialogue Systems

G Guo, A Aleti, N Neelofar… - arxiv preprint arxiv …, 2024 - arxiv.org

With the widespread application of LLM-based dialogue systems in daily life, quality
assurance has become more important than ever. Recent research has successfully …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

S Wu, Y Hu, Y Wang, J Gu, J Meng, L Fan… - … Proceedings of the …, 2024 - dl.acm.org

Search components in e-commerce apps, often complex AI-based systems, are prone to
bugs that can lead to missed recalls—situations where items that should be listed in search …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 3 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing

Y Liu, G Wang, X Zheng, G Deng, K Wang, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Image captioning (IC) systems, such as Microsoft Azure Cognitive Service, translate image
content into descriptive language but can generate inaccuracies leading to …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization Systems

X Zhu, M Jiang, XY Zhang, L Nie, Z Ding - Proceedings of the ACM on …, 2024 - dl.acm.org

Abstractive summarization (AS) systems, which aim to generate a text for summarizing
crucial information of the original document, have been widely adopted in recent years …

Αποθήκευση Παράθεση Σχετικά άρθρα

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing

Z Chang, M Li, J Wang, C Li, Q Wang - arxiv preprint arxiv:2403.02581, 2024 - arxiv.org

Visual entailment (VE) is a multimodal reasoning task consisting of image-sentence pairs
whereby a promise is defined by an image, and a hypothesis is described by a sentence …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

MTTM: metamorphic testing for textual content moderation software. In 2023 IEEE/ACM 45th...

Glitch tokens in large language models: Categorization taxonomy and effective detection

An empirical study on large language models in accuracy and robustness under chinese industrial scenarios

MORTAR: Metamorphic Multi-turn Testing for LLM-based Dialogue Systems

Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing

MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization Systems

VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing