Študovňa Google

Glue-x: Evaluating natural language understanding models from an out-of-distribution generalizati...

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Uložiť Citovať Citované 2276-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Uložiť Citovať Citované 134-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Uložiť Citovať Citované 414-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] utk.edu

[PDF][PDF] Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arxiv preprint arxiv …, 2024 - mosis.eecs.utk.edu

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Uložiť Citovať Citované 259-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the robustness of chatgpt: An adversarial and out-of-distribution perspective

J Wang, X Hu, W Hou, H Chen, R Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org

ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing
attention over the past few months. While evaluations of various aspects of ChatGPT have …

Uložiť Citovať Citované 244-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization

Y Wang, Z Yu, Z Zeng, L Yang, C Wang, H Chen… - arxiv preprint arxiv …, 2023 - arxiv.org

Instruction tuning large language models (LLMs) remains a challenging task, owing to the
complexity of hyperparameter selection and the difficulty involved in evaluating the tuned …

Uložiť Citovať Citované 201-krát Súvisiace články Všetky verzie 4 HTML verzia

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arxiv e …, 2023 - ui.adsabs.harvard.edu

The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

Uložiť Citovať Citované 254-krát Súvisiace články

[Free GPT-4]
[DeepSeek]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Uložiť Citovať Citované 56-krát Súvisiace články Všetky verzie 11 V pamäti

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and llms evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

Uložiť Citovať Citované 82-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y **ao, S Li, X Deng, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

Uložiť Citovať Citované 53-krát Súvisiace články Všetky verzie 3 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Glue-x: Evaluating natural language understanding models from an out-of-distribution generalizati...

A survey on evaluation of large language models

Datasets for large language models: A comprehensive survey

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

[PDF][PDF] Trustllm: Trustworthiness in large language models

On the robustness of chatgpt: An adversarial and out-of-distribution perspective

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and llms evaluations

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems