Google Наука

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Запазване Позоваване С позовавания в 2329 Сродни статии Всички 8 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Запазване Позоваване С позовавания в 499 Сродни статии Всички 4 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arxiv preprint arxiv …, 2023 - arxiv.org

Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

Запазване Позоваване С позовавания в 1198 Сродни статии Всички 7 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [HTML] sciencedirect.com

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

N Díaz-Rodríguez, J Del Ser, M Coeckelbergh… - Information …, 2023 - Elsevier

Abstract Trustworthy Artificial Intelligence (AI) is based on seven technical requirements
sustained over three main pillars that should be met throughout the system's entire life cycle …

Запазване Позоваване С позовавания в 423 Сродни статии Всички 8 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Запазване Позоваване С позовавания в 423 Сродни статии Всички 9 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Can large language models be an alternative to human evaluations?

CH Chiang, H Lee - arxiv preprint arxiv:2305.01937, 2023 - arxiv.org

Human evaluation is indispensable and inevitable for assessing the quality of texts
generated by machine learning models or written by humans. However, human evaluation is …

Запазване Позоваване С позовавания в 499 Сродни статии Всички 5 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Representation engineering: A top-down approach to ai transparency

A Zou, L Phan, S Chen, J Campbell, P Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …

Запазване Позоваване С позовавания в 326 Сродни статии Всички 3 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc

Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …

Запазване Позоваване С позовавания в 196 Сродни статии Всички 6 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Запазване Позоваване С позовавания в 296 Сродни статии Всички 3 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

C Burns, P Izmailov, JH Kirchner, B Baker… - arxiv preprint arxiv …, 2023 - arxiv.org

Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

Запазване Позоваване С позовавания в 224 Сродни статии Всички 9 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Aligning ai with shared human values

A survey on evaluation of large language models

Challenges and applications of large language models

Universal and transferable adversarial attacks on aligned language models

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Can large language models be an alternative to human evaluations?

Representation engineering: A top-down approach to ai transparency

Tree of attacks: Jailbreaking black-box llms automatically

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision