- Academic Search

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Simpan Kutip Dirujuk 743 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

Simpan Kutip Dirujuk 2652 kali Artikel terkait 12 versi Cache

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Simpan Kutip Dirujuk 392 kali Artikel terkait 8 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc

Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …

Simpan Kutip Dirujuk 165 kali Artikel terkait 3 versi Versi HTML

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arxiv e …, 2023 - ui.adsabs.harvard.edu

The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

Simpan Kutip Dirujuk 242 kali Artikel terkait 2 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org

Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Simpan Kutip Dirujuk 218 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G **, Y Dong… - Artificial Intelligence …, 2024 - Springer

Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Simpan Kutip Dirujuk 97 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

In chatgpt we trust? measuring and characterizing the reliability of chatgpt

X Shen, Z Chen, M Backes, Y Zhang - arxiv preprint arxiv:2304.08979, 2023 - arxiv.org

The way users acquire information is undergoing a paradigm shift with the advent of
ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the …

Simpan Kutip Dirujuk 139 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sneakyprompt: Jailbreaking text-to-image generative models

Y Yang, B Hui, H Yuan, N Gong… - 2024 IEEE symposium on …, 2024 - ieeexplore.ieee.org

Text-to-image generative models such as Stable Diffusion and DALL• E raise many ethical
concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones …

Simpan Kutip Dirujuk 75 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bert-attack: Adversarial attack against bert using bert

L Li, R Ma, Q Guo, X Xue, X Qiu - arxiv preprint arxiv:2004.09984, 2020 - arxiv.org

Adversarial attacks for discrete data (such as texts) have been proved significantly more
challenging than continuous data (such as images) since it is difficult to generate adversarial …

Simpan Kutip Dirujuk 730 kali Artikel terkait 6 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Textbugger: Generating adversarial text against real-world applications

The rise and potential of large language model based agents: A survey

Explainable ai: A review of machine learning interpretability methods

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Tree of attacks: Jailbreaking black-box llms automatically

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

Smoothllm: Defending large language models against jailbreaking attacks

A survey of safety and trustworthiness of large language models through the lens of verification and validation

In chatgpt we trust? measuring and characterizing the reliability of chatgpt

Sneakyprompt: Jailbreaking text-to-image generative models

Bert-attack: Adversarial attack against bert using bert