The rise and potential of large language model based agents: A survey
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …
human intelligence. AI agents, which are artificial entities capable of sensing the …
Explainable ai: A review of machine learning interpretability methods
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …
with machine learning systems demonstrating superhuman performance in a significant …
[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
Tree of attacks: Jailbreaking black-box llms automatically
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …
Promptbench: Towards evaluating the robustness of large language models on adversarial prompts
The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …
necessitates a comprehensive understanding of their robustness to prompts. In response to …
Smoothllm: Defending large language models against jailbreaking attacks
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
A survey of safety and trustworthiness of large language models through the lens of verification and validation
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …
engage end-users in human-level conversations with detailed and articulate answers across …
In chatgpt we trust? measuring and characterizing the reliability of chatgpt
The way users acquire information is undergoing a paradigm shift with the advent of
ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the …
ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the …
Sneakyprompt: Jailbreaking text-to-image generative models
Text-to-image generative models such as Stable Diffusion and DALL• E raise many ethical
concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones …
concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones …
Bert-attack: Adversarial attack against bert using bert
Adversarial attacks for discrete data (such as texts) have been proved significantly more
challenging than continuous data (such as images) since it is difficult to generate adversarial …
challenging than continuous data (such as images) since it is difficult to generate adversarial …