Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Grounding and evaluation for large language models: Practical challenges and lessons learned (survey)
With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes
domains, ensuring the trustworthiness, safety, and observability of these systems has …
domains, ensuring the trustworthiness, safety, and observability of these systems has …
Jailbreakbench: An open robustness benchmark for jailbreaking large language models
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or
otherwise objectionable content. Evaluating these attacks presents a number of challenges …
otherwise objectionable content. Evaluating these attacks presents a number of challenges …
Inadequacies of large language model benchmarks in the era of generative artificial intelligence
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities
has spurred public curiosity to evaluate and compare different LLMs, leading many …
has spurred public curiosity to evaluate and compare different LLMs, leading many …
Comparative evaluation of commercial large language models on promptbench: An english and chinese perspective
S Wang, Q Ouyang, B Wang - 2024 - researchsquare.com
This study embarks on an exploration of the performance disparities observed between
English and Chinese in large language models (LLMs), motivated by the growing need for …
English and Chinese in large language models (LLMs), motivated by the growing need for …
A comparative analysis of large language models to evaluate robustness and reliability in adversarial conditions
T Goto, K Ono, A Morita - Authorea Preprints, 2024 - techrxiv.org
This study went on a comprehensive evaluation of four prominent Large Language Models
(LLMs)-Google Gemini, Mistral 8x7B, ChatGPT-4, and Microsoft Phi-1.5-to assess their …
(LLMs)-Google Gemini, Mistral 8x7B, ChatGPT-4, and Microsoft Phi-1.5-to assess their …
[PDF][PDF] Evaluating prompt injection safety in large language models using the promptbench dataset
X Sang, M Gu, H Chi - 2024 - files.osf.io
The safety evaluation of large language models against adversarial prompt injections
introduces a novel and significant concept that addresses the critical need for robust AI …
introduces a novel and significant concept that addresses the critical need for robust AI …
On catastrophic inheritance of large foundation models
Large foundation models (LFMs) are claiming incredible performances. Yet great concerns
have been raised about their mythic and uninterpreted potentials not only in machine …
have been raised about their mythic and uninterpreted potentials not only in machine …
Plum: Prompt learning using metaheuristic
Since the emergence of large language models, prompt learning has become a popular
method for optimizing and customizing these models. Special prompts, such as Chain-of …
method for optimizing and customizing these models. Special prompts, such as Chain-of …
Benchmarks as microscopes: A call for model metrology
Modern language models (LMs) pose a new challenge in capability assessment. Static
benchmarks inevitably saturate without providing confidence in the deployment tolerances …
benchmarks inevitably saturate without providing confidence in the deployment tolerances …
Lifelong knowledge editing for llms with retrieval-augmented continuous prompt learning
Model editing aims to correct outdated or erroneous knowledge in large language models
(LLMs) without the need for costly retraining. Lifelong model editing is the most challenging …
(LLMs) without the need for costly retraining. Lifelong model editing is the most challenging …