Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …
their remarkable capabilities in performing diverse tasks across various domains. However …
Benchmark data contamination of large language models: A survey
The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …
Gemini has transformed the field of natural language processing. However, it has also …
Chatbot arena: An open platform for evaluating llms by human preference
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …
evaluating the alignment with human preferences still poses significant challenges. To …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Don't make your LLM an evaluation benchmark cheater
Large language models~(LLMs) have greatly advanced the frontiers of artificial intelligence,
attaining remarkable improvement in model capacity. To assess the model performance, a …
attaining remarkable improvement in model capacity. To assess the model performance, a …
Livecodebench: Holistic and contamination free evaluation of large language models for code
Large Language Models (LLMs) applied to code-related applications have emerged as a
prominent field, attracting significant interest from both academia and industry. However, as …
prominent field, attracting significant interest from both academia and industry. However, as …
LLM Dataset Inference: Did you train on my dataset?
The proliferation of large language models (LLMs) in the real world has come with a rise in
copyright cases against companies for training their models on unlicensed data from the …
copyright cases against companies for training their models on unlicensed data from the …
Bridging language and items for retrieval and recommendation
This paper introduces BLaIR, a series of pretrained sentence embedding models
specialized for recommendation scenarios. BLaIR is trained to learn correlations between …
specialized for recommendation scenarios. BLaIR is trained to learn correlations between …
Task contamination: Language models may not be few-shot anymore
Large language models (LLMs) offer impressive performance in various zero-shot and few-
shot tasks. However, their success in zero-shot or few-shot settings may be affected by task …
shot tasks. However, their success in zero-shot or few-shot settings may be affected by task …
Hallucination-free? assessing the reliability of leading ai legal research tools
Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI).
Such tools are designed to assist with a wide range of core legal tasks, from search and …
Such tools are designed to assist with a wide range of core legal tasks, from search and …