Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on evaluation of large language models
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …
industry, owing to their unprecedented performance in various applications. As LLMs …
Survey on factuality in large language models: Knowledge, retrieval and domain-specificity
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …
[PDF][PDF] Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization
Instruction tuning large language models (LLMs) remains a challenging task, owing to the
complexity of hyperparameter selection and the difficulty involved in evaluating the tuned …
complexity of hyperparameter selection and the difficulty involved in evaluating the tuned …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Does fine-tuning LLMs on new knowledge encourage hallucinations?
When large language models are aligned via supervised fine-tuning, they may encounter
new factual information that was not acquired through pre-training. It is often conjectured that …
new factual information that was not acquired through pre-training. It is often conjectured that …
Investigating the factual knowledge boundary of large language models with retrieval augmentation
Knowledge-intensive tasks (eg, open-domain question answering (QA)) require a
substantial amount of factual knowledge and often rely on external information for …
substantial amount of factual knowledge and often rely on external information for …
Leave no document behind: Benchmarking long-context llms with extended multi-doc qa
Long-context modeling capabilities of Large Language Models (LLMs) have garnered
widespread attention, leading to the emergence of LLMs with ultra-context windows …
widespread attention, leading to the emergence of LLMs with ultra-context windows …
Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in llms
Due to the widespread use of large language models (LLMs), we need to understand
whether they embed a specific “worldview” and what these views reflect. Recent studies …
whether they embed a specific “worldview” and what these views reflect. Recent studies …
Unveiling the clinical incapabilities: a benchmarking study of GPT-4V (ision) for ophthalmic multimodal image analysis
Purpose To evaluate the capabilities and incapabilities of a GPT-4V (ision)-based chatbot in
interpreting ocular multimodal images. Methods We developed a digital ophthalmologist app …
interpreting ocular multimodal images. Methods We developed a digital ophthalmologist app …