Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
" My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
The open-ended nature of language generation makes the evaluation of autoregressive
large language models (LLMs) challenging. One common evaluation approach uses …
large language models (LLMs) challenging. One common evaluation approach uses …
Cvqa: Culturally-diverse multilingual visual question answering benchmark
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used
to test the ability of vision-language models to understand and reason on knowledge …
to test the ability of vision-language models to understand and reason on knowledge …
Are Large Language Models Consistent over Value-laden Questions?
Large language models (LLMs) appear to bias their survey answers toward certain values.
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …
Take care of your prompt bias! investigating and mitigating prompt bias in factual knowledge extraction
Recent research shows that pre-trained language models (PLMs) suffer from" prompt bias"
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Predictable behavior from scaling advanced AI systems is an extremely desirable property.
Although a well-established literature exists on how pretraining performance scales, the …
Although a well-established literature exists on how pretraining performance scales, the …
Look at the text: Instruction-tuned language models are more robust multiple choice selectors than you think
Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large
language models (LLMs). One common way to evaluate the model response is to rank the …
language models (LLMs). One common way to evaluate the model response is to rank the …
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Vision and language model (VLM) decoders are currently the best-performing architectures
on multimodal tasks. Next to answers, they are able to produce natural language …
on multimodal tasks. Next to answers, they are able to produce natural language …
A Study on Large Language Models' Limitations in Multiple-Choice Question Answering
A Khatun, DG Brown - arxiv preprint arxiv:2401.07955, 2024 - arxiv.org
The widespread adoption of Large Language Models (LLMs) has become commonplace,
particularly with the emergence of open-source models. More importantly, smaller models …
particularly with the emergence of open-source models. More importantly, smaller models …
(perhaps) beyond human translation: Harnessing multi-agent collaboration for translating ultra-long literary texts
Recent advancements in machine translation (MT) have significantly enhanced translation
quality across various domains. However, the translation of literary texts remains a …
quality across various domains. However, the translation of literary texts remains a …
Benchmarking Distributional Alignment of Large Language Models
Language models (LMs) are increasingly used as simulacra for people, yet their ability to
match the distribution of views of a specific demographic group and be\textit {distributionally …
match the distribution of views of a specific demographic group and be\textit {distributionally …