Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[فهرست منابع][C] Reasoning with transformer-based models: Deep learning, but shallow reasoning
C Helwe, C Clavel, F Suchanek - International Conference on …, 2021 - imt.hal.science
Recent years have seen impressive performance of transformer-based models on different
natural language processing tasks. However, it is not clear to what degree the transformers …
natural language processing tasks. However, it is not clear to what degree the transformers …
Do as i can, not as i say: Grounding language in robotic affordances
M Ahn, A Brohan, N Brown, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …
Evaluating large language models at evaluating instruction following
As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …
Language models show human-like content effects on reasoning tasks
I Dasgupta, AK Lampinen, SCY Chan… - arxiv preprint arxiv …, 2022 - arxiv.org
Reasoning is a key ability for an intelligent system. Large language models (LMs) achieve
above-chance performance on abstract reasoning tasks, but exhibit many imperfections …
above-chance performance on abstract reasoning tasks, but exhibit many imperfections …
Chinese clip: Contrastive vision-language pretraining in chinese
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and
application of contrastive learning for vision-language pretraining. In this work, we construct …
application of contrastive learning for vision-language pretraining. In this work, we construct …
Cruxeval: A benchmark for code reasoning, understanding and execution
A Gu, B Rozière, H Leather, A Solar-Lezama… - arxiv preprint arxiv …, 2024 - arxiv.org
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …
Consistency analysis of chatgpt
ME Jang, T Lukasiewicz - arxiv preprint arxiv:2303.06273, 2023 - arxiv.org
ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been
reported through many media platforms, and some analyses even showed that ChatGPT …
reported through many media platforms, and some analyses even showed that ChatGPT …
Prosocialdialog: A prosocial backbone for conversational agents
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …
by either ignoring or passively agreeing with them. To address this issue, we introduce …
Negative object presence evaluation (nope) to measure object hallucination in vision-language models
H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
Small models are valuable plug-ins for large language models
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are
often publicly unavailable and their immense sizes make the models difficult to be tuned with …
often publicly unavailable and their immense sizes make the models difficult to be tuned with …