[فهرست منابع][C] Reasoning with transformer-based models: Deep learning, but shallow reasoning

C Helwe, C Clavel, F Suchanek - International Conference on …, 2021‏ - imt.hal.science
Recent years have seen impressive performance of transformer-based models on different
natural language processing tasks. However, it is not clear to what degree the transformers …

Do as i can, not as i say: Grounding language in robotic affordances

M Ahn, A Brohan, N Brown, Y Chebotar… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arxiv preprint arxiv …, 2023‏ - arxiv.org
As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

Language models show human-like content effects on reasoning tasks

I Dasgupta, AK Lampinen, SCY Chan… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Reasoning is a key ability for an intelligent system. Large language models (LMs) achieve
above-chance performance on abstract reasoning tasks, but exhibit many imperfections …

Chinese clip: Contrastive vision-language pretraining in chinese

A Yang, J Pan, J Lin, R Men, Y Zhang, J Zhou… - arxiv preprint arxiv …, 2022‏ - arxiv.org
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and
application of contrastive learning for vision-language pretraining. In this work, we construct …

Cruxeval: A benchmark for code reasoning, understanding and execution

A Gu, B Rozière, H Leather, A Solar-Lezama… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …

Consistency analysis of chatgpt

ME Jang, T Lukasiewicz - arxiv preprint arxiv:2303.06273, 2023‏ - arxiv.org
ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been
reported through many media platforms, and some analyses even showed that ChatGPT …

Prosocialdialog: A prosocial backbone for conversational agents

H Kim, Y Yu, L Jiang, X Lu, D Khashabi, G Kim… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Small models are valuable plug-ins for large language models

C Xu, Y Xu, S Wang, Y Liu, C Zhu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are
often publicly unavailable and their immense sizes make the models difficult to be tuned with …