Large language models surpass human experts in predicting neuroscience results

X Luo, A Rechardt, G Sun, KK Nejad, F Yáñez… - Nature human …, 2024 - nature.com
Scientific discoveries often hinge on synthesizing decades of research, a task that potentially
outstrips human information processing capacities. Large language models (LLMs) offer a …

Understanding the effects of rlhf on llm generalisation and diversity

R Kirk, I Mediratta, C Nalmpantis, J Luketina… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) fine-tuned with reinforcement learning from human
feedback (RLHF) have been used in some of the most widely deployed AI models to date …

Evaluation of retrieval-augmented generation: A survey

H Yu, A Gan, K Zhang, S Tong, Q Liu, Z Liu - CCF Conference on Big Data, 2024 - Springer
Abstract Retrieval-Augmented Generation (RAG) has recently gained traction in natural
language processing. Numerous studies and real-world applications are leveraging its …

Watermark stealing in large language models

N Jovanović, R Staab, M Vechev - arxiv preprint arxiv:2402.19361, 2024 - arxiv.org
LLM watermarking has attracted attention as a promising way to detect AI-generated
content, with some works suggesting that current schemes may already be fit for …

MEGA-Bench: Scaling multimodal evaluation to over 500 real-world tasks

J Chen, T Liang, S Siu, Z Wang, K Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500
real-world tasks, to address the highly heterogeneous daily use cases of end users. Our …

Glm-4-voice: Towards intelligent and human-like end-to-end spoken chatbot

A Zeng, Z Du, M Liu, K Wang, S Jiang, L Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It
supports both Chinese and English, engages in real-time voice conversations, and varies …

Evaluating rag-fusion with ragelo: an automated elo-based framework

Z Rackauckas, A Câmara, J Zavrel - arxiv preprint arxiv:2406.14783, 2024 - arxiv.org
Challenges in the automated evaluation of Retrieval-Augmented Generation (RAG)
Question-Answering (QA) systems include hallucination problems in domain-specific …

Applied Hedge Algebra Approach with Multilingual Large Language Models to Extract Hidden Rules in Datasets for Improvement of Generative AI Applications

HV Pham, P Moore - Information, 2024 - mdpi.com
Generative AI applications have played an increasingly significant role in real-time tracking
applications in many domains including, for example, healthcare, consultancy, dialog boxes …

Training language models to critique with multi-agent feedback

T Lan, W Zhang, C Lyu, S Li, C Xu, H Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Critique ability, a meta-cognitive capability of humans, presents significant challenges for
LLMs to improve. Recent works primarily rely on supervised fine-tuning (SFT) using critiques …

An Investigation of Applying Large Language Models to Spoken Language Learning

Y Gao, B Nuchged, Y Li, L Peng - Applied Sciences, 2023 - mdpi.com
People have long desired intelligent conversational systems that can provide assistance in
practical scenarios. The latest advancements in large language models (LLMs) are making …