- Academic Search

X Wang, Z Wang, J Liu, Y Chen, L Yuan… - arxiv preprint arxiv …, 2023 - arxiv.org

To solve complex tasks, large language models (LLMs) often require multiple rounds of
interactions with the user, sometimes assisted by external tools. However, current evaluation …

Save Cite Cited by 98 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Agent-as-a-judge: Evaluate agents with agents

M Zhuge, C Zhao, D Ashley, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records

W Shi, R Xu, Y Zhuang, Y Yu, J Zhang… - Proceedings of the …, 2024 - aclanthology.org

Clinicians often rely on data engineers to retrieve complex patient information from
electronic health record (EHR) systems, a process that is both inefficient and time …

Save Cite Cited by 16 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Advancing llm reasoning generalists with preference trees

L Yuan, G Cui, H Wang, N Ding, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art …

Save Cite Cited by 60 Related articles All 3 versions Free GPT-4 View as HTML

Generative AI Agents for Knowledge Work Augmentation in Finance

S Ganesh, L Ardon, D Borrajo, D Garg… - Annual Review of …, 2024 - annualreviews.org

The development of software agents that can autonomously take actions to achieve goals
has been a long-standing foundational objective in the field of AI. Recent advances in …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

Learning to use tools via cooperative and interactive agents

Z Shi, S Gao, X Chen, Y Feng, L Yan, H Shi… - arxiv preprint arxiv …, 2024 - arxiv.org

Tool learning empowers large language models (LLMs) as agents to use external tools and
extend their utility. Existing methods employ one single LLM-based agent to iteratively select …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Codexgraph: Bridging large language models and code repositories via code graph databases

X Liu, B Lan, Z Hu, Y Liu, Z Zhang, F Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and
MBPP, but struggle with handling entire code repositories. This challenge has prompted …

Save Cite Cited by 9 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Single Transformer for Scalable Vision-Language Modeling

Y Chen, X Wang, H Peng, H Ji - arxiv preprint arxiv:2407.06438, 2024 - arxiv.org

We present SOLO, a single transformer for Scalable visiOn-Language mOdeling. Current
large vision-language models (LVLMs) such as LLaVA mostly employ heterogeneous …

Save Cite Cited by 6 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Waitgpt: Monitoring and steering conversational llm agent in data analysis with on-the-fly code visualization

L **e, C Zheng, H **a, H Qu, C Zhu-Tian - Proceedings of the 37th …, 2024 - dl.acm.org

Large language models (LLMs) support data analysis through conversational user
interfaces, as exemplified in OpenAI's ChatGPT (formally known as Advanced Data Analysis …

Save Cite Cited by 4 Related articles All 5 versions Free GPT-4

Toolsandbox: A stateful, conversational, interactive evaluation benchmark for llm tool use capabilities

J Lu, T Holleis, Y Zhang, B Aumayer, F Nan… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent large language models (LLMs) advancements sparked a growing research interest
in tool assisted LLMs solving real-world challenges, which calls for comprehensive …

Save Cite Cited by 13 Related articles All 3 versions Free GPT-4 Cached

Create alert

Cite

Advanced search

Saved to My library

Executable code actions elicit better llm agents

Mint: Evaluating llms in multi-turn interaction with tools and language feedback

Agent-as-a-judge: Evaluate agents with agents

Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records

Advancing llm reasoning generalists with preference trees

Generative AI Agents for Knowledge Work Augmentation in Finance

Learning to use tools via cooperative and interactive agents

Codexgraph: Bridging large language models and code repositories via code graph databases

A Single Transformer for Scalable Vision-Language Modeling

Waitgpt: Monitoring and steering conversational llm agent in data analysis with on-the-fly code visualization

Toolsandbox: A stateful, conversational, interactive evaluation benchmark for llm tool use capabilities