Google znalac

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Spremi Citiraj Spominje se 25 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

J Chen, X Li, X Ye, C Li, Z Fan, H Zhao - arxiv preprint arxiv:2404.04363, 2024 - arxiv.org

With the success of 2D diffusion models, 2D AIGC content has already transformed our lives.
Recently, this success has been extended to 3D AIGC, with state-of-the-art methods …

Spremi Citiraj Spominje se 4 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The AI Agent Index

S Casper, L Bailey, R Hunter, C Ezell, E Cabalé… - arxiv preprint arxiv …, 2025 - arxiv.org

Leading AI developers and startups are increasingly deploying agentic AI systems that can
plan and execute complex tasks with limited human involvement. However, there is currently …

Spremi Citiraj Spominje se 1 puta Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

J Lee, D Hahm, JS Choi, WB Knox, K Lee - arxiv preprint arxiv …, 2024 - arxiv.org

Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …

Spremi Citiraj Spominje se 1 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AI Cyber Risk Benchmark: Automated Exploitation Capabilities

D Ristea, V Mavroudis, C Hicks - arxiv preprint arxiv:2410.21939, 2024 - arxiv.org

We introduce a new benchmark for assessing AI models' capabilities and risks in automated
software exploitation, focusing on their ability to detect and exploit vulnerabilities in real …

Spremi Citiraj Spominje se 1 puta Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

S Wang, G Zhang, M Yu, G Wan, F Meng, C Guo… - arxiv preprint arxiv …, 2025 - arxiv.org

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated
remarkable capabilities in various complex tasks, ranging from collaborative problem …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Science of Evaluating Foundation Models

J Yuan, J Zhang, A Wen, X Hu - arxiv preprint arxiv:2502.09670, 2025 - arxiv.org

The emergent phenomena of large foundation models have revolutionized natural language
processing. However, evaluating these models presents significant challenges due to their …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

R Sun, J Chang, H Pearce, C **ao, B Li, Q Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal foundation models (MFMs) represent a significant advancement in artificial
intelligence, combining diverse data modalities to enhance learning and understanding …

Spremi Citiraj Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

K Zhou, C Liu, X Zhao, S Jangam, J Srinivasa… - arxiv preprint arxiv …, 2025 - arxiv.org

The rapid development of large reasoning models, such as OpenAI-o3 and DeepSeek-R1,
has led to significant improvements in complex reasoning over non-reasoning large …

Spremi Citiraj Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers

D Beaglehole, A Radhakrishnan, E Boix-Adserà… - arxiv preprint arxiv …, 2025 - arxiv.org

A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is
difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always``know …

Spremi Citiraj Srodni članci Svih 2 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Agentharm: A benchmark for measuring harmfulness of llm agents

A Survey on LLM-as-a-Judge

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

The AI Agent Index

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

AI Cyber Risk Benchmark: Automated Exploitation Capabilities

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

The Science of Evaluating Foundation Models

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers