Google Acadèmic

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Desa Cita Citat per 10 Articles relacionats Versió HTML

[Free GPT-4]

[PDF] arxiv.org

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

J Lee, D Hahm, JS Choi, WB Knox, K Lee - arxiv preprint arxiv …, 2024 - arxiv.org

Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …

Desa Cita Citat per 1 Articles relacionats Totes les 3 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] arxiv.org

AI Cyber Risk Benchmark: Automated Exploitation Capabilities

D Ristea, V Mavroudis, C Hicks - arxiv preprint arxiv:2410.21939, 2024 - arxiv.org

We introduce a new benchmark for assessing AI models' capabilities and risks in automated
software exploitation, focusing on their ability to detect and exploit vulnerabilities in real …

Desa Cita Citat per 1 Articles relacionats Versió HTML

[Free GPT-4]

[PDF] arxiv.org

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

R Sun, J Chang, H Pearce, C **ao, B Li, Q Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal foundation models (MFMs) represent a significant advancement in artificial
intelligence, combining diverse data modalities to enhance learning and understanding …

Desa Cita Articles relacionats Versió HTML

[Free GPT-4]

[PDF] arxiv.org

The AI Agent Index

S Casper, L Bailey, R Hunter, C Ezell, E Cabalé… - arxiv preprint arxiv …, 2025 - arxiv.org

Leading AI developers and startups are increasingly deploying agentic AI systems that can
plan and execute complex tasks with limited human involvement. However, there is currently …

Desa Cita Articles relacionats Versió HTML

[Free GPT-4]

[PDF] researchgate.net

[PDF][PDF] Benchmarking OpenAI o1 in Cyber Security

D Ristea, V Mavroudis, C Hicks - arxiv preprint arxiv:2410.21939, 2024 - researchgate.net

We evaluate OpenAI's o1-preview and o1-mini models, benchmarking their performance
against the earlier GPT-4o model. Our evaluation focuses on their ability to detect …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] rit.edu

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

A Maurya - 2024 - search.proquest.com

The exponential growth of data-intensive scientific simulations and deep learning workloads
presents significant challenges for high-performance computing (HPC) systems. These …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Agentharm: A benchmark for measuring harmfulness of llm agents

A Survey on LLM-as-a-Judge

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

AI Cyber Risk Benchmark: Automated Exploitation Capabilities

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

The AI Agent Index

[PDF][PDF] Benchmarking OpenAI o1 in Cyber Security

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads