A Survey on LLM-as-a-Judge
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …
Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control
Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …
assistive tasks across various domains, including mobile device control. As these agents …
AI Cyber Risk Benchmark: Automated Exploitation Capabilities
We introduce a new benchmark for assessing AI models' capabilities and risks in automated
software exploitation, focusing on their ability to detect and exploit vulnerabilities in real …
software exploitation, focusing on their ability to detect and exploit vulnerabilities in real …
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach
Multimodal foundation models (MFMs) represent a significant advancement in artificial
intelligence, combining diverse data modalities to enhance learning and understanding …
intelligence, combining diverse data modalities to enhance learning and understanding …
The AI Agent Index
Leading AI developers and startups are increasingly deploying agentic AI systems that can
plan and execute complex tasks with limited human involvement. However, there is currently …
plan and execute complex tasks with limited human involvement. However, there is currently …
[PDF][PDF] Benchmarking OpenAI o1 in Cyber Security
We evaluate OpenAI's o1-preview and o1-mini models, benchmarking their performance
against the earlier GPT-4o model. Our evaluation focuses on their ability to detect …
against the earlier GPT-4o model. Our evaluation focuses on their ability to detect …
Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads
A Maurya - 2024 - search.proquest.com
The exponential growth of data-intensive scientific simulations and deep learning workloads
presents significant challenges for high-performance computing (HPC) systems. These …
presents significant challenges for high-performance computing (HPC) systems. These …