Evaluation of openai o1: Opportunities and challenges of agi

T Zhong, Z Liu, Y Pan, Y Zhang, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
This comprehensive study evaluates the performance of OpenAI's o1-preview large
language model across a diverse array of complex reasoning tasks, spanning multiple …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

A reasoning and value alignment test to assess advanced gpt reasoning

TR McIntosh, T Liu, T Susnjak, P Watters… - ACM Transactions on …, 2024 - dl.acm.org
In response to diverse perspectives on artificial general intelligence (AGI), ranging from
potential safety and ethical concerns to more extreme views about the threats it poses to …

Hallucination-free? assessing the reliability of leading ai legal research tools

V Magesh, F Surani, M Dahl, M Suzgun… - arxiv preprint arxiv …, 2024 - arxiv.org
Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI).
Such tools are designed to assist with a wide range of core legal tasks, from search and …

Or-bench: An over-refusal benchmark for large language models

J Cui, WL Chiang, I Stoica, CJ Hsieh - arxiv preprint arxiv:2405.20947, 2024 - arxiv.org
Large Language Models (LLMs) require careful safety alignment to prevent malicious
outputs. While significant research focuses on mitigating harmful content generation, the …

Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models

GI Meadows, NWL Lau, EA Susanto, CL Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
The proliferation of large language models (LLMs) requires robust evaluation of their
alignment with local values and ethical standards, especially as existing benchmarks often …

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org
To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

Automated summarization of multiple document abstracts and contents using large language models

O Langston, B Ashford - Authorea Preprints, 2024 - techrxiv.org
The exponential growth of textual data across various domains necessitates the
development of efficient and accurate summarization techniques to facilitate quick …