A survey on evaluation of large language models
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …
industry, owing to their unprecedented performance in various applications. As LLMs …
Gemma 2: Improving open language models at a practical size
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
AI generates covertly racist decisions about people based on their dialect
Hundreds of millions of people now interact with language models, with uses ranging from
help with writing, to informing hiring decisions. However, these language models are known …
help with writing, to informing hiring decisions. However, these language models are known …
Larger and more instructable language models become less reliable
The prevailing methods to make large language models more powerful and amenable have
been based on continuous scaling up (that is, increasing their size, data volume and …
been based on continuous scaling up (that is, increasing their size, data volume and …
Fairness in serving large language models
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …
requests from short chat conversations to long document reading. To ensure that all client …
Online speculative decoding
Speculative decoding is a pivotal technique to accelerate the inference of large language
models (LLMs) by employing a smaller draft model to predict the target model's outputs …
models (LLMs) by employing a smaller draft model to predict the target model's outputs …
Introducing v0. 5 of the ai safety benchmark from mlcommons
This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
Power-aware Deep Learning Model Serving with {μ-Serve}
With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …
pressing need to reduce the energy consumption of a model-serving cluster while …
Agentohana: Design unified data and training pipeline for effective agent learning
Autonomous agents powered by large language models (LLMs) have garnered significant
research attention. However, fully harnessing the potential of LLMs for agent-based tasks …
research attention. However, fully harnessing the potential of LLMs for agent-based tasks …
Generative language models exhibit social identity biases
Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity)
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …