Identifying and mitigating the security risks of generative ai

C Barrett, B Boyd, E Bursztein, N Carlini… - … and Trends® in …, 2023 - nowpublishers.com
Every major technical invention resurfaces the dual-use dilemma—the new technology has
the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such …

Longbench: A bilingual, multitask benchmark for long context understanding

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …

Embrace divergence for richer insights: A multi-document summarization benchmark and a case study on summarizing diverse information from news articles

KH Huang, P Laban, AR Fabbri, PK Choubey… - arxiv preprint arxiv …, 2023 - arxiv.org
Previous research in multi-document news summarization has typically concentrated on
collating information that all sources agree upon. However, to our knowledge, the …

L-eval: Instituting standardized evaluation for long context language models

C An, S Gong, M Zhong, X Zhao, M Li, J Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, there has been growing interest in extending the context length of large language
models (LLMs), aiming to effectively process long inputs of one turn or conversations with …

Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training

J He, K Pan, X Dong, Z Song, LYB LiuYiBo… - Proceedings of the …, 2024 - aclanthology.org
While large language models (LLMs) are equipped with longer text input capabilities than
before, they are struggling to seek correct information in long contexts. The “lost in the …

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

C Zhang, LF D'Haro, Y Chen, M Zhang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Automatic evaluation is an integral aspect of dialogue system research. The traditional
reference-based NLG metrics are generally found to be unsuitable for dialogue assessment …

Never lost in the middle: Improving large language models via attention strengthening question answering

H Junqing, P Kunhao, D **aoqun, S Zhuoyang… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) are equipped with longer text input capabilities than
before, they are struggling to seek correct information in long contexts. The" lost in the …