Lexeval: A comprehensive chinese legal benchmark for evaluating large language models

H Li, Y Chen, Q Ai, Y Wu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Large language models (LLMs) have made significant progress in natural language
processing tasks and demonstrate considerable potential in the legal domain. However …

A survey on large language models for critical societal domains: Finance, healthcare, and law

ZZ Chen, J Ma, X Zhang, N Hao, A Yan… - arxiv preprint arxiv …, 2024 - arxiv.org
In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as
GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law …

A comprehensive survey on generative AI for metaverse: enabling immersive experience

V Chamola, S Sai, A Bhargava, A Sahu, W Jiang… - Cognitive …, 2024 - Springer
Abstract Generative Artificial Intelligence models are Artificial Intelligence models that
generate new content based on a prompt or input. The output content can be in various …

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

[HTML][HTML] Pre-trained language models for keyphrase prediction: A review

U Muhammad, T Sultana, YK Lee - ICT Express, 2024 - Elsevier
Keyphrase Prediction (KP) is essential for identifying keyphrases in a document that can
summarize its content. However, recent Natural Language Processing (NLP) advances have …

Exploiting privacy vulnerabilities in open source llms using maliciously crafted prompts

G Choquet, A Aizier, G Bernollin - 2024 - researchsquare.com
The proliferation of AI technologies has brought to the forefront concerns regarding the
privacy and security of user data, particularly with the increasing deployment of powerful …

Multi-modal and multi-agent systems meet rationality: A survey

B Jiang, Y **e, X Wang, WJ Su, CJ Taylor… - ICML 2024 Workshop …, 2024 - openreview.net
Rationality is characterized by logical thinking and decision-making that align with evidence
and logical rules. This quality is essential for effective problem-solving, as it ensures that …

[HTML][HTML] Potential of multimodal large language models for data mining of medical images and free-text reports

Y Zhang, Y Pan, T Zhong, P Dong, K **e, Y Liu, H Jiang… - Meta-Radiology, 2024 - Elsevier
Medical images and radiology reports are essential for physicians to diagnose medical
conditions. However, the vast diversity and cross-source heterogeneity inherent in these …

Rupbench: Benchmarking reasoning under perturbations for robustness evaluation in large language models

Y Wang, Y Zhao - arxiv preprint arxiv:2406.11020, 2024 - arxiv.org
With the increasing use of large language models (LLMs), ensuring reliable performance in
diverse, real-world environments is essential. Despite their remarkable achievements, LLMs …

Programming refusal with conditional activation steering

BW Lee, I Padhi, KN Ramamurthy, E Miehling… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs have shown remarkable capabilities, but precisely controlling their response behavior
remains challenging. Existing activation steering methods alter LLM behavior …