A survey on evaluation of large language models
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …
industry, owing to their unprecedented performance in various applications. As LLMs …
Using large language models in psychology
Large language models (LLMs), such as OpenAI's GPT-4, Google's Bard or Meta's LLaMa,
have created unprecedented opportunities for analysing and generating language data on a …
have created unprecedented opportunities for analysing and generating language data on a …
A metaverse: Taxonomy, components, applications, and open challenges
SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …
based on the social value of Generation Z that online and offline selves are not different …
BLEURT: Learning robust metrics for text generation
Text generation has made significant advances in the last few years. Yet, evaluation metrics
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
HotpotQA: A dataset for diverse, explainable multi-hop question answering
Existing question answering (QA) datasets fail to train QA systems to perform complex
reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset …
reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset …
Towards a human-like open-domain chatbot
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and
filtered from public domain social media conversations. This 2.6 B parameter neural network …
filtered from public domain social media conversations. This 2.6 B parameter neural network …
Chateval: Towards better llm-based evaluators through multi-agent debate
Text evaluation has historically posed significant challenges, often demanding substantial
labor and time cost. With the emergence of large language models (LLMs), researchers …
labor and time cost. With the emergence of large language models (LLMs), researchers …
All that's' human'is not gold: Evaluating human evaluation of generated text
Human evaluations are typically considered the gold standard in natural language
generation, but as models' fluency improves, how well can evaluators detect and judge …
generation, but as models' fluency improves, how well can evaluators detect and judge …
MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance
A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …
systems. A desirable metric compares system output against references based on their …
[HTML][HTML] Advances and challenges in conversational recommender systems: A survey
Recommender systems exploit interaction history to estimate user preference, having been
heavily used in a wide range of industry applications. However, static recommendation …
heavily used in a wide range of industry applications. However, static recommendation …