- Academic Search

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Save Cite Cited by 2071 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] jeichstaedt.com

Using large language models in psychology

D Demszky, D Yang, DS Yeager, CJ Bryan… - Nature Reviews …, 2023 - nature.com

Large language models (LLMs), such as OpenAI's GPT-4, Google's Bard or Meta's LLaMa,
have created unprecedented opportunities for analysing and generating language data on a …

Save Cite Cited by 191 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

Save Cite Cited by 1849 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

BLEURT: Learning robust metrics for text generation

T Sellam, D Das, AP Parikh - arxiv preprint arxiv:2004.04696, 2020 - arxiv.org

Text generation has made significant advances in the last few years. Yet, evaluation metrics
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …

Save Cite Cited by 1491 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

HotpotQA: A dataset for diverse, explainable multi-hop question answering

Z Yang, P Qi, S Zhang, Y Bengio, WW Cohen… - arxiv preprint arxiv …, 2018 - arxiv.org

Existing question answering (QA) datasets fail to train QA systems to perform complex
reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset …

Save Cite Cited by 2363 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Towards a human-like open-domain chatbot

D Adiwardana, MT Luong, DR So, J Hall… - arxiv preprint arxiv …, 2020 - arxiv.org

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and
filtered from public domain social media conversations. This 2.6 B parameter neural network …

Save Cite Cited by 1167 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Chateval: Towards better llm-based evaluators through multi-agent debate

CM Chan, W Chen, Y Su, J Yu, W Xue, S Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Text evaluation has historically posed significant challenges, often demanding substantial
labor and time cost. With the emergence of large language models (LLMs), researchers …

Save Cite Cited by 332 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

All that's' human'is not gold: Evaluating human evaluation of generated text

E Clark, T August, S Serrano, N Haduong… - arxiv preprint arxiv …, 2021 - arxiv.org

Human evaluations are typically considered the gold standard in natural language
generation, but as models' fluency improves, how well can evaluators detect and judge …

Save Cite Cited by 412 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

W Zhao, M Peyrard, F Liu, Y Gao, CM Meyer… - arxiv preprint arxiv …, 2019 - arxiv.org

A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …

Save Cite Cited by 652 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Advances and challenges in conversational recommender systems: A survey

C Gao, W Lei, X He, M de Rijke, TS Chua - AI open, 2021 - Elsevier

Recommender systems exploit interaction history to estimate user preference, having been
heavily used in a wide range of industry applications. However, static recommendation …

Save Cite Cited by 301 Related articles All 8 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Why we need new evaluation metrics for NLG

A survey on evaluation of large language models

Using large language models in psychology

A metaverse: Taxonomy, components, applications, and open challenges

BLEURT: Learning robust metrics for text generation

HotpotQA: A dataset for diverse, explainable multi-hop question answering

Towards a human-like open-domain chatbot

Chateval: Towards better llm-based evaluators through multi-agent debate

All that's' human'is not gold: Evaluating human evaluation of generated text

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

[HTML][HTML] Advances and challenges in conversational recommender systems: A survey