Codenames as a Benchmark for Large Language Models
In this paper, we propose the use of the popular word-based board game Codenames as a
suitable benchmark for evaluating the reasoning capabilities of Large Language Models …
suitable benchmark for evaluating the reasoning capabilities of Large Language Models …