Feedback loops with language models drive in-context reward hacking

A Pan, E Jones, M Jagadeesan, J Steinhardt - ar** the Design Space of Teachable Social Media Feed Experiences
KJK Feng, X Koo, L Tan, A Bruckman… - Proceedings of the CHI …, 2024 - dl.acm.org
Social media feeds are deeply personal spaces that reflect individual values and
preferences. However, top-down, platform-wide content algorithms can reduce users' sense …