Feedback loops with language models drive in-context reward hacking
A Pan, E Jones, M Jagadeesan, J Steinhardt - ar** the Design Space of Teachable Social Media Feed Experiences
Social media feeds are deeply personal spaces that reflect individual values and
preferences. However, top-down, platform-wide content algorithms can reduce users' sense …
preferences. However, top-down, platform-wide content algorithms can reduce users' sense …