source: https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart.html
Increasing efficiency is permeating almost every aspect of our society. If the thing that is being made more efficient is beneficial, then the increased efficiency makes the world a better place (overall, the world seems to be becoming a better place). If the thing that is being made more efficient is socially harmful, then the consequences of greater efficiency are scary or depressing (think mass surveillance, or robotic weapons). What about the most common case though — where the thing we are making more efficient is related, but not identical, to beneficial outcomes? What happens when we get better at something which is merely correlated with outcomes we care about?
In that case, we can overfit, the same as we do in machine learning. The outcomes we care about will improve for a while … and then they will grow dramatically worse.
Below are a few, possibly facile, examples applying this analogy.
Goal: Educate children well
Proxy: Measure student and school performance on standardized tests
Strong version of Goodhart’s law leads to: Schools narrowly focus on teaching students to answer questions like those on the test, at the expense of the underlying skills the test is intended to measure
Goal: Rapid progress in science
Proxy: Pay researchers a cash bonus for every publication
Strong version of Goodhart’s law leads to: Publication of incorrect or incremental results, collusion between reviewers and authors, research paper mills
Goal: A well-lived life
Proxy: Maximize the reward pathway in the brain
Strong version of Goodhart’s law leads to: Substance addiction, gambling addiction, days lost to doomscrolling Twitter
Goal: Healthy population
Proxy: Access to nutrient-rich food
Strong version of Goodhart’s law leads to: Obesity epidemic
Goal: Leaders that act in the best interests of the population
Proxy: Leaders that have the most support in the population
Strong version of Goodhart’s law leads to: Leaders whose expertise and passions center narrowly around manipulating public opinion at the expense of social outcomes
Goal: An informed, thoughtful, and involved populace
Proxy: The ease with which people can share and find ideas
Strong version of Goodhart’s law leads to: Filter bubbles, conspiracy theories, parasitic memes, escalated tribalism
Goal: Distribution of labor and resources based upon the needs of society
Proxy: Capitalism
Strong version of Goodhart’s law leads to: Massive wealth disparities (with incomes ranging from hundreds of dollars per year to hundreds of dollars per second), with more than a billion people living in poverty
Goal: The owners of Paperclips Unlimited, LLC, become wealthy
Proxy: Number of paperclips made by the AI-run manufacturing plant
Strong version of Goodhart’s law leads to: The entire solar system, including the company owners, being converted to paperclips