Discussion about this post

User's avatar
Frans Zdyb's avatar

Great post! I highly recommend looking into the literature on assistance games, which is a specific proposal for how to get AI to infer our intentions, rather than optimize a prespecified reward. See eg https://arxiv.org/abs/1606.03137

I think this area will become very relevant very soon, not just from a safety perspective, but even just for expanding the set of tasks that AI can take on - as you mention, reward design is not a great strategy.

Expand full comment
Melanie Mitchell's avatar

Nice essay.

"While AI understands your end goal, its lack of commonsense means that it didn’t understand that it was not supposed to destroy the earth in the process as a sub-goal."

Many "AI safety" folks have countered that the issue is *not* that the hypothetical superintelligent AI doesn't *understand* that this is not what humans intended, but that it is programmed or trained to only *care* about goal/subgoals that it is explicitly given. I personally don't find this plausible, but this is how people have responded to me when I have critiqued such hypotheticals.

Anyway, great discussion.

Expand full comment
12 more comments...

No posts