2 Comments
⭠ Return to thread

Great post! I highly recommend looking into the literature on assistance games, which is a specific proposal for how to get AI to infer our intentions, rather than optimize a prespecified reward. See eg https://arxiv.org/abs/1606.03137

I think this area will become very relevant very soon, not just from a safety perspective, but even just for expanding the set of tasks that AI can take on - as you mention, reward design is not a great strategy.

Expand full comment

Thanks, and thank you for the pointer. Looking into it.

Expand full comment