1 Comment
⭠ Return to thread

Interested to hear your reasoning! In particular, whether you feel the model architecture aren't able to theoretically store these sorts of models or whether they just aren't achievable to be built using gradient descent etc.

Expand full comment