6 Comments

Great ideas! Your ideas as HC as a sequence of sensory perception learning makes so much more sense.

I have a question regarding HC in humans which I suppose handle learning sensory as well as more abstract "features" sequences. I'm also assuming intelligence as repurposing navigation capabilities in a mental representations hyperspace, and problem solving as finding a mental "way" to an abstact "target".

In your paper you give the HC planning and schema transfer capabilites ie, requirements for general intelligence. In patient H.M specifically, even without HC, intelligence was normal (in 1967 milner paper, they even saw arithmetic capabities improvement) so he was able to use previously learned sequence of abstract representations to solve daily tasks . How do you think he was able to achieve it without an HC ?

Expand full comment

This is something I'm still figuring out from the literature. Just started reading about it.

Expand full comment

Absolutely lovely visuals and explanations. I've thought a lot about this too, but I'm glad someone has tackled it more thoroughly.

To turn sequence learning into a map, does the agent require exhaustive exploration of the environment, or can it somehow extrapolate or predict its destination if it, say, takes a short cut? I always struggled with this missing component from the discussions at Numenta and have been working on trying to find a solution.

Not all input spaces can be exhaustively explored like the rate maze, so we would need some form of vector addition or cognitive space interpolation to navigate complex environments. Vectors don't seem to exist in the brain like we hope (directions and magnitudes do), so the tools like sequence learning, perceptual clones, place cells, and grid cells need to be combined somehow to enable this spatial interpolation.

Expand full comment

Thank you!

Schemas would be one way to speed up learning to explore unknown environments. Have a look at the schemas section in the write up. Schemas can also be used hierarchically for exploration and planning.

Expand full comment

Interesting! Not quite what I had in mind, but it was still a very illuminating approach. We can revisit my question later.

I can tell that your latent graphs are a kind of "computer-sciencey" way of describing cortical minicolumns :). I too have come up with several different names for this. The latest I used was cribbed from HMMs. In particular, your observations I would call "observable states" and your clones I called "hidden states". Then you could do a lot of mathematical nonsense like taking the Cartesian product of observable and hidden states to build all possible elements, and then the powerset of that is the space of all possible activations of the network. This diversion led to comfortable mathematical notation but not very much insight except for things like capacity.

I like your exposition on your version of schemas and how to accomplish a binding. I have actually despaired to understand how role-binding could work in cortical circuits, but your approach has shown me the structural similarity between roles and knowledge generalization.

# Questions:

1) Would you view your concept of the emissions matrix to be similar to the role of the thalamus for routing sensory data to the cortex?

I too have thought of this, but you could also use other cortical columns to perform this role of emissions matrix. It's essentially mapping new data onto a trained computational unit. There needs to be some mechanism that keeps track of all the available trained mappings and either selects the appropriate one for the situation or learns a brand new mapping.

2) In these demos, is their an "action" component to the state transitions, or is it just based on time+observations?

It seems that you must. I'm inferring that you are selecting from a set of 4 possible discrete actions, all with the same step distance with no possibility of error. From there, you are using the implicit topology of the learned graph for your navigational needs. There is no inherent sense of "direction" in this representation, just traversing a graph. (Boo)

How are you ensuring that you have only one state anchored per discrete location? Do you have a latent state merge step or are you keep tracking of your position to choose the proper latent state?

# End of Questions

With regard to grid cells, I see that you're inclined to treat grid cells as unreliable indicators of position. In this, we agree. I've thought a lot about what grid cells are actually for, and how they might be used in the context of building discrete cortical networks like your CSCGs. I have a lot of ideas on this.

The thing I've been stuck on is how grid cell modules are changed based on action inputs. That is, they could be hardwired with a fixed set of actions (like your 4 discrete steps), but how would we create, learn, and manage an "emissions matrix" for multi-modal motor efferents and route these to a fixed set of grid cell modules? And from the motor-emissions+grid-module system, can we compute a required motor action to go from one grid state to another? This relates to my first question about whether you can interpolate an action based on a desired state (a kind of vector computation).

The architectural questions are very interesting.

Expand full comment

Yes...the connection to HMM is quite explicit actually..you can see it in our paper.

1) I'm not sure emission matrix is similar to thalamus.... emission matrices would map to the dentate gyrus. I can see a similarity in terms of the gating mechanism (which I haven't explained at all in the current paper...it comes into play when multiple emission bindings are involved).

See this paper for what I think about the thalamus: https://www.biorxiv.org/content/10.1101/2020.09.09.290601v1.full

2) Yes, all the arrows going from clones to clones are labelled by the actions.

Different action labels help, but interestingly the latent topology can be learned even when actions are not available (equivalent to saying there is only one action -- "next"), if the aliasing is not too much. Of course having different action labels make the learning quicker and better.

We have experimented with actions being noisy, and it is reasonably robust to such things. This is not described in the current paper of course. Also, the actions do not have to land the agent in exactly the same spots..(will not happen if actions do not transport you exact distances every time).

"How are you ensuring that you have only one state anchored per discrete location? Do you have a latent state merge step or are you keep tracking of your position to choose the proper latent state?"

No, we are not keeping track of the position -- that would be cheating.

We do the standard soft-EM learning followed by a viterbi-EM. This results in a consolidation of the graphs. Have a look at the supplementary material of the paper...there are some figures that show how this works.

I'll think about the grid cells more...

Expand full comment