Yes, LLM's do not have the human-level capacity to produce and utilize mental models. And it seems likely that if AGI is to be developed, it will have to include such a capacity for mental modelling.

How might such a capacity be installed in AI? A recent paper of mine that was published in the Journal Biosystems outlines how mental modelling evolved and develops in humans. This understanding can facilitate insights into how this capacity could be incorporated into AI. The paper titled "The Evolution and Development of Consciousness: The Subject-Object emergence Hypothesis" is freely available here: https://www.sciencedirect.com/science/article/pii/S0303264722000752

Expand full comment

Awesome article, my question is does abstract concepts or reasoning share more structure with sensorimotor domain or language domain?

Expand full comment
Aug 8, 2023Liked by Dileep George

Interesting post. I agree with your general framing: people constantly make reference to mental models that are developed in large part through physical experiences, not acquired from language.

However, I believe that many useful mental models of the physical CAN be constructed from language alone. Assertions that this is impossible typically seem to be based on thinking of the form, "Well, *I* don't see how it can be done, so it must be impossible..." That line of reasoning seems inadequate to me, and I hope you won't fall prey to that!

More compelling to me are experiments with toy models trained on (somewhat contrived) language from which one can extract and validate the learned mental model from the model parameters. Here are a few examples:

- Given a list statements of the form "San Francisco is west of Reno" that give the position of one US city relative to another, a simple model generates a reasonably accurate map, which can be extracted from the model parameters and shown graphically. This seems to be exactly the sort of mental model that people use to reason about the physical world.

- Extending this example, if a model is then given statements of the form "Fargo is in North Dakota", a simple model learns state boundaries. After the city positions and boundaries are learned, the model can quite accurately guess which state contains a city outside this second-stage training data. Again, this "mental model" map can be extracted and displayed graphically. A human might acquire such a model by walking, driving, or looking at a map, but language alone is actually sufficient.

- Given a list of simple arithmetic assertions, even a toy language model develops algorithms to perform basic arithmetic. In some cases, these algorithms are comprehensible enough to extract and explain. This works even though the model starts with no a priori notion of quantity or addition, which a human might acquire through physical experience, e.g. seeing a group of two sheep join a group of three sheep.

In summary, I believe that learning mental models of the physical world from language alone is MUCH HARDER than learning from language together with physical experience, but I've seen no evidence that this is impossible. To the contrary, available evidence proves this is possible in some nontrivial cases, and I expect many more examples will follow.

Expand full comment
Aug 3, 2023Liked by Dileep George

Hi, thanks for keep sharing.

The real question is not as much whether a LLM trained of text acquires the same world model as humans, but rather if the generic prediction capabilities of LLMs can also work with sensory-motor data streams besides text.

Experiments with transformers integrating image data with text look promising in this regard.

To fully answer the question, the challenge would lie in designing/engineer the "correct" encoding of such streams and the (simulated I guess) body-world-perception-action "experiencing" to feed a LLM.

Even your work on short-sequence world models suggest that everything (not only words) when encoded (and remembered) as a set of short sequences ("perceptual phrases" if you like), the agent has to learn the sequences "stitching" or proximity relationships in order to build a convincing world model.

So I wouldn't wager yet on LLMs ability to build a world model, if it is fed with the right training data.

Expand full comment

Great article! Thank you!

You may add as examples hearing a recipe and then following it, hearing how to reach a distant place and then reaching it - mental models in the beginning and in the end will be different.

If you add primitives related to each facet of sensorimotor activities to my model (https://ling.auf.net/lingbuzz/007345 and https://alexandernaumenko.substack.com/) it could be a good starting point or a boost for your research. I would love to join!

Expand full comment

I think recent trends of combining LLMs with external tools, like calculators and simulators, is a step towards building that pyramid you drew , but starting from the to. This is the opposite order of evolution, which started with locomotion, then sensation, then planning, and finally communication.

Expand full comment