Welcome to the exciting dirigibles era of AI

Dileep George

Mar 30, 2023

Notes for navigating large language models and beyond...

Read →

19 Comments

David Rimshnick

Mar 30, 2023

So good! I learned so much though now will spend rest of my night going down a rabbit hole on dirigibles :)

Question though - isn't there a distinction in that, in scaling LLMs, you get emergent effects (like induction heads etc). Does the analogy hold?

Expand full comment

Reply (2)

Gary Lupyan

Mar 30, 2023

I think the idea is that the same abilities may emerge with much less training if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input

Expand full comment

Gary Lupyan

Mar 30, 2023

I think the idea is that the same abilities may emerge with much less training/compute if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input

Expand full comment

Reply (1)

Dileep George

Mar 30, 2023

Yes, as Gary mentions, similar abilities but with broader generalization can emerge with less training.

Emergence is treated as some kind of unexplainable magic right now because we don't understand all the mechanisms. One could say that 'better stability emerged in larger blimps'. We don't say that because we know how/why it happens. I think we will figure out some of the reasons for the emergent capabilities.

Expand full comment

Reply (1)

Gary Lupyan

Mar 30, 2023Edited

I feel like we have a conceptual-level understanding of what's happening w/ these models (though judging by a lot of the discourse, it seems like a (very?) counter-intuitive idea: In the course of predicting words, a system ends up learning world models (outlines of physical laws, topics of conversation, a ton of common sense reasoning) -- not because we necessarily talk about these things but because inferring such models helps make sense of the observed language and therefore lower prediction error. Humans too do this when we learn, but arguably not to the same extent and instead we learn much more by actively trying things out and observing the consequences.

Expand full comment

Reply (1)

Dileep George

Mar 30, 2023Edited

I somewhat disagree about that. I don't think the model infers physical laws. Those are all expressed in words online, and sophisticated word filling-in using statistics is good enough to look like physical reasoning. (Not downplaying statistics in any way...I think it is cool). I don't think physical laws are inferable from words, but ways of filling in words describing physical laws can be learned from text describing physical laws.

What the model has is interesting generalization patterns for sequences, and when applied over language it looks smart and looks like it is inferring things. It learns small automata that can mix slots and content, because the architecture has biases for forcing it that way. I will write more about this soon...hopefully as a paper.

Expand full comment

Reply (1)

David Rimshnick

Apr 4, 2023

Interested to hear your reasoning! In particular, whether you feel the model architecture aren't able to theoretically store these sorts of models or whether they just aren't achievable to be built using gradient descent etc.

Expand full comment

Dmitri Chklovskii

Mar 30, 2023

Spot on, Dileep! Couldn't agree more!

Expand full comment

Magi_in_the_mountains

Apr 8, 2023

great article, I am very fond of your analysis and insights regarding AI and especially from brain based AI approaches. We don't know how the GPT_N++ will work out in a decade etc but it is my intuition that these systems will not create the value we think they would, we will need humans to shape them consistently and constantly, guide them and create narrow guidelines. Of course when no reliability is needed they will look magical and general AI's but for large time (t) they will show their problems and inabilities.

Expand full comment

David

Apr 4, 2023

Thank you for the welcome!

Great read with insightful information that enlightens the mind. Recently I tried my hand at some tech writing, AI is coming and I thought I’d put together a rounded article—would love your eyes and insights 😊 https://tumbleweedwords.substack.com/p/ai-enters-our-everyday-reflections

Expand full comment

mrbeastly

Apr 3, 2023

Good article. Found one typo:

1919: First non-stop transatlantic airplane fight. ->

1919: First non-stop transatlantic airplane flight.

Expand full comment

Rajaram

Apr 2, 2023

Thank you Dileep for bringing out a very different aspect which people outside the pure AI world did not realise...looking forward to the next blog

Expand full comment

Cezar T

Apr 2, 2023

Ok, should we try again to make a paper plane that actually flies?

If there are yet to be found principles/ideas that might work on a much smaller scale (and later scaled up), investigating that area shouldn't be too expensive.

Expand full comment

Reply (1)

Dileep George

Apr 3, 2023

of course yes.

Expand full comment

Mar 31, 2023

During the dirigible days, were there people who said, "No one will find the principles of aerodynamics" like some people today (me for example!) who say, "We will never find the principles of intelligence"? - all we can hope to do is "dirigibles"!

It is a bunch (huge bunch) of small things put together that acquires an emergent property that we call "intelligence" . . .?

Expand full comment

anhinga

Aug 8, 2023

>there are also avenues to combine the strengths of current approaches with an investigation into future architectures.

Yes, hybrids should be powerful. Especially when new architectures can read from the intermediate layers of LLMs (and, perhaps, write to them).

This is where this really beautiful analogy with LLMs as dirigibles stops working. A hybrid between an airplane and a dirigible is not a promising idea. A hybrid between a shiny new architecture and an LLM might be what one needs to progress from excelling in "toy problems" to convincingly beating stand-alone LLMs across the board.

Expand full comment

Oleg Alexandrov

May 7, 2023

I agree that scaling up LLM will hit a dead end. But people are going way beyond that. They are using LLM as language interface between tools, and those tools know what they are doing individually. They are reinforcing LLM with human feedback, and a framework that can be taught by human means rather than explicit coding is a huge deal. People are adding verification to LLM, so you could tell it, that a portion of its output is wrong, so it needs to be redone. It is too early to early to say that it won't grow.

Expand full comment

lukaemon

Apr 13, 2023

Looking forward to this line of work!!!

Expand full comment

Dad

Apr 12, 2023

Great analogy and helpful perspective for evaluating and extrapolating the performance of these exciting large ML models, compared to what we would really like to have.

Expand full comment

Artificial General Ideas

Welcome to the exciting dirigibles era of AI