I think the idea is that the same abilities may emerge with much less training if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input
I think the idea is that the same abilities may emerge with much less training/compute if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input
Yes, as Gary mentions, similar abilities but with broader generalization can emerge with less training.
Emergence is treated as some kind of unexplainable magic right now because we don't understand all the mechanisms. One could say that 'better stability emerged in larger blimps'. We don't say that because we know how/why it happens. I think we will figure out some of the reasons for the emergent capabilities.
I feel like we have a conceptual-level understanding of what's happening w/ these models (though judging by a lot of the discourse, it seems like a (very?) counter-intuitive idea: In the course of predicting words, a system ends up learning world models (outlines of physical laws, topics of conversation, a ton of common sense reasoning) -- not because we necessarily talk about these things but because inferring such models helps make sense of the observed language and therefore lower prediction error. Humans too do this when we learn, but arguably not to the same extent and instead we learn much more by actively trying things out and observing the consequences.
I somewhat disagree about that. I don't think the model infers physical laws. Those are all expressed in words online, and sophisticated word filling-in using statistics is good enough to look like physical reasoning. (Not downplaying statistics in any way...I think it is cool). I don't think physical laws are inferable from words, but ways of filling in words describing physical laws can be learned from text describing physical laws.
What the model has is interesting generalization patterns for sequences, and when applied over language it looks smart and looks like it is inferring things. It learns small automata that can mix slots and content, because the architecture has biases for forcing it that way. I will write more about this soon...hopefully as a paper.
Interested to hear your reasoning! In particular, whether you feel the model architecture aren't able to theoretically store these sorts of models or whether they just aren't achievable to be built using gradient descent etc.
great article, I am very fond of your analysis and insights regarding AI and especially from brain based AI approaches. We don't know how the GPT_N++ will work out in a decade etc but it is my intuition that these systems will not create the value we think they would, we will need humans to shape them consistently and constantly, guide them and create narrow guidelines. Of course when no reliability is needed they will look magical and general AI's but for large time (t) they will show their problems and inabilities.
Ok, should we try again to make a paper plane that actually flies?
If there are yet to be found principles/ideas that might work on a much smaller scale (and later scaled up), investigating that area shouldn't be too expensive.
During the dirigible days, were there people who said, "No one will find the principles of aerodynamics" like some people today (me for example!) who say, "We will never find the principles of intelligence"? - all we can hope to do is "dirigibles"!
It is a bunch (huge bunch) of small things put together that acquires an emergent property that we call "intelligence" . . .?
>there are also avenues to combine the strengths of current approaches with an investigation into future architectures.
Yes, hybrids should be powerful. Especially when new architectures can read from the intermediate layers of LLMs (and, perhaps, write to them).
This is where this really beautiful analogy with LLMs as dirigibles stops working. A hybrid between an airplane and a dirigible is not a promising idea. A hybrid between a shiny new architecture and an LLM might be what one needs to progress from excelling in "toy problems" to convincingly beating stand-alone LLMs across the board.
I agree that scaling up LLM will hit a dead end. But people are going way beyond that. They are using LLM as language interface between tools, and those tools know what they are doing individually. They are reinforcing LLM with human feedback, and a framework that can be taught by human means rather than explicit coding is a huge deal. People are adding verification to LLM, so you could tell it, that a portion of its output is wrong, so it needs to be redone. It is too early to early to say that it won't grow.
Great analogy and helpful perspective for evaluating and extrapolating the performance of these exciting large ML models, compared to what we would really like to have.
So good! I learned so much though now will spend rest of my night going down a rabbit hole on dirigibles :)
Question though - isn't there a distinction in that, in scaling LLMs, you get emergent effects (like induction heads etc). Does the analogy hold?
I think the idea is that the same abilities may emerge with much less training if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input
I think the idea is that the same abilities may emerge with much less training/compute if, for example, the models are allowed to explore virtual worlds rather than just learning from static/passive input
Yes, as Gary mentions, similar abilities but with broader generalization can emerge with less training.
Emergence is treated as some kind of unexplainable magic right now because we don't understand all the mechanisms. One could say that 'better stability emerged in larger blimps'. We don't say that because we know how/why it happens. I think we will figure out some of the reasons for the emergent capabilities.
I feel like we have a conceptual-level understanding of what's happening w/ these models (though judging by a lot of the discourse, it seems like a (very?) counter-intuitive idea: In the course of predicting words, a system ends up learning world models (outlines of physical laws, topics of conversation, a ton of common sense reasoning) -- not because we necessarily talk about these things but because inferring such models helps make sense of the observed language and therefore lower prediction error. Humans too do this when we learn, but arguably not to the same extent and instead we learn much more by actively trying things out and observing the consequences.
I somewhat disagree about that. I don't think the model infers physical laws. Those are all expressed in words online, and sophisticated word filling-in using statistics is good enough to look like physical reasoning. (Not downplaying statistics in any way...I think it is cool). I don't think physical laws are inferable from words, but ways of filling in words describing physical laws can be learned from text describing physical laws.
What the model has is interesting generalization patterns for sequences, and when applied over language it looks smart and looks like it is inferring things. It learns small automata that can mix slots and content, because the architecture has biases for forcing it that way. I will write more about this soon...hopefully as a paper.
Interested to hear your reasoning! In particular, whether you feel the model architecture aren't able to theoretically store these sorts of models or whether they just aren't achievable to be built using gradient descent etc.
Spot on, Dileep! Couldn't agree more!
great article, I am very fond of your analysis and insights regarding AI and especially from brain based AI approaches. We don't know how the GPT_N++ will work out in a decade etc but it is my intuition that these systems will not create the value we think they would, we will need humans to shape them consistently and constantly, guide them and create narrow guidelines. Of course when no reliability is needed they will look magical and general AI's but for large time (t) they will show their problems and inabilities.
Thank you for the welcome!
Great read with insightful information that enlightens the mind. Recently I tried my hand at some tech writing, AI is coming and I thought I’d put together a rounded article—would love your eyes and insights 😊 https://tumbleweedwords.substack.com/p/ai-enters-our-everyday-reflections
Good article. Found one typo:
1919: First non-stop transatlantic airplane fight. ->
1919: First non-stop transatlantic airplane flight.
Thank you Dileep for bringing out a very different aspect which people outside the pure AI world did not realise...looking forward to the next blog
Ok, should we try again to make a paper plane that actually flies?
If there are yet to be found principles/ideas that might work on a much smaller scale (and later scaled up), investigating that area shouldn't be too expensive.
of course yes.
During the dirigible days, were there people who said, "No one will find the principles of aerodynamics" like some people today (me for example!) who say, "We will never find the principles of intelligence"? - all we can hope to do is "dirigibles"!
It is a bunch (huge bunch) of small things put together that acquires an emergent property that we call "intelligence" . . .?
>there are also avenues to combine the strengths of current approaches with an investigation into future architectures.
Yes, hybrids should be powerful. Especially when new architectures can read from the intermediate layers of LLMs (and, perhaps, write to them).
This is where this really beautiful analogy with LLMs as dirigibles stops working. A hybrid between an airplane and a dirigible is not a promising idea. A hybrid between a shiny new architecture and an LLM might be what one needs to progress from excelling in "toy problems" to convincingly beating stand-alone LLMs across the board.
I agree that scaling up LLM will hit a dead end. But people are going way beyond that. They are using LLM as language interface between tools, and those tools know what they are doing individually. They are reinforcing LLM with human feedback, and a framework that can be taught by human means rather than explicit coding is a huge deal. People are adding verification to LLM, so you could tell it, that a portion of its output is wrong, so it needs to be redone. It is too early to early to say that it won't grow.
Looking forward to this line of work!!!
Great analogy and helpful perspective for evaluating and extrapolating the performance of these exciting large ML models, compared to what we would really like to have.