GPT-3 and the Future of AGI
With much fanfare within the Machine Learning and broader tech community GPT-3 was recently released by OpenAI. GPT-3, as the name indicates, is the 3rd version of OpenAI’s transformer based neural network language model or Generative Pretrained Transformer (GPT).
There are many good articles and technical primers that you can read about the model here, here and here. A friend also shared this blog with me that illustrates in a human-friendly way some of the interesting and amazing implications of the technology.
But in quick summary, GPT-3 continues the good work on transformer architectures started with BERT and continues to make progress towards strong general purpose language models that can be used for any variety of natural language processing (NLP) and understanding tasks. Rather than building a custom model for each NLP Machine Learning problem or task, these large pre-trained language models can be used and transfer their knowledge to a new domain of NLP problems. This was the major breakthrough in NLP pre-training and transfer learning that emerged just a few years ago.
What’s interesting about these models — unlike supervised learning tasks — is that humans don’t need to be involved to label data or train the system manually. The systems are pre-trained on huge corpus of text including WikiPedia (something like ~500GB of text) using 175B parameters and cost $12M of computer hardware to train. Its a big model :)
There’s been much debate about whether its a game-changer or just a brute force trick and I think it goes to the heart about our approach to general intelligence.
I don’t think anyone has or could rightly claim that GPT-3 is a breakthrough in AGI — Artificial General Intelligence. But I do think most can agree that it does produce impressive results.
The criticisms against it are that with 175B parameters (175B!?!) that the model has somehow memorized the English language.
This appears unsatisfying for two reasons:
- It feels like a statistical trick — without any real underlying learning.
- It does not appear generalizable to any other general intelligence problems.
I think this claim is only half-true.
I think as for the “trick” — I think we can take comfort and knowledge that fantastical technological progress doesn’t always mimic nature.
Looking at a hummingbird and a 747 commercial airplane for example.
They both fly, but they accomplish their goals very differently. A hummingbird is dramatically more flexible and elegant — a 747 is an industrial behemoth that is capable to move hundreds of people around the globe (pre-pandemic). While the 747 is clearly a brute force technology of fixed wing airfoils and massive jet engines — it meets our needs, even if it doesn’t replace the elegance of nature or enable a human to fly in the same way.
Isn’t it possible, if not likely, that natural language understanding will be similar? That we will be able to accomplish much of what we need with language in the robot world — without ever having learned to “flap our wings”? If we can have a perfectly good human-like conversation to order our food, take out a loan, or plan a construction project — why does it matter HOW its accomplished?
On the other hand — as it pertains to artificial general intelligence — I do think its clear that this model is not sufficient for the general kind of robust, transferable cognition that we expect to see in a truly intelligent agent.
I enjoyed a great interview with Jitendra Malik and Lex Fridman (my brief observations here) on the future of computer vision and cognition — and Malik does still believe that our best progress in AGI and machine learning will be inspired by human learning. He provides 6 areas, beyond supervised learning, in which agents can benefit from who child learn, namely:
- be multi-model
- be incremental
- be physical
- explore
- be social
- use language
One of the big challenges for the future of ML that Malik shares is can an AI read an entire novel and then answer arbitrary questions about the book.
In this regard the very nature of the transformer architecture and larger and larger language models based on predicting the next word may not be sufficient.
While I too look forward to the next quantum-leap in language understanding and cognition, we can acknowledge that progress is rarely linear. And regardless of how we march forward towards those longer term goals, I am excited and curious to see how we can take advantage of these impressive models to substantially move forward many practical open NLP problems of today — particularly conversational AI.