Transcribe another video

Report a copyright issue

YouTube transcript

François Chollet: Why Scaling Alone Isn’t Enough for AGI

Y Combinator Watch on YouTube

[0:00]I think we're probably looking at AGI [0:02]2030, around the time [0:05]that we're going to be releasing like [0:06]maybe Arc 6 or Arc 7. You're not going [0:09]to stop AI progress. I think I think [0:12]it's too late for that. And so the next [0:14]question is okay, like AI progress is [0:16]here. [0:17]It's actually going to keep [0:18]accelerating. How do you make use of it? [0:20]How do you leverage? How do you ride the [0:21]wave? That's the question to ask. [0:30]>> Today we're lucky to be joined by [0:32]Francois Chollet, founder of the Arc [0:34]prize, a global competition to solve the [0:37]Arc AGI benchmark. His latest project is [0:40]Indium, a lab exploring a new paradigm [0:43]in frontier AI research. Francois is one [0:46]of the best people in the world to help [0:48]us understand the current AI moment and [0:51]where all of this is going. Francois, [0:53]thank you so much for joining us today [0:54]and congrats on the launch of Arc AGI [0:57]V3. [0:58]>> Thanks so much for having me. I'm super [0:59]excited to be here. Super exciting times [1:01]to talk about AI. [1:02]>> So Francois, tell us a little bit about [1:04]Indium. So what exactly is it and what [1:06]are you guys trying to achieve? [1:08]>> Right. So Indium is this new AGI [1:10]research lab and we are trying some very [1:13]different ideas. And so our goal is [1:16]basically to build this new branch of [1:18]machine learning that will be much [1:19]closer to optimal. Unlike unlike deep [1:22]learning. [1:23]>> All of us right now are sort of taken by [1:26]what's going on with code. I have sort [1:28]of this viral moment right now where I [1:30]got to 40,000 stars this morning on G [1:34]stack. So it's like oh, this is an open [1:36]source project that now is one of the [1:37]biggest ones and I have more than 100 [1:39]PRs from contributors to deal with. I [1:42]guess you're, you know, one of the best [1:44]people to talk to about this because [1:46]you're you're actually literally coming [1:48]up with something that is a totally [1:49]different pathway. [1:51]>> That's right. That's right. So [1:53]what we're doing at Indium is we're [1:55]doing program synthesis research. And [1:57]when I talk about program synthesis, [1:59]often people ask me, "Oh, so are you [2:00]doing like code gen? Are you building an [2:02]alternative to coding agents?" And it's [2:04]actually not at all what we are doing. [2:06]We are working at a much, much more, [2:09]much lower level than that. What we are [2:11]actually doing is that we are trying to [2:12]build a new branch of machine learning, [2:15]an alternative to deep learning itself, [2:18]rather than like coding agents. Coding [2:19]agents are like this very, very [2:20]high-level last layer piece of the [2:23]stack. And we are actually trying to [2:25]rebuild the whole stack under a [2:26]different foundations. So, we are [2:28]building a new learning substrate that's [2:31]very different from, you know, [2:33]parametric learning, deep learning. [2:35]So, if you go back to [2:37]the problem of machine learning, you [2:39]have some input data, some target data, [2:42]and you're trying to find a function [2:44]that will map the inputs to the targets [2:46]that will hopefully generalize to new [2:49]inputs. And [2:51]if you're doing deep learning, what [2:52]you're doing is that you have this [2:54]parametric curve that serves as your as [2:56]your function, as your model, and you're [2:57]trying to fit the parameters of the [2:59]curve, yeah, gradient descent. And this [3:01]is basically what we are doing. Except, [3:04]we are replacing the parametric curve [3:06]with a symbolic model that is meant to [3:09]be as small as possible. It's like the [3:11]simplest [3:13]possible [3:14]model to explain the data, to model [3:16]what's going on. [3:17]And of course, if you're doing that, you [3:19]cannot apply gradient descent anymore. [3:21]So, we are building something that we [3:23]call symbolic descent, which is like the [3:26]symbolic space equivalent of gradient [3:28]descent. The idea is to build this new [3:31]machine learning engine [3:32]that's giving you [3:35]extremely concise symbolic models of the [3:38]data you're feeding into it. And then we [3:40]are going to make it scale. And so, [3:42]everything you're doing with machine [3:43]learning today, with parametric curves, [3:45]we should be able to do it [3:47]with symbolic models in the future. In a [3:50]in a way that will be much much closer [3:53]to optimality. [3:55]Much closer to optimality in the sense [3:57]that you're going to need much less data [3:59]to obtain the models. The models are [4:01]going to run much more efficiently at at [4:04]inference time because they're going to [4:05]be so small. And because they're so [4:06]small, they will also generalize much [4:08]better and compose much better. You [4:10]know, the the minimum description length [4:12]principle that the model of the data [4:14]that is most likely to generalize is the [4:16]shortest. And I think you cannot find a [4:18]model like this. If you're doing [4:20]parametric training, you need to you [4:21]need to try something else. [4:23]That's fascinating. [4:24]>> So, the rest of the industry is just [4:26]pouring more and more billions of [4:27]dollars down an approach that was set [4:30]years ago. Can you like help make the [4:32]case for why you think that it's the [4:34]right thing to explore alternative [4:35]approaches instead of just to keep [4:37]putting more money into the current [4:38]approach? [4:39]>> I mean, everybody's is [4:41]you know, building on top of the LLM [4:42]stack these days, which makes sense [4:44]because, you know, the the returns are [4:46]there. Like it's actually working. So, [4:48]it would seem very sensible for [4:50]everybody to just be doing what seems to [4:53]be the the the currently most productive [4:55]path. [4:56]But I think it's actually it's [4:57]counterproductive to have everybody [4:59]working on the same thing. Like, I [5:01]personally don't think that machine [5:03]learning or AI in 50 years is still [5:06]going to be built on this stack. I think [5:07]this is a stack that is [5:08]very nice. Maybe it even gets us to AGI. [5:11]But it's not as efficient as it should [5:14]be. I think it's inevitable that the [5:17]world of AI will trend over time towards [5:20]optimality. And so, I'm trying to sort [5:22]of like leapfrog directly [5:25]to optimality. Like to build to build [5:27]the foundations of optimal AI today. But [5:29]in general, you know, [5:30]our vision is very ambitious, and I'm [5:33]not saying that we're going to be [5:34]successful. Like we have maybe a 10 or [5:36]15% chance of success. [5:38]But that is enough [5:40]that it's worth trying, right? And I [5:42]think in general, like among among [5:44]listeners, if you have [5:46]a big idea and it is very low chance of [5:48]success but uh if it works it's going to [5:51]be big and no one else is going to be [5:52]working on it, right? It's It's not [5:54]something popular. It's not something If [5:56]you don't do it no one else will do it. [5:57]And this is basically our situation. If [6:00]you're in this situation then you then [6:01]you should you should should try a [6:02]chance, you know, you should should go [6:04]and work on it. I mean that's almost [6:05]like the mission statement of Y [6:06]Combinator, the thing that you just [6:07]said. [6:09]Yeah, the reason it's important is that [6:10]again, if we don't do it no one else [6:12]will do it, right? So it's worth trying [6:13]even if we don't succeed. [6:15]>> It's worth trying. [6:15]>> Has the success very specifically of the [6:18]coding agents I guess built on top of [6:19]the LLM stack like has their success [6:22]surprised you at all and in particular [6:24]like say over the last 6 months or so? [6:26]>> Yeah, absolutely. I think they surprised [6:28]many people and it definitely did [6:29]surprise me. If you look at why [6:31]everything is is starting to work so [6:33]well with coding agents, it's really [6:34]because [6:35]code provides you with a verifiable [6:38]reward signal. And I think right now [6:40]we're in this situation where any [6:42]problem where the solutions you propose [6:44]can be [6:45]formally verified and you can actually [6:47]trust the reward signal. It's not just [6:48]some guess made by a model. Any domain [6:51]like this can be fully automated with [6:53]current technology with with the LM [6:55]based stack. And code is sort of like [6:58]the first domain to fall but there will [6:59]be many others in the future. I think [7:01]mathematics is also is also primed to [7:04]see a a revolution next few years for [7:06]the same reasons again because the [7:08]domain just gives you verifiable [7:10]rewards. [7:11]>> I guess the challenge for a formally [7:13]verified domain is you have to [7:17]somehow take a domain and make it [7:19]verifiable which is the trick. I mean [7:21]code is very natural. [7:22]You can test, there's bugs, compiles, [7:24]etc. and mathematics as well where there [7:27]all the theorems and proofs work out. [7:29]I guess it becomes more nebulous when [7:31]you go couple degrees off where there [7:34]fields that are not [7:36]naturally formally verified. You need to [7:38]come up with a again with some some of a [7:40]function [7:42]to [7:43]come up with that [7:44]reward that makes it verifiable. With [7:47]very fuzzy things like, let's say, [7:49]English language and composing the [7:51]perfect essay, [7:52]how do you make that formally [7:54]verifiable? [7:55]>> Yeah, yeah. Absolutely. I mean, writing [7:57]essays is, you know, the typical example [7:59]of the domain that's not [8:01]verifiable. And so, what you're going to [8:03]see is that progress of reasoning models [8:05]in in base elements on this type of of [8:07]of domain is is, you know, is going to [8:09]be very slow because the stack we're [8:11]using, like the LLM stack, is very very [8:13]reliant on its trained data. It's [8:16]basically just operationalizing the [8:18]trained data. And for writing essays, [8:20]the trained data is coming from human [8:23]experts, like annotating answers. And [8:27]that's costly. So, you're going to see [8:28]this very very slow progress. Maybe [8:30]maybe it's even going to stall. But for [8:32]any any verifiable domain, like take [8:34]code for instance, what was the big [8:36]unlock is [8:37]when when people started creating these [8:40]code-based like training environments [8:42]for for post-training, [8:44]where the the the reward signal, the [8:46]verification signal, is provided by [8:48]things like unit tests and so on. And [8:51]so, that means that the model was not [8:52]just working from human-provided [8:55]annotations. It was actually trying its [8:56]own things, [8:58]verifying the answer, and generating a [9:01]lot lot more trained data in the [9:03]process. So, a much denser coverage of [9:05]the problem space. And not just coverage [9:08]in terms of like is is the answer right [9:10]or wrong, but also starting to build [9:13]models of the execution traces, right? [9:16]So, that the models could start [9:18]incorporating a an execution model, very [9:20]much the way that [9:22]human programmers, you know, when they [9:23]look at code, they're sort of like [9:24]executing the code in their minds. They [9:26]they keep track of the value of [9:27]variables and so on. It's also what the [9:29]models are trying to do now. And this is [9:31]why it's working so well. And it's [9:32]possible because you're working with [9:34]this very a [9:35]fully verifiable environment. You cannot [9:37]do that with this. You cannot do that [9:39]with you know law [9:40]or many other problems. [9:41]>> I think I really like how you define [9:43]intelligence [9:45]and how to measure it, which brings to [9:47]the question of also sharing having you [9:49]share the history of AGI. [9:52]>> Yeah, so my my definition of general [9:55]intelligence, you know, many people [9:57]around the industry these days they say [10:00]AGI is going to be a system that can [10:02]automate most economically economically [10:05]valuable tasks. [10:06]And to me that definition is it's it's [10:09]about automation. It's not about [10:11]intelligence. It's not about general [10:12]intelligence. So my definition is [10:15]AGI is basically going to be a system [10:17]that can approach any new problem, any [10:20]new task, any new domain and make sense [10:23]of it like model it, uh become competent [10:26]at it [10:27]uh with the same degree of efficiency as [10:30]a human could. So meaning it's going to [10:32]need basically the same amount of [10:33]training data uh and training computes [10:36]as as a human would, which is which is [10:37]very little. Like humans are really [10:39]really uh data efficient. So [10:42]general intelligence is human-level [10:45]skill acquisition efficiency on the on [10:47]the same scope of tasks that humans [10:49]could potentially uh [10:51]learn to do. [10:52]>> Do you think it's possible that we will [10:53]accomplish the first definition of AGI, [10:56]the automate most economically useful [10:58]work, before we accomplish your [11:01]definition? [11:01]>> Absolutely. I think that's that's the [11:02]trajectory that we're on right now. And [11:05]I think it's already true that in [11:06]principle current technology can fully [11:09]automate at human level or beyond any [11:12]domain where you have uh verifiable [11:14]rewards, right? And code code being the [11:16]first one. And I think figuring out AGI, [11:18]figuring out like human level uh [11:21]you know, learning efficiency over [11:23]arbitrary tasks, [11:24]that's probably going to take uh a [11:26]different sort of technology, a [11:27]different a different mindset, a [11:28]different approach.

[11:29]>> Do you think that LLMs can be bent to [11:31]have the same sample efficiency as [11:34]humans or do you think it's like [11:35]fundamentally just impossible and we [11:37]need a new approach and that's that's [11:39]the thing that you're hoping hoping to [11:41]solve. [11:41]>> With enough compute everything starts [11:43]looking like everything else. Every like [11:45]computer is going to look like every [11:46]approach starts looking the same. And I [11:48]think it's possible in principle to [11:51]build something that looks a lot like [11:52]AGI on top of the LLM stack. Uh but it's [11:56]not going to be LLMs per se. It's going [11:58]to be this new layer perhaps you know [12:00]it's going to be even a few layers above [12:02]not just one layer above but a few [12:04]layers above. Uh but it you you can [12:06]build it on top of LLMs because LLMs are [12:08]kind of computer right? [12:10]>> Exactly. [12:10]>> Uh I do believe however this would be [12:12]the wrong thing to do because it would [12:14]be very inefficient. I think AI AI [12:17]research will have to trend towards not [12:20]just efficiency but in fact optimality [12:22]over time. And for this reason future AI [12:25]in a few decades uh it's not going to be [12:27]this harness on top of a reasoning model [12:30]on top of a base LLM. Uh it's going to [12:32]be much much lower than that. [12:35]>> To Diane's question do you want to talk [12:37]about how you actually designed our KGI [12:39]and why it's a good barometer of that?

[12:40]>> I mean I I you know I've been doing deep [12:42]learning for a very very long time and [12:44]initially my my my take my mindset was [12:47]that deep learning was going to be able [12:48]to do everything. [12:50]>> You were the creator of Keras before [12:52]even all the other frameworks became [12:54]very popular. [12:55]>> That's right that's right. I was [12:56]training deep learning model [12:58]uh for natural language processing in [13:00]fact. In uh 2014 and uh from that work [13:05]uh you know I actually started uh [13:07]developing this open source library [13:09]which I I released uh in fact uh exactly [13:12]11 years ago uh March March 2015. Uh so [13:16]it was Keras and and then it got popular [13:18]and then I ended up [13:19]uh sort of like doing less of the [13:21]research that I that I had started Keras [13:23]for and more of working on the framework [13:25]itself just because it has really really [13:27]good product market fit. [13:28]And so my my take, you know, around that [13:30]time, around like 2015, 2016, was that [13:33]deep learning was extremely general, [13:34]that you could do everything with deep [13:36]learning, that you didn't need anything [13:38]else. It was Turing complete. So, [13:41]uh my take was basically that deep [13:42]learning was differentiable programming. [13:45]Uh so, anything you would do with [13:46]software, you could in principle train a [13:48]deep learning model on the right inputs [13:50]and outputs to do the same thing. [13:53]And uh in uh 2016, I was doing uh [13:56]research at Google Brain [13:59]on trying to train deep learning models [14:01]to help with uh reasoning problems. And [14:04]in particular, uh first-order logic [14:06]problems, [14:07]uh [14:08]uh theorem proving, and so on. And I [14:11]started finding that you could not [14:13]really get gradient descent to encode [14:17]uh uh sort of like reasoning-style [14:19]algorithms. [14:20]It was not because the models could not [14:23]represent these algorithms. It was [14:25]because gradient descent could not find [14:27]them, right? So, the problem was that it [14:30]wasn't about deep learning not being [14:32]Turing complete or anything like that. [14:33]Like, [14:34]that was not the problem. The problem [14:35]was gradient descent, right? Gradient [14:37]descent would not find generalizable [14:39]programs. It would instead uh end up [14:42]doing uh overfit pattern matching, [14:43]right? Uh over over sequences of uh uh [14:46]input tokens. [14:47]>> So, I guess people could argue like [14:48]that's what's happening. [14:50]>> I mean, this this this is still what's [14:51]happening today in a in a in a slightly [14:54]>> It's It's just It's slightly [14:56]higher-level version of the [14:57]>> With a lot of data. So, it doesn't feel [14:59]like overfitting because the data has a [15:00]lot more distribution. [15:01]>> Yeah. With a lot more data, and also I I [15:03]think models today uh they are a lot [15:05]more compressive after that. That's why [15:07]why they they generalize better. [15:08]>> All models are wrong, but some models [15:11]are useful. And then I guess what I'm [15:13]hearing is like your method might find [15:15]the right model. [15:16]>> That's right. [15:17]That's uh that's uh where where the idea [15:20]came from. And I was like, you know, at [15:21]the time, you know, back in 2016, 2017, [15:24]I was like, "Okay, we're going to need a [15:26]a benchmark to capture the ideas." [15:29]>> Uh we're going to need a program [15:30]synthesis benchmark. [15:32]And uh my my mental model for that was [15:35]ImageNet. [15:36]>> Mhm. [15:36]>> I was like, "Oh, I'm going to make the [15:37]ImageNet of reasoning." So, I started [15:40]brainstorming a few ideas around like [15:42]20s, 2017. I explored many different [15:44]things. [15:45]Uh I tried working with uh in particular [15:47]cellular automata, like [15:50]a setup where you show a model [15:52]uh cellular automata outputs and it must [15:54]recreate uh the program that generated [15:56]them, like that sort of thing. Uh and [15:58]eventually I settled on the uh ARC [16:00]format [16:01]uh around like early 2018. You know, I [16:04]was doing this on the side. It was a [16:05]side project. Like my main project was [16:08]uh developing Keras at Google. I wasn't [16:10]moving very very fast [16:12]uh on that. Uh so, summer 2018, uh I [16:15]wrote the ARC task editor.

[16:18]And then I started just making lots of [16:19]tasks [16:21]by [16:23]hand. [16:24]And so, I wrote up uh the paper that was [16:26]explaining what this was about, what the [16:28]big idea was, like intelligence is as uh [16:31]skill acquisition efficiency. [16:33]Uh and I published all of that in uh in [16:35]2019. [16:36]>> In parallel, GPT-3 2020 [16:39]was coming out and starting to show [16:40]signs [16:42]until the ChatGPT moment around 2022, [16:45]end of the year. [16:46]And the industry took off with that. And [16:49]this was one of the benchmark that was [16:50]really performing really badly. And it [16:52]was very obscure. I don't think many [16:54]people knew about it. It was mostly [16:56]niche research communities that maybe [16:59]read your paper. [17:00]>> Yeah, people who worked on program [17:01]synthesis knew about it. [17:03]Uh but a lot of people who worked on on [17:05]deep learning, on scaling up LLMs, they [17:06]didn't really care for it. And part of [17:08]the reason why is because LLMs did not [17:11]work well or at all on the benchmark. [17:14]For a benchmark to capture the attention [17:17]of the research community it needs to [17:18]start working a little. [17:21]Uh if it's too hard, people are going [17:23]I'm just going to dismiss it. [17:24]>> You're just ahead of your time clearly [17:26]because we're not on Arc AGI 1 anymore [17:30]and then 2 is reaching saturation. [17:32]And then 3 is out now.

[17:35]>> Yes. [17:36]>> And I think the cool thing about [17:38]Arc AGI it has been a very good [17:41]barometer for [17:43]the industry of the big changes that [17:45]happened because [17:46]1 was not working at all [17:49]for a long time until 2025 [17:53]when reasoning models came out, right? [17:55]>> Yeah, absolutely. If you look at [17:58]frontier AI performance on Arc V1 first [18:01]and then V2. So basically LMs [18:04]were scoring extremely low on V1 like [18:06]sub 10% basically. And I mean it was [18:08]true of the original like GPT-3 [18:12]which was scoring zero. But that's even [18:14]true of the latest base LMs today, you [18:17]know, as of as of March [18:18]>> Without reasoning. [18:19]>> Without reasoning. [18:19]>> Without reasoning. [18:20]>> Yeah, so it's the base models. So [18:22]performance of [18:23]base base LMs on on V1 stayed very very [18:26]low even though in the meantime, you [18:28]know, we had scaled up these models by [18:30]50,000 X, right? So it was really [18:33]telling you that you know, more scale [18:35]scaling up pre-training alone was not [18:37]going to crack the benchmark. This was [18:38]not enough to demonstrate that the model [18:41]had true intelligence. And then [18:44]the moment [18:46]models started performing well on Arc 1 [18:49]was with the first reasoning models. In [18:51]particular the OpenAI 01 and then 03 [18:55]models which by the way they were [18:56]demonstrated by OpenAI on Arc because it [18:59]was the one unsaturated reasoning [19:01]benchmark that was really showing that [19:03]this model was different. It had new [19:05]capabilities that we had not seen [19:07]before. And so with reasoning models, [19:09]you start seeing this sudden like step [19:12]function change [19:13]on on ARC-1. And so, ARC-1 was really [19:15]the benchmark that signaled that at this [19:18]moment in time something was happening.

[19:20]And so [19:21]>> Something big. [19:21]>> Yeah, something big. Like new [19:22]capabilities were emerging. Like [19:24]reasoning was new and different. And it [19:28]was actually not obvious at the time. [19:30]Like you know, I don't know if you [19:31]remember when the when the O3 preview [19:34]was was announced by OpenAI. [19:36]>> That was end of 2024 actually. [19:37]>> Yeah, December 2024. And like short it [19:40]was like [19:42]huge like step function progress on ARC. [19:45]But it was very expensive. It did not [19:47]really have product market fit [19:49]effectively. But if you looked at at ARC [19:51]results, you knew that this was big and [19:53]important.

[19:55]And then we released ARC-2, which was [19:57]the same format but more difficult like [19:59]with more [20:00]composition [20:02]the level of the the reasoning chains. [20:05]And what happened is that so the the [20:07]earliest reasoning models started very [20:09]very low on ARC-2. And then around the [20:11]same time as coding agents started [20:14]working, you saw this [20:16]>> Yeah. So very very recent just few [20:18]months ago, you saw this [20:20]very very fast like saturation of ARC-2. [20:24]And so again like ARC-2 signaled that [20:26]yes, there was this this new set of [20:28]capabilities emerging. So I think the [20:30]benchmark did a really good job at [20:31]capturing the advent of reasoning models [20:34]and then the advent of agentic coding. [20:37]Like this this new paradigm where if you [20:39]have very favorable rewards, then you [20:41]can basically fully automate [20:43]the domain. Which by the way is true of [20:45]ARC. Like ARC does provide a verifiable [20:47]reward. [20:48]>> I guess for V2 what what caused the So [20:50]one was clearly reasoning. Two, a [20:52]benchmark doesn't care how you solve it. [20:55]I guess embedded in what you said like [20:57]were people using code gen to then [21:00]solve? [21:01]>> That's right. So not not necessarily [21:03]code gen [21:04]per se but [21:06]uh frontier labs have been targeting ARC [21:08]V2. And uh the progress you saw on ARC [21:12]V2 is actually results uh of this very [21:14]very large-scale targeting. So, what you [21:17]can do to solve ARC V2 is you ask your [21:19]reasoning model to make more tasks like [21:23]those in the benchmark. [21:25]Uh and then you try to solve them using [21:27]let's say let's say program induction [21:28]for instance. [21:29]Uh [21:30]still using your reasoning model. Then [21:32]you verify the solution. Again, it's [21:33]verifiable. So, you can you can trust uh [21:36]the answer. Um and then you fine-tune [21:39]the model on the successful reasoning [21:41]chains. And then you keep repeating like [21:42]generate new tasks, you solve them, you [21:44]verify the solution, you fine-tune the [21:46]model on the reasoning chains. And um [21:49]you can keep doing this millions of [21:51]times, right? Like the the you just need [21:53]to spend more money. [21:53]>> This is the RL loop that is happening, [21:56]yeah. [21:56]>> And the the new paradigm in AI is [21:58]basically that any domain where this is [22:00]true, where you have uh the ability to [22:02]generate these uh these uh true uh uh [22:05]verification signals, you you can run [22:07]this this kind of loop, right? If you [22:08]can run this kind of loop, you can mine [22:11]uh uh you can brute-force mine [22:12]effectively the entire space and get [22:14]extremely high performance. This is [22:16]basically the the process through which [22:17]ARC 2 was saturated. So, what it tells [22:20]you is that it's not so much that the [22:21]models have higher fluid intelligence uh [22:24]than than they did with the with the [22:26]first reasoning models. It's just that [22:28]you have this new paradigm of [22:29]post-training. And this is exactly what [22:32]led to agency coding. So, it does [22:34]matter. It is it is valuable. It is [22:35]useful. [22:36]>> It's not that the mar- models are [22:38]smarter, it's that they're suddenly more [22:40]useful. And it's possible to be more [22:43]useful in particular domains without [22:45]being smarter. Yeah, clearly because [22:47]that's means good things for me. I'm not [22:49]getting any smarter right now like [22:52]you know, age 45, but you know, I can [22:55]learn how to do things. And that's sort [22:57]of what's happening with the models as [22:59]of like late. [23:00]>> Yeah, absolutely. When when it comes to [23:02]a [23:02]competency, there's always a trade-off [23:04]between intelligence and knowledge. If [23:07]you have more knowledge, if you have [23:08]better training, uh you need less [23:10]intelligence to be competent. And that's [23:12]exactly uh [23:14]what happened with the the rise of [23:16]coding agents, right? The models don't [23:18]have higher fluid intelligence per se. [23:20]They don't have like a higher [23:22]IQ, so to speak. It's just that they're [23:24]way better trained. And they're way [23:26]better trained in in two ways. So, [23:28]they're not just trained to to complete [23:30]coding more. They're actually trained [23:32]via trial and error in these RL [23:35]post-training environments with, you [23:36]know, true what's signals. And also, [23:38]they're trained uh to embed these uh [23:41]model of code execution, right? Where [23:43]they they they they [23:45]they learn to keep track of the value of [23:46]variables [23:47]uh over an execution cycle. And that's [23:50]what what's leading to this extremely [23:52]strong product market fit uh of [23:54]agent-side coding today. And three, it's [23:55]completely changing software [23:56]engineering. [23:57]>> This has happened not too long ago, the [23:59]saturation. We actually had the founder [24:02]of Poetic that came and spoke about the [24:05]approach, which is really sounds like [24:08]this new way of getting LLMs to perform [24:10]is building this agent harness, right? [24:13]And the harness is basically structuring [24:15]a problem domain into [24:18]something that can be formally verified. [24:20]And they did that basically for Arc V2.

[24:23]Which that when they released it, they [24:25]were at the top of the benchmark. But [24:27]then the crazy thing is I actually [24:28]worked with the company in the winter 26 [24:30]batch not too long ago called Confluence [24:32]Lab, which actually ended up saturating [24:35]the V2 results with 97% and I think [24:38]their task cost was a lot more [24:40]efficient, too. [24:41]And the approach they basically took is [24:43]similar to this. I think they [24:45]built the harnesses on top of it in [24:48]order to get the LLMs to to go and build [24:51]different tasks and [24:53]program through it.

[24:55]>> Yeah. [24:55]>> Which then [24:56]for me, I was like, "Wow, is this batch [24:58]and during the batch they only worked on [25:00]it for a couple of months and they were [25:03]able to saturate this benchmark that has [25:04]been around for a long time like [25:05]something special is happening. [25:07]>> Yeah yeah there's a lot of progress [25:08]right now that's driven by a custom [25:11]harnesses around the task and the [25:13]harness is basically a way for the the [25:15]human programmer to [25:18]input into the model or like higher [25:20]level like solution strategies [25:22]basically. I mean to me the fact that [25:24]you need humans to engineer these [25:26]harnesses is also a sign that we're [25:29]we're short of AGI today because if we [25:31]had AGI you know AGI would just make its [25:33]own harness it would not need to be told [25:35]how to solve a problem it would just [25:37]figure it out. But it is very effective [25:39]like harnesses are not feeling that get [25:40]us closer to AGI in any sense but that [25:43]it's very valuable area of research [25:45]because that can lead to task automation [25:48]at scale. [25:48]>> YC's next batch is now taking [25:51]applications. Got a startup in you? [25:53]Apply at ycombinator.com/apply. [25:56]It's never too early and filling out the [25:58]app will level up your idea. Okay, back [26:01]to the video. [26:02]>> Can you tell us about then what V3 is [26:05]going to measure that's just got [26:07]released? [26:08]>> Yeah absolutely. So if you look at V1 V2 [26:11]it was really focusing on your ability [26:13]to [26:14]produce like causal models of a pattern [26:18]that was just given to you like the data [26:20]was given to you.

[26:21]So it was static it was [26:23]passive and really focused on [26:26]modeling. And V3 is completely [26:29]different. We're trying to measure [26:31]agentic intelligence. So it's [26:33]interactive it's active like the data is [26:36]not provided to you you must go get it. [26:38]The idea is that your agent is dropped [26:41]into an environment which is kind of [26:43]like a a mini video game. [26:45]And it's not provided any instructions [26:47]it's not told what to do it's not told [26:50]uh the goal even is or what the controls [26:53]even are, and must figure out everything [26:56]on its own via trial and error. So, we [26:59]are we are not just uh measuring, you [27:01]know, the [27:02]the AI's ability to model its [27:04]environment, we are also looking at uh [27:07]its exploration efficiency, its ability [27:09]to acquire goals on its own, like goal [27:12]setting, and of course its ability to [27:14]plan [27:15]uh through the model of the environment [27:17]it has created and and to execute the [27:19]plan. Uh and so, together, you know, all [27:22]of all of these abilities, we call that [27:23]agentic intelligence, and we are looking [27:26]for AI systems that could learn to play [27:29]these games and and, you know, crack [27:31]them with the same degree of action [27:33]efficiency as a human. If you look at [27:35]the human, they are dropped into this [27:37]new environment, they they try a few [27:38]things, they start understanding how [27:40]things work. [27:41]Uh they can they can solve the [27:42]environment, you know, in in a few [27:44]hundreds to thousands of actions. We are [27:46]trying to look for AI systems that could [27:48]match uh this efficiency. And by the [27:50]way, we know that all of these test [27:52]environments in our suite are solvable [27:54]by humans with no prior training because [27:56]we actually uh tested them uh on on [27:59]regular people. Yeah, at first you just [28:01]see this screen, and you have you know [28:04]you have these keys available, but you [28:05]know what they do, and you must figure [28:07]out everything from scratch. And humans [28:10]are really good at that, by the way. [28:12]They're really good at exploring [28:13]efficiently, at making sense of [28:15]something new, and eventually cracking [28:17]the game. And frontier models today are [28:19]not very good [28:21]>> If the reasoning models cracked V1 [28:23]and the like reinforcement learning [28:25]environments cracked V2, do we need a [28:28]new advance to crack V3 to the to to [28:31]even the best techniques currently like [28:33]not work? [28:34]>> Yeah, I mean, I'm very curious to see [28:36]how frontier labs are going to react to [28:38]V3 and how they're going to start to [28:40]target it. [28:41]Um it is designed to be more resistant [28:44]uh to the same kind of dodging strategy [28:46]as what we saw for V2 in in particular.

[28:48]Like, of course you can try to just make [28:51]more OX3-like games and then train your [28:54]agents [28:55]in them. [28:57]Um but the thing is we've uh [28:59]deliberately tried to create a private [29:02]set of environments that is [29:04]significantly different from the public [29:06]set. Like, you can look at the public [29:07]set. It's not actually giving you that [29:09]much information about what's in private [29:11]set. Uh in the private set we'll have [29:13]very different games with very different [29:15]concepts. And also the public set is [29:18]meant to be substantially easier. So, [29:20]your performance on public set is not [29:22]actually It's not representative of how [29:24]well the system would do on private set. [29:25]So, for these reasons it can be harder [29:27]to target. [29:28]>> Uh [29:28]>> And that makes it a better test of fluid [29:30]intelligence as opposed to a test of how [29:32]much effort you put into into cracking [29:34]it. [29:35]>> I'm so curious, how do you come up with [29:36]these games? They're so creative. [29:38]>> Yeah, we set up an entire uh video game [29:41]studio, right, to to create them. Uh so, [29:44]we got uh uh over 250 games. Uh and you [29:47]know, they're they're pretty quick to [29:48]play. Like, each game takes you maybe 10 [29:50]minutes or a bit less [29:53]uh to play from scratch like upon first [29:55]contact. And we have like 250 plus and [29:58]we set up this [30:00]uh very productive game studio where we [30:02]had any given week we had multiple games [30:05]in progress. We had like this this [30:07]pipeline [30:08]including, you know, design, [30:09]implementation, uh review, human [30:11]testing, and and many many iterations [30:15]cycles to to to make sure that the the [30:17]game comes out right. [30:18]>> Who who's working in the studio? [30:20]>> Right. Uh we have Yeah, we hired a a [30:23]team of game developers and we built our [30:25]own game engine. [30:26]>> Wow, so so it's actually people who like [30:28]previously worked in the game in the in [30:30]the [30:31]video game industry. [30:32]>> That's right. That's right. So, one [30:33]thing to keep in mind though is that the [30:35]games in OX3 are unique, right? They're [30:38]trying to not borrow elements, concepts [30:41]from previous video games. Uh they're [30:43]built entirely on top of uh core [30:45]knowledge priors. Like things like just [30:48]just you know, elementary knowledge like [30:50]basic physics, [30:52]understanding of objects, [30:53]understanding of the notion of agents [30:56]for instance, like an agent is an object [30:58]with goals and [30:59]intentions. But we are not incorporating [31:03]any language, any like cultural symbols [31:06]like you know, arrows for instance. Or [31:08]the color green meaning go and color red [31:11]meaning stop, that sort of thing. [31:13]There's no external knowledge that's [31:14]involved in these games. [31:17]>> It's like one of those IQ tests that are [31:18]just pattern matching, but now it has [31:19]time series.

[31:20]>> Yeah. [31:21]It's not just time series, it's [31:22]interactive. You must create your own [31:25]path through game space, right? You You [31:29]must [31:30]You know, in in in an IQ test like [31:33]problem like you know what arc one and [31:35]two is, the data that you must model is [31:38]provided to you. You already have the [31:40]data. You just You just need to find a [31:42]causal rule to explain it. With R3 [31:44]actually must gather the data. [31:46]And you must do so efficiently. Like of [31:49]course you could say, "Well, I'm just [31:50]going to you know, brute force mine [31:52]the space of every possible game state [31:55]and then I find the solution." You [31:57]cannot do that because if you try to do [31:58]that you score extremely low even if you [32:00]manage to solve the level. Because [32:02]you're scored on your efficiency. You [32:04]must match human level efficiency. [32:06]>> It's funny, it's like almost coming full [32:08]circle. This level of AGI [32:11]with games sort of is the match parity [32:13]OpenAI writing I mean, you know, Tom [32:16]Brown [32:17]one of the co-founders of Anthropic had [32:19]to write like the harness code to allow [32:22]like the you know, pre-GPT AI at OpenAI [32:25]to play StarCraft. [32:27]>> Yeah, yeah. OpenAI worked on the on the [32:30]in particular on the on the lab too. [32:32]>> Mhm. [32:32]>> The OpenAI 5 model which was very good [32:35]correctly. So this was like [32:37]Nawjust pre GPT, but I was mostly pre [32:41]transformers because they were working [32:42]with a stack of LSTMs.

[32:43]>> Yeah. [32:44]>> Uh layers, if I recall correctly. And [32:46]even before opening AI, uh DeepMind [32:49]worked a lot on video game uh you know, [32:51]solving video games. Yeah, Deep RL. Uh [32:54]and they were the first to do [32:56]uh Atari games, right back in 2013. That [33:00]you know, they were very very early. [33:01]They they were visionary in that sense [33:02]to to work on on this problems already [33:05]with these methods, [33:06]which are still very modern methods. So, [33:08]the big difference is that if you look [33:10]at um [33:11]at the Atari games for instance, I even [33:12]do that. Your [33:14]training uh on on the same environment [33:17]as what you use for testing. So, [33:19]effectively, you're just trying to [33:21]memorize the best strategies. You're [33:24]trying to uh at at training time explore [33:28]the full uh space of possible game [33:30]states and productionize [33:34]operationalize [33:35]uh that knowledge into into into the [33:37]model. And then at inference time, [33:39]you're basically just recalling that [33:41]knowledge. And that's explicitly what [33:43]we're trying to avoid with Arc 3. Uh [33:47]you're not playing games uh that you've [33:49]seen before. You're not playing games [33:51]that you've been trained on like for [33:53]millions of hours. Like the the OpenAI 5 [33:55]model for instance was playing uh [33:58]a restricted version of Dota 2 and it [34:00]was trained on like tens of thousands of [34:02]of hours of gameplay effectively. I [34:04]think maybe in millions. So, it's just [34:06]an insane amount of train data. With Arc [34:08]3, you're being evaluated on games that [34:10]you're seeing for the very first time. [34:12]And every action you spend exploring is [34:15]counted towards your efficiency score, [34:18]right? So, you're really focused on [34:20]measuring fluid intelligence, your [34:22]ability to efficiently explore, [34:24]efficiently produce a world model [34:27]uh of the environment, and then use this [34:29]model uh to infer goals, uh plan towards [34:32]these goals, uh, and and eventually [34:34]crack the game. [34:35]>> One of the arguments for, um, [34:38]you know, N D A is that you're able to [34:40]do all of the intelligent tasks for, you [34:42]know, an ARC task might be like [34:44].3 cents that, you know, cents for an [34:47]ARC task, but, you know, for the same [34:49]task on a foundation model with LLM's, [34:52]you know, a dollar to $10.

[34:54]And then there's this other aspect that [34:56]we've been tracking where it seems like [34:58]uh, more and more intelligence, [35:01]um, at least on the LLM side, [35:03]uh, can be distilled down into smaller [35:06]and smaller models. And so, on the one [35:08]hand, like, they're scaling up, but then [35:10]they're like distilling smarter and [35:12]smarter small models. I guess your [35:15]approach might indicate that it's not [35:17]billions of parameters like the, you [35:19]know, N D A achieving AGI might not be [35:23]it it, you know, sort of inherently a [35:25]scale thing at all. There's a platonic [35:27]ideal of the N D A model that achieves [35:30]AGI.

[35:31]>> Yeah. [35:31]>> Do you ever think about it in terms of [35:33]like, well, it would fit on a floppy [35:34]disk? [35:35]>> Well, okay. There are There are two [35:36]things to separate. There's the, sort of [35:38]like, fluid intelligence engine. [35:40]>> Mhm. [35:40]>> I think it's going to be a very, very [35:42]small code base, uh, and a very small [35:44]set of models associated with it. And [35:48]it's probably going to be on the order [35:49]of megabytes, right? And then you have [35:52]the knowledge base, so to speak, uh, [35:55]that's going to be, [35:56]uh, layered below this this fluid [35:59]intelligence engine. Like, you know, [36:01]fluid intelligence has to draw on some [36:03]knowledge, and that knowledge is going [36:05]to take up a lot more space. So, I think [36:07]it's it's it's important to to [36:08]differentiate the two. I do believe [36:09]that, you know, when we create AGI, [36:12]retrospectively, it will turn out that [36:14]it's a code base that's less than 10,000 [36:17]lines of code. [36:18]>> Mhm. [36:18]>> And that if you had if you had known [36:21]about it back in the in the 1980s, you [36:24]could have done AGI back then using the [36:26]the computer resources back then. [36:28]>> Wow, that's a crazy prediction. [36:29]>> That's I I think retrospectively this [36:31]will turn out to be to be true. [36:33]>> Wow, so it was just like hiding under [36:34]our noses in plain sight for like 40 [36:36]years. It took us like 40 years to [36:38]figure it out. [36:38]>> right. That's right. [36:39]>> Well, that second thing sounds like [36:41]Douglas Lenat's like Cyc project. Or is [36:43]that the wrong way to think about it? [36:44]It's like there's sort of knowledge [36:46]about the world. [36:47]>> Yeah. [36:48]>> And then there's methods. Like the [36:49]program, what I hear is like the program [36:52]might be 10,000 lines, and then it [36:54]operates on like [36:55]>> on knowledge base that's very large. So, [36:57]the problem with Cyc, uh I mean there [36:59]there were many issues with it, but one [37:00]of the big issues is that uh there was [37:03]no learning involved. [37:04]>> Yeah. It's just the knowledge like it's [37:06]just like [37:06]>> was handcrafted. [37:07]>> It's like purely symbolic knowledge, and [37:09]it was probably inaccurate. [37:10]>> The way you want to be building AGI is [37:13]that you want to be removing humans [37:16]uh from from the improvement loop as [37:18]much as possible. You don't want a [37:19]system where every improvement in system [37:22]capability uh has to involve a human [37:24]engineer doing something. And that's [37:26]actually the strength uh of deep [37:28]learning and foundation models is that [37:31]you can just scale up the knowledge [37:32]base. Like an LLM is effectively a [37:34]knowledge base. It's a bank uh of uh of, [37:37]you know, marginal uh vector programs [37:39]that map patterns of input tokens to [37:41]patterns of output tokens. And you can [37:43]can scale up that knowledge base by just [37:45]adding training data and training [37:47]compute with no further human [37:50]involvement. I mean, of course, there's [37:51]still a little bit of human involvement [37:53]in in making sure the training job [37:54]completes, but it's it's minor. You've [37:56]managed to remove humans uh from this [37:59]improvement loop as much as possible. [38:01]And that's also [38:02]uh what we want for our system. We want [38:03]a system that's uh self-improving where [38:06]the improvements are compounding, [38:08]meaning that every time the system [38:10]increases its capabilities, it's also [38:12]increasing the rate at which it [38:14]increases its capabilities. [38:15]>> I think this is a PG is on. It's like, [38:17]"I'm sorry the essay is so long. [38:19]Uh if I had more time, I would make it [38:21]shorter." [38:22]>> Yeah. When you're looking at at a hard [38:24]problem, it's [38:25]actually harder to produce a short, [38:28]elegant, concise solution than a messy [38:30]over-engineered solution. Yeah. [38:31]>> Yeah, you can brute force it, but you [38:33]know, the more elegant version is very, [38:35]very short. And that's kind of like what [38:37]you said with how this might come about. [38:39]>> This is this is Yeah, this is literally [38:41]the shape [38:42]of the type of AI approach uh we're [38:44]creating. And I think this is also the [38:46]shape of science itself. [38:50]Like science is fundamentally [38:53]a symbolic compression process [38:55]where you're looking at uh a big mess of [38:58]observations, like you know, the the [39:00]position of planets in the sky or [39:01]something like that. And you're [39:02]compressing that down to [39:05]uh [39:05]a very simple symbolic rule. You're [39:07]saying like, "Yeah, like [39:10]all these new thousands of observations [39:11]actually just all uh this one simple [39:13]equation." That's symbolic compression. [39:15]And to do this, by the way, uh you need [39:17]the model [39:19]uh to be symbolic. Like you you you [39:21]cannot fit a curve and say, "Well, you [39:23]know, that that curve is my model." That [39:25]would never be optimal. It would never [39:27]be concise or elegant enough. And that's [39:29]not what science is doing. Science is [39:30]not about curve fitting. Science is [39:32]about finding the equation, finding the [39:34]most compressive symbolic model of your [39:37]pile of observation. And that's the [39:39]process that you're trying to recreate [39:41]in software form. Like you could say [39:42]that uh the NDI approach to program [39:44]synthesis is that we are building [39:47]science incarnate, science the [39:49]scientific method in in in algorithmic [39:51]form. [39:51]>> I'm curious if you compare it to [39:54]biology. [39:56]Clearly, LLMs don't learn the way that [39:58]humans do cuz no baby reads the whole [39:59]internet. Do you think program synthesis [40:02]is closer to the way that humans learn, [40:04]or do you think that's yet a third [40:06]branch where even if program synthesis [40:08]is correct, there'll be some yet as [40:10]undiscovered third way to do it, which [40:12]is the thing that we do? [40:13]>> I think so. Uh I do think humans do some [40:17]amount of program synthesis. I think the [40:19]the way humans learn and the way the the [40:21]human mind works is very messy. It's not [40:23]like there's one simple elegant [40:25]principle behind it all. It's an [40:27]implementation of fundamental [40:29]principles, the fundamental principles [40:31]of of intelligence, which you know, I [40:33]think we can [40:35]identify these principles and [40:37]re-implement intelligence from scratch, [40:39]from first principles, in a way that [40:41]will be much more efficient than the [40:44]human brain. I think the human brain is [40:45]messy and it's it can be a good source [40:48]of inspiration for AI, but I think it [40:50]would be counterproductive to just try [40:53]to, you know, observe it and [40:54]re-implement it like [40:57]and and make it biologically plausible. [40:59]I think that's counterproductive. That's [41:00]not what we're trying to do at Indie.

[41:01]We're really trying to find what are the [41:03]first principles of intelligence and [41:06]what is the system that would best [41:08]implement them. But yeah, I do believe [41:10]the human mind does, at the highest [41:12]level, [41:14]something that looks a lot like programs [41:15]synthesis. Like we're currently building [41:17]causal models of our surroundings. Like [41:20]we're we're describing our surroundings [41:22]in our mind as, you know, a set of [41:24]objects and agents and and relations [41:27]between objects that are fundamentally [41:29]symbolic and causal in nature. This is [41:32]exactly the process that lets us [41:36]generalize so well and adapt so well to [41:39]novelty on the fly. [41:40]>> I'm curious about Indie, the company and [41:43]as you're as you're building it. [41:45]We've all here heard of the OpenAI [41:47]founding story and but something has [41:49]always struck with me is just like both [41:51]Sam and Greg say that it was a little [41:53]odd in the early days cuz they didn't [41:54]actually know what to do. It's sort of [41:56]like a bunch of people like hanging out [41:57]in an apartment. I would love to hear [41:59]kind of what's that been like for Indie? [42:01]Like what did like the day one look like [42:03]and just maybe for just people who are [42:05]interested in starting these alternative [42:06]approaches who don't have sort of a [42:09]researchy background, how should they [42:10]think about that? [42:11]>> Yeah, so we we started on day one with [42:13]the symbolic learning vision. Like we [42:15]basically knew that we wanted to do [42:18]symbolic program synthesis, that we [42:19]wanted to create a new approach to [42:21]machine learning where you replace [42:24]parametric curves with the shortest [42:26]possible symbolic models. And then the [42:28]big question was, okay, so how do we [42:29]find these models? We started from the [42:33]the the base idea, which is still the [42:35]idea that we're following today, which [42:36]is that we are doing we are going to do [42:39]uh [42:40]deep learning guided [42:42]program search. Like you have a a [42:44]symbolic search space to explore, and [42:46]it's big, it's in fact combinatorial. [42:48]You're not going to make progress if you [42:50]just use brute force. Uh it's not going [42:52]to scale. Uh you have to break the [42:55]combinatorial wall, and the way to do it [42:57]is to add is to add uh deep learning [42:59]guidance. It's actually very similar to [43:02]uh the principles that underlie [43:04]something like AlphaGo or AlphaZero. But [43:06]those were our our starting point. We [43:08]also, you know, didn't have very clear [43:10]ideas about how to how to build it. We [43:11]we tried many different things. We tried [43:13]many many different ideas. And uh it [43:16]took us half a year roughly [43:18]uh to to to get to good foundations [43:21]uh where we we could start building a [43:23]system that compounds. And I think [43:25]that's what's really important [43:27]uh when when doing a lab like this, that [43:28]you don't want to be in a situation [43:29]where you're you're constantly trying [43:31]something new, it's not reusing any [43:34]learnings, any findings [43:36]uh from the previous approaches. You [43:37]want a you want a compounding stack. You [43:40]want to build reusable foundations, and [43:42]then the next layer, and then the next [43:43]layer. And the the the of course you you [43:46]want to be building on top of the right [43:47]foundation. So don't commit to the to [43:50]the foundation layer too early, but also [43:52]make sure that at some point you're [43:54]building this this compounding [43:55]structure. And that that's that's the [43:57]situation that that we're in now. [43:59]>> Is Arc 3 the end, or will there be an [44:01]Arc 4, 5, 6? Can you keep making it [44:04]harder? [44:04]>> Yeah, yeah, I think there there will [44:06]absolutely be Arc 4 and and ARC 5. I [44:08]mean, we're currently planning ARC 5. Um [44:11]the the point of the ARC AGI benchmark [44:13]series is not to say that, well, you [44:15]know, here's this test. If you pass it, [44:17]this is AGI. [44:19]Um [44:19]instead what you're trying to do is we [44:21]are target we're targeting [44:23]uh the residual gap of fair [44:25]capabilities. Like frontier is [44:27]advancing, and we're saying, well, [44:30]uh if you compare it to you to to human [44:32]abilities, there there's all these [44:34]tasks, all these things, it's not doing [44:36]well. So, we're going to create a [44:37]benchmark to target that. Uh and so, [44:40]it's a moving target, right? It's it's [44:42]not fixed point, it's a moving target. [44:43]So, there will be ARC 4, which will be [44:46]uh in the spirit of ARC 3, but more [44:48]focused on continual learning and and [44:50]curriculum learning at longer time [44:52]scales. So, you're going to have you're [44:54]going to have fewer games, [44:56]uh but they're going to have way more [44:57]levels. And the levels are going to be [44:59]compounding, meaning that for for each [45:01]level you need to reuse stuff that [45:03]you've learned before. And then that's [45:05]going to be ARC 5. And I'm I'm actually [45:07]really really excited about ARC 5. It's [45:08]very very new and different. Uh it's all [45:10]about invention. And I mean, you you [45:13]you'll see you'll see what that means. [45:14]Eventually, I expect we'll we'll run out [45:17]of things to test. Like as uh as we get [45:20]closer to AGI, um eventually, there will [45:23]be no measurable difference [45:25]uh between human capabilities and but [45:27]like human learning efficiency and and [45:29]frontier AI. And when that happens, when [45:32]when it becomes effectively impossible [45:33]to measure the gap, this is the AGI [45:35]moment. [45:36]>> Well, then the machines will take over, [45:38]and then they will create ARC ASI 1. [45:41]>> Yes, ARC ASI 1. [45:42]>> it will continue from there. [45:43]>> yeah. [45:43]>> Yeah. If you had to put a guess, I mean, [45:46]years, decades, months? [45:50]>> Um my timeline to AGI, [45:52]you know, if you if you just try to to [45:55]extrapolate from the the current rate of [45:57]progress and the amount of investment [46:00]that's going into not just the LM stack, [46:03]but also like uh side ideas, side bets [46:06]that might work out like, you know, [46:07]India for instance. I think we're [46:10]probably looking at AGI 2030. [46:13]Early 2030s, uh [46:16]most likely. So, around the time uh that [46:19]we're going to be releasing like maybe [46:20]Arc 6 or Arc 7. [46:23]Uh that's probably going to be AGI. [46:25]>> You guys are doing a different approach [46:27]to LLMs. Um do you think there's room [46:30]for more startups to explore other new [46:33]approaches, and are there any other ones [46:34]that you think are promising that don't [46:36]have time to explore yourself? [46:37]>> Yeah, absolutely. I mean, there are many [46:39]different approaches that you could try. [46:41]I've said that compute is the is the [46:43]great equalizer. I think if you look at [46:45]the amount of compute and resources that [46:47]we've thrown at uh deep learning and and [46:50]gradient descent and and scaling that [46:53]up, if you had thrown the same amount of [46:55]investment into almost anything else, [46:58]you would also have seen ex- extremely [47:00]exciting results. Like, genetic [47:01]algorithms for instance. Uh if you try [47:04]to scale up genetic algorithms, I mean, [47:05]I'm sure you can do incredible things [47:07]with that. [47:08]Um you you could in fact probably do new [47:10]new science. Uh because uh that's based [47:13]on search, and search is the is the is [47:14]the best fit for uh automating the [47:17]scientific method. [47:18]Uh I think so right now, there's also [47:20]like approaches that uh build on top of [47:23]the current stack with their slightly [47:24]alternative like uh state space models [47:26]for instance. Uh there's uh the the [47:29]excess same architecture. Like, you [47:31]basically, you know, [47:32]Currents from Cerebras is is a stack of [47:34]things, and you you can take any layer [47:37]in the stack and try to propose an [47:38]alternative. Like, if you propose an [47:40]alternative architecture, uh you can be [47:43]doing for instance like, yeah, like more [47:44]like uh recurrent models instead of [47:46]transformers uh for the for the [47:48]architecture. Uh you or you can do even [47:51]lower level. You're going to be like, [47:52]okay, [47:53]we're still going to be training uh [47:55]parametric curves, but we're going to [47:56]get rid of gradient descent. Right, [47:58]we're going to use like search. Maybe [47:59]you're going to do new evolution. Uh, [48:01]that's the second level. And the lowest [48:03]level is uh the low the level where [48:06]where we're operating where we're [48:07]saying, "Well, actually uh forget about [48:10]curves. [48:11]Uh forget about parameter learning. [48:12]Forget about gradient descent. We're [48:13]just going to do something completely [48:15]different." Um and I think if you want [48:17]to build optimal AI, you're kind of [48:20]forced to go back to the foundation of [48:23]the stack. It cannot be like uh uh [48:26]one one layer added on to the pile. [48:28]>> So, do you think for aspiring [48:30]researchers who want to do a new neo lab [48:32]with a different approach, they should [48:33]be reading research papers from the '70s [48:36]or '80s and [48:37]go deeply in those with approaches that [48:39]were not as invested nowadays? [48:41]>> That is actually a great idea because uh [48:44]earlier in the in the history of the AI [48:47]research timeline, people were exploring [48:50]more things and very different things.

[48:52]You've had this sort of like collapse of [48:54]everything into one approach. It's It's [48:57]actually kind of a bad idea. Uh like [48:59]consider that not too long ago, like [49:02]about about 20 years ago [49:03]>> We had the collapse into SVMs, too. [49:05]>> Yeah, I mean it's it wasn't I wouldn't [49:08]describe it as a collapse because there [49:09]weren't that many people doing SVMs and [49:11]AI was a much much uh smaller field back [49:13]then, but there was this [49:15]uh [49:16]uh widespread understanding that neural [49:18]networks were were a failed approach. [49:21]That neural networks didn't work. And it [49:23]was a waste of time to to to to keep [49:25]trying that. [49:26]>> right?

[49:26]>> Yeah. No, even even in the in the in the [49:28]late 2000s. This This was This was the [49:31]sort of things. Uh basically like when I [49:33]got into into AI uh people were telling [49:36]me like, "Hey, neural networks don't [49:37]don't try that." I was like, "Yeah, but [49:39]it it looks a lot like what the brain is [49:41]doing. Like I'm I'm interested in that." [49:43]If everybody is working on something, [49:44]you are discarding ideas that will uh [49:47]actually turn out to be very productive [49:49]ideas, right? And yeah, like back in the [49:51]'70s, back in the '80s, people are [49:53]trying more things. And everything [49:54]genetic algorithms actually a very good [49:56]example of that. [49:58]Uh I think [49:59]this is an approach that has a [50:01]tremendous amount of potential, but [50:03]there's there's not too many people are [50:05]looking into scaling up uh deeply.

Want one of these for your own audio or video? Transcribe your own

[50:07]>> Are there any characteristics that you [50:09]would be looking for? I mean, is it as [50:11]simple as like, if there's a scaling law [50:13]that could happen, [50:15]then even if it's a different, or is it [50:18]is that too like, you know, thinking by [50:21]analogy? [50:22]>> I think you are looking for approaches [50:25]that scale. [50:26]>> Yeah. [50:26]>> Uh I think it's it's a non-starter. If [50:28]you're working on something, but the [50:30]only way to increase the capabilities of [50:32]the system is to have uh human engineers [50:35]and researchers spend time on it, [50:37]it will not work. Cuz even if the idea [50:39]is very clever and very elegant and [50:42]works really well, capabilities are [50:44]going to be bounded. They're going to be [50:45]bounded by human investment, right? You [50:47]want to be in a setup where the system [50:50]can improve its capabilities with no [50:52]human in the loop, with no human input. [50:54]>> like, don't just do it the way we did it [50:56]like 10 years ago, do it with the idea [50:59]that recursive self-improvement is baked [51:01]in at the beginning. [51:02]>> Yeah, not necessarily recursive [51:04]self-improvement, because deep learning [51:05]for instance is not is not recursively [51:07]self-improving, but with the idea of [51:10]scaling up with no human bottlenecks. [51:13]You want to remove the human from from [51:14]the improvement loop. The great strength [51:16]of deep learning is that the models got [51:18]better and better simply by adding [51:21]uh training training compute and [51:23]training data. I mean, it's it's a [51:25]little bit of caricature, because of [51:26]course, just adding these factors [51:29]requires a lot of human involvement, but [51:31]basically that's the idea that you have [51:32]this decoupling from uh [51:34]the improvement curve and the amount of [51:36]human effort that's needed to be [51:38]injected into the system. [51:39]>> I guess or human effort that's already [51:41]happened. Cuz the LLMs do actually [51:42]require an enormous amount of human [51:44]effort. It's just there was the human [51:45]effort to build the internet, and we'd [51:46]already built it. [51:47]>> Yeah. Actually, less and less now [51:50]that we are doing training in the [51:52]interactive environment environments cuz [51:55]then you only need a small amount of [51:57]human effort to create the environment, [52:00]and from that small amount of effort [52:01]you're creating exponentially more [52:03]training data. But at first, I think to [52:05]sort of like [52:07]prime the machine, you need this [52:08]tremendous amount [52:10]of of uh [52:12]uh human-generated abstractions and [52:14]coding in text data. And if you if you [52:17]don't start from that, you you cannot [52:19]get the system [52:20]into this loop. [52:21]>> Do you have any advice for me starting a [52:23]open-source project? Things to do, [52:25]things not to do in in the AI space [52:29]because [52:30]I am uh [52:32]not sure how I signed up for this in the [52:34]last 14 days, but I think I have I don't [52:37]know on the order of like 10 to 30,000 [52:39]people using G stack every day.

[52:42]>> That's wild. [52:43]>> Yeah. [52:44]I don't know like I have a job. [52:48]I guess like you know what was it like [52:49]to start Keras, and how did you keep [52:52]maintaining it? How what's a good [52:53]maintainer? Like what did you learn from [52:55]that? I don't know. This might be a [52:56]whole hour. It was [52:57]>> Yeah, I mean [52:58]lots lots of learnings from [53:01]from growing growing Keras. [53:03]So right now I'm less involved with it. [53:05]There's a big team at Google that's [53:07]working on it and they're doing an [53:08]amazing job. [53:09]>> So it is possible to not to you know to [53:11]put people together to like [53:13]>> It is possible to start something. Yeah, [53:14]it's possible to start something. [53:16]>> That's a release.

[53:17]>> And and and then get more people [53:18]involved and at some point it becomes [53:20]its own thing. It's just you know [53:22]it used to be your baby, but now it's [53:24]all it's all grown up. It's all adult [53:25]and and and going on with its own life. [53:28]So if you ask me the the factors that [53:30]really made Keras successful, [53:32]um and first of all is that there was [53:34]this big focus on [53:36]uh making the the API simple and [53:38]intuitive. There was this big So, big [53:40]focus on usability. [53:42]And this was inspired by scikit-learn. [53:44]Like scikit-learn was sort of like the [53:46]OG [53:47]uh machine learning library for Python. [53:49]And what made it successful was that it [53:51]was so easy to get started with it. [53:54]So, at first I was like, okay. Uh I'm [53:56]going to package uh all this [53:57]functionality I've created under a [53:59]really, really simple API. It's going to [54:00]be like the scikit-learn API. That was [54:02]like the big idea. The focus on [54:04]usability is not just making sure the [54:07]API is simple. It's also making sure the [54:09]entire um onboarding experience is nice [54:12]and easy. Like the docs should be very [54:14]informative. You should, you know, the [54:16]docs should be [54:17]not just telling you about how to use [54:20]this thing, but they should actually be [54:22]teaching you about the domain in the [54:24]first place. Because the the folks who [54:26]land on your website, they're not going [54:28]to be already deep learning experts.

[54:29]They're going to be people looking to [54:31]maybe start using deep learning. And so, [54:33]you you have to teach them not just how [54:35]to use the tool, but what the tool is [54:37]good for um and and the entire field [54:40]around it. And then uh you know, you [54:41]have to put a lot of investment into [54:43]community building. [54:45]Um one thing we uh we did a bit at [54:47]Google, in fact, you know, Google made [54:49]it kind of kind of difficult and and I [54:50]was sad about that, is uh [54:53]hire your power users. Like hire your [54:56]fans. This this is a really, really good [54:57]idea. Like find find the the most [55:01]enthusiastic users from your community [55:04]uh and and and just hire them on your [55:05]team.

[55:06]>> Amazing. [55:06]>> Yeah. And uh [55:09]they're the [55:09]always the best people, right? [55:11]>> All right, time to start gstack.org, [55:13]uh put in a bunch of my own money, and [55:15]then hire a bunch of people to work on [55:16]it. That sounds good. I think you've [55:18]been a leader and pioneer, and we're so [55:20]lucky to have you sit with us. There are [55:23]people watching who are at the beginning [55:25]of their, you know, adulthood even, like [55:27]their [55:28]certainly their professional careers. [55:30]Uh or actually like people just around [55:32]the world. They're like trying to [55:33]understand like what does this mean as [55:36]intelligence becomes broadly applicable, [55:39]like [55:40]what would you tell you know, if you [55:41]were 18 right now, what would you tell [55:43]them?

[55:44]>> Yeah. I mean, there's a lot of people [55:46]today with very pessimistic and negative [55:50]takes about the the rise in the [55:52]capabilities. They say, oh, you know, [55:54]I'm going to be out of a job soon. [55:56]There's going to be mass unemployment. [55:58]AI is just going to take over [56:00]completely. And my my take is actually, [56:03]you know, the more you know, the more [56:04]expertise you have with things like [56:06]programming for instance, the better [56:08]you're able to use and leverage these [56:12]tools for your own benefit. And with the [56:15]right kind of expertise, [56:16]all this AI progress is actually [56:18]empowerment. Like it's something that [56:20]you can leverage for yourself. I mean, [56:22]that's that's exactly what you did with [56:23]your project, right? [56:24]>> Yeah. [56:25]>> And yeah, more people should have this [56:26]mindset of trying to learn as much as [56:29]possible, not just about AI, [56:32]but about the the domain that they want [56:34]to apply AI to. All right, so that they [56:37]should they should seek to [56:39]turn this [56:41]this this new development into an [56:42]opportunity, into into a tool they can [56:44]use for themselves to improve their own [56:46]lives. I think that's that's the right [56:48]mindset because, you know, you're not [56:49]going to stop AI progress. I think I [56:52]think it's too late for that. And so, [56:54]the next question is, okay, like AI [56:55]progress is here. [56:57]It's actually going to keep [56:58]accelerating. How do you make use of it? [57:00]How do you leverage? How do you ride the [57:01]wave? That's the question to ask. [57:04]>> going for a couple hours cuz I'm sure we [57:06]could. Francois, thank you so much for [57:08]spending time with us. [57:09]>> Thanks so much for having me.

Was this transcript helpful?

Beyond YouTube

Got your own audio or video file?

Typist turns your own audio and video into accurate, timestamped transcripts. The same speed and export options you just used, now for your lectures, meetings, podcasts and interviews. No signup to start.

Done in seconds
99 languages
TXT, DOCX, PDF, SRT, VTT

Transcribe your own

lecture-recording.mp3

Transcribing

00:00

00:05

00:11

Report a copyright issue

More free tools

Discover more tools

Other AI tools from Typist, free to use, no signup

YouTube Video Summarizer cover

YouTube Video Summarizer

Key points and timestamps from any YouTube video, powered by AI

Transcribe on the Go cover

Transcribe on the Go

Record audio from your microphone and get an accurate transcription in seconds

Media Converter cover

Media Converter

Convert audio and video files up to 5 GB. MP3, WAV, M4A, FLAC, OGG. No signup

Extract Audio From Video cover

Extract Audio From Video

Pull MP3 audio from MP4, MOV, and WebM videos without uploading the file.

Audio Compressor cover

1-click presets

Audio Compressor

Shrink audio or video to an email-friendly size. Pick a preset, we handle the rest

Compress a file

Video Compressor cover

Video Compressor

Compress MP4, MOV, and WebM videos in your browser. Export a smaller MP4.

Compress a video

Video Splitter cover

Video Splitter

Split videos by timestamp, equal parts, or fixed duration. No upload needed.

Mic & Camera Test cover

Mic & Camera Test

Check that your microphone and webcam work, with a live level meter and a device picker