Lecture: Are We (Still!) Not Giving Data Enough Credit? - TIB AV-Portal

Lecture: Are We (Still!) Not Giving Data Enough Credit?

00:00

4

Related Material

Heidelberg Laureate Forum Foundation

Formal Metadata

Title

Lecture: Are We (Still!) Not Giving Data Enough Credit?

Title of Series

11th Heidelberg Laureate Forum (HLF), 2024

Number of Parts

11

Author

Contributors

Wienhard, Anna (Moderation)

License

No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.

Identifiers

10.5446/69288 (DOI)

Publisher

Heidelberg Laureate Forum Foundation

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Abstract

For most of its existence, Visual Computing has been primarily focused on algorithms, with data treated largely as an afterthought. Only recently, with the advances in AI, did the field start to truly appreciate the singularly crucial role played by data, but even now we might still be underestimating it. In this talk, I will begin with some historical examples illustrating the importance of large visual data for both human and computer vision. I will then share some recent work demonstrating the power of very simple algorithms when used with the right data. The 11th Heidelberg Laureate Forum took place from September 22–27, 2024. #HLF24

Speech

Text

Image

00:00

Musical ensembleNeuroinformatikMachine visionThermische ZustandsgleichungComputer engineeringComputer graphics (computer science)Computer animationMeeting/Interview

00:57

Suite (music)NeuroinformatikComputer programmingField (computer science)Web pageMachine visionComputer graphics (computer science)Meeting/InterviewLecture/Conference

01:44

Field (computer science)BuildingInheritance (object-oriented programming)Information securityForceCluster samplingGraphics processing unitDemosceneDesign by contractVideoconferencingCognitionMotion blurRead-only memoryInformationAlgorithmSemiconductor memoryInformationMultiplication signElectric generatorComputing platformRight anglePixelTheory of relativityAlgorithmCAN busField (computer science)Endliche ModelltheorieDisk read-and-write headProcess (computing)Wave packetRevision controlBitVideoconferencingCondition numberFile viewerCategory of beingComputer animationXML

05:19

NeuroinformatikStudent's t-testMachine visionMultiplication signCASE <Informatik>Digital photographyBoom (sailing)Observational studyPhase transitionComputer animation

05:52

AlgorithmThomas BayesArtificial neural networkMetropolitan area networkEquals signSocial classMachine visionNeuroinformatikField (computer science)Division (mathematics)Right angleArtificial neural networkSoftware testingAlgorithmMedical imagingTheoryNegative numberMultiplication signDifferent (Kate Ryan album)SpacetimeAdditionBitTask (computing)Virtual machinePosition operatorSoftwareLecture/ConferenceMeeting/InterviewComputer animation

08:03

Stokes' theoremNichtlineares GleichungssystemMultiplication signNumberEvent horizonPower (physics)AreaMathematicsEntropie <Informationstheorie>AuthorizationComputer simulationBitMereologyComputer graphics (computer science)Line (geometry)SimulationRight angleNatural languageRandomizationSound effectNatural numberReading (process)Lecture/ConferenceMeeting/InterviewComputer animation

10:06

Time evolutionNichtlineares GleichungssystemStokes' theoremNetwork topologyEvolutePattern languageNichtlineares GleichungssystemParameter (computer programming)10 (number)Lecture/ConferenceMeeting/InterviewComputer animation

10:50

Network topologyMedical imagingStudent's t-test2 (number)Lecture/ConferenceMeeting/InterviewComputer animation

11:31

BefehlsprozessorDigital photographyQuery languageGenetic programmingComputer-generated imageryMedical imagingSet (mathematics)FlickrDistanceBitNeuroinformatikComputer graphics (computer science)Pattern recognitionTask (computing)Point (geometry)Computer animation

12:05

Decision tree learningString (computer science)DemosceneDigital photographyQuery languageGenetic programmingComputer-generated imageryAlgorithmAbelian categoryState observerSemiconductor memoryTerm (mathematics)Data storage deviceVisual systemRead-only memoryCognitionCurveChannel capacitySlide rulePlastikkarteCategory of beingPattern recognitionStreaming mediaObject (grammar)Task (computing)TouchscreenDifferent (Kate Ryan album)State of matterPoint (geometry)Medical imagingMultiplication signDemosceneBitVotingDigital photographyAuthorizationAlgorithmPairwise comparisonSound effectSoftware testingStudent's t-testCASE <Informatik>Category of beingQuicksortSemiconductor memoryGoodness of fitType theoryResultantAudiovisualisierungGoogolSoftwareOptical disc driveInstance (computer science)Artificial neural networkReplication (computing)Task (computing)StatisticsSocial classSlide ruleObservational studyComputer architectureNatural numberSet (mathematics)NumberTexture mappingState of matterObject (grammar)PlanningMereologyDifferent (Kate Ryan album)Right angleDifferential equationVirtual machineRepresentation (politics)AreaLevel (video gaming)Channel capacityVisualization (computer graphics)Computer animation

19:48

Computer animationPanel paintingVisualization

20:22

Sign (mathematics)Texture mappingEvent horizonTexture mappingSource codeEndliche ModelltheoriePoint (geometry)Event horizonDifferent (Kate Ryan album)Distribution (mathematics)Computer animation

21:27

Workstation <Musikinstrument>SpacetimeFormal languageAlgorithmCompilation albumMedical imagingResultantWave packet2 (number)Linear regressionProcess (computing)Computer animationXML

22:40

MereologyParameter (computer programming)Data modelScale (map)Slide ruleComputer-generated imageryChannel capacityEndliche ModelltheorieWorkstation <Musikinstrument>Channel capacityEndliche ModelltheorieError messageWave packetResultantPlotterVirtual machineComputer animation

23:35

Real numberMedical imagingElectric generatorWave packetResultantAttribute grammarEndliche ModelltheorieSet (mathematics)AlgorithmComputer animation

24:09

Digital photographyInterpolationWave packetSet (mathematics)LinearizationComputer animation

24:47

SpacetimeInterpolationMorphingData modelLogic synthesisEndliche ModelltheorieTaylor seriesComputer-generated imageryVisual systemMedical imagingCombinational logicDimensional analysisLinear subspaceElectric generatorInterpolationLinearizationSpacetimeEndliche ModelltheorieSet (mathematics)SubsetComputer animation

26:21

WeightSpacetimeEndliche ModelltheorieVector graphicsLinear mapInterpolationData modelIdentity managementAttribute grammarPrincipal component analysisBasis <Mathematik>Sample (statistics)Rule of inferenceInversion (music)Revision controloutputAxonometric projectionDistribution (mathematics)Inversion (music)Endliche ModelltheorieReal numberMedical imagingWave packetStability theoryLinear subspaceSoftwareStandard deviationWeightSet (mathematics)ManifoldInformationDampingCASE <Informatik>Electric generatorVector spaceQuicksortTask (computing)Point (geometry)Sampling (statistics)Different (Kate Ryan album)Arithmetic meanEigenvalues and eigenvectorsBackpropagation-AlgorithmusDigital photographyPhase transitionRight angleComputer animationEngineering drawing

29:52

outputObject (grammar)Wave packetLinear mapVisual systemTask (computing)Electric generatorEndliche ModelltheorieMedical imagingGoodness of fitVideoconferencingInternetworkingRoboticsComputer animation

31:06

ParadoxParadoxDeep Blue (chess computer)RoboticsComputer chessMedical imagingElectronic program guideComputer animation

31:54

Plot (narrative)ParadoxEndliche ModelltheorieMachine visionFormal languageRoboticsComputer-generated imageryVideoconferencingRoboticsMedical imagingTwitterGradientFeedbackInfinityPredictabilityMultiplication signSurface of revolutionGoodness of fitToken ringRight angleDampingBitLine (geometry)Set (mathematics)Formal languageVideoconferencingAlgorithmScaling (geometry)Point (geometry)Endliche ModelltheorieTransformation (genetics)Closed setGradient descentWater vaporComputer animation

36:44

Internet forumBeat (acoustics)Multiplication signCuboidMeeting/InterviewLecture/Conference

37:18

Mountain passPresentation of a groupOffice suiteEndliche ModelltheorieMultiplication signCompilation albumExtension (kinesiology)Machine visionLecture/ConferenceMeeting/Interview

39:04

Latent heatTexture mappingLecture/ConferenceMeeting/Interview

39:37

Game controllerMatching (graph theory)AlgorithmRandom matrixCASE <Informatik>Inheritance (object-oriented programming)Point (geometry)Meeting/InterviewLecture/Conference

40:36

Function (mathematics)Wave packetEndliche ModelltheorieTwitterElectric generatorLecture/ConferenceMeeting/Interview

41:14

Entropie <Informationstheorie>InformationComplex (psychology)Endliche ModelltheorieRight angleBitPoint (geometry)Domain nameReal numberMeeting/InterviewLecture/Conference

42:19

Internet forumMusical ensembleMeeting/InterviewComputer animation

Transcript: English(auto-generated)

00:25

it's really my pleasure to introduce the first speaker, our first Laureate Lecturer, Alexei Efros, who is a professor for Electrical Engineering and Computer Science at UC Berkeley, and the recipient of the ACM Prize in Computing in 2016

00:41

for groundbreaking data-driven approaches to computer graphics and computer vision. He is also a member of the HLFF Scientific Committee, and he will talk about, are we still not giving data enough credit? Thank you.

01:04

Good morning. Welcome to the first day of the scientific program. My field is computer vision, computer graphics, and all my career I've been worried about data

01:23

and arguing for data for more than 20 years, and it was a somewhat lonely fight for a while, but now it feels like maybe everybody's in agreement, we're all maybe on the same page, and I thought, should I still talk about data? And then I thought, maybe I should,

01:43

because there is a bunch of documents being circulated around Silicon Valley this summer, this is one of them, talking about the future of AI, and it's a very long document, and I'm just gonna summarize it with a wordle for you, so you'll get the gist, you know, it's AGI, AI, algorithmic, compute, efficiency,

02:02

GPT, human, models, progress, et cetera, et cetera. And what struck me was that data was this tiny little thing. So maybe, even though we talk about data now, maybe we should talk about data some more.

02:21

And so, hopefully this is going to be useful, and I will hopefully leave some time at the end for questions. So, because really, much of the magic that we are seeing right now in AI, and generative AI in particular, is due to the data. And let's start from the beginning,

02:42

because I like visual data, let's look at a picture. So this is, if you go to Paris, take the Tejave, go to Paris, go to the Musee d'Arcee, this is one of the wonderful Monet paintings hanging there, Garçon-Lazac, and it's really, I think, very evocative. You feel like you're in this

03:02

turn of the century train station, and you hear the hustle and the bustle, and this train is arriving at the platform right at the viewer, yes, the steam engine. You see the steam engine, yes? Are you sure you see the steam engine? Are you sure it's not just a splotch of paint on a canvas or something looks like a penguin

03:22

or a champagne bottle? Where is the steam engine? Well, the steam engine is not really in the pixels, is it? It's really in our heads. And each of you is probably seeing a slightly different version of that steam engine, depending on which steam engine museum you went to as a child.

03:41

And we are really good at guessing from sparse data. Here is a wonderful video by Antonio Taralba of our friend Rob Fergus, and it's a very blurry video, but I think you have no problem seeing what Rob is doing right here, yes?

04:00

In fact, I think you're doing a little bit too good of a job, because actually, he's talking on his shoe, he's gonna compute in his trash can and listening to his beer cans, check out the mouse, check out the printer, right? So again, it's not in the pixels, it's in our heads.

04:24

Mo Shebar, a neuroscientist, likes to say, our perception relies on memory as much as it does on incoming information, which blurs the border between perception and condition. And Lance Williams says, mind is largely an emergent property of data.

04:41

And given that, in the old days, data didn't really get that much respect. It was all about the algorithms. If you want to get famous, if you want to publish your papers, you have to come up with an amazing algorithm, okay? And you work on your algorithm for a year, and then maybe a couple of weeks before the deadline,

05:01

you need to figure out, okay, I need to run it on some features, and then the night before the deadline, you say, oh, I need to find some data to run it on, and then you submit it to a conference, and you know, you get famous, right? Well, that kind of mentality has not served us well in these AI-related fields.

05:20

Let's start with an old example, a case study. This is where I was a graduate student in the 90s, in computer vision. It was a very lonely time to be in computer vision because nothing worked, absolutely nothing worked. It was kind of depressing. And then, late 90s, something started to work, and one of the first things to start to work

05:41

was face detection. You know, you give a computer a photograph, and boom, boom, boom, it finds faces. And it was so magical, it was finally something was working. And, you know, if you were to take a class in computer vision, even now, 25 years later, 30 years later, there is, you know,

06:02

one paper that is always attributed to kind of changing the field in face detection. Anyone who had taken a computer vision class remember that paper? Anyway, the paper is called Violi and Jones,

06:23

but the thing is, it wasn't the only one. In fact, there were three papers that all did very, very good, very similar performance on this face detection task. They all, you know, they all got, you know, awards. It's not like one, they were forgotten. All three got Test of Time awards.

06:41

All three are papers, all three are very well cited. But somehow, in the textbooks, we still only talk about the last one. Why is that? Well, my theory is it's because, you know, the first paper, it was using this very, very ancient, boring algorithm called neural networks. You know, and in the 90s, neural networks were,

07:02

yeah, cool kids would not work with neural networks. That was like, you know, old school. So, you know, nobody really paid attention. The second one, okay, the algorithm was naive-based. Yeah, a little bit naive, right? The third one, boosted cascade. It's a beautiful algorithm. I still teach it in my machine learning class. It's a gorgeous algorithm.

07:22

But did it really make any difference? The performance was basically the same. And the thing is, no. Actually, the algorithm did not matter here at all. What mattered is that these were the three papers, the first three papers that realized that in addition to the positive data, images with faces in it, it would be also nice to put some negative images,

07:42

some images without the faces. And that's really what made this work. But in our kind of, in our psyche, we don't remember this. So I call this our scientific narcissism. All things being equal, we prefer to create our own cleverness. And you know, the algorithm is the cleverness

08:00

because the data is just sitting there, right? And I think this has kind of made us go along the wrong path a number of times historically, and I think we need to learn our lesson. There's a wonderful paper out of Google, which is 15 years old now, but still very good read,

08:21

called The Unreasonable Effectiveness of Data, where authors argue that, you know, for parts of our world, you know, it can be nicely explained by elegant mathematics, right? Physics, chemistry, astronomy, et cetera. But there are also things where elegant mathematics did not make a great headway.

08:42

Psychology, sociology, genetics, famously economics, and AI, okay? And I think it's because these are all things that are evolutionary in nature, right? Things which evolved over huge amounts of time

09:03

with a lot of randomness, a lot of entropy along the way and things, just a lot of these evolutionary things are the way they are just because of some chance, because somewhere down the line there was some random event happened and now we're stuck with it.

09:20

And these are exactly the areas where we see this amazing power of data, this magic of data. So maybe to give you a little bit of a kind of a example of what I mean, let's say that, you know, you are doing computer graphics

09:41

and you want to simulate smoke for your feature film. So you have a smoke simulator and, you know, it's, you know, it's Navier-Stokes, you plug it in, you run the simulation and you get really nice looking smoke, okay? And, you know, computer graphics have been doing this for many decades now,

10:03

just beautiful smoke, right? There are also things like, let's say, the bark of this gigantic redwood tree. How would you simulate the pattern of the bark on this tree? This is actually a much, much harder problem

10:21

because now we are going into kind of evolution. Now the pattern of bark of this tree depends on the genome of the tree, of where it was growing, of whether they had squirrels in it, you know, what were the weather conditions, what was the climate, et cetera, et cetera. So you could still write out

10:41

some long parametric equation to describe all of that, but it will have, you know, tens of thousands of parameters. Much easier to just take pictures of a lot of trees and do this nonparametrically, okay? So, and the nice thing is that now

11:00

we don't have problem with data. We have a data deluge and so now it becomes possible, okay? And so with enough data, you know, frankly brain dead lookup, aka the nearest neighbor classifier, works surprisingly well. So here is some example from my own past. This is a work with my second graduate student James Hayes

11:22

where we said, okay, we have an image, we want to, let's see, we want to get rid of some foreground and then there is a hole, we want to fill in the hole. How do we fill in the hole? Well, what we did is we downloaded two million images on Flickr. This was 2007, you know, this was the biggest data set ever in 2007. And then just found kind of most similar in L2 distance,

11:43

most similar image from the data set and with a little bit of computer graphics, just fill it in, okay? And it just works, okay? Then the following year we thought, oh, we can actually apply it for some recognition tasks. For example, given an image,

12:00

we can ask where on earth was this picture taken? And again, download, at that point we had six million images, ah, three times more data. And we just found, you know, a set of closest images, nearest neighbors of that image and because they had GPS location tags with them, we basically let them vote on the map of the world.

12:23

And the biggest vote, in this particular case, it's somewhere in southern Utah, was actually often the correct answer, okay? So, interestingly again, little history of science story here, 15 years later, so this was 2008, 15 years later, neural networks came around

12:44

and some folks from Google decided to kind of relook at this task, this geo-localization task, using neural networks. And so they, instead of nearest neighbor, they used their deep network architecture, latest and greatest, and you know, they got much, much better results.

13:01

But look, there is a little bit of a trick here. Two things changed. The algorithm, from nearest neighbor to a deep network, but also the data size has changed. So luckily, I was a reviewer on this paper. So, you know, in my review, I said, you know, it would be nice to have an apples to apples comparison, right?

13:23

Keep one of them fixed. And the authors very kindly provided this comparison in their rebuttal and guess what? It turned out that if you keep the data size constant, the fancy schmancy Google neural network was doing no better than simple nearest neighbor.

13:42

In fact, in some regimes they were doing a little bit worse, okay? No, I'm not saying forget neural networks, you just need to just do nearest neighbor. No, no, no, no. Of course, neural networks have many other properties and they can do many more things than just nearest neighbors. But in this particular setting, in this particular task, it was really data

14:02

that was doing all the main lifting, right? And so the neural network basically was acting like a nearest, a fast nearest neighbor machine. And so it's very, very important whenever you're doing, to try to figure out what is the contribution of the algorithm versus what is the contribution of the data.

14:22

So the good news is really stupid algorithms, lots of data, get unreasonable effectiveness. Now, the folks like myself who kind of always want to be some sort of biological plausibility, you know, we can ask, but is this really like biologically plausible?

14:40

Do humans remember so much data? Do they remember millions of images? Well, luckily this very question has been answered by my dear friend and colleague, Odd Oliver from MIT, looking at the capacity of human visual memory, okay? Now interestingly, there was actually some studies in the 70s by Standing who showed that people

15:04

were remarkably good at remembering a large number of images. They had something like 83% recall on 10,000 images. And the problem there was it wasn't clear how much people were actually remembering because what he was showing his subjects is

15:22

very, very different images. So, you know, one was a beach, the other was a bedroom. And so it wasn't clear, were you just remembering just, you know, the type of a scene or maybe a few details or were you remembering really like a very dense representation, like a photo, almost like a photographic memory copy?

15:41

And this is what Odd and her students decided to evaluate. And Odd has kindly provided me with her slides. So we are actually going to do a, you know, audience participation psychophysics experiment with you today. So we are basically going to evaluate

16:02

how much you can remember visually. And we are going to look at, just like Standing, different object or different categories of objects. But we are also going to look at the same object but different types of objects, okay? Different instances of the same class

16:21

or also maybe even the same instance but different state, okay? And we'll see how well we are doing, okay? So basically, your task is to see if you have seen something before, okay? So I'm gonna show you images, they're gonna go really, really fast. And the plan is you're gonna clap when you see something for the second time

16:43

or, you know, more than once, okay? Sounds good? Are you guys ready? It's gonna go fast, okay? Ready? Really ready? Okay, go.

18:15

Go the whole five and a half hours, but I think you're starting to get the gist. Now let's test you.

18:22

Which one have you seen? A or B? A. Look at that. Which one have you seen? A or B? A. Very good. What about here? B. Uh-oh. How many As? How many Bs? Okay, it's B. Okay, good. Okay, results.

18:41

Okay, so people did very, very well in this task, even better than standing. So this is basically the replication of standing, 92% recall, correct recall for different categories. But what about if you have a harder task, the same category or, you know, same object, different state?

19:01

Well, so the amazing result was, it was basically almost as good. So it seems like we're really remembering quite a bit about the visual data. Now, does it mean that all humans have photographic memory? Well, they did another test for that.

19:22

By adding, sprinkling in some images that they look, they have the statistics of natural scenes, but they are nonsensical. Meaningless textures that have the first and second order statistics of scenes from the particular human jelly paper, okay? So let's do another little extra test.

19:42

Ready? Okay.

20:29

Yeah, I think you get the point. Basically, with these random textures, they perform as a chance, okay? So humans do not seem to just memorize.

20:41

There is something more interesting going on. There is something happens that is actually very, very interesting. And indeed, you know, raw memorization, these nearest neighbor methods, you know, they're not great for rare events, what's called the long tail of the distribution, right?

21:02

The really rare stuff, you know, you maybe never seen it, or maybe you've seen, you know, once or twice, and these methods do not do well. And this is exactly where these modern generative models come in, because they are somehow able to interpolate and compose data from different sources

21:22

and do something that is better, that's something that's more than just memorization. Okay, so you've probably all seen this kind of impressive compositionality of, you know, penguin or dolphin in the outer space, you know, never happened before,

21:41

there's no data for this, and yet, you know, it's able to do this kind of composing. But, you know, again, you know, everybody is always all excited about, you know, the stable diffusion or whatever, the fancy algorithm or the fancy language features, and we're again maybe kind of forgetting about the data. And so here is a little experiment that I did.

22:00

So I took three very, very different text to image generating approaches, and I gave them the same prompt, squirrel reaching for a nut. And these are the three results. And, you know, you might like one of them more than others, but they're all three doing a reasonable job, I think. Yes, you will all agree. But they're all very, very different.

22:20

The first one is DALY, it's diffusion-based. The second one is outer-regressive, Google Partee. And the third one is actually GAN-based. So the algorithm is absolutely very, very different. But the results seem not that different because they're all trained on more or less the same very large amount of training data, okay?

22:41

In fact, this is a nice plot by Taysom Park who showed that basically what matters for the quality of results of these models is how much capacity they have for storing the training data. The bigger the capacity, the bigger the ability to hold in the data, the better they do,

23:01

the less the FID error is, for example here, okay? But still, it is kind of magical. You can give this machine a prompt, a green creature made of leaves and vines bursting out of the ground, ready to attack. And it just does it, right?

23:22

That is kind of magical, right? How does it do this? And it seems so different from what presumably is in the training data. Well, this is exactly what we wanted to study. So we started looking at the influences of real training images on the results

23:43

on these generative models, okay? This is called data attribution, okay? And so in this paper, I'm not gonna talk about algorithm, but basically what we looked at is for this image, we went to the data set, five billion images of training data, and we asked which images from the training data

24:01

are most influenced the generation of this synthetic image. And so here is the results, are you ready? It's kind of interesting, isn't it? I kind of first turns out that these data sets actually contain things that do not look like real photographs.

24:21

This look actually kind of like paintings. And second, you know, they're kind of, maybe now it's not looking as magical because okay, it's still kind of interesting, but it seems like it's drawing from a lot of the data that's already in there in the training set. So it's maybe not doing nearest neighbors, but maybe it's doing something

24:41

like linear interpolation potentially, right? Because it's maybe not that crazy, and that's very interesting. So to paraphrase Arthur C. Clarke, interpolation is sufficiently high dimensional space is indistinguishable from magic. And the thing is that maybe that space

25:00

is not really that crazy high dimensional to begin with. So for this, let's again go back into the 1990s where people were very excited about these models that basically were able to generate novel faces by just putting, looking at the linear subspace

25:25

of a set of existing faces. So you take a bunch of images for faces, maybe like 200, 300 or something, you put them in a linear subspace, and then you just see if you can create new faces by linear combination of these existing faces.

25:43

And what was found is that it's actually, it works really, really well. And in fact, one of the classic papers is from Blans and Vetter here in Germany, 1999, I think, where they showed that yeah, from 200 people, basically it can reconstruct all of us.

26:03

It's kind of interesting. There are more people on Earth than there are faces, because you can make a lot of us from just a limited subset. And this is very, very interesting. But back in the 90s, it never really went, people tried it for anything other than faces that didn't really work.

26:21

But in this paper this summer, we decided, let's try this, try to get another go. And so instead of looking at linear spaces of faces, of images, or maybe some latent vectors, we decided to do something even crazier. We're going to do linear subspaces

26:44

of networks themselves, okay? What does this mean? So we actually took a bunch of generative models, so the diffusion models, and then we took all their weights as a vector, right? Every network can be described by a set of weights.

27:01

We took all of those weights as vectors, and basically made a linear subspace on those weights. And then what we were able to show is that you can now, by just linearly combining these weights, we can create a new network completely from scratch

27:21

without any back propagation at all. And that new network can produce images, for example, of me, even though I was never in the training set, okay? And it's not just one image, it's actually a proper model. It's a model that can generate as many images as you want. So what's the story here?

27:40

Basically the idea is that we create, take a data set of models, 60,000 of them actually, and then just embed them in some sort of a linear subspace we call it weights to weights, okay? And now we can basically just treat them as points in this linear subspace.

28:01

How do we get so many models? Basically what we did in this case is we took a standard pre-trained stable diffusion model, and then we fine-tuned it for the task of being specialized to a particular individual. So we basically created, we took 60,000 different people and fine-tuned our model to generate only images

28:23

of that particular person, okay? So now we have a data set of 60,000 models, and now we can do wonderful things. We can, for example, we can sample new faces they're just going into this linear subspace and just random sampling, and so here is a new face

28:42

that will never have been before. We can edit models, for example, if we have some models that have beards and mustaches and others that don't, we can basically just take this model and move it towards, add a vector towards having a beard, and there you go.

29:02

It just propagates that information into this new network, and then everything it generates of this guy has a beard of it, okay? Or you can do, as I showed before, you can do inversions. You can take a new image, and then you essentially solve for the weights that would be,

29:23

we'll reconstruct this model that would generate images from that person, okay? For example, me, okay? So here are some inversions of real images into these models, and then you can, of course, play with different things. You can take something that's not an actual photograph,

29:41

for example, a cartoon or a sketch, and it basically projects it onto the manifold of real images, okay? And so you can imagine what it might look like. And you can also, you know how they say that, you know, pets often look like they're owners, so we wanted to check this and see what it can do there,

30:02

okay? And of course, it also, not just for faces, it also works for other things as well, okay? So, you know, but linear or not, you know, we still need a lot of data, okay? And in a way, when people ask me, you know, is this thing gonna work, is that gonna work?

30:21

I always look at how much data is available, because I think that's really the best predictor for success. For example, you know, text data, tons of it available. You know, two millennia of text is all stored on the internet nowadays. And you know, it works very well. Image data, not as much, but you know,

30:43

it's getting there, so the image generation is getting pretty good. Videos, actually not that much useful data so far, and the video models are not as good as the image models. And then if you look at something like robotics, where there's basically no data, it just doesn't work, okay? So the availability of data is actually

31:01

a pretty good marker for, you know, whether it's gonna succeed or not. Because, you know, this kind of explains this idea of a Moravec's paradox, right? The Moravec's paradox is this idea from the 80s, which is that in AI, often the hard things are easy, and the easy things are hard.

31:21

And I believe that I think on Thursday, you're gonna be having a much more in-depth discussion of that, but kind of the classic example of this is this image, so you have Garry Kasparov being defeated by Deep Blue, what is it, 30 years ago. And the interesting thing here is that Deep Blue is playing really great chess, but there's still this guy to move the pieces.

31:43

And 30 years later, we still need the guy to move the pieces, we still don't have a robot that can reliably move the pieces on the chessboard. Because that turned out to be a much harder problem. And again, you know, with all of these fancy schmancy large language models, they can do a lot of things,

32:02

but at least for now, they're having trouble figuring out, you know, how many times does a blue line cross a red line, right? So, but this is okay, it's gonna get better, we'll get more data, and the interesting thing is I think the future is hopefully going to be,

32:24

you know, brightened, it's probably going to be with these more sensory data, because you know, text data, we're almost to the end of it. You know, we had this 2,000 years worth of human culture and history, we went through this.

32:40

Going forward, we only have Twitter and Reddit. It's not gonna be quality data. So I think we're kind of towards the end of getting what we could get from text. But image data, video data, robotics data, infinite supply, it's not very good data, it's much more noisy data, but there is infinite supply of it.

33:02

And so I think in that sense, there is much, much more exciting things to be had. So as a, in a kind of a closing, I want to mention this wonderful parable that my colleague and wonderful

33:20

developmental psychologist Alison Gopnik recently told us about. The parable of the stone soup AI. So most of you have probably had heard of the stone soup fairytale, and it's basically in most cultures we have this kind of tale. So you know, travelers come to a village

33:43

and they are hungry, but you know, the villages, they say, well, we don't have any food for ourselves. And then the travelers say, oh, no problem. We're gonna make soup out of stones. So just give us some cauldron, some firewood, and we're gonna make amazing soup. So they got, they get a few stones, they put them in the water, it starts boiling.

34:01

It's like, oh, it's gonna be so good. And they try it like, oh, it's already getting good. But you know, if it only had some onion and some celery, oh, it would be even better. And then one of the villagers is like, okay, here's, I found some onions here. And they say, oh, but you know, a little bit of potato and maybe cream, it's gonna be even better. And then somebody else brings that.

34:20

And they say, oh, you know, when we made it for the king we also had a chicken and that made it absolutely marvelous. And then of course, somebody brings the chicken and then at the end, they're all eating, the whole village is eating this soup. And it's like, so this wonderful soup, and they say, oh, imagine, such an amazing soup. What a recipe, just from stone, such amazing soup.

34:40

So you can see where this is going. So Allison says, look, think about it this way. Now tech execs come to a village and they say, we'll make amazing magical AI with just a few magical algorithms, you know, gradient descent, transformers, next token prediction. And everybody's like, great. And they say, you know, it's gonna be even better

35:01

if we get more data. Ah, it will be so much better with more data. And then of course, you know, and they say, oh, you know what? It will be even more magical if we actually have some human feedback at the end, some RLHF feedback to just make sure that it's doing the right thing. Like, okay, okay. Oh, you know, and it will be even better

35:21

if humans learn how to ask the questions in the right way, kind of prompt engineer it to do even better. And then of course, we're all sitting there eating this stone soup AI saying, oh, wow, what an amazing set of algorithms that did this all magic from scratch. So I think you kind of, you get the point here

35:43

that basically a lot of the unsung hero of this AI revolution is actually sitting in the data, whether it's actual data or it's data from humans doing the reinforcement feedback or it's data from all of us learning

36:01

how we interact with these systems. And this is fine, but we just have to be aware of this. So the takeaways are, is it all just data with no place for algorithms? No, of course not. Of course not. Large scale data is clearly necessary, but it's not sufficient. There's plenty of other things to be done.

36:23

And there's plenty of place for the flights of fancy and the inspiration and the great algorithmic insights. But we need to learn to be humble and we need to learn to give data the credit it is due. Thank you very much.

36:55

Alexei, thank you very much for a fascinating talk. We have time for a couple of questions. I don't think that we will have time

37:01

for all the questions which might be there. But for a couple, yes. We need a microphone there, watch box. Thanks. Can you hear me? Can you hear me? Okay, thank you so much for the wonderful presentation.

37:23

My name is Christiana. I'm a Nigerian, but I study in Morocco. This is my second time attending HLEV. I attended years ago and I met the presenter this year in United States at a contribution conference.

37:40

So my question is from your presentation. I could see how human are doing wonderfully well. But I want to know if it's always that human has so smart to be able to recognize pictures. Because for example, when I was coming down here,

38:04

my appearance, that's my facial appearance when I was taking passport for my Schengen visa is quite different because I was not putting on my hair, I was not on glasses, hearing, and the like.

38:21

So it was not difficult for the immigration officer to really believe I was the one in the visa. They called five people. They could not still recognize me. So I felt in that aspect a model might have some key features that will give the model and hints

38:45

that yeah, this is actually Christiana. So do we really say human are so smart to understand that they will be able to, because I gave them a lot of my passport, they could not still recognize that this is me in the passport.

39:02

Thank you. Thank you for the question. So I think there is the specifics and the general. So the specifics about recognizing people on their passports it's actually a very common and very well known thing,

39:21

just like if you remember, we were very good recognizing familiar pictures, but when we've seen these random textures, we were not very good. This actually happens very much with people that are not similar to us. So I go to Singapore, for example, I go to the passport control there,

39:41

they cannot match my pictures because I don't look as familiar to them as what they're used to. Humans are very good at dealing with cases when they have a lot of data and when they don't have a lot of data that their performance goes down. So it's not that surprising.

40:00

And I think maybe more generally you say, people are so smart. I think this is actually a very interesting point because a lot of people are worried about your AI is going to be super smart, it's going to be amazingly intelligent, just like humans. And I'm thinking, okay, I think it's the kind of the other way around. What we're seeing is that maybe the humans

40:20

are not as smart as we think we are. Maybe we have a lot in common with these algorithms, not because the algorithms are so smart, but because we are less than what we think of ourselves and we'll see where this goes. But I think this is definitely a possibility we should be open to.

40:42

So there's, let's take the gentleman in the white shirt. So, okay, and thank you very much for the talk. So on the topic of us slowly running out of data, what's your opinion on the, let's say,

41:02

a recent trend about synthetic data generation, models training of model outputs, and what do you think about model collapse due to those synthetic data? Okay, I'm maybe in the minority. I do not believe synthetic data is a good idea

41:23

because you need to have some extra entropy. The data needs to bring in some information, right? Like, just looking at Shannon, it needs to give you some bits. And synthetic data is fundamentally, is usually not gonna bring you any new thing.

41:41

It's basically just gonna linearly interpolate what you already know. So unless we have some really, really good, for example, in the visual domain, if we have really, really good graphic models that can simulate a lot of our visual world, then there is a lot of data in there

42:01

that I think is gonna work very well. But I think right now, even that we don't know how to do, we don't know how to make synthetic reality have the richness and complexity of our real reality. So at this point, I think we're stuck with real data. That's at least my bet.

42:21

But I might be wrong on this.

Recommendations

46:25

Passwords are not Enough

1:24:03

Are We Still Modern?

D-ARCH - Are We Still Modern?

Series of 2 media

37:38

Diversity: We are not done yet

31:08

Still not Superheroes

45:50

Still not lovin gentrification

13:50

OJS is Not Enough

36:09

Metaprogramming? Not good enough!

31:23

TrustZone is not enough

15:23

Are distributions still relevant?