Nevanlinna Prize Winner
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 33 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/15970 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
1
4
7
8
9
11
12
13
14
15
16
17
20
21
22
24
26
32
00:00
AnalysisEigenwertproblemGraphMathematikNumerische MathematikPolynomMatrizenrechnungKategorie <Mathematik>ZeitbereichDeterminanteGlobale OptimierungAngewandte MathematikExponentGerichteter GraphKette <Mathematik>PaarvergleichPhysikalische TheorieResultanteStatistische HypotheseThermodynamisches SystemVerschlingungFlächeninhaltThermodynamisches GleichgewichtFormale PotenzreiheÄhnlichkeitsgeometrieExistenzsatzKörper <Algebra>Kartesische KoordinatenRichtungProzess <Physik>Jensen-MaßDifferenteMathematische ModellierungProdukt <Mathematik>ModulformGrenzschichtablösungGefangenendilemmaGanzzahlige lineare OptimierungIndexberechnungMomentenproblemNormalvektorPunktUrbild <Mathematik>Toter WinkelTOEVorlesung/Konferenz
05:40
Algebraische StrukturAnalysisBruchrechnungDynamisches SystemGraphHarmonische AnalyseKombinatorikLineare OptimierungMathematikMengeNumerische MathematikRelativitätstheorieSignifikanztestSpieltheorieFunktion <Mathematik>GraphentheorieMathematisches ModellPerkolationFrequenzTourenplanungDatenanalyseMatrizenrechnungModulformKategorie <Mathematik>ZeitbereichGanze FunktionDimensionsanalyseGanze ZahlFinitismusUnendlichkeitGrenzschichtablösungGesetz <Physik>ÜbergangAbgeschlossene TeilmengeArithmetisches MittelAuswahlaxiomBeweistheorieEbeneEinfach zusammenhängender RaumErwartungswertExponentFunktionalGeradeGerichteter GraphGleichverteilungGruppenoperationIdeal <Mathematik>Inhalt <Mathematik>Kette <Mathematik>Komplex <Algebra>KreisringLeistung <Physik>LogarithmusLokales MinimumMengensystemMereologieMetrischer RaumMetrisches SystemPhysikalische TheoriePhysikalisches SystemPhysikalismusPolygonPolygonzugPotenz <Mathematik>RangstatistikRechenschieberResultanteStammfunktionStellenringStichprobenumfangSuperposition <Mathematik>TeilmengeTermTheoremVollständiger GraphVerschlingungFlächeninhaltGüte der AnpassungKonstanteReelle ZahlTeilbarkeitÄquivariante AbbildungTrajektorie <Kinematik>EinflussgrößeMatchingÜberlagerung <Mathematik>GegenbeispielStochastische AbhängigkeitSuperstringtheorieBasis <Mathematik>AbstandParametersystemRuhmassePunktspektrumHeegaard-ZerlegungWurzel <Mathematik>VorhersagbarkeitTranslation <Mathematik>DruckverlaufGebundener ZustandDurchmesserNachbarschaft <Mathematik>ZufallsgraphRandomisierungVollständigkeitGammafunktionQuadratzahlFächer <Mathematik>AdditionExistenzsatzPunktPunktgitterFeuchteleitungPolylogarithmische FunktionKörper <Algebra>Dichte <Physik>HorizontaleDimension 2Varietät <Mathematik>Offene MengeArithmetische FolgeSortierte LogikBeobachtungsstudieRichtungUmwandlungsenthalpieFokalpunktt-TestProzess <Physik>Spiegelung <Mathematik>RadiusPartielle DifferentiationFlächentheorieSurjektivitätJensen-MaßEreignishorizontHierarchie <Mathematik>DifferenteKontrast <Statistik>MedianwertSondierungMinimalgradEvoluteDickeMultiplikationsoperatorRandwertSchlussregelStandardabweichungTVD-VerfahrenInverseMinkowski-MetrikRechter WinkelKappa-KoeffizientFigurierte ZahlOrtsoperatorMathematische ModellierungEinsGravitationNatürliche ZahlRauschenValiditätMittelwertTropfenDerivation <Algebra>GefrierenUniformer RaumEntscheidungstheoriePhysikalischer EffektGefangenendilemmaAggregatzustandDichotomieDivergente ReiheForcingFundamentalsatz der AlgebraGarbentheorieInverser LimesLoopRandverteilungTabelleZentralisatorChirurgie <Mathematik>IsochoreFamilie <Mathematik>Spannweite <Stochastik>WasserdampftafelNormalvektorStrategisches SpielSterbezifferVierAbstimmung <Frequenz>Negative ZahlBaumechanikEinfügungsdämpfungVollständiger VerbandStabilitätstheorie <Logik>GoogolVorzeichen <Mathematik>Explosion <Stochastik>KnotenmengeRechenbuchWald <Graphentheorie>Vorlesung/KonferenzComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:10
It's a great pleasure for me to introduce the speaker of this afternoon, the winner of the Rolf Riemann Lina Prize, John Kleinberg.
00:21
Let me say a few words about the work of John Kleinberg and his career so far. He started, received his PhD thesis from MIT in 1996, and he is now professor of computer science at the computer science department at Cornell University.
00:41
The scientific work of John Kleinberg, given his age of 33 years, is impressively broad and spans diverse topics, including link analysis and modeling of the World Wide Web and related information networks, discrete optimization and network algorithms,
01:04
algorithmic approaches to clustering, indexing, and data mining. So in his work, he always tackles problems with great impact on practical applications. In the process of solving, he derives deep and structural
01:26
and mathematical insight, and he develops efficient algorithms and applies them to relevant experiments. Although his work has outstanding practical applications, his admirable contributions
01:45
are clearly theoretical, in the sense that good theory is best practice. Kleinberg's work has been acknowledged already, not only in the public, but also by several prestigious prizes, whether the Lina
02:03
Prize is certainly the highest recognition so far. Allow me to highlight three important topics which characterize, at least in my opinion, the scientific work of John Kleinberg. First, Kleinberg creates new concepts and definitions
02:26
by introducing mathematics to areas which had eluded mathematics until that point, leaving efficient algorithms alone. For example, in his work, his most groundbreaking
02:41
and influential work, paper, authoritative sources in a hyperlinked environment from 1998, he introduced eigenvectors related to incident matrices of graphs, and he used this to determine
03:00
certain equilibrium between authorities and hubs, which had he realized played an important role in the significance and relevance of a link. Furthermore, secondly, he develops efficient algorithms by using mathematics from different areas to topics
03:21
where only qualitative results so far existed. For example, in his work on small world phenomena, he introduced the clustering exponent alpha, and he proved a non-existing result by showing that only the exponent alpha had the property
03:44
that there exists alpha equal to two, had the property there exists an efficient algorithm to find the target and polynomial number of steps. Moreover, in his work on word bursts and temporal analysis,
04:00
he introduced the formalism of Markov chains and hidden Markov models and Markov sources, and all this had important impact to the field of network analysis. Third, the practical impact. I think John Kleinberg's impact on analysis
04:21
and development of the World Wide Web can hardly be overestimated. It's not easy to find a comparable story of success in the recent years of immediately application of mathematics to solving practical problems. However, as I said, the conceptual originality
04:44
and depth is, besides the practical impact, perhaps the most impressive attribute of Kleinberg's work. And he has not only one outstanding result. Every few years, he comes up with new, spectacular results in different directions
05:03
with ongoing new and insightful ideas. I think we can expect many more deep and original contributions. The only thing which I think is hard to predict is in which domain this will happen. So this is all, and I should make a further announcement
05:23
before John Kleinberg starts his talk, namely that the talk on Tuesday will be canceled. That is, he will give his talk, this talk, or similar talk now. Thank you very much.
05:52
Okay, looks like everything is up and running here. Okay, well, so thanks very much. Thanks very much for the introduction.
06:01
Thanks very much for the honor, both to the prize committee and the organizing committee of this Congress. It's been a fascinating experience for me to be able to attend all these talks, and I've certainly learned a lot. So what I wanted to talk about today
06:21
is some issues at the intersection of mathematical models and the structure of information networks. And in order to situate this talk, it's important to provide some context. And the context begins in a sense
06:42
with the following dichotomy, which is that within computer science, the field has developed a very rich mathematical framework for posing research questions. And it has sort of in its most simplified form the following general flavor, namely a computer like this one here receives input,
07:05
it computes for a while, it produces some output. And obviously when faced with questions like that, you naturally gravitate toward issues like how efficiently can the output be computed? Or for certain problems, can we prove there is no fast or efficient algorithm?
07:23
Now, if we think about it though, if you think about how you use your computers every day, how the public uses computers every day, yes, deep down inside they're doing calculations, they get input, they calculate a lot, they produce output. But the way most of the world experiences computer use
07:41
is as a form of mediator, namely the computer is mediating their access to information and to other people. So if we think about mediating access to information, the way that people browse the web, use search engines and so forth, the way they connect to other people through things like email, instant messaging,
08:02
through their interactions with other people at electronic markets. And even the sort of in-between ground which you could say is sort of simultaneously about connecting with information and other people, things like blogging, like online discussion, like file sharing. So if this is how most of the world experiences computer use, namely that the algorithms are somehow in the internals
08:22
and what they're doing is interacting with information and other people, the natural question that a computer scientist has to ask himself is, how should one combine this rich framework for posing questions with these kinds of developments? And that's the issue that frames this talk,
08:41
how we can take the style of asking computer science research questions and apply it to these new kinds of domains. Now, one ingredient in this, of course, is not just the information, but the rich structures that grow up around information. And these structures often take the form of complex networks. So by networks here, I simply mean graphs,
09:03
either directed or undirected graphs. And I'm thinking of networks here at really all levels of the computing experience, right? So ranging from communication networks, right? The actual physical networks that send our packets when we communicate or search for information to the kind of more virtual information networks
09:22
like the World Wide Web, where the nodes are our web pages and the links and the edges are directed hyperlinks, to even more virtual kinds of networks like human social networks, where the nodes are people and the links represent either friendship, acquaintanceship, collaboration,
09:41
or simply the exchange of messages. And of course, one interesting thing about modern computer systems is that all these networks are sort of intertwined. They sort of inhabit the same spaces all at once. And one way to see examples like this is, for example, in pictures of the story drawn here, which is actually from Lada Adamich and Eytan Adar's study of the electronic mail network
10:01
at the corporate research lab for Hewlett-Packard. So this is about 400 people working at a corporate research lab. What's drawn here is the management hierarchy of the lab, but then the thin lines are all the links connecting people who exchanged a sufficient number of electronic mail messages in both directions. So we have a social network superimposed on an organizational network,
10:22
and you can see how the communication sort of follows the official hierarchy, but also sort of traverses it. And of course, we have not just the social network riding on top of this, but we have the communication network that's carrying the email, the information they're exchanging, all somehow all entangled at once. And essentially untangling what's going on
10:41
in pictures like this is part of the challenge that we face in asking these questions. There's something else going on when we think about information networks and social networks, and that's the following. That what we're really thinking about is networks as phenomena, namely unlike the great networks of the past century,
11:02
the telecommunications network, the power grid, the highway and rail systems, those were fundamentally engineered artifacts. Although there was lots of local variation and lots of perturbation, fundamentally those were mapped out, designed and engineered by a small group of people who had a global plan for the system.
11:22
On the other hand, something like the World Wide Web or something like the social network of all email communication in the world is clearly not something that's designed or engineered. It simply happened organically as a phenomenon. It exploded in growth over the past 10 years, really due to the completely asynchronous,
11:41
uncoordinated actions of hundreds of millions of people. As a result, that's how we have to think about these networks, not as things that we're gonna design, but as phenomena that have to be studied on their own terms. As a result, the kinds of models that we're gonna think about are rooted not just in graph theory and algorithms,
12:01
which has traditionally studied networks, but also in discrete probability through things like random graph models, which through randomized means are designed to capture the uncoordinated growth of these kinds of methods, of these kinds of networks. At the same time, these kinds of questions clearly and inevitably bring us into contact with the social sciences,
12:21
areas like economics and sociology. And one other thread running through this research area is the way in which algorithmic models have begun to merge with notions in the social sciences, trying to provide an algorithmic foundation for some of the social processes that we see.
12:42
And finally, although these are networks that are evolving on their own and hence have to be treated as phenomena, the models that we develop very rapidly become principles for analyzing the data we see and even in a circular sort of way become principles for designing new systems. Because obviously, although these systems
13:01
are growing through everyone's participation, they're also being shaped by the kinds of tools that we develop, tools like search engines or social networking systems. So we have this progression. We have information, we have networks, we have rich data, mathematical models, and eventually design principles.
13:20
And what I hope to do in this talk is pick one particular strand of research that embodies all of these themes and try to show you its progression more or less from beginning to end, necessarily omitting all sorts of parts of it simply due to the bounded amount of time
13:41
I have in which to talk. There are a number of examples I could actually choose to try showing this progression. I want to start off by mentioning a few of these examples just to convey what the scope is here. So the first and probably most well-known example
14:03
is the use of link analysis in the context of web search, as depicted here. So obviously one of the crucial challenges here is to understand the information content of the web and the way in which people access that information. And the power of mathematical models for the web
14:23
is apparent in facts like defining the spectral properties of the hyperlink graph in fact are a very effective way to rank web pages and in fact form the basis for a lot of the ranking functions of modern search engines.
14:40
At the same time, if we want to analyze the network more deeply to try exposing some of the tightly knit communities of pages and communities of people that exist on the web, it's again turned out that analysis of the link graph in an algebraic way and in particular through the development of novel types of matrix factorization has proved very effective also.
15:03
And finally, all bundled into the evolution of the web as an information system, we see new questions in economics and game theory. So for example, the fact that search engines in the past few years have become immensely profitable is thanks to something that people call the search economy. This notion that you can actually bid for ads
15:21
on specific search keywords leads to new kinds of auction designs, new kinds of auction design, new kinds of game theoretic analysis. This intersection of algorithms and game theory, which is not going to be the focus of my talk here, is a very interesting area which actually Tim Roughgarden is going to be covering in his invited talk on Tuesday afternoon.
15:44
So these are some of the questions that one gets if one looks at the network structurally, the web pages and the links, and try understanding the structure and what it tells us about the information content. If we look up at a higher level at the dynamics taking place on these networks, then we see a different set of questions.
16:01
So moving up from structure to dynamics and function, we can look at the aggregate behavior of people as they actually access this information. So we've moved from the structural level of web pages to the macro scale of billions of people issuing queries. So a big search engine like Google receives several hundred million queries per day,
16:22
close to a billion, and as a result we can start to see trends that emerge in aggregate that'll be very hard to see at a micro level. So these are some examples. What I've drawn here, this is data on Google queries. So time on this axis, number of queries for a certain term on this axis. So this is the query Christmas.
16:41
Okay, so time runs this way. And so even if you had just arrived from Mars and knew nothing about human beings or language, you could quickly infer that this string of letters, Christmas, seems very important at certain times and particularly very, very important during the last week of December. And you would know that simply by observing aggregate behavior. This is a very simple example.
17:01
This second query shows a different example. Katrina, the name of the hurricane that struck the southern U.S. in August of 2005. And again you see very little querying for Katrina, except perhaps people actually had that as their name, up until here, and then suddenly a spike. And again you would know something clearly had happened there regardless of the fact
17:20
that you might not know anything about the domain. Now there are all sorts of questions here, right? So this is not at the structural level now with the level of aggregate behavior. For example, is there a basic vocabulary of usage patterns that we can invoke here to try decomposing these kinds of things and understand the behavior that's underlying them? And I should mention that this is more complicated
17:41
than simply trying to find spikes, right? You might look at this and say, well, this is easy. It's simply that everything just goes up. But note, for example, some more subtle features. I only have time here to mention one, which is that in a sort of strange way, the shape of the spike for Christmas is the shape of the spike for Katrina if time were running backwards.
18:01
And why is that? It's because Christmas is an expected event. You know from way in advance Christmas is coming, queries ramp up, and then actually after Christmas, everyone kind of, clearly the data says, moves on to other things, forgets all about it. Katrina, no one knew it was coming except a few meteorologists tracking it out in the Atlantic Ocean. And so when it first struck in the news,
18:21
the query spike went up nearly vertically and then fell off slowly as people continued searching for it over time, right? So in the sense, expected events look like unexpected events in the query logs with time running in the other direction. And these are phenomena that we'd like to understand at a deeper level, right, extrapolating from these examples. And the kind of analysis of this,
18:42
coupled with the need for efficient algorithms that operate on massive streams of this data, right? How do you not only look for patterns like this but look for patterns over every query term on a source of data that's getting several hundred million queries a day? This has turned into the area of streaming algorithms, which has close connections with a number of different probabilistic models with the areas of property testing
19:01
and communication complexity, and actually in some ways that are only now being explored, some nice questions at the intersection of combinatorics and harmonic analysis, actually not necessarily so different from some of the issues that Terence Tao brought up in his talk Wednesday morning. And again, these are things that are going to come up in talks later in the Congress also.
19:21
For example, some of the issues here, although in quite different forms, will come up in Luca Trevisan's and Omer Reingold's talks that afternoon. And the subject of streaming algorithms is also going to be the focus of Rony Rubenfeld's talk on Wednesday. But in this particular talk, I sort of wanted to interpolate between these two things,
19:40
the structure of the web and information networks on one hand, and on the other hand, these aggregate properties that we see that we'd like to understand better, the behavior of whole human populations. And in order to interpolate between these, we somehow need to build a bridge between the underlying structure and the macro level phenomenon that we see, this sort of micro to macro step. And so we need models of how individual people,
20:04
how individual algorithms, how agents interact with information, how they interact with networks. And that's actually going to be the focus of the bulk of the talk, namely how individual algorithms or individual agents or people explore information networks
20:21
using only the information that's available to them. So with networks this vast, one typically does not have global knowledge of one's entire social network or of the entire web. One has local information and one uses that to search for information. And somehow the combined actions of all those people doing that is what leads to this aggregate behavior that we see. And this is the particular thread of research
20:41
that I wanted to try tracing out from some of its early origins to some of the recent developments, trying to highlight along the way some of the questions that come up. So the origins of this issue, of course, are old and reach back into many areas, but the particular area in which I wanted to chase it specifically was some origins in social psychology,
21:03
the work of Stanley Milgram on the so-called small world phenomenon, which I'll talk about in the next slide. From there I'd like to move on to some models of this notion, small world properties, and in particular searching with local information.
21:20
Some works of very influential work of Watts and Strogatz in 1998 and then some work that I did on the algorithmic aspects in 2000. From these models, which are going to be intentionally very stylized and specific, so intentionally scaled down, we're gonna try abstracting a general pattern that we can then hunt for in real network data,
21:40
the kinds of pictures I've been showing on the previous slides. And we'll actually see this pattern in things like web hyperlinks in email communication and within at least the best data we can find on friendships online. If I have time, I'll briefly also talk about this circularity whereby the models turn back into design principles, in particular inform the structure of things
22:02
like decentralized peer-to-peer file sharing systems. And woven throughout this is going to be this last point, that a lot of these models are probabilistic somehow at the intersection of discrete probability and graph theory, and in particular raise some interesting questions for the topic of long-range percolation in graphs,
22:22
a notion that I'll define in a few slides. Okay, so let's start on this trajectory with the small world experiment of Stanley Milgram. So this is known to some of you, of course, but I wanted to actually go through the specifics of it because the specifics are gonna be important
22:40
for this talk. So what Stanley Milgram was trying to do was validate this, at the time, anecdotal notion that we're all connected by a small number of friendships, what we now think of, although we didn't then, as the six degrees of separation. So how could one test this experimentally? It's an anecdotal notion that we often seem to have friends in common with people,
23:01
but if we wanted to test it experimentally, and this is in 1967, Stanley Milgram operating without computers with an operating budget of a few hundred dollars. He picked a few hundred people at random out of the phone book of Townsend, Nebraska in the US because from his position at Harvard,
23:20
Nebraska to him, he said, seemed far enough out there that it was gonna be a good challenge. He sent letters to all these people. He said, we're conducting an experiment. We'd like you to forward this letter to the following designated target person, a friend of his who lived in a suburb of Boston. And he gave each starting person the target's name, their address, their occupation, and a few other details about them.
23:41
And the rules were that although you have their address, you can't send the letter directly to them. Instead, you have to mail this letter to someone you know on a first name basis, a friend of yours, with these instructions, with the goal of reaching the target as quickly as possible. And in this way, we're gonna try to find whether there really are these short chains of friendships that can get from essentially random people
24:01
to this essentially arbitrary target. And the chains continued onward. And actually, as opposed to what would be and has been the case now in versions of this experiment, in 1967, people were actually quite willing to participate. Around 75% of the chains took their first step onward,
24:22
and about a third of them reached their destination. And the dropouts were essentially random. And over the chains that completed, the median number of steps was six, a number that subsequently entered popular culture as the six degrees of separation. That on average, it took six steps to get from random people to an arbitrary person. Okay, so this actually caused some amount of surprise,
24:44
even among professional social scientists at the time. At the time when Milgram pulled his colleagues and asked them how many steps do you think this will take, some thought 10, some thought 100, some thought it wouldn't work at all. So the intuition that we have now about this is much more refined than it was then. But the actual question of mathematical models
25:03
for this was an interesting challenge, partly because short paths in graphs is not graph theoretically necessarily a surprising property. Almost any random graph model you make up, a model of random graphs, is going to have short paths. And so the question is, why then is the small-world phenomenon surprising to us? And so a proposal that Watson-Sturgatz made
25:22
in a very influential paper in 1998 was that the reason it's a surprise to us is that real social networks are somehow a superposition of a structured network and a random network. So the reason why it's surprising that we're all six degrees apart is because when we look around us, we have lots of friends, but all our friends seem to know each other. It doesn't seem to have that sort of exponential branching
25:42
that you would want to have the world be small. But of course, if you think about it a bit further, and this is what Watson-Sturgatz were getting at, most of our friends know each other, but there are those few chance long-range connections, those sort of unexplainable chance friendships, and a few of those is, of course, enough to give us that exponential branching
26:00
that gives us short paths to almost anywhere in the network. And so what they proposed to abstract this was, again, an intentionally stylized, simplified model, which later in the talk we'll layer some more complexity onto. And that was the following. So I'm going to talk about a variation on the exact model they proposed. We're going to start with a structured grid network,
26:21
say a two-dimensional grid here. And again, metaphorically, this might model the fact that we all live on the surface of the Earth and we all know our neighbors, but we shouldn't take this too literally at this point. And then we'll add a very small number of random links. Again, for simplicity, say one random directed edge out of every node.
26:41
This model's our one chance random trend. Although again, later we can imagine generalizing the model to more links and so forth. Again, we're keeping things as simple as possible to observe this. Okay, and the point is, and this is actually a fact that was known from the area of random graphs, as you sprinkle in these random red edges here,
27:03
the diameter drops very, very quickly, even while local neighborhoods remain fairly clustered, fairly grid-like. And so the fundamental thing here was this Watts-Strogatz idea that we should model these low-diameter small-world networks as a superposition of two, the predictable structured one we know and the random one that we don't understand that makes the world small.
27:23
But the thing that interested me actually was not so much the structural question, but in a sense the striking algorithmic punchline of Milgram's experiment. Namely, if Milgram had actually wanted to find the shortest path from all those people in Nebraska to his target in Boston, then if you think about it,
27:40
in a sense he used the wrong algorithm. He should have said to his starters, take this letter, send it to all of your friends, tell them to send it to all of their friends, and everyone should count steps, and the shortest chain is going to be the actual shortest path, right? We should flood the network. Now, if you reflect on this for a while, there's certain obvious things involving US postal law and other things that prevented him
28:01
from doing this experiment. And so he was forced to embark on the much more interesting experiments that he actually conducted, right? Tunneling his way through the social network, forcing people to guess how to find the target. And the amazing fact is that even that succeeded, right? Namely, complete strangers without any global knowledge of the social network somehow knew implicitly that if they started forwarding the message
28:21
in this direction, and this is a sample chain from the experiment, that gradually would make its way to Boston, from Boston and into the suburb of Boston, Sharon, Massachusetts, where the target lived, wander around in Sharon for a step or two and then reach the target, right? What was it that was causing them to be able to navigate in this unknown social network? Well, so obviously at a very high level
28:40
we sort of know what was happening. No one had a map of the social network, but everyone knew some coordinates for the target, some geographic coordinates, as well as some social and occupational coordinates, and they sort of aimed in the correct direction, figuring that once it got to Boston, people would be able to aim even more accurately, and that's exactly what happened. That, however, is not something that random graph models recapitulate so well,
29:03
and so that's something where we actually need to think, what is a model where not only are the chains short, but unlike most random graph models, local information suffices to find it? Okay, so we should think about how to formalize this, and of course to actually pin down precisely what we mean by a decentralized algorithm
29:20
would take a few minutes, which I don't have right now, so I'll convey to you very quickly what I mean by a decentralized algorithm, and this will certainly suffice for this talk. A decentralized algorithm is one intuitively which, starting from its starting point, going to some node v, knows the grid, knows everyone's on a grid and it knows the coordinates of the target, but it only knows the red edges it's seen so far.
29:43
In a way, the red edges it hasn't seen so far are only randomly generated once it gets to them. So specifically, a decentralized algorithm, when the letter in the analog of the Milgram experiment is at node v, it knows the path, it knows v's red edge, it knows these things, and what it has to produce for us from this information
30:01
is a choice of which neighbor to send it to. Right, if you try and get to the target, should you send it on to one of these next door neighbors or should you follow the red edge? And here presumably, if you were at least a human being in Milgram's experiment, your intuition would say to follow the red edge, it gets you awfully close to the target. So that's a decentralized algorithm. It sees part of the network,
30:22
the sort of known structure part, but not all of it. And our figure of merit, the way we're gonna evaluate the quality of a decentralized algorithm is by what I'll call its delivery time, namely the expected number of steps over a random generation of the graph, over a random choice of red edges, and a random choice of start and target, the expected number of steps it takes to reach the destination.
30:44
And in thinking about what makes for a good decentralized algorithm or what makes for a bad decentralized algorithm, our gold standard here is gonna be that although the network is an n by n grid of nodes, so it has n squared nodes, we want an algorithm that's exponentially faster than that. So we'd like something that's a polynomial function of log n, not n, a quantity that I'll call polylogarithmic.
31:04
And if you think about it, for asking this question, although Watson-Sorgatz did not design their model to ask this question, in a sense it's the perfect minimal model for posing the question because if you wanna make this an interesting question, you need a few ingredients.
31:21
You need a network that's partially known and partially unknown to the participants because if it's completely known, then this is really just the problem of finding a short path on a roadmap. You have full information. If it's completely unknown, again, it's not interesting because if we all have random node IDs, I have no idea how to navigate. So we need this partial knowledge, partial lack of knowledge.
31:41
Clearly the known part should have very high diameter. If you actually wanted to get from Nebraska to Boston, always by just passing it to your next door neighbor, it's gonna take you a very long time. So the known part is useful for orienting yourself but not useful for doing much message passing. But the full graph has low diameter,
32:01
low path length between all pairs. Okay, and so considering that the Watson-Sorgatz model was the perfect setup for asking the question, of course the first interesting question is, so do decentralized algorithms work well in the Watson-Sorgatz model? And the first result is actually negative, that in fact, although the diameter of the Watson-Sorgatz model is at most
32:20
some constant times log N. Right, so the maximum distance between two nodes is actually exponentially better than the size. No decentralized algorithm can find such paths. So for any decentralized algorithm, namely for any rule based on local information, if you compute the expectation, it's at least some constant times N to the two-thirds power a bound which is actually tight. There's an algorithm that achieves that.
32:42
Okay, so in fact, the Watson-Sorgatz model, although not showing why the Milgram experiment might have succeeded, instead shows something else surprising, that there really is a split between the existence question here and the algorithmic question. Short paths exist, but with local information, you can't find them. Okay, so obviously this is not the end result we want
33:01
since we somehow want to capture networks where like in the Milgram experiment, like in actual human social networks, you really can find the paths. So what might be a hopefully mild generalization of the Watson-Sorgatz model where decentralized algorithms succeed? And here's a proposal. We're gonna add one extra parameter. So we're gonna keep our end-by-end grid
33:20
with these random red edges. But our further parameter alpha is gonna control, in a sense, the distribution of what these red edges look like. So for each node V, I'm gonna continue to add a directed link to a random node. But I'm gonna choose W as the other endpoint, not uniformly at random anymore, except in one extreme case, but with probably proportional to their lattice distance
33:41
to the minus alpha power, okay? Now it's important here, before I plunge ahead any further, that everyone appreciates that there are two kinds of distance going on now, right? There's the physical distance on the grid, how far apart you are in miles, so to speak, and there's the graph distance. What's the shortest number of hops you need to get from one to another, right?
34:01
In the case of Nebraska to Boston, one is very large, one is thousands of miles, and one is six. But so here the distance I'm talking about is distance on the grid. So when I have D of VW to the minus alpha, then alpha's like a one-dimensional knob I can turn. When I turn it all the way down to zero, I have the uniform distribution. And as I turn it from zero up to larger numbers,
34:21
I go from very random to this more kind of short-range thing, right? Even long links are less likely. Now, this is in fact a type of long-range percolation model, and I'll talk about the connections later on, although the point here is that the results on long-range percolation
34:41
and thinking about this question are not directly applicable since all of those were about structural existence questions for short paths or connectivity, whereas here we're asking the algorithmic question. Okay, so I have this one-dimensional knob alpha. I turn it all the way down to zero, and in a sense the network is too random. I can't find the paths that are there. But obviously if I turn alpha up extremely high, there are no long red edges,
35:01
and there are no short paths defined. So the question is, is there some kind of ideal operating point for such networks that's somewhere in the middle, somewhere where the links are controllable but still long enough? And the answer is in fact that in this very stylized model, there's a single exponent that works, and that's exponent two. So when links are generated from an inverse square distribution,
35:20
then there actually is a decentralized algorithm that delivers messages in time proportional to log squared n. And for any other exponent, as you move away from two, things get worse. There are lower bounds that are n to some power. Okay. I should mention of course that this trade-off is very sharp looking asymptotically, but even for very large values of n,
35:42
when you stimulate these random graph models, the trade-off is much more forgiving, right? You have to have n pretty big to just see the difference between log squared n and say n to some small power. Okay, I want to show one slide about just a sketch of the proof of this, which is not very difficult,
36:02
even written out in full. First of all, so I'll simply sketch the analysis in the case that works, alpha equals two. So one reassuring thing is that the algorithm that works at alpha equals two is the simple algorithm that in fact Milgram's subjects were employing, which is the greedy algorithm. When you have the message,
36:20
just always aim as close to the target as possible. If your random red edge gets you a lot closer, take it. If it leads in the wrong direction, hand it to your next door neighbor and hope that they can do something better with it. That's the greedy algorithm. Always decrease your distance to target by as much as possible. Why is that a good algorithm? Well, so we're going to analyze the progress of the algorithm as follows. Suppose the message is currently at v here, at distance d from the target,
36:41
and let's look at what the chances are that it's going to halve its distance to the target right now in this step. So I can show that with probably about one in log n, the message will actually enter this ball of radius half the current distance in the very next step. And therefore, in the expected number of steps about log n,
37:00
it'll succeed in halving its distance. You can only halve your distance log n times, and hence you get a delivery time bound of log squared n. So once you can show that there's this large probability mass here, you're all set. I should point out that essentially what's crucial about exponent two, therefore, is that this ball of half the radius, of radius d over two, has large probability mass regardless of what d is.
37:24
Exponent two is somehow where you're independent of d, where somehow the square provided by alpha equals two cancels the square provided by the area of this ball. And in a sense, that's really at a much higher, more qualitative level, why exponent two is the right exponent.
37:41
Because what we should think about is not the original Watts-Strogatz picture of uniformly distributed random links. Instead, we should think about how links are distributed over what I'll call these scales of distance, right? All people at distance one to two from you, two to four, four to eight, between every two consecutive powers of two. Right, there are these log n distance scales. And when alpha equals two,
38:01
in the grid in two dimensions, that's what you're uniform over. You're uniform over these exponentially increasing annuli that are around you. And so actually in k dimensions, the right exponent would be k. That's when you're uniform with these annuli. And quite early on in thinking about this, I threw away my primitive cartoon pictures of this phenomenon and replaced it with Saul Steinberg's 1978 New Yorker cover.
38:23
Because in a sense, he hit it right on the head. Namely, this New Yorker's view of the world from Ninth Avenue, devotes the same amount of page real estate to the street corner at Ninth Avenue as it does to that neighborhood of Manhattan, as it does to the whole city, as it does to the eastern U.S.,
38:40
to the whole U.S., out there to China, Japan, Russia, these bumps on the horizon. It's exactly this layering of different distance scales that's the correct distribution, right? That's somehow the correct distribution of friends that you need in order to find targets efficiently in the network. Okay, so that's what I meant by abstracting a general pattern, right? Independent of the particular long-range percolation model,
39:01
the particular exponent, this is a pattern that we can now go and look for in other networks, uniformity over distance scales. So, before embarking on that, I wanted to talk about some of the the connections to long-range percolation at a more concrete level. So, in the original models of long-range percolation, which were motivated by questions in physics,
39:22
it was a slightly different model. So, first, the graph was infinite. We took a D-dimensional, a copy of the D-dimensional integer lattice, ZD, and then for every pair of nodes, VW, we included an edge between them independently of probability, lattice distance row VW to the minus alpha, okay?
39:41
So, the graph is infinite, undirected, and also, the node degrees are no longer fixed. I no longer have a fixed number of friends in the world, but my degree is a random variable controlled by this thing. Okay. Now, the work in this area, as I said, the main difference is not these small, syntactic differences in the models, but the questions asked here were structural,
40:02
not algorithmic. It was not about searching these networks. And, in fact, the initial questions weren't even about short-path distances in the graphs. The initial questions were some very hard and deep problems about the existence of infinite connected components in these graphs, and which values of alpha this happens for, and some very interesting work in the 80s on this.
40:22
Motivated by the small world model, these particular kinds of questions, there's been some recent work on long-range percolation, which is explicitly considered the graph diameter. How many hops do you need to get from one point to another as a function of alpha? And, in particular, it lets us go back and ask the question, when are at least the short paths there defined, right?
40:40
So, at alpha equals two, you can find them, but for other values of alpha, they're there and simply invisible. So, just in one slide, here's what's known as a result of some work in the last few years. I should mention, first of all, so now we're back into a finite graph, right? So, basically an n by n by n grid in d dimensions. The case of alpha below d is sort of special
41:02
because here, in infinite, it wouldn't really make sense. And, even in finite graphs, you get very, very large degrees this way. And, as a consequence of that, you actually have constant diameter in these graphs via some results that we're actually asking different questions, sort of a spin-off.
41:20
At alpha equals d, the first transition happens. This is the searchable exponent, remember. The diameter becomes proportional to log n over log log n, result of Coppersmith et al. Between d and 2d, the diameter's polylogarithmic, and, in fact, they're a tight-bounds node known due to some work of Biskov. And then, beyond 2d, the diameter jumps up again.
41:41
In fact, recent work has shown that it's linear in n. So, in fact, alpha equals 2d is a big open question here. And, essentially, it's the point of transition between what you might call the small world and the large world, between polylogarithmic length paths and linear length paths. And, it's actually known from work in the 80s on long-range percolation that alpha equals 2d
42:02
is a very hard exponent to analyze, even for the connectivity questions. Okay, so that's what's known about these kinds of things. I should mention that these models and the diameter questions, led to some other interesting algorithmic questions, which I'll only have time to really sketch very, very briefly.
42:21
But, if we go back to the question of decentralized search when alpha equals d, and we say, here's long-range percolation. So, again, there's something slightly different, right? So, the small difference in the model, because now the degrees are random, and the degrees in expectation are now logarithmic, not constant. And so, the greedy algorithm now finds paths of length proportional to log n, not log squared n.
42:41
Okay, but that's not a major difference. It's simply a consequence of having bigger degrees. But, the diameter was log n over log log n. So, in fact, the greedy algorithm was not finding paths that are within constant of the optimal length. And, the actual question was, is there a decentralized algorithm that does? And, this nice result of Mancunauer and Wieder from 2004
43:03
said that, in fact, if you wanna actually match the diameter, then it's okay. It still suffices to be local, but with slight look-ahead. So, suppose that the people in the Milgram experiment not only knew their own friends, but knew all the friends of their friends, okay? Then, in that case,
43:20
you could consider all the neighbors of neighbors and route to the one of them that's closest to the target. And, what they showed was that that actually gets you to within a constant factor of the diameter itself. So, this is, in a sense, the power of look-ahead. If you can look-ahead two steps, you can match diameter, and, of course, more look-ahead won't help you asymptotically, because you're already within a constant of the diameter. And then, there's been some other interesting
43:40
algorithmic work, which I won't have any time to talk about. But, essentially, in the area of distributed systems, trying to spread information rapidly through a network by telling random nodes pieces of information. It turns out they are, for reasons completely based on scalability, right? So, these things were designed for people
44:01
not thinking at all about long-range percolation. The exponent that turns out to scale the best for them is exactly the hard exponent. Alpha equals twice the dimension. And so, in fact, this lack of understanding at that exact point is an obstacle to the analysis of some of these systems. And again, unfortunately, I won't have really time to say much of that. What I wanted to get to was thinking about actual human social networks, right?
44:23
So, we now have these models that are very suggestive. They have certain diameter properties. They have certain long-range friendships as a function of distance. It's all very suggestive of what real social networks might look like. And it even makes a sort of startlingly specific prediction about what the exponent ought to be if the world is searchable.
44:40
And so, obviously, a very compelling question is whether the world actually looks like that. Does it look like a long-range percolation and does it look like it has the right exponent? Now, if we want to address this question, then we have to start moving away from our very stylized model of grids, very structured grids with random edges. We need things that can sort of match the messiness of real data. And so, a number of other related models
45:04
have been proposed recently. Essentially, rather than hang random edges on a grid, the grid's serving as a scaffold for your random edges, these models hang the random edges on other kinds of scaffolds. So, for example, you could assume that the nodes are organized hierarchically, right? The way a topic hierarchy of web pages might work.
45:21
The nodes are all at the leaves of this tree and the random edges are formed where now you're more likely to link to a close edge. But closeness is measured by how high up the tree you have to go to find this other node. So, nodes that are direct siblings are more likely to link than nodes that have to go all the way through the root. Okay, that's one kind of model. There are other kinds of models where you position the nodes, for example, in a metric space that has low combinatorial dimension,
45:43
right, these various recent notions of combinatorial dimension that have been worked on. And again, showing how long-range links depend to work as a function of metric distance. I want to talk about one of these models that's designed to generalize a few of the earlier ones, the grids and the trees. And that's a model based on set systems.
46:03
Okay, so instead of imagining the nodes residing in a metric space, let's instead suppose that the nodes reside in a collection of sets. So, I have my nodes and there are just some sets that I define on them. And we can imagine this, again, as a metaphor for a social network. So, you and I might know each other
46:21
for a variety of reasons. It might be because we live in the same country or we have the same religion or because we're fans of the same obscure science fiction novelist, right? And the argument here is that you're more likely to know someone when you and they both belong to the same small group. So, the real reason that we're the most likely to know each other is perhaps if we both share the same very obscure interest rather than the fact
46:42
that we're citizens of the same country. So, you could say that g of vw is the size of the smallest set in my collection that contains v and w and the link probability decreases in the size of the set. So, this shared set size is my notion of distance. Now, this analysis is not gonna work for any possible set system. If you think about it for a second, that would somehow be hopeless.
47:01
So, it's gonna work for set systems that have two properties. So, when you consider a set system C, ground set of nodes v, two properties parameterized by a number lambda less than one, a number of kappa greater than one. So, the first is a nesting property, which simply says if a node v belongs to a set S, then it also belongs to a proper subset that's not much smaller, only smaller by a factor of lambda.
47:21
So, there's sort of this nesting of sets that you belong to. And secondly, and this is somehow the set system analog of low dimensionality, which is what we need, that if I have a bunch of sets, arbitrary bunch of sets that have a non-empty intersection, then the size of their union is not much bigger than the size of the largest one, right? Sort of like a bunch of balls in a low-dimensional space.
47:41
If I pin a bunch of balls in a low-dimensional space all together at a point, then the ball that encloses them is not much bigger than the largest of them. Turns out this is actually all that you need to start working again with searchable networks. So, I'm gonna make up a random graph model based on this, essentially the same way that we did for the grid. Everyone chooses some number of outlinks, k of n,
48:01
and we have an exponent gamma, which is gonna be the same as my exponent alpha. I've written it differently because it'll be interesting at different values. And each node v generates k of n outlinks, choosing w as the endpoint of the i-th link independently with probably proportional to g of v, w, the minus gamma, right? To scaling with the size of the set they belong to.
48:21
And the theorem now is that at gamma equals one, for any set system satisfying these properties, and without degree poly logarithmic, you have a searchable network. And for any exponent less than one, and any set system satisfying these properties, and any poly logarithmic out degree, you don't have a searchable network. Okay? And one is the right value here,
48:40
because if you think about if two nodes are at distance d in the plane, in the grid in two dimensions, then the size of the smallest set containing both of them has size about d squared. So one over the size of the smallest set is one and d squared. So there's a translation of the exponent by a factor of two. And so gamma equals one here generalizes the gamma equals two from the grid.
49:03
There is an interesting open question that this is not, somehow we need an extra property to be able to say something about gamma greater than one. There are sort of pathological counterexamples at this level of generality. And it'd be nice to have a model that also included that. Okay. Now we can finally ask about social network data,
49:20
because once we have results like this where we can allow an arbitrary set system, then we can actually start to try aligning these set systems to real data and ask, does this exponent really work? So let's go back to the picture from slide two, Adama Chidar's social network of 436 researchers at Hewlett Packard Labs.
49:41
And the way they built the network again was they observed them over a three month period and they joined pairs who exchanged at least six emails each way. That was the definition of a link. And they compared this to the set system model in the previous slide. And what they found actually when the groups were defined by subtrees of the corporate hierarchy was that the links scaled as group size to the minus three quarters.
50:01
Namely, monotone decreasing in group size a bit too random to be searchable, but in a way surprisingly close to the actual searchable network. Especially surprising given that of course, HP Labs is designed to do research to further the goals of Hewlett Packard, the company, not to perform runs of the Milgram experiment. And there's no a priori reason why this should at all have matched a searchable exponent.
50:22
Okay, so that was the first one. I'm gonna talk about larger scale measurement and then wrap up quickly with some conclusions. So, the Adama Chidar experiment was very suggestive of what might be going on, but it was of course at the scale of 400, 500 people. And it was on a tree structure.
50:41
And you begin to think that maybe in fact it was hopeless to go back to the actual original setting of the Milgram experiment. And actually look at how friendship distance scales with real physical distance. But in fact, the advent of all this recent online large scale social networking and online data had actually made it possible to even address this question.
51:01
So some very interesting work of Libin Nawal Kumar, Novak Raghavan and Tompkins took data from this thing called LiveJournal, which you can think of as a very small version of Myspace, a social networking service that lots of teenagers and college students hang out on. And this had exactly the data you wanted because it has at the time about half a million members
51:21
with user profile pages containing US zip codes causing you to be able to geolocate them and links to who their friends were. So it's exactly a way to measure how does friendship decrease the distance. Okay, there are of course lots of challenges to overcome in adapting the grid-based model to this one on LiveJournal. And I'll skip most of these
51:40
except for in a sense the conceptually most challenging one which is the highly non-uniform population density. Namely, this is the density of LiveJournal users and when you first look at it, certainly I thought, gee, that's funny. It's only the Eastern US where they seem to use LiveJournal. But that's actually, it's an illusion. In fact, this is roughly the population density of the US itself. LiveJournal relatively well samples it.
52:00
It's simply that the population density of the US is very non-uniform. So we have to get over that. And so what they proposed, so one could do that with the group-based model and define groups by balls in the plane. They chose one that worked better for their kind of data analysis which was this thing called rank-based friendship. So each node, say V, ranks all the other nodes
52:20
simply in order of distance from them. And the rank of W is simply the number of other nodes closer. So W is at rank one, two, three, four, five, six, seven from V. And again, this translates exponents downward by a factor of two. So you expect one which should be the searchable exponent. And that's the theorem that they proved. For essentially any population density in the plane, if the link probability is proportional to one over the rank
52:42
then you get searchability. Yes, that's a generalization of the last rule. And now they can go and actually measure this. And the punchline here is that in fact LiveJournal friendships very closely approximate one over the rank. So here's the empirical data. Here's the line you'd see if it were one over rank. Here's the line almost parallel
53:00
you'd see if it were one over rank to the 1.05. It's very, very closely tracking one over rank at the searchable exponent in this particular long-range percolation model. Okay, so this is my next to last slide before I get to conclusions. And I just wanted to mention sort of a few ways in which I found this sort of a startling result. First, the original lattice-based model
53:23
with alpha equals two made, again, a sort of very specific prediction about what you would need for searchability. And so it's surprising in a sense even though we knew the Milgram experiment succeeded 40 years ago, it's nonetheless surprising somehow to see that exponent born out in actual data.
53:41
Secondly, it's completely not obvious what might be driving the network toward that distribution, right? LiveJournal users use the system to chat with their friends, to share their interests, to write in blogs, and so forth. They don't use it to search through their social contacts to try to find short paths to random people. It's not clear what the selective pressure is
54:00
that's causing that work to evolve toward exactly that searchable exponent, yet there it is, right? There it is despite the fact that we don't really know what these links mean or what they're there for and so forth. And all the more so, one final starting point, all the more so in the online world. You might have imagined that LiveJournal, people simply log in, they get on the system, and sure, there's gonna be a big spike at distance zero
54:20
because high schoolers log in to talk to their other friends from high school who live in the same zip code as them. So you'd expect a lot of friendship links at distance roughly zero. But you might have expected that the rest of the links would be sort of uniformly distributed. After all, this is the online world. It knows no geographic boundaries. It ought to be just as easy to make friends with someone halfway around the world as locally. And the point is that is not what happened, right?
54:41
The friendship distance respects all these intermediate distance scales just as surely as it respects the very close one. And so there's sort of an interesting contrast to the conventional wisdom that in the online world, we've broken down these geographic barriers. We still see a very, very strong and predictable evidence of them here. And that's something else that I think would be very interesting to understand. Okay, so simply to wrap up with some reflections then.
55:04
This is a sense in sort of this very rapid fashion. Tried to show you a progression of research questions ranging from early social network research through online data and mathematical models to some very recent and surprising facts that I think are really in need of much better theoretical explanation.
55:22
And along the way, it's really been this rich data that not only lets us study questions that really even lets us ask questions that would have been impossible to formulate 20 years ago. Questions that only make sense once a large fraction of the world's population is in this shared online system searching for information and communicating. We clearly need more powerful frameworks
55:41
for thinking about this. One obviously is these algorithmic and probabilistic models that I've been talking about. A second thread that's been very interesting for reasoning about this is the use of techniques at the intersection of algorithms and game theory to model these interactions. These are some recent survey and book references
56:00
and again, Tim Roughgarden will be talking on Tuesday. And again, of course, some of you might worry seeing this that we're essentially learning all this by observing millions of people who had no intention of ever being observed. True, they made all this public on the internet and so all we're doing is looking at it. But there is this tension between privacy and these massive data sets on human activity.
56:21
Attention clearly has been consuming a lot of people's attention recently and is a very pressing issue and is in itself an important research area in the mathematical aspects of computer science. Yet more topics that I'd be happy to talk about. But on this note, I'd like to stop here. I'd like to thank you again
56:42
for this fascinating Congress and again, thank you for giving me the opportunity to speak here today.