We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The work of Jon Kleinberg

00:00

Formal Metadata

Title
The work of Jon Kleinberg
Title of Series
Number of Parts
33
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Laudatio on the occasion of the Nevanlinna Prize award to Jon Kleinberg.
Set theoryMortality rateMultiplication signScaling (geometry)DivisorOrder (biology)State of matterVertex (graph theory)Curve fittingGraph (mathematics)Numerical analysisDifferent (Kate Ryan album)Right angleMathematicsConnectivity (graph theory)Parameter (computer programming)Equivalence relationAreaMany-sorted logicSequencePoint (geometry)SpacetimeLine (geometry)RankingProjective planeGroup actionRandomizationDependent and independent variablesTheorySocial classRule of inferenceBasis <Mathematik>Presentation of a groupDistanceModel theoryCircleAdjacency matrixRandom walkNatural numberAdditionKörper <Algebra>Range (statistics)ExistenceNeighbourhood (graph theory)Probability distributionConnected spacePolynomialPower (physics)SummierbarkeitResultantProof theoryLengthPhase transitionLogarithmReal numberQuadratic equationAlgebraic structureLink (knot theory)Cartesian coordinate systemDirected graphWeightFrequencyEvent horizonAlpha (investment)Eigenvalues and eigenvectorsSquare numberInfinityDistribution (mathematics)Figurate numberMathematical singularityMereologyComputer programmingStandard deviationBeat (acoustics)Selectivity (electronic)Sampling (statistics)Physical systemString theoryCoefficient of determinationComputabilitySign (mathematics)Blind spot (vehicle)Adaptive behaviorUniform boundedness principleNormal (geometry)Direction (geometry)Heat transferLimit of a functionSinc functionWater vaporLecture/Conference
Transcript: English(auto-generated)
Laudatio for the Rolf Nevenlina Prize, John Kleinberg, by John Hopcroft, Cornell University, USA.
It's a great pleasure for me to be here today and present to you some of the work of John Kleinberg. John's research has helped lay the theoretical foundations for the information age. He has developed the theory that underlies search engines, collaborative filtering, organizing and extracting
information from sources such as the World Wide Web, news streams, and large data collections. John's work is so broad that all I'm going to be able to do today is just to sample.
And I've picked five areas that I believe represent some of his important contributions. And the five topics that I picked are his work on hubs and authorities, on small worlds, on bursts, nearest neighbors, and on collaborative filtering.
And let me start with his work on hubs and authorities. In the early 60s, people in library science developed the vector space model.
And what they did is they took a document and they represented it by a vector where each component of that vector corresponded to one word. And they counted the number of occurrences of each word and used the vector that they got to represent the document.
And so you can see in this particular document, the word aardvark does not occur. Backus does not occur. But antitrust appears 42 times, CEO 17, Microsoft 61, Windows 14.
And you can probably guess what this document is about. Now, early search engines used this vector space model to answer queries. And what one of John's contributions was is that he recognized when the World Wide
Web came along that there was additional structure that could be used in answering queries. So the World Wide Web was represented as a directed graph where web pages corresponded to nodes. And if there was a link from one web page to another, then the graph contained a directed edge.
Now, John introduced the concepts of hubs and authorities. And an authority is a web page on a particular area.
So if it was on automobiles, an authority might be an article on a specific aspect of an automobile. And a hub is a web page that directs you to various authorities. So a hub page might belong to car and driver, which would have pointers to many articles dealing with various aspects of automobiles.
Now, in the mathematical definition, an authority is a node that has a large number of hubs pointing to it. And a hub is a node that points to a large number of authorities.
And you notice that there is this circularity in the definition. And so to be a useful concept, John had to develop the mathematics necessary to break this circularity.
And what he did is for each web page, he assigned two weights. One of them is a hub weight and the other was an authority weight. And so I've shown a simple example here where on the right, I showed the authority weights for a certain set of pages.
And on the left, the hub weights. And then what he did is he started an iterative procedure. He took each node on the right and adjusted its authority by summing the weights, the hub weights, the weights associated to the hubs pointing to that authority.
So if I do that, you notice that the first authority now has a weight of four, the next three, three, and so on. Then what he did is he adjusted the hub weights by summing up the authority weights that they appoint to.
And so he got a set of hub weights shown here. Now to prevent these weights from just growing, at each stage he normalized them so that the sum of the squares adds to one. And if you look at this method, you quickly recognize that the iterative procedure converges
so that the hub weights are the coordinates of the major eigenvector of A transpose A, where A is the adjacency matrix of the directed graph representing the World Wide Web. And the authority weights are coordinates of the major eigenvector of A transpose.
Now the way you could use this in doing a search is by the following method that he described in his paper. He used a search engine such as AltaVista, which at that time used the vector space model to answer a query.
And he would take, say, the 200 most relevant web pages returned by AltaVista. And then he recognized that you had to add to this set of web pages because at that time the web page for a search engine did not have the words search engine on the page.
So if you did a query, you put in the query search engine, you would not get the search engines of that day. So what he did is he expanded this set of 200 web pages by looking at the pages they point to
and the pages that point to these. And this would maybe give him on the order of 5,000 pages from which he would try to return the answer. And so what he would do then is he would take this small graph and find the hubs and authorities
and use these numbers to rank the web pages. And this had a major impact on all modern search engines today. And one of the things I should do because you're probably wondering,
how does this differ from what Brin and Page did when they created Google? Well, they did something which is very similar but has a major difference. What they did is they said, let's take the World Wide Web and let's do a random walk on the web.
And what we will do is we will look at the stationary probabilities of this random walk. Now, they had to solve a problem because if there is a node such as the one on the right there
that has no outgoing edges, when your walk reaches that vertex or node, there's nowhere to go and so you lose probability. And in fact, they had to solve a more general problem. There might be a small connected component out there with no outgoing edges
and all of the stationary probability would end up in that component and it would not give you any way of ranking the earlier pages. So what they did is they said with small probability of epsilon, they would walk to just any random vertex
and then with probability one minus epsilon, they would take a random walk using an edge of the graph. And that's sort of equivalent to adding a vertex and with probability epsilon, you go to that new vertex and then take a step back to an arbitrary vertex of the graph.
Now, the difference between these two methods is the following. To do Brennan Page's algorithm, you've got to calculate the singular vector of a graph
which has on the order of 25 billion vertices. Whereas the method that John developed, you have a much smaller graph in response to your query and you only have to calculate the singular vector for a graph with about 5,000 vertices,
something which is very easy to do. So that was one of his pieces of work. It has had a major impact on all search engines today. It has also created large numbers of industries because people want to figure out how to increase the page rank of their particular web page and things like this.
But it has also had a profound effect on all of us that teach theoretical computer science because this work has led to a large number of theoretical papers that those of us that teach simply cannot ignore in our classes.
So let me move on to another example of his work. And this time I'm going to pick a piece of work referred to as Small Worlds and it's had a major impact in the social sciences. Back in 1967, Stanley Milgram did some experiments
where he calculated sort of the distance between any two people in the United States. And the way he carried out this experiment is he would pick a person who he would call the source,
say in some state such as Nebraska, and he would pick a target person in another state, say possibly in Massachusetts, and he would tell the person in Nebraska the name of this person in Massachusetts, give them their address and occupation,
and ask the person to start a letter moving towards this target. But the rule that he asked is that you could only send the letter to somebody that you know on a first name basis and you're to pass the same set of instructions on to that person. And so the letter might move to a close neighbor,
maybe then to the next door neighbor of that person and then to a distant relative and then to a close friend and so on until it reached the person in Massachusetts. And what Milgram observed is in those cases where the letter actually arrived, the number of edges along the path was between five and six.
And from this he concluded that the distance between any two people in the United States was only five or six steps. And this created a large amount of research in the social sciences where they studied the interaction between people in various social groups.
Now implicit in this experiment is not only the fact that there is a short path between two people, but we have a way using only local information of finding that path. Now this research became a lot more rigorous,
first with some work of Watts and Strogatz where they made the model a little bit precise. They started with a graph where I've shown the vertices around the circumference of a circle and they put an edge from each vertex to each of its adjacent vertices
and in fact they put edges to vertices which are a distance one or two or three around the circle. So they got a graph something like this. But in this graph you will notice that if you want to find a short path
from a vertex on one side of the circle to a vertex on the other and there are n vertices on the circle, the path is going to be about n over four hops. So what they did is they added to the model a few random edges and then they were able to prove that with high probability any two vertices were connected by a short path.
But what they didn't do is they didn't consider the question of how do you find the path and that's where John's research came in and he fundamentally changed the nature of this research.
Now John used a slightly different model. He used a two dimensional grid where each vertex has a short path to each of its four neighbors and then in addition at each vertex he added one random long range path
and we'll have to talk a little bit about how he put in that random edge. But first let me go back and just point out that early work following that of Stanley Milgram really focused on the existence of short paths
and the fundamental change that Kleinberg did in this field is he asked the question how does one find these short paths between two neighbors using only local information. Now John when he put in these long range edges had to decide
with what probability he would put in a particular edge and so what he did is for the long range edges he put a random edge from a vertex and the probability of that edge going from a vertex U to a vertex V
was given by some constant divided by the distance from U to V raised to the arth power and the constant is just there so that you can normalize so that the sum of the probabilities adds to one. And John looked at this probability distribution for different values of R
and the surprising results that he came up with was first of all that if R is equal to two then he was able to show that there's a polynomial time algorithm using only local information for finding short paths.
However if R was less than two there would be many long random edges and there would be short paths but there was no efficient algorithm using only local information to find these short paths and similarly he showed that for R greater than two
where there were fewer long range edges there was no efficient algorithm to find short paths so what he was able to prove is the only time that you can efficiently find these short paths is in the case where the value of this parameter R was exactly equal to two.
I was going to try and go through one of these proofs but I think what I'm going to do is just mention the very outline of it because I'm going to run out of time otherwise. So what he did and this is the proof that you can find a short path for R equal to
he looked at the distance from the current vertex to the destination and he said the algorithm would be in phase J if the distance to the destination was in the interval from two to the J to two to the J plus one
and so you can see that since there are order N or N squared vertices in that graph depending on whether N is the length of a side or the area that there's only log N phases and if you can show that you are going to get out of a phase within log N steps then the algorithm would take time at most log squared N
and I think what I'm going to do is I'm going to just skip over the proof and so I can just suggest a little bit about some of his other research but what I will point out is that this research influenced well it certainly influenced research in the social sciences in a fundamental way
but it also has had application in many other areas and I just mentioned one of them in peer-to-peer file sharing systems. The other thing that I will mention is that if you look at real data sets many of them have this quadratic decrease in probability which is essential for having efficient algorithms for finding these short paths.
Let me quickly move on to John's work in detecting bursts in data streams. If you have a data stream you can organize the information by many different ways by the topic, by the time something occurred, by its frequency or other parameters
and John observed that bursts of activity provide structure for identifying or organizing data. Now what John did is he developed a model for a data stream and then using this model showed how to detect bursts
and John's model is he said that the gaps between events satisfy a probability distribution of the form p of x equals some constant times e to the minus alpha x
and in this model the expected value of the gap is one over alpha. Now John created a model which had a number of states, in fact an infinite number of states and I've labeled them q0, q1, q2 and each qi has a distribution for arrival rates
and in q0 this rate is one over g and as you go up to higher numbered states the rate increases by a scale factor s and in each state there are two transitions. You can either go to a higher numbered state, one higher numbered or one lower numbered
and he charged a cost every time you went to a higher numbered state and what he did then is he developed the mathematics to find the best fit, the model, the parameters for the model, the best fit for a particular data stream and what the state sequence would be
and then looking to see when you're in high states, this tells you when there is a burst. He applied this to a number of areas. One of them was two important conferences in computer science and he looked at the words that appeared and sure enough what you can do is you can see how various intellectual areas
became important and then ceased to be important with time and if you look at this list it very much agrees with what I believe were important areas at times. Since I'm very short on time, let me just quickly mention his work on nearest neighbors.
One fundamental problem is if you're in a high dimensional space, say D is equal to 10,000 and N is some large number, say a million points and these points represent say documents or web pages
and you have a query which is also a vector in this space and what you would like to do is find the closest point to your query and what you would like to do is you'd like to do this with an efficient algorithm.
There was a large amount of work in this area but what John was able to do is get much more efficient algorithms by random projections, by projecting points onto a line and one of the things you would hope is if two points, well certainly if two points are close together, when you project them their projections will be close together.
However, you could have two points which are far apart which end up to be coincident because you happen to project them along the line through the two points and what John did is using some very sophisticated mathematics showed that if you do enough projections that you can indeed with high probability find the nearest neighbor.
What I'm going to have to do is apologize to him for not actually covering even five of his papers which is just a small portion of his work but what I should do is conclude since I'm out of time
but let me just go to my conclusion slide. His research has impacted our understanding of social networks, of our understanding of information, our understanding of the world wide web. It's influenced every modern search engine
and thus the daily lives of researchers throughout the world and it has laid the theoretical foundations for the future of computer science and with this I will conclude and I just have to say I'm very, very proud to have this opportunity to give you a glimpse of some of his work. Thank you.
We close this Laudatione session with warm applause to all the awardees and the presenters.