The work of Jon Kleinberg
11 views
Formal Metadata
Title 
The work of Jon Kleinberg

Title of Series  
Number of Parts 
33

Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
DOI  
Publisher 
Instituto de Ciencias Matemáticas (ICMAT)

Release Date 
2006

Language 
English

Content Metadata
Subject Area  
Abstract 
Laudatio on the occasion of the Nevanlinna Prize award to Jon Kleinberg.

00:01
I'm not sure there'll from by child Gurkha Cornell University U.S. it's a great
00:21
pleasure for me to be here today and presented to use some of the work of the John Klein at Johns research has helped Lane the theoretical foundation for the information age is developed the theory that underlies search engines collaborative filtering
00:45
organizing in extracting information from sources such as the World Wide Web new strains In large selections john's work is so broad that all I'm going to be able to do today is just a sample and eyes picked 5 areas that I believe represent some of his important contribution and the 5 topics that I picked are his work on Hobson authorities on small world on burst nearest neighbors and and on collaborative filtering and let let me start with his work on on Hobson authority but in the early sixties people in library science developed the vector space model and what they did is they tougher document and they represented by a vector where each component of that vector corresponded to 1 word dog and they counted the number of occurrences of each word and used a vectors that they got to represent the document and so you can see in this particular document the word Nevada does not occur it back this not occur but and I trust appears 42 times CEO 17 Microsoft's 61 windows 14 and you could probably guess what this document is is about now hour early Sarah Chan use this space models Julie answer queries and what 1 of John's contributions is recognized when the World Wide Web came along that there was additional structure that could be used in answering queries so the World Wide Web was represented as a directed graph where Web pages corresponded to nodes and if there was a link from 1 Web page to another then the graph contained a a directed now John introduced the concept of pubs and authorities and authority some web page on a particular the area so what if it wins on automobiles authority might be an article on the Pacific aspect Of normal In a hub is a Web page the direction various authorities so subpage might belong to Car and Driver of which would have pointers to many articles dealing with various aspects of our big now in the mathematical definition and authorities Is it doesn't know they have a large number of Hobbs pointing to and a half is note that points to a large number of authority and you notice that there is this circularity in the definition and so to be a useful concept John had to develop the the mathematics necessary to break visitors circularity and what he did History each Web page he assigned to wait 1 of them was off public and the other was authority and so I've shown a simple example here where on the right side I showed the authority weights for a certain set of pages and on the left the house of weights and then what he did as he started an iterative procedure he took and each node on the right and adjusted its authority by summing the weights the harbor weights the weights associated to the hubs pointing to that authority so if I do that you notice that the 1st authority now has a waiter for the next 3 3 and so on they what he did Is he adjusted the hub weights by summing up the authority weights that they appointed 2 and so he got a set of How weights Sean here now to prevent these weights from this growing at each stage normalize them so the sum of the squares adds to want him if if you look at this baffert I do quickly recognized that the iterative procedure converges so that the Harlem weights the Hornets Of the major eigenvectors of a transpose a where a is adjacency matrix the directed graph representing the World Wide Web and the authority weights are quartet of the major eigenvectors of any 8 a transfer now the way of a search engine will you could produces and doing research His by the following method that he described in In this paper he used a search engines such as AltaVista which at that time use vectors model to answer queries him he would take say 200 most relevant Web pages returned by AltaVista and they be recognized that you have added to this set of Web pages because at that time the EU Web page for a search engine did not have the words search engine on the page so if you did it queries put in the query search engine you would not get the search engines of that day so what he did Is he expanded this 200 Web pages by looking at the pages they point to end the pages that I point to today's and maybe give him on the order of 5 thousand pages from which he would try to read the answer what he would do if they aren't is he would take this Graf and find that clubs and authorities in use please numbers To terrain the Web page and this had a major impact on our old water search engines today and 1 of the things I should do because you're probably wondering How does this differ from what bridge Page when they created Google where they did something which is very similar but but has a major difference would what they did I use they said let's take the World Wide Web and let's do a random walk on the on the way out and what we will do is we will look at b stationary probabilities Of this random walk now they had to to solve a problem because if there is a need to know such as the 1 on the right there that has no outgoing edges when you walk reaches that vertex anode there's nowhere to go and so you lose probability in exactly how this all the more general problem there might be a small connected component out there was no outgoing edges and all of the stationary probability would end up that component and it would not give you any way of ranking earlier pages so what they it is they with probability small probability of it Epsilon they would walk to just a ran anyway random Texas and then with probability 1 minus Epsilon they would take a random walk using an edge the graft and that sort of equivalent to adding attacks and with probability epsilon you go to that New offer tax and was take a step 2 an arbitrary protectors of breath Hey a now but the difference
09:30
between these 2 methods is this is the following Brennan to settle some
09:37
you've got to calculate the singular vectors of raft which has the order the 25 billion furnace where is the method that John developed but you have a much smaller Graf in response to your query and you only have to calculate the a singular factor for a graph with about 5 thousand vertices something which is very easy to do so so that was that was 1 of his death 1 of those pieces of work it has had a major impact on on all search engines today but it has also created a large number of industries because people want a figure out how to increase Page rank their particular Web page and things like this but it is also had a profound effect on all of us the theoretical computer science because he's this work has has led a jewel of a large number of theoretical papers that those of us that teach simply cannot ignore in an hour class so let me move on to look to another example of his work and this time I I'm going to pick a piece of work of refer to a small worlds and it had a major impact in the social sciences up back in 1967 Stanley Milgram I did some experiments where if calculated sort of distance between any 2 people in the United States and the way he carried out this experiment is he would get a person who he would call the sources say and some state such as Nebraska
11:24
and he would pick up part persons in other states say possibly in Massachusetts and he would ask that you would tell the person in Nebraska the name of the person in Massachusetts give them their address and occupation and asked the 1st to start of latter moving toward target but the role that he asked is that you could only send a letter to to somebody you know on a firstname basis and due to pass the same set of instructions on to that person and so the letter might move to a close neighborhood began to the next door neighbor of that 1st and then to a distant relative of the men to a close friend and so on until it reached the person in Massachusetts and what Meldrum observed is in those cases where the latter actually brought the number of edges along the path was between 5 and insects and from this he concluded that the distance between any 2 people in the United States was only 5 or 6 steps and created a large amount of research in the social sciences where they studied the interaction between people in various social groups now implicit in this experiment is not only the fact that there is a short pass between 2 people but we have a always using only local information a finding that path now disk His research became a lot more rigorous 1st with some work of Watson's program made the model precise they started with a graph right shoulder vertices around the circumference of a circle and they put an edge from each vertex to each of its adjacent vertices and in fact they put edges Chu vertices Witcher distance wonder 2 or 3 around around the circle so they gotta graph something like that but in this graph you will notice that if you want a standard all signed a short pass from the vertex on 1 side of the circle to avert tax on the other hand there and vertices on the circle of the path is going to be about an over for pops so what they did is they added the model a few random gadgets and then that they were able to prove that was high probability of any 2 vertices were connected by by a short pass but what they didn't do is they didn't consider the question of How do you find the path in the word John research female and he fundamentally changed the nature of this research now John used a slightly different model but he used a twodimensional grid where each vertex has a short to each of its neighborhoods and then in addition at each vertex he added 1 random longrange pass him will have to talk a little bit about how he put him that random but 1st let me go back and point out that early work following that of Stanley note from really focused on the existence of short path and the fundamental change that Kleinberg in this field is easy asked the question how does 1 find short payouts between 2 neighbors using only local information now John when he put in these long range edges headed decide what with what probability he would put a particular and so what he did for the longrange edges up he but a random edge from From tax in the probability of that edge going from a vertex Utah vertex beat art was given by some constant divided by the distance from you the V raised the art house in the constant is just there so you could normalize this summer the probabilities 1 am I John looked at this probability distribution for different values of and the surprising results that he came up with was 1st of all there is or is equal to Chile but then he was able to show that there is a parlor mealtime algorithm using only local information for finding short pacts however if are was less than 2 there would be many long random edges and there would be short path but there was no efficient algorithms using only local information defined short pass and similar he showed that are greater than to where there were fewer long range edges but there was still no there is no efficient algorithm defined short Pappas so when he was able to prove it is only time you can efficiently finding short palace is in the case where the value of this Trammer draw who is exactly equal to about I I was going to try and go through 1 of these groups but I think we're going to do is just on bench the very outline of it because of time of so what he did just this as the trophy the terror you could find a short pass for Oracle to art he looked at the distance from the current vertex to the destination and said the
17:32
algorithm would be in place J. if the distance to the destination was in the interval from to the GATT to the plus 1 and so you can see that since there order and organ vertices Graf any weather and is the length of aside or the area that there's only Morgan phases and if you could show that you were going to get out of phase with Morgan steps and the algorithm would take time most logs where they have a nice I think what I'm going to do is I'm going to use that skip over the proof and so I just suggestible but about some of the other research but what I would point out is that this research influenced its influence research in the social sciences the fundamental way but it also has had application in many other areas I just mentioned 1 of them in peertopeer filesharing systems the other thing that I will mention is that if you look at real bad assets many of them have about this squad dramatic decrease in probability which is essential for having efficient algorithms for finding a short let me quickly move on to 2 Johnson worked in detecting that string if you have a dad a strange but you could organize the information by many different weights by the topic but the time suffered incurred by its frequency or other parameters and John observed burst of activity provide structure for identifying or organizing that now which John did he developed a model for further adapt string and then using this model showed how to do it decamped burst and Giants model here's me said that the gaps between events satisfy a probability distribution of the former copy equals some constant times each of the miners L X and this model the expected value of a gap is 1 over health now John created a model which had a number of states in fact an infinite number of states labeled them 0 2 1 2 2 and each July has a as a distribution for arrival rates and few zeros but this radio it is 1 of Fergie initially blocked a higher member states the rate increases by scale factor yes and in each state their transition UP you go to a higher number state 1 higher number or 1 Lord and he charged to cost every time you went to a higher number stake and what he did them Is he developed the mathematics too find best fit the model parameters among the best bet for a particular Batiste strain and what the state's plants would be and then looking to see when you're in a high state this tailored to win when there is the 1st ought to be applied to a number of areas 1 of them was to hot to important conferences in computer science and he looked at the works it appeared and sure enough what you could do is see how various intellectual areas became important and then ceased to be important with time and if you look at this lest it very much agrees With what would I believe were word for various times since I'm very short on time let quickly mention all his his work on nearest neighbors 1 fundamental problem is it you're highdimensional space he is equal to 10 thousand and you and there's some large number say a million points and these points represents state documents a Web pages and you have a query which is also a factor in this space and what you would like to do it mind our closest . 2 to your query and what you would like to do you'd like to do this with an efficient out there was a large amount of work in this area but which I was able to do is be get much more efficient algorithms by random projection by projecting points on onto a line and the books 1 of the things you would hope is 2 points was on a 2 point close together when you project on their projections a close together however you could have 2 points which are far apart which end up to be coincidence because you have rejected them along the line through the 2 points and what time it is using some very sophisticated mathematics showed that if you do enough projections all that you can indeed with high probability find find the nearest neighbor but I have to do is apologize to him for not actually covering even but this is they researchers at the mall a portion of his work but what what I did I should do it is concluded since I'm out of time but let me just go to my conclusions slide art is His research is impacted our understanding of social networks our understanding of information are understanding of the World Wide Web odd its influenced every modern search engine and thus the daily life of researchers throughout the the world and it is laid the theoretical foundation for the future of computers sign and with their side I will conclude that is set to stand very very proud to have this opportunity to give you a glimpse of some of his work but we
24:09
close please all of that you there were no longer to award east
24:17
of the opposite