Reduced Google Matrix analysis of Wikipedia networks

Video thumbnail (Frame 0) Video thumbnail (Frame 2271) Video thumbnail (Frame 12704) Video thumbnail (Frame 23137) Video thumbnail (Frame 25995) Video thumbnail (Frame 27166) Video thumbnail (Frame 31944) Video thumbnail (Frame 33450) Video thumbnail (Frame 36742) Video thumbnail (Frame 49237) Video thumbnail (Frame 61732) Video thumbnail (Frame 64014) Video thumbnail (Frame 67881) Video thumbnail (Frame 72064) Video thumbnail (Frame 73784) Video thumbnail (Frame 78168) Video thumbnail (Frame 80450)
Video in TIB AV-Portal: Reduced Google Matrix analysis of Wikipedia networks

Formal Metadata

Title
Reduced Google Matrix analysis of Wikipedia networks
Title of Series
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
The workshop will address fundamental features that determine the efficiency and control of information flow on directed networks,information retrieval, including also such fundamental properties of Google matrix as the fractal Weyl law, Anderson localization transitionfor Google matrix eigenstates. The highlights and future developments of this research fieldwill be analyzed 20 years after the seminal article of Brin and Page (1998).
Keywords Markov chains complex networks Google matrix information financial flows Wikipedia cancer networks recommender systems
Slide rule Time zone Presentation of a group Mathematical analysis Mathematical analysis Cartesian coordinate system Googol Matrix (mathematics) Musical ensemble Matrix (mathematics) Resultant Fundamental theorem of algebra Fundamental theorem of algebra
Group action Building Greatest element Latin square State of matter Multiplication sign Direction (geometry) 1 (number) Inverse element Mereology Subset Group representation Derivation (linguistics) Different (Kate Ryan album) Matrix (mathematics) Ranking Circle Vertex (graph theory) Link (knot theory) Block (periodic table) Bending Price index Measurement Googol Vector space Ring (mathematics) Order (biology) Number theory Convex hull Moving average Summierbarkeit Right angle Figurate number Metric system Spacetime Point (geometry) Slide rule Perfect group Link (knot theory) Observational study Mathematical analysis Regular graph Graph coloring Theory Scattering Power (physics) Element (mathematics) Product (business) 2 (number) Goodness of fit Centralizer and normalizer Term (mathematics) Average Googol Reduction of order Directed set Set theory Alpha (investment) Rule of inference Multiplication Eigenvalues and eigenvectors Projective plane Mathematical analysis Algebraic structure Numerical analysis Universe (mathematics) Vertex (graph theory) Iteration Adjacency matrix Ranking Table (information) Matrix (mathematics) Maß <Mathematik>
Greatest element Projective plane Total S.A. Price index Cartesian coordinate system Regular graph Total S.A. Group representation Vector space Forest Order (biology) Matrix (mathematics) Ranking Directed set Summierbarkeit Right angle Vertex (graph theory) Summierbarkeit Matrix (mathematics) Set theory
Point (geometry) Link (knot theory) Direction (geometry) 1 (number) Mereology Regular graph Graph coloring Element (mathematics) Power (physics) Term (mathematics) Different (Kate Ryan album) Forest Matrix (mathematics) Set theory Link (knot theory) Scaling (geometry) Mass flow rate Weight Special unitary group Maxima and minima Line (geometry) Term (mathematics) Numerical analysis Chain Order (biology) Hill differential equation Right angle Diagonal Figurate number Metric system
Addition Link (knot theory) Different (Kate Ryan album) Right angle Order of magnitude
Slide rule Group action Multiplication sign Direction (geometry) 1 (number) Water vapor Mereology Centralizer and normalizer Term (mathematics) Different (Kate Ryan album) Forest Matrix (mathematics) Ranking Circle Arrow of time Set theory Descriptive statistics Position operator Rule of inference Addition Theory of relativity Matching (graph theory) Graph (mathematics) Eigenvalues and eigenvectors Prisoner's dilemma Model theory Algebraic structure Category of being Vertex (graph theory) Arrow of time Right angle Summierbarkeit Table (information) Resultant Sinc function
Greatest element Link (knot theory) Multiplication sign Direction (geometry) Calculation 1 (number) Maxima and minima Limit (category theory) Differenz <Mathematik> Average Element (mathematics) Derivation (linguistics) Mathematics Different (Kate Ryan album) Term (mathematics) Average Reduction of order Matrix (mathematics) Renormalization Sensitivity analysis Set theory Pressure Addition Link (knot theory) Rational number Graph (mathematics) Theory of relativity Process (computing) Calculus of variations Mass flow rate Model theory Median Line (geometry) Measurement Category of being IRIS-T Figurate number Ranking Identical particles Resultant Group representation
Point (geometry) Greatest element Link (knot theory) Calculus of variations Diagonal Line (geometry) Chemical equation Direction (geometry) Mathematical analysis 3 (number) Mathematical analysis Category of being Latent heat Mathematics Term (mathematics) Different (Kate Ryan album) Hill differential equation Figurate number Symmetric matrix
Link (knot theory) Observational study Theory of relativity Link (knot theory) Calculus of variations Mathematical analysis Mathematical analysis Peg solitaire Evolute Mereology Perspective (visual) Subset Element (mathematics) Category of being Spring (hydrology) Googol Matrix (mathematics) Hill differential equation Ranking Metric system Multiplication Matrix (mathematics) Resultant Set theory
Slide rule Musical ensemble
Dot product Link (knot theory) Theory of relativity Link (knot theory) Direction (geometry) 1 (number) Mortality rate Term (mathematics) Numerical analysis Element (mathematics) Many-sorted logic Term (mathematics) Different (Kate Ryan album) Reduction of order Convex hull Right angle Musical ensemble
Link (knot theory) Theory of relativity Link (knot theory) Average State of matter 1 (number) Normal (geometry) Numerical analysis Position operator Annulus (mathematics)
Musical ensemble
[Music] so hello everyone I'm happy to share the screen I'm just sitting in a mystery so hello everyone I'm very pleased to be here today and I would like to thank the organizers for inviting me and giving this talk about the reduced Google matrix analysis of Wikipedia networks so I'm not used to this format of presentation and I'm sorry not to be with you today this is the last minute problem that doesn't yeah that avoided me to come so today the world are going to present has been mainly collaboration with summer and zones the mashup agree on skin clothes from this is most of the results I'm going to present come from the PhD of Samiha and I think you already had a very nice introduction this morning of Klaus about fundamentals of Google matrix and of mostly I guess have to reduce to the matrix and an axis so this top will mostly focus on how to everybody you don't see my slides oh sorry let me try so let's try again I'm sorry share screen shares dr. G does that work
better now oh thank you thank you so most of you talk will discuss on how to use Google matrix tools to to extract some really microscopic and interesting information from the Wikipedia Wikipedia so as you all know GPI is a very large free collaborative read in the encyclopedia you it's it's now very interesting and we have a lot of very very good information in it and what is really interesting for us is that it's it will relies on a hyperlinked structures for articles or articles are together with hyperlinks and so for instance the web page of friends directly points to the web page of Western Europe which might as well point to the web page of England or to an any other web page so we are going to leverage this hyperlink structure and build the so called directed Network where where the vertices are the are the articles of Wikipedia each article has a topic and the name and edges that interconnect these links are the edges I interconnect these vertices are called are given by the hyperlinks so Wikipedia can thus be directly mapped to a directed network of topics that has a shape which is scale-free here we will concentrate over on a couple of Wikipedia editions you all know I think that Wikipedia has been written in various language editions so you have English version which is the most commonly used and which has the largest set of contributors worldwide and as you can see the English the English edition of 2017 has over five million nodes and 122 million links this is getting a very large network you have other editions that are pretty interesting and more or less large depending on the set of contributors so you have the French edition or German Edition your Arabic editions and we have looked at all of these and how we could extract some meaningful information from these editions okay so one of course like we are today here in this very nice conference we will leverage to Google matrix analysis of this large direct network of Wikipedia so for this week we began networks we can build a Google matrix as you all know this google matrix is represents a mark of transitions of a random surfer that will go with probability of let's say half alpha percent of the time use the hyperlink structure to to travel on this hyperlink network with 1 minus alpha proportion of time we go randomly and sweat so from this google matrix it is of course well known that you can use the the eigenvectors corresponding to the largest eigen value which is called the patron probability vector to capture interesting notes this interesting nodes are called central nodes in wikipedia and the several studies in the past have really looked at how wikipedia can be understood using page rank matrix and google matrix analysis so a couple of words exist some very interesting ones are the ones where you have where the researchers have extracted the ranking of historical figures or over 35 centuries and another one which is in good agreement with the heart with the heart ranking and another one where the ranking of world universities has been captured and which is really in good agreement as well with the shrinker academic ranking so of course here we only look at the hyperlink structure and extract the ranking only for the subsets of nodes which look at either universities of historical figures you can as well look at Wikipedia by by trading the reverted and inverted mat Network directed Network where you do a transport version of the adjacency matrix and therefore and then you can compute on this the largest eigenvalue that we will call the Tri rankings this thing in the in this case you will extract good diffusion nodes and the nodes which are good for diffusion capabilities so in terms of a small PageRank example I'm going to start introducing that at various studies we have done and one of the studies really looked at geopolitics interactions among countries either worldwide or either at the scale of Europe so here on the map is plotted different color we have colored various countries which are the top 30 countries when looking at the page rank index for the English version of Wikipedia here the first the largest the most important country in English Wikipedia is the United States then the second is ranked his friends then you have at UK then you have Germany Canada etcetera so we have grouped all these countries in two sets that are represented by the colors so you have the english-speaking countries which are in orange you have the former USSR Block in blue you have in red Europe you have Arabic countries in the top 40 which are in yellow and you have southeast so she's pink and in purple you have Chinese block and of course you have Latin America angry so here you can have a good ranking of all these countries with a patron but of course as you may already know the page rank index and the page rank probabilities really depend on the language editions of Wikipedia there is some cultural bias here and the page rank index varies depending on the English or Russian or Arabic Edition you are looking at for instance in the top table you will see that as given before for the English Wikipedia United States are is ranked first while for the Russian edition it is of course Russia so in order to find some cross Edition ranking we have built teacher score in previous authors have worked on this as well and to offer a global ranking across several editions of the commedia so this global ranking is looking at the top hundred nodes in one edition of Wikipedia and we are capturing this sum of 101 - this one - and we sum that over all editions this
eaves us a ring measure which offers we're the largest values give us the most important note across all editions we can as well of course compute an average page rank probability and build from that a different ranking but from our experience it was much more interesting to have this theta ranking than this K average meant that is pictured here for a set of 30 pages so in in this slide I'll show you we have selected as well another subset of nodes in Wikipedia for our research and we have calculated a set of Lotito score for all the dangers that are enlisted in the English Wikipedia and we have created the title score over those seven editions of Wikipedia and we have picked the 40 most important ones according to this theta score so you will see in this list of 40 painters very famous painters such as van see we can solve and work on all the bends and others I have shown you the two sets the set of 40 country the set of 40 painters we have extracted which this extract versions of 40 painted is more related to the seven editions that we are looking at and based on these subsets we will try to find a better representation of the interactions among all these painters all these countries and how they interact together using the terror which is called reduce Google metrics so why so close I think has or has already introduced the reduce Google metrics theoretical foundation the idea is to use this powerful tool to create a signal Syd Network vision like a limited view of the flu Google matrix for a set for a set of articles of Wikipedia for instance I am interesting in painting and I want to know what is the what are what are them from all those 40 very important painters how how do you buy influence each other this is a question I can ask myself and how they are related maybe two world countries like the 44 the 40 top countries I've identified before so is everything clear did I miss I don't see the audience so it's difficult for me to have any feet okay thank you so the little red circles here represent simply a subset of article that I'm looking at and then from that we are building what close has introduced you earlier which is the reduced Google matrix so this reduce hookah mattress I'm just going to quickly go over it again so we consider here reduce networks of any nodes those any nodes the ones that we are which is which we are focusing on so for instance those forty painters or those for the world countries and we and we investigate the construction of reduced matrix Google matrix and therefore we have to reorder the regular matrix at a regular Google matrix G well we put on the top left corner all the all the elements of the inner time in a matrix grr on the lower right hand side we will put the NS x NS elements that represents the scattering matrix this scattering mattress is the matrix that represents all the other nodes of the network which two are not Union nodes of interest and then we have on top right and top left corner the difference probabilities which made us go from which which helped us travel from the reduced set to the scattering set and back from scattering set to the reduced set and accordingly we will rank we will from G will you of course can compute P which is the PageRank probability vector and that we ordered the same way to have on the top in our first probabilities related to the inner nodes of the reduce network and on the bottom we will have the NS nodes of the rest of nodes and from that since we we want that to calculate a new reduced Google matrix G R such as of course the use of G R on produces the same steady-state PageRank PR and from that and from the definition of G and the reordering that is made before we can define the R as being the sum of g RR + g RS time 1 minus g SS to the power minus 1 times GS o cows told you i think that NS s of course too large for a direct evaluation of 1 minus g SS to the power minus 1 and because we are just looking at a few as a really small subset from our notes so he has proposed the following numerical evaluation by saying that it is possible to invert 1 minus g SS assuming of course that g SS is not singular by saying that we can extract lambda c which is the leading eigen value of g SS using a power iteration and from that extract as well pc the projector on to the eigen space of lambda c and it's complementary product a to z and from that it is possible to compute those two parts of the inversion so the first part the left part pc over 1 minus lambda c represents the projector component and on the right hand side you have two complimentary project so when you when you evaluate when you integrate this derivation into the definition of the reduced google matrix you can see that gr is simply a sum of three components the first component lists in grr the second is the projector component which is the multiplication of PC of M over 1 minus number C times G RS and GSR so this is a part which is a lot related to page rank perfecter as you will see and you have DQ r which represents the inter in direct interactions to the rest of notes and you have no impacts of the largest eigen value of GS s illustrate this components of GR and the first
illustration I'm going to give you is to show you what is the reduced matrix we get for the set of forest for a set of 27 European countries selected in Wikipedia so here we have selected the 27 European countries that were composing Europe for the 2013 Wikipedia edition and we have selected them and looked at what is TR if we and in the Wikipedia Network of the English edition so on the left hand side I plot the plot the reduced matrix gr and the right hand side the project a component GPR as you can see both matrices are really dominated by the put by the by the projected component and among almost 95 to 97% of the total column sum of gr is given by this projected component GPR so this GPR as you as a I didn't tell you but on the left hand side all the rows are ordered by increasing page rank index so the most the largest probability page rank probability is on the top for France when Britain Germany Italian this order in the bottom as well on the x-axis for all the columns are ordered in the same with the same ranking so you can see that this representation is really that the projector component is it doesn't give us any really new information compared to the regular page rank probability vector and all the interesting information for us is captured mostly in those 3 to 5 percent of the total column sum which is given by TR rngq are so if you look a little
bit at those mattresses the some of our weights of these mattresses for this 27 EU Network and for us without the 30 top worldwide set of countries so the set of countries have given you before which is worldwide you can really see that the productive component offers a big chunk of the of the of the power of this mattress if you sum all elements in the mattress and you normalize by the number of columns and only a small part is is captured by WRR and wqr which correspond to the Sun for all four gr and dqr respectively so now
what can we see in GRI and gqr for this 27 EU country Network here we have innovative information I think you really only in most of the innovative information is in the right hand side figure and is in GQ ah so DQ are here we didn't pictured that diagonal terms because they are pretty important and this this browse a little bit the color scale so as well like before the color scale in red values are the maximum values and blue forest minimum values on the left hand side you see the direct links that are represented by grr which is a direct view of the regular Google metrics and with this regular view of the Google matrix you cannot since its column normalized compare what happens in between columns all right but on your on the right hand side for GQ are you can compare what happens with the difference you can compare column wise what happens for one line JQ r represents I want to say again the scaring represents all the paths that you compose represents in the Markov chain the contribution of the random walks that go through the scattering mattress and so it represents not the direct interactions between the node of our network but the indirect interactions that where all the possible travels that go through the scattering mattress and go back to the node of interest so in order to understand this a little bit better what you can what I can say is that friend this if I'm interested in the in astronomy if I want to see a strong interaction that is indirect and that does not really exist in in the direct links on the left hand side I could capture what happens here for the this red point this red point means that the when I am on Finland have a high chance to use the scattering matrix and an indirect link to go to Sweden so there is a strong interaction between the Finland and Sweden this is not as well captured with the direct interaction between Finland which is here and Swindon which is here it is not among the largest ones in the flow matrix left but it's more visible on the right hand side here is where you have a very strong interaction which between Belgium and Luxembourg which is totally meaningful and as well between France and Luxembourg and you have as well a strong interaction between Sweden and Denmark and another between Sweden and an EE okay so there's a lot of information in these windows mattresses and mostly on the right hand side and the one on the right hand side and to
better this is of course these indirect links that you can see with the right
hand side here as capture as well the cultural views of different language editions for instance if you calculate if you computed gqr for an English version of Wikipedia and on the right hand side you captured for the same 27 new countries you have you build the mattress gqr only for the french wikipedia you see that the indirect just the indirect links are not exactly the same and don't have the same magnitude you still find strong influence between Finland and Sweden and so the question we are asking ourself is are is there are mental difference between those in directs networks for different language additions or are there some common traits that represent like some common knowledge that is true in different editions of Wikipedia so to to
investigate that we have created what we call networks of friends so the network of friends is what we say that a top friend overcome CJ is obtained by ranking all the countries in column J by descending value of column J in the matrix of interest and we picked the top four friends for instance to build a little directed Network as represented on the right hand side of this slide so on the left hand side we have pictured gr for the English Wikipedia and I've created a great selected AB selected five countries in Europe which are Sweden France Great Britain Spain and Poland these five countries are identified as the most influential ones in terms of page rank for the set of countries that have entered Europe at the same time so for the founders we have friends then you had the the Great Britain joining when you had a set of countries join in Europe with Spain and another set with Sweden etc so for these five important countries pictures with the largest circles we have selected and we have looked into G R for the top four frames in Geo as the defined before and we can see that Sweden's top four friends are represent not friends top four friends are given by Spain not given by creme Britain by Germany and Poland not Nepal and this network is really really let's say represent is read Amin aided by the countries which have a high page rank this is completely true since gr is dominated by the page rank description and the top country the top friends of a country J are likely to be the top page rank countries in the set of 2427 new countries now when you
build the same type of network of friends from GQ RNG you see much more diversity because you have extracted this important eigenvector and you only see the contribution of the rest of the network and it is interesting to see that by a group that by building the same thought for friends for English Wikipedia or for French Wikipedia for this 20s for by selecting the same set of important countries so we have selected as well Spain France Great Britain Poland Sweden and we have built exactly this network the same way than before we see with the black arrows that they capture first important notes that are not necessarily the ones that have a high PageRank in general this is because GTR and years no patron contribution in there these plots have been plotted with an automatic tool from Jeff even an algorithm which is called first direct layout which defines where the nodes are and grouped together the nodes which are more highly interconnected what I didn't tell you yet is that the red arrows represents the edges that have been computed at the top for friends of the friends of those important nodes and we have added these recursive friendship edges in until no new way no new no new vertices added to the to the network it means that red interaction are friends of friends interruptions and black edges are direct friendship interactions from with the origin being one of those important cultures so you see what is interesting is that even if you are looking at two different additions you can still see that there is a clear community that is represented by that you can find in both types of additions you can see that friends is most of the time related to vanilluxe countries and Netherland you can see that the red the red countries are in Poland as well as almost the same etc so and you always see that Portugal and Spain are put together so this is a very some let's say common knowledge that you can find in both in both editions and this is true for other editions as well and opponent if you are different examples so a little bit Twilight those cross edition friendships we have counted how often in all the five editions we have four five editions we have investigated which are English French Russian German and Arabic we have counted the the friendship interactions that exists in all five editions or in four editions out of five or in three editions out of five there is always a friendship relation between France and Belgium or between France and Spain there is always as well friendship relation between Great Britain and Ireland or between Poland and Czech Republic and you can read the rest of these results in this table so there is some really common knowledge in this structures of networks so I will show you as well another type of network of painters since I've introduced painters before so what we have done here we have looked we have selected thirty painters to create regional for these thirty papers we have selected six that belong to different painting movements so we have looked at cubism Fauvism in prison isn't great masters and modern painting we have associated a color code and we have picked these 30 painters with a cheetah with a cheetah score ranking and we have identified for each category the most important page rank painter which is Picasso or Kobe some matches or four is ammonia Impressionism DaVinci the great masters and Danny for water from that we have created as well a network of painters and we have created the network of top four friends the top three friends we have to credit the top three friends and the red similarly the black arrows represents the top four friends after leading dangers of each category and the red arrows represent the friends of friends interactions that can obtained can be obtained recursively until no new but no new and no new vertices added to the graph so what is interesting to see on the left hand side we have English Wikipedia on the right hand side we have a French Wikipedia and we can clearly see that the chronological development of this painting movements is really coherent on the top you have in orange the great masters which come from from the center late middle age Renaissance century and they from this the set of orange great masters is dancing interconnected with da Vinci being influenced in influencing Duga and other leading painters of the Impressionist movement impressionist movement painters as well very densely interconnected and they as well really influenced and the blue blue group which is forests and then you have us nice implication between Fauvism model and Cubism painters which of course are more closely interrelated and you can see the same type of developments on the right hand side for the French Wikipedia so it is nice to see that two different cultures that have built the same type of knowledge of course there are some local differences but to pick the macroscopic view is a very interesting to see then we have looked at building a friendship network that looks at the interactions between pain trees and countries of course to do that we have built a subnet of a reduced network which take which accounts for both the 40 set of painters in the set of 40 countries and we have extracted the top three country friends for each for each of the 40 painters we have identified before and to do that we have not used only if you are alone we have made this use the sum of G or rngq R so as we have some direct interactions with India and the contribution of the indirect path and this we've done it for you English Wikipedia the Black Rose represents interaction where where where you'll know Co where the direct interaction is more important than the indirect interaction while the red ones represent the opposite so you can see that France and Italy are really really really central central in this network and that a lot of those painters are related to art development in France Italy or Spain we have built the same Network for the French Wikipedia as well in the French Wikipedia you see as well as a central position of French is Spain and Italy but Netherland seems to be more central than it was in the previous one so these are interesting views as well mhm so now is there any question in the audience related to this part because I'm going to move to another part so the next part
is mostly related now now I have identified my subset of node my network I'm able to to picture it and to capture the direct and indirect interactions I would like to know how ha I would like to know how does a relative link violation will impact the reduced network structure what happens if an interaction was stronger in the network and what are the nodes that are gonna suffer from it and what are the interactions that will develop based on this change so we have developed a sensitivity analysis on this reduced network and the reduce Google matrix from that what we do we look at what happens if there is a local change on the given interval a ship for instance I take the relationship between the nation J going to nation I in the reduce Network this one this time I look we look at the R and I will modify with a slight variation the elements of TR at location IJ and I will have no renormalize again the column J keep with properties for this new new changed mattress and I would calculate the modified patron value with this chair and then I will observe the change from the importance of nodes in the network by calculating they were not already there a derivative of the patron probability for a given node K it can be node I can be no Jake and yet okay but here we mostly look at node K which is not necessarily the link that we have changed so this measures for us this is what we call the sensitivity of a nation k to do a link variation day to I so looking at that we can have various examples of results that we have found by having this this type of analyzes so for the 27 European countries data set and for the 4 we have looked at what is the impact of an increase of the of the interaction between Italy to friends so we have changed the link from Italy to France and we look at what happens for the other 25 countries in terms of ranking and what the average sensitivity variation that we observe for three different Wikipedia editions so here we took the value of D and we have average debt over the of the three additions for en English French and German Wikipedia so what we can see is that for this for this analyzes that the counter which is the most affected by this increase of the cooperation between Italy France in Slovenia it is true that Slovenia has a lot of economic exchanges with Italy and if Italy increases its communicates and yeah if you tele increases its relationship to France it is most probably Slovenia that we'll see it's it's a cooperation with Italy decrease and thus will not be able to will we will lose in terms of importance in this global ranking in the network the same I the second country which seems to suffer the most is Greece which as as well a cooperation with Italy in terms of economic core is to Rio job raffia now we have looked as well at the set of 40 worldwide countries and we have looked at the impact of an increase of relationship between China in the United States and we have mapped on this figure the change of the sensitivity that all the other countries will observe if this change happens so here the lower values are in red and larger values are in really median values are in blue so the countries which would be mostly affected by an increase of collaboration flow is going from China to us would be some would be of course the border countries which have more exchanges with with China like India like Japan the Southeast Asia or like and the ones that would benefit the more are the ones that are located close to the United States because they are of course cooperating with the United States more often what we can as well observe with this type of with this type of analyzes this sensitivity and Isis is that it's possible to to identify countries closer to countries that really function together so here we have identity we have captured the facts of course it is well known that Sweden Denmark Finland work together and on the top line we have looked at the average sensitivity farling model modification going from any of these Nordic countries to France or to Germany on the top it is going to France and on the bottom graph plot D and it's going to to true note the opposite a B and C is going to Germany and the D is going to France so you see that every time one of these countries increases its relationship to an important European countries the other ones will suffer because I will reduce their their cooperation with this country that is a bit leaving this cluster of and then we have looked as well as the average sensitivity of countries to painters so here we have calculated what would be the sensitivity of country of the 40 countries if you increase ration ship from one go to the Netherlands she all know that when go glaze originated you know it's originally coming from the Netherlands and that he has spent only the four last years of his life in France but it was a very productive years and this is where he really but didn't get famous often at that time but most of his master masterpieces were drawn there and if you would artificially increase the interaction between Bangkok and the Netherlands in this reduced network you would securely see that its friends that would suffer from this increase of interaction because it is highly tight mangog as well in wikipedia okay and this is when you increase the since if it sorry there is a little there's a little typo here it's a sensitivity from Bangor go to France I don't think this is the right pictures I'm sorry that is not the right picture this is a thing the sensitivity the caption is wrong this is DaVinci two friends this is da Vinci to France you you can find all these updated pictures figures in a paper that I've cited at the end of the talk but here is the increase of interaction from DaVinci to France so in of course you will see that Italy is the country that will suffer from this change so here we have looked at like unidirectional sensitivity and it is possible of course to look at what happens when the link is changed from I to J and it's modified as well from J to I so we have simply calculated the directional sensitivity which we call the two eye sensitivity to measure the sensibility of a nation I to the changes in both directions on the link I to J and link J twice and we have observed
that for for instance the relationship between pages and countries so here on this plot every we've represented diagonal sensitivity of top 20 countries of top 20 painters so on the left hand side you have the top 20 painters of the 40 top category I presented earlier and on the bottom we have the 20 most important countries that we have identified worldwide and here how can you read that is that we have calculated to a sensitivity for the top 20 countries on the bottom when the interaction between this country and the painter on the left hand side is modified so how to interpret that is the following is to say for instance that France is mostly impacted by as well Vinci if there is a link between Vinci and France that is being changed the same way for Spain you have pica so that is really important as well in terms of leading painting figure and you can analyze that well pretty important for most of the countries that are presented here so this helped us capture very easily in a synthetic way we need the importance of some specific painters in different cultures and in different for different countries
another different little analysis which follows from this - why sensitivity is what we call the relationship imbalance between two nations here we would like to know on the relationship between two countries maybe which one is the strongest nation which one has the more influence on the other one when it is when when there is a link change in both directions between both countries this relationship in balance is calculated in the following way it is the difference between the two a sensitivity falling for country a observed at country a - the two high sensitivity calculated as well between for violation - way variation between a and E and observed at point B I observed that nationally and this relationship imbalance can be interpreted in the following way if this F between countries am is positive its P that is the strongest nation and is going to meet and if FA B is negative a is the strongest nation and we have planet these outcomes here for the for the case of the 27 EU Network what we can see in terms of relationship imbalance analyzes so it is of course we have only presented half of it because it's a perfectly symmetric matrix x-axis represents country and y-axis represents country B the blue values represent negative values reddish orangish values represent positive values so here we can see that France is really a culture which is dominating in the in this this is English Wikipedia in the English Wikipedia the French has really dominated all other countries in the sensitivity analysis in Germany is another really important one Austria and Italian is an important one as well and Australia dullness we have done the same type of development for work for the worldwide network and you can see that us is really the strong and so strong that the variation of other countries influence on each other is pretty much them by by this so you have us which is important in English Wikipedia French which has a strong impact and German so
this concludes a little bit my talk so here what I wanted to show you is how Google metrics can be nicely leveraged to analyze Wikipedia it really offers a nice framework to automatically learn really grosgrain embedded information and like having a microscopic view of some elements which are really interesting most of the results have shown you our results that are of course pretty obvious we wanted to check whether the type of information that we could obtain was meaningful and reasoner so what is nice to do with Google matrix analysis is that you can of course capture important notes which PageRank or their relative metrics across editions but you can as well exhibit interactions within a soup Network like in this traumatic view of this directed google metrics network with the reduced google metrics and alliances and what is interesting as well is to understand the influence of links and nodes on the network with the sensitivity analysis which was in the for a little internal perspectives google matrix has very nice properties to become for me a major tool for artificial intelligence and automatic information extraction for /aa from such large and very large information networks but there for my media I think we still have to be able to automatically extract this subset of articles the subject set of nodes that we want to investigate further so an automatic procedure that would be able to extract the subset of articles that for a given study and create this sub network would be really interesting to go beyond and haven?t got official intelligence leverage more efficient in this network and as well who you may have seen in Wikipedia is changing every year you are adding a lots of new articles new links etc and how to capture this variation into the into the reduced network at a reduced cost that would be very interesting and trying to understand what other parts which really are important in this evolution or not so most of the works have presented here are to be found in this recent literature so if you need some further explanations more developments don't hesitate to go and look for this paper the tourists are free free access and pretty efficient ok I try to go back to
not sharing my screen yes right
interactions between country [Music] can you see my slides now because I've got sherry okay just what anything I can't find I'm sorry I'm technical matter I can't find them this was yeah I can find it
[Music] this strong in direct relation I noticed
that Finland and Sweden is here and you you have one of those strong terms of the colony I noticed that most red nodes are red elements are about the day probably because of the way you sort about the way we sorted rows and columns countries are sorted by importance right yeah yes my importance I agree so do you have an interpretation of why Finland has in direct relation to Sweden but not the other way around it means that you my interpretation is the following is that you have an important an important number of paths linking Sweden and Finland indirectly and you have a reduced number of direct path compared to the all possible paths that are going out of of Finland Sweden I mean that's here you have a lot of subjects you know which are integrated in Wikipedia you can have two articles that are interconnected related to issues related to politics related to agriculture related to moving to the iOS and a lot of different things and these are most these are cultural strengths that are mostly captured by the direct things while direct links are the ones where you have a direct link on the page of Sweden going to Finland so the fact of having a culture which is closely intertwined for the two countries that thing increases the rate of indirect links that go through the scattering mattress yeah my my my observation was more that the indirect links tend to be more important from less important countries to more important countries that the other way around because the red dots are more about the diagonal then below and sort of related question is about sensitivity when you measure the sensitivity of adding a link from Italy to France I
think why are all the numbers negative you mean that the only yeah there's one that is positive it's friends you see not to me most of the values what friends must be positive because when you add a link to a node because that node increases in school but you mean it all the other ones are negative or but is it an average value as well about three editions but most of them are negative this is true yeah if you have you have an interpretation to this it's like the only country that gets more important is friends yes because it's the one that you are pushing the you increase the probability and then you renormalize though it makes sense that the other ones are negative because you have to compensate for this increased normalizing normalization state that you can see you yes would it make sense to run achieve values because we often see that small countries are under represented and it would maybe make sense to have a relation between the population and
quantity of lives you mean normalization by a number of
inheritance yeah okay did you get this so someone was commenting that it may make sense to normalize the values by not by population because the countries have different sizes so it may make sense to have a normalized did you try this no no more questions no let's thank the speaker again [Music]
Feedback
hidden