5th HLF – Lecture: Approximate Elimination
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 49 | |
Author | ||
License | No Open Access License: German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties. | |
Identifiers | 10.5446/40131 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
2
00:00
Internet forumBit rateLinear programmingSmoothingGraph theoryMathematicsFields MedalMathematical analysisCartesian coordinate systemField (computer science)Numerical analysisLecture/Conference
00:33
Internet forumMatrix (mathematics)Operator (mathematics)ResultantComputational complexity theoryAlgorithmMultilaterationObservational studyDreiecksmatrixMathematics1 (number)TheoremComputer scienceApplied mathematicsNeuroinformatikField (computer science)Gaussian eliminationImage resolutionFamilyInverse elementContext awarenessRow (database)QuicksortType theoryComputer fontDivisorLaplace-OperatorPresentation of a groupAxiom of choiceProduct (business)Order (biology)System of linear equationsBitPerfect groupCASE <Informatik>CollaborationismGoodness of fitState of matterPhysical systemFormal languageArithmetic meanTranslation (relic)Cartesian coordinate systemDifferent (Kate Ryan album)Computer animationLecture/Conference
05:58
Internet forumBit rateRankingDivisorInverse elementDreiecksmatrixMatrix (mathematics)Row (database)Symmetric matrixGraph coloringMultiplication signCASE <Informatik>Product (business)Vector spaceDegree (graph theory)Gaussian eliminationDescriptive statisticsSystem of linear equationsRandom matrixOrder (biology)Operator (mathematics)ThetafunktionAdditionPoint (geometry)NumberTerm (mathematics)Correspondence (mathematics)Form (programming)Laplace-OperatorEquationSubstitute goodAlgorithmDifferenz <Mathematik>Domain nameUniverse (mathematics)SummierbarkeitBinary multiplierTriangleCubeVariable (mathematics)Cholesky-VerfahrenLecture/Conference
11:23
Bit rateInternet forumMatrix (mathematics)AlgorithmDivisorResultantVector spaceNeuroinformatikApproximationSystem of linear equationsVariable (mathematics)MultiplicationCubeMultiplication signArithmetic meanOrder (biology)TheorySparse matrixDirection (geometry)Lecture/Conference
13:06
Internet forumBit rateGraph (mathematics)Functional (mathematics)SummierbarkeitoutputDifferent (Kate Ryan album)Vector spaceMatrix (mathematics)Negative numberQuadratic formBinary multiplierWeightVertex (graph theory)Graph (mathematics)Procedural programmingApproximation1 (number)Definite quadratic formNumberGoodness of fitImage resolutionNormal (geometry)Identity managementInverse elementPhysical systemOrder (biology)ResultantDivisorAlgorithmTerm (mathematics)Form (programming)Point (geometry)TheoremRootLaplace-OperatorMultiplication signDiagonalLoginParameter (computer programming)Limit of a functionSpectrum (functional analysis)SoftwareControl flowEigenvalues and eigenvectorsMachine learningMathematicsSystem of linear equationsGraph theoryMathematical optimizationSocial classPhysicalismRadical (chemistry)Absolute valueEquationConnected spaceConstraint (mathematics)Real numberRow (database)Dimensional analysisComputational scienceGaussian eliminationReading (process)WhiteboardDegree (graph theory)CASE <Informatik>Boundary value problemPosition operatorSign (mathematics)Computer animation
19:02
Internet forumBit rateAlgorithmStochastic processGaussian eliminationRow (database)Operator (mathematics)Random matrixMultiplicationSummierbarkeitRankingMatrix (mathematics)Degree (graph theory)TheoryLaplace-OperatorPerfect groupResultantArithmetic meanBitDrop (liquid)DivisorWordExpected valueInequality (mathematics)ApproximationCASE <Informatik>PermutationVertex (graph theory)Multiplication signOrder (biology)Proof theoryClique problemLoginWeightNumberGraph (mathematics)Descriptive statisticsBound stateMereologyProcess (computing)Graph (mathematics)Symmetric matrixAttribute grammarMathematical analysisBuildingDifferent (Kate Ryan album)Lecture/Conference
24:58
Internet forumBit rateNumberMaxima and minimaMathematical analysisPairwise comparisonPhysical systemAlgorithmMehrgitterverfahren2 (number)HypothesisCASE <Informatik>Linear algebraGroup actionGaussian eliminationAlgebraInstance (computer science)Multiplication signCartesian coordinate systemVideo gameoutputStudent's t-testHeuristicQuicksortGraph (mathematics)Numeral (linguistics)EquationMonster groupGraph (mathematics)Point (geometry)Order (biology)Stochastic processConjugate gradient methodImage resolutionRandomizationConjugacy classComplete metric spaceMultiplicationDifferent (Kate Ryan album)System of linear equationsRandom graphApproximationsalgorithmusCombinatoricsComputer animationLecture/Conference
27:50
Internet forumAlgorithmDataflowMaxima and minimaMultiplication signInterior (topology)AlgebraSystem call2 (number)Run time (program lifecycle phase)Physical systemDigitizingGame controllerMatrix (mathematics)outputControl flowMathematicsComputer animationLecture/ConferenceMeeting/Interview
29:22
Internet forumBit rateForm (programming)Graph theoryLinear equationSoftwareGraph (mathematics)Student's t-testTheoremLaplace-OperatorSocial classChainMatrix (mathematics)Spectrum (functional analysis)Physical systemFamilySoftware developerAddressing modeWeb 2.0LinearizationLink (knot theory)AlgorithmPointer (computer programming)EquationAreaWeb pageFatou-MengePlotterComputer animationLecture/Conference
Transcript: English(auto-generated)
00:01
My name is Helga Holton, and I'm the secretary of the International Mathematical Union that is responsible for awarding two of the prizes included in this week, namely the Nevan-Lina prize and the Fields Medal, and we will have one Nevan-Lina prize winner and one Fields Medalist in this session. So it's a pleasure for me to introduce Daniel Spielman.
00:22
He was born in 1970, and he received the Nevan-Lina prize in 2010 for his work on smooth analysis of linear programming and then and the application of graph theory to numerical computation. He holds a PhD from MIT, but is currently a professor at Yale University. He is a Henry Ford professor of computer science,
00:44
mathematics, and application of mathematics, which basically covers the complete interest of this audience. So Dan, the floor is yours. Okay. Thank you very much. This is going to be a slightly more technical talk than the ones that we saw this morning.
01:02
I'm going to tell you about some very fast algorithms for solving systems of linear equations in Laplacian matrices, which are a type of matrix I like a lot. So to give you an outline, I will first remind you of the first algorithm we usually learn that we recognize as an algorithm when we're learning it. That is Gaussian elimination.
01:23
I mean earlier in education, we are taught algorithms. We're taught how to multiply and add, but when we do that I don't think we realize they're algorithms. When you learn Gaussian elimination, I think that is when it clicks in your brain, ah, this is an algorithm. So we'll remind you of how that works, and I will try to give you a somewhat different presentation of it than you originally learned.
01:45
Then I will tell you about Laplacian matrices. That's a very nice family of matrices for which we are going to be able to solve systems of linear equations faster than we could through Gaussian elimination. And in particular, I'll tell you about a modification of Gaussian elimination we call approximate elimination, and it is roughly what you think to begin with. We're going to not do elimination exactly.
02:06
We'll do it approximately, and it'll be much faster. And I'll tell you how we can use that for Laplacian matrices, and then I'll give you some experimental results. So to begin, I want to put this research in context. This takes place in the field that analyzes the complexity of algorithms.
02:23
So people like myself who analyze the complexity of algorithms are interested in designing and really mathematically proving theorems about algorithms for mathematically well specified problems. And I should emphasize that later restriction. For example,
02:40
I don't, I can't study algorithms for say translating English into German and back, because that is not a mathematically well specified problems. And even people who understand both languages might disagree about the correct translation of a sentence. I am interested in problems where we have an abstract mathematical definition of the problem, like say solving systems of linear equations.
03:05
Then when I'm working on those problems, I'm interested in finding the best algorithms I can. Now what does best mean? For me usually means the fastest algorithms, the algorithms that require as few operations as possible. And if we're getting approximate solutions, then the algorithms that are as accurate as possible.
03:24
Now I want to give you a little bit of a summary of the state of the field from my long-term collaborator and good friend, Shenghua Tang. He said, unfortunately, there are almost no perfect results in algorithms. So what did he mean by this? I mean, he's not a pessimist. He's a very happy guy.
03:43
But perfect is a lot to hope for. What you want for a perfect algorithmic result is something where there's a beautiful theorem, where also the same algorithm works in practice, and you're solving a problem people actually want to solve.
04:02
So, okay, the last is actually a tall order. Once you restrict yourself to solving things people really want to solve, they're not as many. And during this talk, I think I'll explain a little bit about why the first two are different. But let me begin by giving you a rough idea. When you are designing an algorithm, you are making design choices. So the first design choices we usually make are trying to make an algorithm that we can prove a theorem about.
04:27
That actually really restricts your choice of what you are going to do. It means there are certain complicated things you will probably avoid doing. It's very different than the sort of choices you will make if you want to make an algorithm work as well in
04:40
practice as possible. And it is only sort of miraculous cases that you can unify the two. So this talk is about a result where I think we are close to such a perfect result. If someone can prove a theorem about the algorithm I will explain at the end of the talk, then I think we'll have gotten there.
05:01
Okay, so let me review Gaussian elimination, and I hope you will bear with me. Most of you are familiar with it, but I'll hopefully give you a slightly different take on it. So first, at least if you're taking numerical linear algebra, the way you learn Gaussian elimination is you learn that if I want to solve systems of linear equations in a matrix A,
05:21
I write it as a product of a lower triangular matrix L and an upper triangular matrix U. We call this an LU factorization, like this here. And then if I want to solve systems of linear equations in A, like I want to solve systems like AX equals B, what I do is I take B and I multiply it by L inverse and U inverse, and that gives me the solution.
05:45
Now you might be wondering why not just compute A inverse? Well, sort of that is what you were doing along the way. But your row operations and column operations you learn in Gaussian elimination give you this matrix L and U. And actually it's better to have this L and U than to have A inverse.
06:01
The reason is that to multiply, or you don't, to multiply L inverse and U inverse by B, you don't actually have to ever form L inverse and U inverse. You probably once learned, or implicitly, though no one may have made it explicit, that when you want to solve a system of linear equations in a triangular matrix, whether it's forward or backward substitution,
06:24
you can do this in time proportional to the number of nonzero entries in that matrix without ever forming the inverse. So this is the key point. We don't need to form L and U inverse in order to apply those operators. And this is good because often it's possible to make L and U inverse, sorry, L and U
06:42
be mostly zeros, in which case you can apply L inverse and U inverse very quickly. However, L inverse and U inverse might be dense and have, you know, as many entries as possible. Okay, I'm going to always be talking about symmetric matrices in this talk, just to tell you sometimes a new Gaussian elimination symmetrically,
07:03
that's called Cholesky factorization. The important point is you can take L to be U transpose. So I'll talk about finding a factorization of A as U transpose U. Another warning, most people instead talk about L L transpose instead of U U transpose, but for us the letter L is reserved for the Laplacian, so I won't write it again for a lower triangular matrix.
07:23
Okay, so let me remind you how Gaussian elimination works, and I don't want to describe in terms of row and column operations. To tell you in advance, the reason is row and column operations on matrices are multiplicative. I'm going to want to apply tools from random matrix theory, which apply much more naturally to sums of random matrices.
07:43
So I need an additive description of Gaussian elimination. Okay, here's how Gaussian elimination works additively. You take your matrix find a rank one matrix that agrees, or the rank one matrix, that agrees with your matrix on the first row and column.
08:00
Oh, I'm sorry, my first row and column were supposed to be highlighted. That's unfortunately, that color is not showing up. But okay, so you should see that the matrix I've written down below here agrees with my initial matrix on the first row and column. And you can tell it's a rank one matrix because I've written it as a vector times its transpose.
08:20
That's going to be rank one. You find the rank one matrix that agrees with the first row and column, and then subtract it off. And after you subtract it off, you have a matrix which is zeros in the first row and column. And when you do that, that corresponds to eliminating the first variable and the first equation.
08:41
That's what actually goes on when you're eliminating the first variable. Okay, so now we proceed inductively. You take your matrix and find the rank one matrix that agrees on the next row and column of it. Again, not highlighted, unfortunately, but here it is. And I will write that as an outer product of a vector with itself,
09:00
so you can tell it's rank one. And then we subtract that off. So, yes, so okay. So, Bill Kahn knows too much about elimination. This can go wrong. You might not, sometimes you have to permute your rows and columns.
09:22
But we're going to ignore that nicety for now. But at least I promise you that with Laplacian matrices, we will never have to. So, we will subtract that rank one matrix, and you keep going. And when you're done, what happens at the end is you've taken your initial matrix and written it as
09:43
a sum of rank one matrices. Each was the outer product of a vector with itself, in this case, because it's a symmetric matrix. And if you then take those vectors and assemble them into matrices, you can write the matrix as a lower triangular matrix times an upper triangular matrix.
10:00
You just literally take the vectors that were in those outer products and you assemble them together. This is how Gaussian elimination gives you an LU factorization. If I want to write it as U transpose U, I just take the upper triangular matrix and write it as transpose. And this is the Scholesky factorization that I want for symmetric matrices.
10:22
Okay, now I care about the running time of this algorithm. The dominant step in this algorithm is the subtraction of rank one matrices and updating things. When I subtract one matrix from another, the amount of time that takes will be the number of entries in the matrix I'm subtracting. And that time is proportional to the square of the number of nonzero entries in the vectors
10:43
I have. Because each matrix I'm subtracting was, I take one of those vectors and I multiply it by its transpose. So it's really nonzero entries in those vectors that cause these things to take a lot of time. And if you add it all up, if I have an n by n matrix, the running time of this algorithm I've described is theta of n cubed.
11:02
It is at most n cubed and it typically is n cubed. n cubed is really bad for me. I don't like n cubed. That's a big number. One reason I don't like that is most of the matrices I encounter, even if they're n by n, are usually sparse. That means they would only have to order n nonzero entries.
11:21
Turns out even if A, your initial matrix, is mostly zeros it's still true that this algorithm that I described to you will take n cubed steps around. At least depending on the matrix it can. And again, I don't like that because I'm often solving systems of linear equations with millions of variables. 10 to the 6 is okay. 10 to the 6 cubed is 10 to the 18. That's too long.
11:43
Now some of you know you can speed this up. There are algorithms like fast matrix multiplication starting with the work of Strassen which now with, I'm saying, order n to the 2.37. It's a way of rearranging these computations. It gives you the same result. That's asymptotically faster. Still not fast enough for me for the sparse matrices that I also care about.
12:03
But I should also mention not a perfect result because the constant hidden in front means that you really can't actually use it on matrices that we care about. Not at the sizes that we really have. Okay, so how are we going to try to speed this up? We're going to do is we're going to not go for an exact factorization. We'll go for an approximate factorization.
12:24
You might naively begin to think well let's do that by taking these vectors and I don't know if you can see as I transitioned. Try to just make many of the entries zero to compensate I rounded some others up. That doesn't quite work, but we're going to do something close to that. Where we're going to make many of these entries zero and get a good approximate factorization, and I want to mention the first work
12:46
I know that went in this direction was a paper of Ken Clarkson, which had some very, it was a beautiful idea with a lot of experiments and the theory had not yet been developed. Okay, but before I tell you about how we're going to do that, I need to know
13:01
what does it mean to approximate one matrix by another? Or what is a sufficient approximation for our purposes today? So I will say that one positive semi-definite matrix, I'll really care about those, A will be an epsilon approximation of a matrix B. If A inverse B is very close to the identity. In my case, I want to say the norm of A inverse B minus identity is at most epsilon.
13:24
That's one way of defining it. An equivalent way is you take a look at the quadratic forms, which is what I usually do. So I want that for all real vectors X, the quadratic form in B is approximately equal to the quadratic form in A. So you take X transpose B X divided by X transpose A X, that should always be close to one.
13:43
What this gets you is if B is a good approximation of A, then you can solve systems of linear equations in A by solving systems of linear equations in B. It turns out if I want to figure out how to solve X equals a little vector B, unfortunately it's the same word, you solve the system in B, that gives you a good approximation.
14:01
And then you compute the residual and you do it again and you do it again. And in a few steps of solving systems in B, you can solve systems in A. So for my purposes now, I just need to be able to get a good approximation of any matrix A that I can solve and be able to solve systems in that approximation quickly.
14:20
The matrices we're going to apply this to are Laplacian matrices of graphs. These are the first ones for which we can analyze a procedure like this rigorously. So here's a very brief introduction to Laplacian matrices of graphs. Let's say I have a graph. I'll define it in the Laplacian first by the Laplacian quadratic form.
14:40
So the quadratic form takes a function that assigns a real number to every vertex. Or if you like number the vertices one through n, and then you take an n-dimensional vector as input. What the Laplacian quadratic form does is it computes the sum of the squares of the differences of that function across the edges in the graph. So as you sum up overall edges,
15:01
take a look at x of A minus x of B, where A and B are the vertices at the endpoint of the edge, square that. And if it's a weighted graph, multiply by the weight. Or weights, by the way, I have to make positive. Can't deal with negative weights. That's the Laplacian quadratic form. If you want the matrix, here's what the matrix looks like. So let's say here's a graph.
15:21
For every edge in the graph, we have a non-zero off-diagonal in the matrix. If all the edges in the graph of weight one like I do here, the off-diagonals would be minus one. So for example, I have a red edge between node two and node six that corresponds to the minus one in row two and column six, and vice versa, row six and column two.
15:43
The diagonal entries are the degrees of nodes, the number of edges attached to them. For our purposes, really what matters is that the off-diagonals are non-positive, and the matrix is diagonally dominant, meaning the diagonals are at least as big as the sums of absolute values in the rows and column. So one of the first places we encounter these, just to remind you,
16:01
was in a physics class at some point, when you studied resistor networks, probably. So if you take a look at a network of resistors, and you do something like attach two of the nodes to the terminals of a battery, what physics tells you is that the voltages that will, the current that will flow in the induced voltages will minimize the Laplacian quadratic form subject to the boundary constraints.
16:24
So actually they will minimize that Laplacian quadratic form up there, where each edge corresponds to a resistor, those are the induced voltages. And the reason I've written the resistance inverse is, I think of the weight of an edge as one over resistance. Zero weight for me is no edge.
16:40
No edge is infinite resistance. Very, very low resistance is what you get when you have a very strong connection and a very high weight edge. Okay, I can speak for hours about Laplacian matrices. For now, let me just mention they come up in a lot of places. I teach a course in spectral graph theory. That's mostly studying eigenvalues and eigenvectors of the Laplacian matrix.
17:02
They come up, of course, in scientific computing, and the resistor networks is just one example. They come up a bunch in machine learning and in optimization, which I will mention later. If you want to maximize or minimize a function over a graph, and you're doing something like doing it by Newton's method, your Hessian is usually a Laplacian matrix,
17:22
very small change to it. So they come up a lot in optimization. Okay, there's been a lot of work on solving systems of equations in Laplacians. One of the end results of this is you can do it very, very quickly. So the fastest algorithm we know right now of Cohen, King, Pachuki, Peng, and Rao
17:41
solve systems of linear equations in the Laplacian matrices in time, I'll break this down, order m. m is the number of non-zeros in the matrix times the square root of log n, where log n is the dimension, times a log in one over epsilon, where epsilon is the accuracy parameter. Okay, let's not worry about what I mean by approximately solving, but log in anything is small.
18:01
So log accuracy is good. The key point here is this is less time than it would take you to sort the non-zero entries in the off diagonals of the matrix. That would be order n log n, right? This is m root log n. So this is incredibly fast asymptotically.
18:20
It is not yet a perfect theorem because we can't really implement this algorithm. It's pretty complicated, and the terms hidden in the big O are big. Well, actually, we could implement it, but it would be slower than algorithms that look asymptotically worse. The jumping point for this talk is a result that tells us, by Lee, Richard Peng, and myself,
18:42
that tells us that Laplacian matrices do have very sparse approximate factorizations. By the way, the first result doesn't have anything to do with elimination. We know that every Laplacian matrix can be written in the form, sorry, approximated by a matrix of the form u transpose u, where u only has order n non-zero entries,
19:00
where n is the dimension. So we know that very sparse approximate factorizations exist, and the question is, can you get them? So King and Sachdeva give us a way of getting something almost that good. So this is now how they give it. I'll give you the brief story for approximate elimination for Laplacian matrices.
19:23
So the first thing you should know about Laplacians is they are closed under these elimination operations. So if I start out with a Laplacian matrix and I subtract off the rank one matrix that agrees on the first row and column, I get another Laplacian. That's very useful. And we can understand what this does to the graph.
19:40
When I eliminate the first row and column that corresponds to removing, say, node one from this graph, I will get rid of that node and the edge is attached to it, and I get a new graph where I now have added a clique on the neighbors of node one. It's a weighted clique, but we can figure out exactly what it is and has a very nice description. What King and Sachdeva suggest doing
20:01
is when we eliminate a node, instead of adding that clique, we'll add a sparse approximation of that clique, and we know how to make one. And actually, I'll tell you exactly how they make one. The most important thing though is they can make it without ever building the clique itself. Building the clique would take too long or could take too long. And this is the first step towards the elimination.
20:21
So what they do, I'm missing something, is if they eliminate a node, let's say it had degree D, in this case degree five, on its neighbors they will make a graph with D edges. Every one of its neighbors will get one edge that it picks, and it picks that edge by choosing another neighbor
20:42
with probability proportional to the weight of the edges to node one. At least it's a very short description. You can look up the details later, but I promise you that does it. It's a very, very simple algorithm. You can implement this algorithm. There are two caveats to making the theory work.
21:02
By the way, I just told you what one elimination looks like. You just keep doing that, and you've eliminated everything. Okay, so what do you need to make the theory work? First, you need to actually begin by subdividing every edge. This doesn't necessarily work if you start with the original graph. We take every edge and we make order log squared copies of it.
21:21
Then the algorithm works, but it means you've blown up the number of edges by order log squared, and you've decreased their weight. It also needs a random permutation of the vertices to begin. Then we're all done. This is time order m log cubed in. Now, m log cubed in is not asymptotically as good as the algorithms I've shown you before, but the constant in the big O is really small,
21:41
and this algorithm we've implemented, it's pretty good. It's not the fastest, but it's okay. I should tell you just one word about how you analyze this. You're looking at a random process as they do this. It's not hard to show that the expectation is correct.
22:02
It's very easy to show that the expectation of u transpose u is equal to Laplacian. If you write this by looking at Gaussian elimination as the sum of random matrices the correct way, well, they're not independent random matrices like you'd like. It's a matrix martingale. Each step depends on the previous, but there have been a lot of recent advances
22:22
in random matrix theory, and some of them are summarized in a beautiful paper by Tropp in which he includes bounds for matrix martingales. And you can use what's called his matrix Friedman inequality to analyze this process and prove that it gives you a good approximation. And this is where the random permutation and the copying edges came from.
22:41
And it turns out, by the way, that both of those which were necessary for the theory also are necessary in practice, meaning they're fairly simple graphs that break without either of those. Okay, so now let me tell you how we will improve this algorithm just a little bit in practice. I don't have a theory yet. For exactly why this improvement works,
23:03
if you can get that, then you've got the perfect result that I'm looking for. So here's what I will do a little differently. Instead of thinking about Gaussian elimination, eliminating a row and column all at once, let's do it more slowly. Right, the way we first learn it. The first thing you learn to do is do row operations. Let's say I want to eliminate that non-zero,
23:22
that minus two. That's the purple edge between node one and node two. I do it by taking a multiple of the first row and adding it to the second, and that will eliminate that zero, and then it adds in some more non-zeros. They're further down. Those are those purple edges between node two and node three, just so you see those weren't there before.
23:40
Okay, that does the elimination, but I want to keep things symmetric. So I'm going to do the same operation on the column, and now I've gotten rid of that zero and I have a symmetric matrix, but it's got two new edges. I'm now going to do something that seems unnecessary, but it helps. I'm going to scale the first row and column. In this case, I multiply them both by six sevenths.
24:02
That keeps this matrix a Laplacian matrix. It's going to be very important for me to keep everything a Laplacian, at least for the analysis. After that rescaling, here's where we do something. Oh, here's our result. Here's where we do something approximate. I've added two new edges there. We're only going to keep one of them. How do I choose it?
24:21
I choose it with probability proportional to its weight. So with probability seven twelfths, I choose the edge of weight one, and I round up its weight and I get rid of the other, or with probability five twelfths, I choose the other edge. So of the edges we've added, we only keep one with probability proportional to its weight.
24:42
So here's what we've done. We started out with the matrix on the left. I've just shown you part of it. I've eliminated the edge between node one and two, and in the process created one new edge and changed the weights on some others. This is the algorithm. You just keep doing this. This works very well.
25:00
Now, okay, I'd like to thump my chest and say this is the best algorithm there is. Life is never so clean. I mean, when I teach undergraduate algorithms, I teach students an algorithm, and then I teach them an algorithm that is always better. That usually doesn't happen. How an algorithm performs depends on your input instances. So I'll give you a comparison of a few algorithms.
25:23
So ours here is the approximate elimination. I'll also mention an algorithm called combinatorial multigrid due to Yanis Koudis. It doesn't have an analysis. It appears in a thesis with analysis of other algorithms, but this one doesn't. And the lean algebraic multigrid, which in this case also doesn't have an analysis, but is very useful in a lot of cases.
25:43
Okay, so what do we run these on? Well, the first things a lot of people try is say, let's take a look at a grid graph, a thousand by a thousand grid. In this case, combinatorial multigrid is really fast. It solves the system in three seconds. Ours is around six, and LAMG is around 15 seconds. So at this point you might think,
26:01
okay, I'm going to use a CMG. But you don't always care about 2D grids. I don't know, maybe a 3D grid, a hundred by a hundred by a hundred. Okay, the running times are going to be in the similar orders. Not much has changed yet. But there are other systems to consider. What about a random graph? Well, if I run on a random graph,
26:21
things are fairly different. The algorithms take a little longer. Now CMG and LAMG both beat our approximation algorithm, but I don't feel so bad because there's another algorithm called the conjugate gradient, which is 10 times faster on these systems. The conjugate gradient is a classic. And if you want to solve random systems of equations or systems of equations in random graphs,
26:40
that's what you use. How about a preferential attachment graph? That's the sort of things that comes up a lot in social networks. Well, now if you look at a preferential attachment, LAMG is the fastest algorithm. Incomplete Cholesky, one of the long standing great heuristics in numerical linear algebra
27:01
is the fastest I know on preferential attachment graphs. So that wins here. Okay, so you're beginning to get the idea it depends on your graph. So how am I going to compare these? Well, one, I have a generator that makes difficult graphs. I call it chimera. It combines different graphs together to make sort of monsters and I use it to break a lot of algorithms.
27:20
So here's just one run that was particularly nice for us. This chimera, our algorithm took 28 seconds. CMG was 10 times longer and Lean Algebraic Multigrade was three times longer. Okay, but that's just a generator. I don't really want to solve those. The real reason I care about solving systems
27:40
of Laplacian equations is they come up in a lot of applications. I'm using it as a subroutine. So what happens when I start using it as a subroutine? Well, one group and I recently have been implemented in algorithms for the minimum cost flow problem. They use interior point methods. They call Newton's method. They use Laplacian solvers as a subroutine. Here are just two run times that you get out of this.
28:04
Our algorithm on both calls took around six seconds. On one of them, Lean Algebraic Multigrade took a long time and say another, they took much longer than ours. Okay, so here I get to thump my chest and say, yes, on these particular minimum cost flow problems we were doing very well, let me tell you what I take away in summary from this.
28:23
This algorithm that I've described to you is incredibly consistent. It consistently solves systems. I was running it to get eight digits of accuracy. The time it takes is about 300 to 500,000 non-zero entries per second.
28:40
It never deviates from that too much. These other algorithms are sometimes faster on some inputs, but sometimes they take 10 times longer. Sometimes I can't even wait and I hit control C. And actually, you really got to do that when you're comparing things. This algorithm is very reliable when you want to use it as a subroutine
29:01
and cannot tolerate very inconsistent run times. But I will say the matrix martingale techniques we use to analyze the algorithm of King and Satchdeva break down when you're trying to analyze this particular algorithm that I know, which is very similar to theirs. It has a few changes, so we don't know how to analyze it yet. But it's very reasonable to assume that you could.
29:23
Okay, I will stop there with a few pointers to where you could learn more about this. So this algorithm is implemented in a package called Laplacians.jl written in Julia. If you want to do experiments with Laplacians, there's a lot of related code there. On my web page, I have a link titled something like Laplacian linear equations,
29:41
sparsification, and so on, where you can find references to papers on these topics and some of the latest developments have been some very exciting developments about applying ideas like this to other families of linear equations. I'm most excited about work done, most of it at MIT coming out on solving systems and directed Laplacian equations, lets you analyze Markov chains and things
30:01
in a really revolutionary way. Some of this I teach, so I have class notes for my classes on spectral graph theory and graphs and networks. So if you want to learn a lot of how the theorems are proved related to this material, there are students, or sorry, people who were students who wrote great theses
30:20
in these areas. I recommend Richard Pang, Aaron Sidford, Yintat Lee, and most recently, Rasmus King. And finally, there's a wonderful monograph about Laplacian matrices and solving Laplacian equations called LX equals B by Nishi Fichinoi. Thank you.