We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

LIBRSB: Universal Sparse BLAS Library

00:00

Formal Metadata

Title
LIBRSB: Universal Sparse BLAS Library
Subtitle
A highly interoperable Library for Sparse Basic Linear Algebra Subroutines and more for Multicore CPUs
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many science and engineering models reduce to problems involving huge sparse matrices -- matrices where most of the values are zeroes. Such computations are resource-intensive (time, memory, energy), and much research was devoted into data structures ("formats") and algorithms leading to fast sparse matrix operations. Yet, most such formats are highly specialized and seldom make it into a solid software package apt for general use. The RSB (Recursive Sparse Blocks) data structure is a format that addresses performance concerns for current shared-memory multicore CPUs, while also avoiding dead ends in terms of usability. The LIBRSB library implements RSB with all the necessary operations to manipulate sparse matrices in most computations, in the most popular programming languages, and on many hardware platforms. This talk will give an overview of the concepts behind LIBRSB and its main usage modes. Intended audience: Developers of Linear Systems Solvers based on Iterative Methods, or General Computing Packages. Expected prior knowledge: Familiarity in any of C, C++, Fortran, Python, GNU Octave.
Core dumpAlgebraBefehlsprozessorMultiplicationSparse matrixLinear mapSupercomputerNichtlineares GleichungssystemSystem programmingElement (mathematics)TriangleMatrix (mathematics)IterationFlagFunction (mathematics)Alpha (investment)GradientOctaveConjugacy classPopulation densityComputer configurationInternet forumExtension (kinesiology)Parallel computingAxiom of choiceRepresentation (politics)Revision controlGraphics processing unitSparse matrixOperator (mathematics)Stability theoryArray data structureSystem of linear equationsEqualiser (mathematics)Multiplication signKernel (computing)ResultantIterationMatrix (mathematics)Element (mathematics)Physical systemCore dumpComputational fluid dynamicsBounded variationDifferent (Kate Ryan album)ImplementationCodePopulation densityData structureLinear algebraLinearizationBitWritingComplex numberComputational scienceLoop (music)Chemical equationCASE <Informatik>NeuroinformatikProfil (magazine)Execution unitQuantumOctaveNumerical analysisFluidSimulationDigital electronicsRectangleTable (information)MultiplicationUniverse (mathematics)
Extension (kinesiology)BefehlsprozessorRepresentation (politics)Parallel computingAxiom of choiceFunction (mathematics)Revision controlGraphics processing unitSample (statistics)Sparse matrixComputer programCone penetration testRecursionBlock (periodic table)Computer configurationCache (computing)File formatMatrix (mathematics)Shared memoryBuildingIterationOctaveArray data structureModule (mathematics)Mathematical optimizationJava appletDefault (computer science)SoftwareDerivation (linguistics)FreewareMatrix (mathematics)Core dumpMultiplication signOperator (mathematics)Wrapper (data mining)Revision controlBitFunction (mathematics)BuildingOctaveThread (computing)ImplementationCodeSparse matrixFormal languageOrder (biology)Direction (geometry)Data structurePositional notationType theorySocial classComputer programAutomationExtension (kinesiology)Mathematical optimizationBlock (periodic table)Array data structurePlug-in (computing)Interface (computing)Recursion1 (number)Open sourceComputer configurationSlide ruleCache (computing)Graph coloringInformationTypprüfungNeuroinformatikFlagSymmetry (physics)WindowIterationMulti-core processorDistribution (mathematics)Instance (computer science)Computer animation
Program flowchart
Transcript: English(auto-generated)
All right, we're going to start the next talk. If you're going to stay in the room, please take a seat. If you want to leave, please leave.
OK, so next speaker is Michele, who's going to talk about a universal sparse BLAST library.
So, yes. At the core of many technical or scientific computing problems, we end up with reduce the problem to solving a system of linear equations.
If the system of linear equations were a simple one, like a two by two one, the methods of solving will be pretty simple. And in any case, it would involve representing the linear systems by the data structure of a matrix.
So a table of symbols, or usually numbers, and a few vectors of numbers. So the matrix is the basic structure of scientific computing. In the case of such toys problems or school problems,
we have exact direct solutions at our disposal, which works fine. However, once we go into the problems involving simulation of larger domains, so engineering problems, those linear systems to be solved become large.
And also the methods that we use for smaller systems are not applyable here anymore, because the numerical stability of, let's say, toy problems or small problems, the stability is not here anymore.
Simply with those methods, numbers, results diverge. And the time to solution also increases more than exponentially. So they're simply infeasible and don't make sense. So it's a different, we use completely different techniques
for large linear systems. So furthermore, if the systems were not only large, but also full of zeros in the matrices, so how do we call the systems, or what do we have to do with here?
We have to do, perhaps, with sparse systems or sparse problems. This is the way we call it. So in this acoustics matrix, or matrix coming from acoustics, we observe that less than half percent of each row,
on the average, has a non-zero element. So we would call this sparse systems, system, perhaps. So the system coming from this matrix. Indeed, usually we are happy with the definition of Jim Wilkinson, where we say a problem or a matrix is sparse.
If we can, with our technology, which our technique, we can make use of the amount of zeros there to our advantage. So this is the definition. It's not about numbers, it's really about what we are able to do with the way
the matrix looks like. So among the different matrices we can encounter, we could have matrices from a circuit simulation, which looks like this, and have such clustered elements in them. Sometimes the elements are more clustered
around the diagonal, like in this quantum, I think, quantum chroma dynamics, I think, matrix. Computational fluid dynamics, matrices a bit more regular, you could say. So it means that you can exploit all of those matrices, perhaps, in different ways. This is why I'm showing you this gallery.
This is another CFD, so computational fluid dynamics matrix, a structural matrix, another material problem matrix, structural, and so on. This is also CFD1. So this was just to tell you that sparsity really
is related to the technologies, the technology we use to deal with it. So usually we are happy using iterative methods with sparse systems. Iterative methods, because something is being iterated. So there is a loop. And with the most common methods, Krylov methods,
the loop usually has a bottleneck, has a core operation, which is prominently multiplying the system matrix by one vector or many vectors. Depends a bit on the technique. There are several of them.
Here I'm showing a new octave implementation of such one iterative method. So there are two kernel operations, or main operations, multiplication of the matrix by many vectors, or let's say another dense matrix, or the triangular solve, so the solving of matrices
which are called preconditional matrices, but are sparse. And these are the core operations which we are interested in. And I want to mention that those operations for the sparse matrix vector or multi-vector operation can have many variants.
The variants can be on the matrix, which could be perhaps complex and Hermitian and or symmetric. It doesn't have always to be square. It could be any rectangular. And perhaps it has already a unit diagonal, and we can exploit this. This is what I'm saying this.
Many things change if the matrix has a different, has a complex number, so long complex numbers like a speaker before me spoke about. So that changes the balances in the performance profile here. And other things might change.
And all of this have influence on the specific kernels. And if you think like Ludovic has spoken about the different variants that one might want to build over different architectures, you see that you end up with code bloat if you really want to optimize each subcase. Also, the operands have their own variants.
So in the way the data are laid in the dense matrix. Similarly, for the triangular solve operation, there also you have different variants, which leads to a multitude of different kernels
or ways you wish to write them, kernels of code. So this leads to a committee of people end of the 90s to meet together, it was mostly US people, but also with delegation from Europe to standardize an API which they called sparse BLAS,
sparse basic linear algebra subroutines, to somehow just give an API to the different variations of the operations that I spoke about. So it's mostly, it's not like full BLAS, if you know the dense BLAS, it's mostly about creating sparse matrices, destroying them and doing a few operations,
not only those ones, but these are really the core operations. And they talked about C and Fortran because 20 years ago, 20 something years ago was the final document which they finalized. Now, after 20 years, we could say that, well, what they've wrote, especially this, especially in my opinion, is perfectly portable,
allows some parallelization, even if it's not specified at all. They didn't foresee extensions, but it's possible. If you look at the API, you see that you can have extensions, so they're not blocked somehow. The namesake of sparse BLAS
has been copied by every major vendor you can imagine. The sad thing that each major vendor has completely violated their API, so they changed something in a slightly incompatible way, which is sad, it's simply sad. And the original sparse BLAS didn't think about the GPUs, but actually, in my experience,
looking at how people program code, I see so much technical depth that I think you can do compromises, and with small adaptions, you could adapt the sparse BLAS to the GPU computations to some extent. So I think you can save this API to a good extent.
And this is the reason why I'm here. So I've wrote a library which respects the original sparse BLAS. So I see sparse BLAS program can look like this, where you have a notation for the sparse BLAS operations, which is logical if you know BLAS a bit, so you can understand it a bit.
Going in the direction of my library, it's centered, it's around a data format, a sparse matrix format, which I came up with. It's called recursive sparse blocks, because there is a recursive subdivisions, there are blocks which are sparse.
And the reason, the motivation for this data structure is to not exclude the sparse BLAS operations. So I have made compromises in order to allow sparse BLAS operations to be there. I didn't want to preclude these operations. So it's a compromise. And the core idea here is to have,
let's say, cache size blocks, more or less, and a way to give each multi-core core something to work with. So it's oriented towards multi-core, it's not for GPUs, or not at the moment at least. So the matrices which you have seen before,
with this data format, the data structure looks a bit like this. The color is based on the population, on the amount of matrices are there. Then there is another core coding with other information. But this is just to tell you that the irregular aspect of those matrices
is reflected also here, to some extent. So the library itself wants to provide sparse BLAS. So building blocks for iterative solvers is pretty compatible at the library.
It works with C, C++, Fortune, Octave, and Python. I say it's quite compatible. So it uses, let's say, established technologies, and it's quite compatible also in the sense with your software. It doesn't require you to use. The only data structure which is custom is the matrix. You don't need extra data structures for vectors.
And the program you saw before written in the sparse BLAS for using RSV uses just one extra init and finalized function. I really respect that API. But however, it's nice to write also the 15 standard,
or I'm joking, this is not the 15 standard, but just internal API. So if you want, perhaps you can exploit the internal API of liberal SB, or not internal, but the native one. Please tell me when I'm at the menus. Yeah, which is primarily in C.
Then you have wrappers with C++. And there is also the Fortune one. These are the native APIs. And what is specific about RSV is that the blocking is not so clear which blocking is best.
Because yeah, depending on how you block, you could have better or worse performance. And for this reason, there is an idea of using automated empirical optimization in this library. There is a special call, a function which you call
where you invest time to ask the library to optimize a bit the data structure. So you sacrifice a minute perhaps for optimizing the data structure a bit. And you do this in the hope that the many hours
which we'll be using this matrix afterwards will be, will profit, will be decreased thanks to the optimization. Because as I said, this is meant to be used for iterative methods. So you will be running this for many hours. And therefore, spending a few minutes in automated optimization,
it's something that should pay off. No guarantee, but that's the idea and that is usually how it goes. To give an idea, this C++ API is what you would expect. So there is a class templated on the type. So there is type safety here. When you say, this is my library.
Sorry, this is my matrix. These are my non-zeros because this is what we are representing here. We have flags, C-style flags for options like symmetry or asking for discarding zeros rather than keeping the zeros because sometimes you want to keep structural zeros
for modifying them later, for instance. So you have many such options here. And this is the way, this is why I'm showing this slide to tell you that there are many options which I'm not showing you here. Yeah, and the only data structure here is the RSV matrix. No other custom stuff. And you can exploit, you can enjoy the spam interface
of C++20 that doesn't really force you to have any weird custom vector type apart from the standard C++ ones. If you want to use, for instance, GNU Octave and enjoy the multicore speedup from librsb, you can use the sparseRSB plugin which I wrote
which uses C++, librsb pretty efficiently. So apart from a few conversions, it should have almost native performance. Similarly for Python, the PyRSB plugin
for standalone package has an interface which is copied from CSR matrix. So you use it mostly the same way, but underneath, librsb runs. You don't see it.
Or you see it if you ask it to use the auto-tuning routine because as I said, in all of those language implementations you can also use all of the functionality of librsb which I think includes the auto-tuning also here in Octave. And I want to stress this.
GNU Octave doesn't have sparse multi-threaded sparse operations. With librsb, you can have them. Same for scipy-sparse. As far as I know, it's not multi-threaded. With librsb, you get it. Librsb is, by default, is licensed as lesser GPL 3,
which means as long as you don't modify it, you can distribute it with your proprietary code. If you modify it, well, it's more complicated. You have to release the modified version. The librsb library is, if you want to learn to use it,
it makes absolutely sense to use a packaged version from Debian Ubuntu or most of Linux distributions. Or if you use Windows and you can use Seguin. Or once you want the performance,
I mean, you can just compile it by yourself because it's quite trivial. Or enjoy what our colleagues here from SPAC and EasyBuild have done and use the packaged version from those distributions. And some people have written wrappers for Rust and Julia.
I don't know these languages, so I didn't use them. I think the Rust one is like the entire API. I think Julia is more in Julia style, so it's just the core functionality is there, I think. Yeah, that was everything.
I don't know how much time did I take. 15 minutes. So, thanks.