Concurrent data structures: CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 30 | |
Autor | ||
Lizenz | CC-Namensnennung 4.0 International: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/52867 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
GraphSystemprogrammierungGraphentheorieROM <Informatik>FestspeicherKnotenmengeEinfügungsdämpfungSchwach besetzte MatrixDeltafunktionMailing-ListeStapeldateiEinfügungsdämpfungDatenstrukturMailing-ListeMultigraphSpeicherabzugNichtlinearer OperatorEnergiedichteSchießverfahrenGesetz <Physik>GRASS <Programm>Kategorie <Mathematik>Arithmetisches MittelHypermediaEinsGlobale OptimierungZahlenbereichTermStrömungsrichtungTwitter <Softwareplattform>sinc-FunktionBildverstehenZweiInstantiierungBitrateKlasse <Mathematik>Demoszene <Programmierung>HalbleiterspeicherGraphGreen-FunktionMathematikKontextbezogenes SystemNummernsystemRuhmasseWellenpaketFlächeninhaltSelbstrepräsentationInformationsspeicherungGraphentheorieStapeldateiReelle ZahlAutonomes SystemProgrammbibliothekPhysikalisches SystemKnotenmengeProzess <Informatik>Endliche ModelltheorieRelativitätstheorieLeistungsbewertungAlgorithmusAnalytische MengeFestspeicherGrößenordnungDeltafunktionKollaboration <Informatik>Beweistheoriet-TestKompakter RaumBeanspruchungArray <Informatik>BitWeb-SeiteComputeranimation
05:19
LeistungsbewertungFontGraphentheorieKompakter RaumArray <Informatik>MultigraphKnotenmengeMailing-ListeKongruenzuntergruppeCachingGeneigte EbeneGrenzschichtablösungPlastikkarteBetriebsmittelverwaltungStetige FunktionDatenparallelitätArithmetisches MittelHalbleiterspeicherMultigraphGraphKnotenmengeMinimalgradKompakter RaumArray <Informatik>sinc-FunktionGlobale OptimierungBetriebsmittelverwaltungZeiger <Informatik>Mailing-ListeCASE <Informatik>PlastikkarteAnalytische FortsetzungDatenstrukturAnalysisFlächeninhaltStellenringGraphentheorieKategorie <Mathematik>Protokoll <Datenverarbeitungssystem>DateiformatSynchronisierungSchaltnetzZahlenbereichMechanismus-Design-TheorieInformationsspeicherungInstantiierungWort <Informatik>WellenpaketRechter WinkelBildschirmmaskeDelisches ProblemBildgebendes VerfahrenFormale SpracheGüte der AnpassungGruppenoperationAutomatische HandlungsplanungMultiplikationsoperatorComputeranimation
10:38
PlastikkarteStetige FunktionBetriebsmittelverwaltungMultigraphKnotenmengeDickeMailing-ListeArray <Informatik>GraphZeiger <Informatik>Fahne <Mathematik>Befehl <Informatik>DatensatzSchreiben <Datenverarbeitung>GraphKnotenmengeSynchronisierungZahlenbereichKategorie <Mathematik>Zeiger <Informatik>Divergente ReiheBildgebendes VerfahrenRechter WinkelComputeranimationDiagramm
11:16
MultigraphZeiger <Informatik>WirbelströmungKnotenmengeDickeGraphFahne <Mathematik>HalbleiterspeicherKategorie <Mathematik>Zeiger <Informatik>KnotenmengeGüte der AnpassungZahlenbereichArray <Informatik>MinimalgradDickeMultigraphArithmetisches MittelWellenpaketTorusComputeranimation
12:31
DickeKnotenmengeMultigraphGraphInverser LimesGraphDatenfeldMultigraphMailing-ListeKnotenmengeAutomatische IndexierungArithmetisches MittelMultiplikationsoperatorInformationsspeicherungKategorie <Mathematik>GeradeComputeranimation
13:04
MultigraphArray <Informatik>Paralleler AlgorithmusROM <Informatik>BetriebsmittelverwaltungKnotenmengeZeiger <Informatik>EinfügungsdämpfungProtokoll <Datenverarbeitungssystem>Kategorie <Mathematik>KnotenmengeKategorie <Mathematik>MultigraphArray <Informatik>Automatische IndexierungZeiger <Informatik>InformationsspeicherungUmwandlungsenthalpieAusdruck <Logik>Arithmetisches MittelRechenschieberProtokoll <Datenverarbeitungssystem>Paralleler AlgorithmusEinfügungsdämpfungMetropolitan area networkStichprobenumfangRichtungWellenpaketWort <Informatik>Einfache GenauigkeitInstantiierungComputeranimation
16:19
Zeiger <Informatik>Array <Informatik>Protokoll <Datenverarbeitungssystem>KnotenmengeEinfügungsdämpfungKategorie <Mathematik>BetriebsmittelverwaltungMultigraphStapeldateiKontextbezogenes SystemMultiplikationThreadDickeFahne <Mathematik>MultiplikationssatzGraphentheorieGraphAlgorithmusMailing-ListeProgrammbibliothekCachingKonfigurationsraumLeistungsbewertungTwitter <Softwareplattform>Green-FunktionRankingWeb-SeiteVirtuelle MaschineGewichtungKnotenmengeWeg <Topologie>Reelle ZahlCASE <Informatik>MinimalgradKontextbezogenes SystemMultigraphKonfigurationsraumSystemprogrammierungZeiger <Informatik>Kompakter RaumArray <Informatik>Kategorie <Mathematik>AnalysisInteraktives FernsehenFahne <Mathematik>DatenparallelitätMinkowski-MetrikMathematische LogikDatenstrukturVerzweigendes ProgrammEinfügungsdämpfungMultiplikationsoperatorLeistungsbewertungProtokoll <Datenverarbeitungssystem>Arithmetisches MittelHypergraphPolygonzugDatenfeldDickeThreadBenutzerfreundlichkeitRichtungSystemaufrufPunktGesetz <Physik>GRASS <Programm>GruppenoperationMathematikBitrateData MiningGemeinsamer SpeicherArithmetische FolgeBildgebendes VerfahrenRechenwerkEnergiedichteZusammenhängender GraphGeradeExt-FunktorRechter WinkelGrenzschichtablösungGarbentheorieNichtlinearer OperatorNeunzehnFlächeninhaltHalbleiterspeicherTwitter <Softwareplattform>Computeranimation
21:58
GraphentheorieAlgorithmusGraphMailing-ListeProgrammbibliothekCachingKonfigurationsraumLeistungsbewertungKnotenmengeGreen-FunktionTwitter <Softwareplattform>Web-SeiteRankingVirtuelle MaschineGewichtungMultigraphRankingTwitter <Softwareplattform>Web-SeiteGewicht <Ausgleichsrechnung>GraphentheorieZusammenhängender GraphWellenpaketProgrammbibliothekCodeAlgorithmusBeanspruchungVirtuelle MaschineMailing-ListeOpen SourceImplementierungSpeicherabzugSocketEreignishorizontGesetz <Physik>Computeranimation
22:46
Web-SeiteRankingThreadTwitter <Softwareplattform>SkalierbarkeitOverhead <Kommunikationstechnik>Twitter <Softwareplattform>GeradeDatenstrukturFestspeicherGraphentheorieSkalierbarkeitComputeranimationDiagramm
23:13
ThreadSkalierbarkeitWeb-SeiteRankingTwitter <Softwareplattform>GrößenordnungMultigraphEinfügungsdämpfungGrößenordnungGraphentheorieZahlenbereichMultigraphThreadVorzeichen <Mathematik>Weg <Topologie>GRASS <Programm>Arithmetisches MittelEinfügungsdämpfungKnotenmengeNichtlinearer OperatorBildschirmmaskeTermInstantiierungGanze ZahlChirurgie <Mathematik>ComputeranimationDiagramm
23:54
GrößenordnungMultigraphEinfügungsdämpfungStapeldateiThreadFestspeicherTwitter <Softwareplattform>ROM <Informatik>Overhead <Kommunikationstechnik>MaßstabBetriebsmittelverwaltungTeilbarkeitArray <Informatik>GraphSkalierbarkeitDatenparallelitätMittelwertAnalytische MengeGesetz <Physik>MatchingZweiDatenstrukturOverhead <Kommunikationstechnik>MultigraphStabilitätstheorie <Logik>HalbleiterspeicherEinfügungsdämpfungMultiplikationsoperatorGraphStapeldateiKontextbezogenes SystemMinkowski-MetrikFlächeninhaltGrößenordnungCASE <Informatik>GraphentheorieDatenparallelitätArithmetisches MittelFestspeicherMailing-ListeSystemprogrammierungSkalierbarkeitSoftwaretestElektronisches ForumMittelwertBildschirmfensterHochdruckKlasse <Mathematik>Bildgebendes VerfahrenKreisflächeARM <Computerarchitektur>Mechatronikt-TestComputeranimation
27:10
GraphSkalierbarkeitDatenparallelitätAnalytische MengeMittelwertOverhead <Kommunikationstechnik>ROM <Informatik>Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:00
My name is Sukena Fiamali and I'm a PhD student in Muhammadi School of Engineers. This work is in collaboration with Oracle Labs and it's called CSR++, a fast, scalable, state-friendly graph data structure. So this work has been published as a conference paper and accepted in the 24th International Conference
00:21
on Principles of Design and it's mainly centered on designing and implementing an efficient data structure to store graphs and to enable mutations, meaning updates on graphs.
00:41
Okay, so to start with, I'm gonna give a bit of context on graph analytics and systems. So initially, the graph models, the graphs allow to model reward data as a relationship between entities called vertices, for example, Ali, Mark and Jane in this example,
01:02
and relationships between them called edges to gain new insights that are not seen when using classic auto models such as relational models. So the graphs systems are actually aimed
01:22
to process real large graph data and they rely on efficient data structures to support the fast, for example, fast read-only workloads, meaning analytic workloads such as algorithms, page rank, wiki-connected components and other algorithms. They also rely on efficient data structures
01:43
to allow fast mutability, meaning that we need to be able to update graphs in a tiny manner in a fast way, but also they need to allow for low memory consumption, low memory footprint. Since we're talking about large scale big data graphs
02:01
that can hold billions of edges and need to be stored efficiently in the memory. So graphs actually as a proof that graphs are becoming very important in today's technology trends.
02:20
We see that Gartner's ranked graphs as number five trend in their top 10 data and analytic technology trends for 2019. So to give more details about the work that we've been carrying on, I'm gonna talk about graph data structures and then mutations.
02:41
So graph representations aim to store vertices and edges in an efficient way. So the classic example of those data structures, we find the HACC list and then CSR, but the main gain from implementing a graph in HACC list is the high end mutability performance.
03:06
CSR on the other hand is a compact way to store graphs. It's the fastest read-only data structures there is. It has really high performance, high analytic performance since we only store next arrays,
03:23
compact arrays of vertices and edges. It has really low memory. So when we're talking about graph updates, we're talking about insertions and then deletion of vertices and edges. There are multiple techniques, but the main ones are actually listed here.
03:42
So we have the in-place updates, meaning that we don't have to perform copies of the whole graphs. We have, so for example, the boost graph library that we're gonna evaluate later. We're gonna show some evaluation numbers on that. Batching meaning that we actually collect
04:01
a number of updates, collect a number of edge insertions, for example, into batches and then perform them at once to allow for optimal update operations. And then we have this technique called snapshotting or creating deltas that is mainly useful
04:23
when we have data structures that are immutable like CSR. Whenever we have to perform analytics and perform scans, we will have to scan both the original data structure, which is for example, CSR, and then we'll have to scan the delta as well.
04:42
So that actually, that is the trade off and that lowers the analytic performance. So this naturally leads to asking the main question, which is like how to enable fast in-place updates while maintaining high analytic and low memory footprint. So the answer is that we developed is CSR++,
05:06
which is a fast scalable update friendly graph data structure. And we're gonna talk more in details about the design ideas and solutions to answer this main question.
05:20
So first of all, we're gonna talk about the design, meaning the vertex edge data structures in CSR++. We're gonna talk more about the properties since we're mainly addressing property graphs in this case. And then we're gonna talk about the update protocol.
05:42
And next we're gonna give some breakdown of the performance analysis of CSR++ and then we're gonna conclude by a small conclusion. So initially, since we talked about the graph data structures, we gave two main examples,
06:03
the adjacency list and then CSR. So the adjacency list allows for per-vertex flexible edge updates, meaning that if you want to update an edge, we only have to, if we want to insert an edge, for example, we only have to access the edge area
06:23
and then perform updates on it. Whereas in CSR, we'll have to copy the whole edge area because the edges are stored in a compact manner in a continuous manner. So in one side, CSR is actually very fast
06:40
since we have compact format. We have fast indexed, array indexed access. So it's more cache-friendly and it allows for better locality. But adjacency list allows for faster updates
07:02
and faster edge area handling in case of mutation. So the idea here is to combine both the area continuity of CSR, meaning that we design CSR++ as compact arrays of vertices, for example,
07:22
with the flexibility of updates of adjacency list. So to present a high-level design of CSR++, we can give the details on how we store the vertices and how we store the edges.
07:42
So the vertices are stored in segments, meaning the graph is actually a list of segments. Each segment stores a fixed number of vertices. And then this way, we have a cache-friendly way to traverse these vertices.
08:02
And then we have this optimization, which is low degree lining, meaning that unlike the adjacency list where every vertex has to have a pointer to its neighbor's list, in this case, in CSR++, we have this optimization
08:21
where in case we have a single neighbor in vertex, we actually embed that edge in the vertex area. That way, we can actually store the actual edge instead of the pointer. That allows for memory optimization. And the edges themselves are actually arrays.
08:42
They are expandable arrays, meaning that when we want to update an array of edges, in case we want to insert a new edge, when we don't have space, we can actually grow that array. We can double, we can allocate double the original size. And that allows for really fast in-place updates.
09:01
And then we, for really fast in-place updates. And then in a multi-threaded context, we mentioned that CSR++ is actually a concurrent data structure. So in case we have multiple writes, we allow for, we have a synchronization mechanism
09:22
that's based on locks. Those locks actually protect each segment from concurrent writes and allows for synchronized updates. So it is worth mentioning that we keep the edges sorted, meaning we keep the semi-sorted property of the graph
09:42
for better cache usage. It actually gives really better performance when we're applying analytic read-only. So basically, we talked about how CSR++
10:01
kind of like combines ideas from agency's list and then CSR to allow for fast analytics, but as well as fast in-place updates. And then we're talking about, when we're talking about CSR, we're talking about compact ways to store entities. So here we have, in CSR++,
10:22
we have compact vertex arrays, as well as compact edge arrays when we're loading the graph, meaning that we have a smart loading by implementing a smart allocation protocol that could allow us to allocate the edges initially
10:42
in a continuous manner. So this, so here we're gonna talk about details on how segments are implemented, are designed. So basically, the graph in CSR++ is implemented as, is represented as chunks called segments
11:01
holding a fixed number of vertices. So here we have an example of a segment. Each segment actually has a lock, and then that lock enables for synchronized writes. So the segment is basically compounded of a lock,
11:22
a list of vertices, which is here, like we have four vertices, for example, and then we have a vector of pointers to vertex properties. We're gonna talk about vertex properties later, but here we're talking about topology, meaning like the vertex and the edge structures. So basically, each vertex here
11:42
holds three information, three data. So we have the length, which means, which stands for the degree of the vertex, the number of neighbors, and then we have a pointer to the neighbor's list. So here we have a union structure,
12:01
meaning that if the length is greater than one, then we're gonna have a pointer to the edge list. But if we have the pointer, if we have the length equal to one, as I said before, we're gonna inline the edge, we're gonna embed it in the segment structure,
12:21
and then we're gonna store it there instead of the pointer. That allows for really good memory consumption. And for the edges, how they're presented, they actually hold three fields. First of all, the deleted flag, and we're gonna talk about deletions later
12:41
in the update protocol, and then the vertex ID, meaning like the index of the vertex in the segment, in the segment, and then the segment ID. Since we have the graph as represented as a list of segments, we need to actually locate the edge in which segment it is.
13:04
So we're going to talk in this slide about how CSR++ stores properties. So we need to store basically vertex properties and then edge properties. So in this example, we have, for example, two segments, and then the way we store
13:20
vertex properties in CSR++ is in arrays that are parallel to the segments. So as I mentioned before, each segment holds a vector of pointers, and each pointer actually points to an array for vertex properties. So for example, if we have the index of a vertex,
13:41
we can deduce the index of the property value of that vertex in a given vertex property array. So this way, this actually allows for fast index accesses to the vertex properties. So for edge properties, actually,
14:00
we have the same segmentation approach as for edges. That is to say, we have for each edge array, we have a parallel vertex, we have a parallel edge property array, which stores the edge property values in an aligned matter. So for example, we have two edge properties,
14:23
example here below, so each, so all the edge property values for the two properties are actually stored in the same array, but we have this logical formula to access the edge property values for a given edge
14:41
and for a given index. So this way actually allows for a better fast per-vertex update, meaning that if you want, for example, to update an edge, we don't have to copy the whole edge property array of the whole segment,
15:04
of the whole vertices of the same segment, but we actually can only update a certain given specific edge property array in a separate way. That allows also for easy relocation
15:21
of the property arrays and then easier reordering. Next, I'm gonna talk about, so now that we've talked about the design of the topology, meaning how to restore vertices and how we store edges and then properties,
15:41
we're gonna talk about the update protocol of CSR++. So the update actually, the updates are, can be defined as three main update protocols, the vertex insertion, the edge insertions, and then the deletions for both vertices and edges.
16:03
So for vertex insertion, we're gonna start with this one. So this slide presents how the vertex insertion is implemented. So to be able to insert a new vertex, we always check for the last segment. So usually, after we load, we populate the segments,
16:26
we have either some space left in the last segment or we don't. So in case we have, we're gonna actually make this vertex valid, meaning that we're gonna,
16:40
so initially, the length and the degree of the vertex is actually minus one. So that's how we know that the vertex is valid. So in case, so we're gonna check for the last segment and see if we have enough space. And then if we do, we're gonna find the first vertex,
17:00
first invalid vertex, and then make it valid by incrementing the degree of that vertex. So the other case is if the last segment is full, actually allocate a new segment and then allocate a new vertex property arrays along with it. And then the way we actually keep track
17:23
of those newly allocated segments that we have this interaction layer that we have here. So this interaction layer is nothing but just an array of pointers to those segments. And if we want to allocate a new segment, we only have to copy this to expand this array
17:42
by doing like a copy-on-write. And this is usually very fast. So the only cost here, as I mentioned here, is the copy-on-write of the segment, pointer array, what we call the interaction layer. So next is the edge insertion update protocol.
18:04
So as I mentioned before, we have in CSR++, we implement growable expandable arrays meaning that when we want to insert a new edge, if we have enough space, if we don't have enough space in an array, we actually double the allocate new array
18:24
and then double the size of the original and then perform copy-on-write. We use relocation method in C++, but this can be actually implemented in a smarter way and then allowing for faster copy-on-write,
18:43
faster relocation. So the thing here is that, for example, here we have four edges. So in order to insert a new edge, we actually double the edge array and then we keep space, we keep extra space for new incoming edges.
19:03
So basically, CSR++ is actually a concurrent data structure. So we said that we protect each segment with locks. So in our case, as we were gonna mention later, in evaluation section, we actually evaluate CSR++ in a multithreaded context
19:22
and we allow for concurrent writes into the edge and then the edge arrays. So to be able to have synchronized and to protect those write operations, we actually use lightweight spin locks
19:42
and every time a thread actually can hold a lock and then perform the updates on a certain edge array. For deletions, for the update protocol, to perform deletions, so we have logical, we have implemented logical deletions
20:00
of vertices and edges. So for vertices, as I mentioned before, we have this length field that stands for the degree of the vertex, which is initially minus one. So minus one means like valid vertex. So the obvious way to,
20:24
the optimal way to actually implement the logical deletion for vertices is to use that field and then set it to actually minus one if you want to mark it as deleted. For edges, we have a separate deleted flag
20:40
and then we have to set it to one to mark it as deleted. The cost of this is the fact that we'll have to add extra conditional branches in traversal for when deletions become very frequent, meaning that we have a lot of Xs in segment arrays
21:01
and then in edge arrays, we're allowed to perform compaction. By compaction, we mean like we're gonna reuse the space that was used by the deleted vertices or edges and then kind of like compact the arrays and then remove the unused space basically.
21:25
That compaction would mean actually like a whole copy of the segments, the vertex arrays, et cetera. It is very expensive, but like in real world, in real use cases, deletions are not that frequent.
21:48
So next, we're gonna talk about the performance analysis breakdown of C++ compared to our system. I'm gonna give the performance analysis configuration first. So the graphs that we use are LiveJournal, Twitter.
22:03
So LiveJournal has 68 million edges. Twitter has 1.4 billion edges. For the algorithms, we used a page rank, weakly connected components, breadth-first search, and then weighted page rank. And then for the evaluated data structures, we used TSR implementation in Greenwall.
22:22
We used the adjacency list implementation from the BoostGraph library. And then we used LLAMA, the open source code of LLAMA. For the machine, we used a two socket 36 core machine that has 384 gigabytes of RAM.
22:43
So first, we're gonna start with read-only workloads. This is the example of the two page rank algorithms that we perform on Twitter and the LiveJournal graphs. So as we see here, C++ is very close to the best read-only data structure, which is CSR.
23:03
It is also close enough to LLAMA with only a 50% overhead slowdown compared to CSR. This shows that CSR++ is very well scalable data structures
23:21
and it scales well with the number of threads as well as the size of the graphs. So next, we're gonna talk about the in-place updates. So the in-place updates, meaning perform edge insertions or vertex insertions. This example, we give some numbers
23:40
on the edge insertion performance of CSR++. We show that even with a single-threaded update perform, update operation, we reach up to one order of magnitude faster than CSR performance. Here we have 363 seconds compared to 40 seconds with CSR++.
24:02
So in this slide, we talk about the in-place updates performance of CSR++ versus LLAMA. What we did is that we applied 1,000 batches of new insertions of new edges on CSR++ and LLAMA. And then as we see here in this figure, we have LLAMA memory usage that explodes
24:23
after applying around 380 batches. But whereas CSR++ continues to run the in-place updates and while consuming almost a stable memory footprint
24:42
along the way. So this increase in memory causes the system to run out of memory. So next, we're gonna talk about the memory consumption, which is a very big challenge when designing the data structures. So we know that we evaluated CSR++ against CSR and LLAMA.
25:04
And then we found out that CSR++ has a moderate memory overhead of 33% compared to CSR, both for small graphs and large graphs. And then it scales well after updating the data structures. So what we did here is basically
25:24
test CSR++ and LLAMA in both contexts when the first context is where we don't update the graph meaning like we only read the graph. And then we found out that this is like 33% average overhead over compared to CSR here.
25:44
And then we actually tested our case where after applying mutations. So as we mentioned before, in CSR++ design is that we have to allocate double the size of the original edge areas
26:03
whenever we have to add new edges, whenever we don't have enough space so that we have that extra space for new incoming edges. That's what's causing this memory overhead. Whereas in LLAMA, we actually have to create new snapshots
26:22
every time we have to add new edges. So as a conclusion, we showed that CSR++ is a new scalable concurrent graph data structure that relies on segmentation technique. We also showed that CSR++ achieves
26:40
best of the both worlds from CSR design in adjacency list. It allows for close analytic performance to CSR which is the fastest read-only data structure with only 10% overhead on average. It also allows for fast mutability meaning like we have up to two times speed up compared to LLAMA.
27:01
And then it has a low memory footprint overhead, only 33% overhead compared to CSR. So that is all, thank you for watching.