Concurrent Programming Made Simple
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 199 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32500 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2014112 / 199
2
3
10
11
12
13
14
17
18
19
23
24
25
27
29
45
46
47
48
49
50
51
52
55
56
57
58
65
67
68
70
72
74
75
76
77
78
79
81
82
84
86
87
89
90
93
94
100
101
102
103
104
106
107
109
110
111
112
113
114
115
119
122
123
124
126
127
128
129
137
139
141
143
145
147
150
154
155
158
161
163
164
165
166
168
171
174
175
176
179
182
183
185
188
190
191
192
194
195
196
00:00
Total S.A.Computer virusParallel portConcurrency (computer science)Database transactionSemiconductor memoryDemosceneStudent's t-testData storage deviceSystem callWordComputer programmingMultiplication signRight angleArithmetic meanXMLComputer animationLecture/ConferenceMeeting/Interview
00:40
Group actionConcurrency (computer science)Line (geometry)Computer programmingMultiplication signState of matterSemiconductor memoryAssociative propertyLevel (video gaming)Shared memorySynchronizationDatabase transactionAtomic numberContext awarenessJSONXMLLecture/ConferenceMeeting/InterviewUMLComputer animation
01:27
DatabaseAtomic numberMessage passingComputerWebsiteComputer hardwareBitCombinational logicXML
02:15
Atomic numberWebsiteRight angleGroup actionSet (mathematics)Computer hardwareSoftware testingLecture/Conference
03:04
Database transactionSemiconductor memoryCodeSequenceAbstractionAreaMachine visionComputer programmingProgrammer (hardware)ImplementationAtomic numberDifferent (Kate Ryan album)XMLComputer animationUML
03:43
Right angleComputer hardwareDatabase transactionStatement (computer science)Vector potentialData storage deviceConstructor (object-oriented programming)CASE <Informatik>ConsistencyVirtual machineComputer programmingSemiconductor memoryHigh-level programming languageSemantics (computer science)CompilerTask (computing)Functional programmingSingle-precision floating-point formatWorkstation <Musikinstrument>QuicksortTupleProgramming languageStandard deviationMereologyReal numberImplementationFocus (optics)Revision controlCompilation albumExecution unitCodePerturbation theoryLogic programmingArithmetic meanShared memoryInstance (computer science)Assembly languageLevel (video gaming)AbstractionProcedural programmingTape driveObservational studyAtomic numberModal logicPressureWebsiteGroup actionSinc functionSoftwareSocial classMultiplication signStructural loadComputer programMixed realityData typeBefehlsprozessor
07:08
Latent heatDatabase transactionAttribute grammarSlide ruleFunctional programmingNumberWeb 2.0Semantics (computer science)SynchronizationConcurrency (computer science)Pointer (computer programming)System callInformationEndliche ModelltheorieSemiconductor memoryTheory of relativitySequenceArithmetic meanRight angleLevel (video gaming)CASE <Informatik>Programming languageAutomationOrder (biology)MultiplicationImplementationWordProduct (business)Computer programmingThread (computing)DatabaseFlagXMLLecture/Conference
09:37
Database transactionComputer programmingResultantBit rateReading (process)Thread (computing)Semiconductor memoryCompilerComputer programmingImplementationModulare ProgrammierungLecture/ConferenceXML
10:23
Generic programmingCartesian coordinate systemSynchronizationAtomic numberInsertion lossMetadataAssociative propertyDefault (computer science)Limit (category theory)Programmer (hardware)Functional programmingImplementationDatabase transactionDeadlockState of matterXML
11:05
SynchronizationDatabase transactionArithmetic progressionFunctional programmingConcurrency (computer science)Thread (computing)Order (biology)Electronic mailing listElement (mathematics)CompilerRight angleBlock (periodic table)Interface (computing)Integrated development environmentCellular automatonSemiconductor memoryGeneric programmingInsertion lossLecture/ConferenceXML
12:29
Database transactionElectronic mailing listThread (computing)ConsistencyImplementation2 (number)Stability theoryRight angleGeneric programmingLibrary (computing)WebsiteCompilerRun time (program lifecycle phase)Lecture/ConferenceXML
13:23
Exception handlingDatabase transactionProgrammer (hardware)CompilerCodeAtomic numberComputer programmingParsingMultiplication signCompilation albumLibrary (computing)Data storage deviceRun time (program lifecycle phase)Functional programmingSemiconductor memoryMechanism designStructural loadHookingLecture/ConferenceXML
14:32
Structural loadCodeResultantLibrary (computing)Database transactionRun time (program lifecycle phase)Data storage devicePoint (geometry)ImplementationArmRight angleSynchronizationElectric generatorTerm (mathematics)BitCompilerCASE <Informatik>Functional programmingMultiplication signWebsiteOverhead (computing)System callLecture/ConferenceXML
15:29
CASE <Informatik>Atomic numberImplementationSoftwareRun time (program lifecycle phase)Semiconductor memoryDifferent (Kate Ryan album)Computer hardwareDatabase transactionModemAlgorithmBitMappingSet (mathematics)Exception handlingDefault (computer science)CodeAsynchronous Transfer ModeLoginLibrary (computing)PowerPCLecture/ConferenceXML
16:43
Database transactionOffice suiteSemiconductor memory1 (number)Channel capacityQuicksortComputer hardwareLecture/Conference
17:25
Computer hardwareDatabase transactionCodeScanning tunneling microscopeHybrid computerOverhead (computing)Slide ruleGroup actionPoint (geometry)XMLLecture/Conference
18:05
Chemical equationSemiconductor memoryRight angleMultiplication signConcurrency (computer science)Overhead (computing)Mathematical optimizationImplementationPoint (geometry)CASE <Informatik>Arithmetic progressionUsabilityCode refactoringCompilerLimit (category theory)Order (biology)Database transactionCodeComputer hardwareResource allocationAbstractionSlide ruleProgrammer (hardware)System callLevel (video gaming)Buffer solutionLink (knot theory)PlotterElectric generatorTunisSoftwareHybrid computerAlgorithmCodeBitCache (computing)XML
20:50
Computer programmingSemiconductor memoryImplementationComputer programCartesian coordinate systemGame controllerDatabase transactionRevision controlCode refactoringCodeResource allocationOvalPoint (geometry)Programmer (hardware)Numbering schemeProteinArmPower (physics)Pattern languageStructured programmingLecture/Conference
21:36
Sheaf (mathematics)Overhead (computing)Scaling (geometry)CodeComputer hardwareEstimatorAtomic numberSystem callMathematicsTunisRevision controlScanning tunneling microscopeRight angleScalabilityConstructor (object-oriented programming)Limit (category theory)WorkloadSemiconductor memorySingle-precision floating-point formatEndliche ModelltheorieForm (programming)Database transactionData structureThread (computing)Statistical dispersionElement (mathematics)Real numberMultiplicationPhysical lawXML
23:57
Hybrid computerImplementationScanning tunneling microscopeHidden Markov modelSupport vector machineCartesian coordinate systemRun time (program lifecycle phase)Software bugTraffic reportingLatent heatPoint (geometry)FeedbackVideoconferencingSynchronizationDifferent (Kate Ryan album)Library (computing)Extension (kinesiology)ArmMechanism designBlogCivil engineeringProgrammer (hardware)Term (mathematics)Multiplication signGraph (mathematics)SequelLecture/ConferenceComputer animation
25:31
Graph coloringImplementationGroup actionObservational studyLatent heatCodeTunisOrder (biology)Virtual machineRegulator genePoint (geometry)System programmingStack (abstract data type)Database transactionIntegrated development environmentIndependence (probability theory)Position operatorArithmetic meanDynamical systemCartesian coordinate systemSemiconductor memoryAbstractionData storage deviceDifferent (Kate Ryan album)Device driverTerm (mathematics)Distribution (mathematics)Set (mathematics)Electronic mailing listBuildingPrice indexTelecommunicationMultiplication signCellular automatonSoftwareRule of inferenceLogic programmingLecture/ConferenceXMLMeeting/Interview
28:10
Fault-tolerant systemWater vaporFacebookScaling (geometry)Replication (computing)BitCASE <Informatik>Virtual machineSubsetSystem programmingNatural numberConsistencyDatabase transactionGame theoryServer (computing)Translation (relic)Rule of inferenceReflection (mathematics)Point (geometry)Position operatorMultiplication signSkewnessProgrammer (hardware)Sheaf (mathematics)SequenceSoftware bugConcurrency (computer science)Client (computing)AbstractionLecture/ConferenceMeeting/InterviewComputer animationDiagram
31:14
SequenceCASE <Informatik>ScalabilityBootstrap aggregatingDatabase transactionSoftware frameworkCommunications protocolReplication (computing)Point (geometry)Computer programWordProgrammer (hardware)Direction (geometry)Flow separationPartial derivativeBuildingGoogolMeeting/InterviewXML
32:30
Domain nameLatent heatCartesian coordinate systemCodeComputer programModeling languageImplementationSocial classGame controllerInteractive televisionProgramming languageProcess (computing)Structured programmingDemosceneMedical imagingComputer fileMultiplication signSoftware frameworkProgrammer (hardware)Theory of relativityInterface (computing)BitData storage deviceRight angleMusical ensembleFocus (optics)Goodness of fitDatabase transactionEuler anglesCASE <Informatik>Expert systemComputerOnline helpSystem programmingMappingContext awarenessData managementDistribution (mathematics)Instance (computer science)Configuration spaceJava appletCountingOpen sourceCodeIntrusion detection systemConcurrency (computer science)Local ringBoiling pointView (database)Different (Kate Ryan album)Endliche ModelltheorieFront and back endsFlow separationMeeting/InterviewXML
36:50
Programmer (hardware)Database transactionSerializabilityCodeCartesian coordinate systemControl systemOpen sourceBoilerplate (text)MereologyBitProcess (computing)Front and back endsComputer programWebsiteConsistencyDistribution (mathematics)Domain nameDatabaseReplication (computing)Term (mathematics)Concurrency (computer science)Universe (mathematics)Structural loadInstance (computer science)Software frameworkAlgorithmAtomic numberCategory of beingComputer fileConfiguration spaceField (computer science)Reading (process)TelecommunicationHybrid computerGroup actionCASE <Informatik>Multiplication signPoint (geometry)SineSystem programmingComputerSoftware bugSocial classMeeting/InterviewXML
39:47
Universe (mathematics)Point (geometry)Line (geometry)MereologyEndliche ModelltheorieLevel (video gaming)Functional programmingDomain nameInstance (computer science)Object-oriented programmingStudent's t-testCodeSystem programmingArithmetic meanMeeting/Interview
40:33
Computer programCartesian coordinate systemSubject indexingService (economics)Attribute grammarSystem programmingScalabilityServer (computing)1 (number)Virtual machineAxiom of choiceCodeTask (computing)Local ringOperator (mathematics)Software frameworkCASE <Informatik>Drop (liquid)CodeCellular automatonOvalTheory of relativityRun time (program lifecycle phase)Complex (psychology)Atomic numberState of matterFormal grammarUniform resource locatorVector spaceContext awarenessXML
42:52
Mechanism designProgrammer (hardware)Computer programmingWebsiteSoftwareCodeLine (geometry)Multiplication signComplex (psychology)InformationExpressionOpen sourceUniverse (mathematics)Software frameworkProjective planeComputer programDifferent (Kate Ryan album)1 (number)Computer programmingLecture/ConferenceJSONXML
44:18
Data managementInstance (computer science)Front and back endsMereologyPoint cloudConnectivity (graph theory)ConsistencyRule of inferenceSequelSystem callScalabilityWorkstation <Musikinstrument>Lecture/ConferenceMeeting/InterviewJSONXMLUML
45:47
Centralizer and normalizerCASE <Informatik>Database transactionComputer programAbstractionProgramming languageIndependence (probability theory)Connectivity (graph theory)Virtual machineDifferent (Kate Ryan album)Domain nameCode refactoringFront and back endsData managementRight angleService (economics)Lecture/Conference
46:45
SequelVirtual machineSoftware frameworkMereologyTrailVisualization (computer graphics)Compilation albumNumberCompilerElectronic program guideSystem programmingMultiplication signConsistencyDistribution (mathematics)CASE <Informatik>Flow separationThermal conductivityImplementationCellular automatonDatabase transactionInterface (computing)Programming languageSemiconductor memoryGroup actionSet (mathematics)Computer programmingObservational studyImage resolutionEvent horizonLogic programmingFront and back endsResultantSubsetPoint (geometry)Heegaard splittingTouch typingComputer programReplication (computing)JSONXMLUMLLecture/Conference
Transcript: English(auto-generated)
00:01
Torvald Riegel and Nuno Dix. Thank you very much. Hello. Welcome. And my name is Torvald Riegel. I work for Red Hat's toolchain team on various things concurrent and parallel. And I'm Nuno Dix, a PhD student from Portugal. And we'll tell you something about transactional memory today.
00:23
So before I start to get to the beef, let's actually set the scene here, right? So this talk is about concurrent programming. And concurrent means that things happen at the same time, but they're not independent, right? So this is slightly different to parallel programming, where it says they also
00:41
happen at the same time, but they're usually independent, you know, like parallel lines. And when you have concurrent actions, but usually, you know, one conceptual program state, you need to make sure that the concurrent actions actually synchronize, right? So that it can make sense out of the state transitions.
01:00
And when you look at multicores, for example, we have shared memory, so shared state. And we need to synchronize on a shared memory, so we have this thing called shared memory synchronization. Then you add transactions and then you get the name transactional memory where it comes from. Another topic or another concept that is very important is atomicity.
01:23
And in the context of shared memory synchronization, atomicity means that something happens as an indivisible step, right? If you're a database person, think about this as atomicity and isolation combined. And this is why we have hardware instructions such as x86 compare-exchange.
01:41
And to illustrate this a little bit further why atomicity is important, let us act out a small situation that everybody of you will have experienced already, right? So sometimes you have something like a narrow hallway, right? Just people need to pass each other.
02:01
So we're walking to each other and usually somebody will step to the side, but sometimes the people do like this, you know? And then they bump into each other. We can't let computers do that. So what is the problem here? The problem is that we are facing each other and I see the side and I think it's empty, right? Nuno sees the same side and also sees things that's empty.
02:22
And then we do the same steps, right? We're in the same situation and this is what's happening. So what's going on? The problem here is a lack of atomicity. If these two steps of seeing the side and stepping over would be indivisible, we wouldn't have a problem, right?
02:41
I see it, I go to the side, and then Nuno makes his turn, right? Or on the other way around. So if we had the two actions atomic, then we would not have any problems. So this shows you how atomicity is important, right? So what we're doing here is something like a test and set that you have on hardware as well. So this is how you can remember why atomicity is important.
03:09
And so coming back to the transactional memory, if you want to remember one thing about transactional memory, then please let it be this, right?
03:20
Transactional memory is a programming abstraction and it allows programmers to declare which sequences of code should be atomic instead of requiring to actually implement how this is done. This is the underlying vision, this is how all the different kinds of TM, you know, what they give you.
03:40
And instead of the programmer, we have generic implementations that ensures atomicity. So this is not specific to a particular program or something like that, but it's generic, right? And it can be purely software, it can be new hardware, it can be a mix of hardware and software. And our focus on this talk will be TM for high-level programming languages.
04:05
So what we're going to do is that we have a first part in which I will talk about TM for shared memory on a single machine. I'll give you an overview of the proposed C and C++ language constructs. I'll give a peek into Jesus's implementation of that and I'll make some comments on performance.
04:23
Next up, Nuno will talk about TM for distributed shared memory, so multiple machines, but with a shared memory abstraction on them. He will discuss strong consistency and implementation of top-off and fini-span. Afterwards, we have a short Q&A.
04:40
If there's something very short, quick, you know, clarity question that you have, scream it. If it's short, I'll answer it, everything else goes at the end, please. And one note that I also want to send out is that TM is still rather new. The idea, the first idea has been proposed 20 years ago, but real research didn't start until about 10 years ago.
05:06
And it's still ongoing, heavily. Standardization for C and C++ started about five years ago when people got together informally, and we have ISO C++ study groups since mid-2012. There's GCC support on the C and C++ sites since version 4.7, and we're formally also getting real hardware TM implementations out there.
05:28
So, Zuul has been out there for a long time, but special purpose, arguably. Blue gene, kind of like it, and it fills real mainstream CPU that has transactional memory hardware on it.
05:40
Intel's Haswell, right? If you have a new laptop, you actually might have it, a new workstation with that chip. So how do the C and C++ language constructs look like? So, we have this transaction atomic construct there. And this allows a programmer to declare that the following compound statement must execute atomically.
06:02
So in this case, the load to X and the potential load to Y and the store to Y will all be executed as one indivisible step. I'll explain the specific semantics later on. There are no data annotations necessary, or no special data types necessary, right?
06:25
And in those transactions, you can run existing sequential code, you can do function calls, you can have nested transactions, and so on. However, the code in transactions must be what we call transaction safe.
06:41
And unsafe code, for example, is the use of locks or low-level atomics, assembler instructions, volatile memory accesses, or simply functions not known to be safe. However, the compiler will actually check whether your code is safe, right?
07:00
So you get a compile time checking guarantee for that. And because functions are known to be safe, for cross-compilation unit calls, or for indirect functions, or function pointers, you need to annotate your functions with this transaction safe attribute. For example, you say here that you make the claim that foo is actually transaction safe.
07:26
For further information, have a look at the C++ specification. The best is to just web search for the paper number there, this n thing. Now, this is concurrency, and so there are nasty little details. So here's a slide about the synchronization semantics.
07:44
Transactions extend the C++11 memory model. So the memory model is the thing in the language specification that defines the execution of multi-threaded programs, right? Which behavior are they actually allowed to have? So transactions, or the semantics of transactions, are integrated into this memory model.
08:04
And the basic guarantee is that all transactions are totally ordered. And this order contributes to the happens-before relation. Happens-before is basically the relation that models what happens ordered after what other thing in your program if you have a multi-threaded execution.
08:23
And TM ensures some valid order, so the TM implementation ensures some valid order that is consistent with happens-before for every execution. In other words, this means that one transaction happens after the other, and it makes sense with respect to the other synchronization going on in the program.
08:44
So if you have the intuitive meaning, it's actually pretty easy to understand, right? Note, however, that this does not imply sequential execution, right? This is a correctness guarantee. If the transactions are actually independent, the implementation is also free to execute them in parallel, for example.
09:03
But it allows you to reason about the transactions as if they would be executed strictly one after the other. However, there's a gotcha, and that is that database freedom is still required. That is the case throughout all C and C++, right?
09:21
Whether you're using locks, low-level atomics, whatever. And it means that if your program has a data race, you get undefined behavior. So I have a small example there about publishing data. And you have one thread doing initialization of the data, and then it uses some transaction to make the data by setting a flag.
09:43
And the other thread uses a transaction to use the data if it has been published. This here is correct usage. Doing something like that is not, because if you do the temporary read in front of it, you have a data race in your program. And as a result, you get undefined behavior.
10:05
But as I said, this is something that you will have to consider anyway with the C and C++ programs. And the reason why we have that is that this actually allows efficient implementations for the compiler.
10:20
One of the biggest benefits of transactional memory is that it supports modular programming. And so that means, this is because programmers don't need to manage the association between shared data and the synchronization metadata. So shared data here is application data, things that you actually need to synchronize on and define your program state.
10:44
And the synchronization metadata is something like locks. And the reason for that is that because we have a generic TM implementation that takes care of ensuring the atomicity. So that's one thing. For example, programmers don't need to have a locking convention, or don't need to follow it.
11:02
The second thing that kind of follows from that is that functions containing only transactional synchronization compose without deadlock. So the nesting order of transactions, for example, does not matter. And however, you cannot expect another thread to make progress in an atomic transaction, or concurrent with an atomic transaction.
11:24
This wouldn't make sense, right? You're requiring the compiler and the environment to make your transaction into one indivisible step. You can't go then, oh, this is one indivisible step, and oh, and by the way, in between, I want this other thread to make another step. So this would be conceptually incompatible.
11:44
And at the bottom, I'm giving you an example for that, right? Comparing transactional memory to lists. So the example is that we want to move one element from one list to another. And I have this very little, neat, small generic function here, where I remove from one list, insert into the other.
12:04
Synchronizing that or making that thread safe with an atomic transaction is easy. You just put it in an atomic transaction, so you move AB and element 23. If you do this with locks, or try to do this with locks, it becomes difficult. First of all, where do you put the locks, right?
12:21
Are they in the list, or are they externally to the list? So if they're in the list, you need to add an interface for the move functions, for which you can query the locks that the lists actually have in them, right? If you have done that, the next problem is that, how do you acquire the locks in a consistent order?
12:40
Somebody is calling move AB, and another thread is calling move BA, right? If you just acquire the lock for the list 1 here first, and then list 2 second, you'll run into deadlocks. You don't have that problem with transaction memory, because the implementation will take care of this. If you try to do something as generic like that, with locks, it becomes messy quickly.
13:07
Switching gears a little, so how do we do it actually in the implementation? So, here's an overlook of, or a peek into GCC's implementation. And it has two sides to it. One is the compiler side, and the other is the TM runtime library side.
13:24
And there are three steps essentially that we do in the compiler, except parsing and stuff like that. So first we ensure the atomicity guarantee. We don't ensure that your program is data race free, but if we can assume that your program is data race free,
13:42
then we can ensure at compile time atomicity for your transactions. And this works because we check all transaction safe code, whether it is indeed safe. And transaction safe code is the code that is implicitly transaction safe, because you call code from a place that should be transaction safe, or whether you have annotated the functions accordingly.
14:04
Once we have checked all this code, we create an instrument that clone of this code. And so this code is then the transaction safe functions and so on. And what we do essentially is that we take all the memory loads and stores, and rewrite them so that they are called into a TM runtime library,
14:21
that will then perform the memory loads and stores. So we hook into the load and store mechanism essentially. Similarly we redirect the function calls to the clones, and as a result we have both an instrumented code path, where the loads and stores are intercepted, and an un-instrumented one where things are just like before.
14:42
And then finally we generate begin and commit code for each transaction. And one thing that's worth noting is that we allow the runtime library to decide on every transaction start, whether un-instrumented or instrumented code should be executed.
15:00
And the whole point of delegating to a runtime library is really that we get implementation flexibility. So we don't need to generate synchronization code in the compile time. Or at compile time and by the compiler. But we redirect, we have a little bit more overhead in terms of the function calls, but then we have full flexibility at runtime to do something, you know,
15:22
different and better potentially. So on the runtime library site, and the runtime library is called libITM in case of GCC, this is the thing that then actually forces the atomicity at runtime. And libITM currently contains different software only implementations
15:40
called STM. So these are TM implementations that do not need any special hardware. And the default here is one that does write through essentially to memory with undo logging and uses a set of internal logs to protect concurrent accesses to the same piece of memory between different transactions.
16:02
So the library will manage a couple of logs, do the automatic mapping from the memory to the logs and then run a kind of little bit evolved algorithm to actually make this sick. So this uses the un-instrumented code path. We currently in libITM also make use of hardware TM implementations.
16:25
So HTM. And one thing that you might not be aware of is that all the HTM implementations out there, with the exception of a kind of special mode on the PowerPC stuff, is that they are all best effort.
16:41
In the sense that they do not guarantee you that they can run all transactions. For example if your transaction, you try to access more memory in a hardware transaction then there is hardware capacity to track your accesses, then it will simply fail and that's all you know, can't do. So you always need to have a fallback for your hardware transactions.
17:02
And that means that for example you need to fall back to logging, you need to fall back to an STM and so on. So the HTM by itself is not really a solve-it-all thing. And libITM currently uses HTM but the global lock is fallback. That's very simple but it's also a very non-scalable fallback obviously.
17:26
The good thing however is that it allows us to use the un-instrumented code path for the hardware transactions. So that the hardware transactions really don't have any overhead because of instrumentation.
17:41
We currently don't have any hybrid STM HTM implemented yet in libITM but this is something that we'll hopefully have at some point. The idea there is that we have hardware transactions that are very fast and then we can fall back to something that is actually still scalable. So, two more slides about performance.
18:08
And performance is a difficult topic for TM because TM has seen quite a bit of hype. And so let me first note that like everything else it's a tool, it's not magic, right?
18:22
And the performance goal of TM or the performance goal that every same person implementing TM would give you is that it tries to provide a useful balance between ease of use and performance. It does not claim to provide the most useful balance or the only useful balance, right?
18:43
But it tries to make concurrency simpler for programmers and still deliver decent performance. The goal is always to make it simpler, you know, except by considering some corner cases and then we see how good we can get. And currently it looks like performance can be decent.
19:05
And the other point is that right now I don't think it's meaningful to try to draw conclusions from TM performance. And this is also one reason why I don't give you any performance plots as such here because it would mostly be misleading.
19:20
And I'll tell you why. So first of all, the implementations are work in progress. So libITM is a work in progress, has not seen a lot of tuning. For example, the HTMs are first generation implementations or the hardware implementations. I would claim that they are also work in progress. For example, they lack features that would be very useful for hybrid software or hardware TM implementations.
19:44
And second, the performance heavily relies on a lot of factors, many more than might be obvious. So for example, it relies on the hardware, obviously. It relies on the compiler, what kind of codes it generates, the actual TM algorithm used.
20:01
The HTM implementation, for example, you know, how many memory accesses can it track for transactions. It uses what kind of cache or dedicated buffer it uses to actually track accesses. Which other instructions can it handle, right? There was a Spark hardware TM implementation that was so restricted that you couldn't do function calls.
20:24
That's not useful for general purpose TM, right? It's useful if you use it in a very, very tiny controlled way, but it's not useful for, you know, the high level TM abstractions. The allocator matters because the allocator decides, you know, how you end up moving your data around
20:42
and where your data gets located in memory and there's false sharing issues and all that stuff. Whether you use link time optimizations or not does matter because it affects the overhead. And these are just all the factors that the implementation actually has control over. Only the application or the programmer has control over the actual transaction conflict probability.
21:05
How long the transactions are, what the load-store ratio is, whether there are many read-only transactions and so on. Memory access pattern, data layout, allocation schemes, again. Other code executed as transactions and so on. So there's a lot of stuff that's going on.
21:22
And the third point I want to make is that I mentioned that TM is still fairly new. That means, because it's a program instruction that people actually have to use, right, actively, that we are still in a kind of chicken and egg situation. So I'd love to tune libITM for real-world workloads, but we currently have very few of them, right?
21:44
So any tuning will be guesswork. Nonetheless, let me try to give you a few rough estimates that hopefully still hold in the future for TM performance. TM performance can be characterized by both what's typically called a single-thread performance, so the single-thread overhead,
22:04
and the multiple-thread performance, so scalability essentially. Regarding the former, we can say that STM is slower than sequential code. You have to synchronize so there will be more stuff, right? On the other hand, how much overhead there will be can vary a lot, from doing just the one critical section to, you know,
22:25
doing something weird with every memory access. STM will probably be slower or equal to coarse-locking. This is because we synchronize so we at least need to have some kind of atomic instruction, at least if we do a change or something like that.
22:41
So that's like a lock. HTM will be, or is, about as fast as uncontended critical sections, or even faster, because you don't need to change memory at all. And depends on how it's implemented and so on, but HTM can be very fast, provided the hardware can actually run your transactions, right?
23:04
If you do a syscall in your transaction, it's not a good probability that, you know, we will actually be able to do it. Depends on the HTM again, but this is just an example. Regarding scalability, we can say that most STM algorithms out there typically scale very well, right?
23:23
You say you have a shared data structure, you move elements, there's no real logical conflicts in your transaction, it scales well. But the less likely, the lower the single-thread overhead is, right? Because if you want to have a lot of scalability, you usually do more,
23:43
you can just go coarse-grained lock or something like that. HTM also scales well, but again, the fallback is a problem. So if your fallback doesn't scale well, and you have to use the fallback often, bad. And so this is why I said that hybrid STM HTM implementations
24:02
will hopefully help you to get the HTM performance out most of the time, but still have a fallback that does not suck in terms of performance. And one point that is worth noting is that the TM runtime libraries adapt at runtime. They currently do this to a little extent, but there's a lot more that they can do.
24:22
This means that of all the possibilities here, there's a good hope that automatically the implementation will pick the right thing to do, right? Which would be very hard to do for programmers. Nobody wants to maintain, and it's, you know, in a general library, maintain three different or five different synchronization mechanisms,
24:42
just to be able to be high performance when it's used in different ways. And if you're still worried about that I didn't give you actual performance data, there's a very easy way to get it. You just need to use it, right? So grab GCC 4.7 or more recent, use dash afknew dash TM,
25:05
and then measure performance in your applications, right? Read the C++ video specification that's out there, report about your findings, blog about it. If you think there's a bug or an inefficiency in GCC's implementation, report a bug, please.
25:22
If you have feedback for the constructs as specified, then it might be a good idea to get involved with the C++ TM specification, so this is study group five. And if you really are more interested in the code, please dive into, just dive into the implementation, see what's there.
25:41
The comments in the libITM code are fairly extensive. So even though this is concurrent code, it should be relatively easy to follow. And there are many interesting things to work on, for example, improving the autotuning and so on. And with that, I'll let Nuno discuss something completely different.
26:05
All right. And so as you can see now, we're going for a different world, but we'll still be addressing the same problem. And until this point we were looking at a single machine where we already understood that we need to make sure that these concurrent accesses to data are regulated.
26:24
But we typically like to build distributed applications, and there we have many, many of these machines. And the point here is that it would be very interesting if we could explore these independent memories, but look at them as a single shared abstraction.
26:42
And so what I will show you basically is how to do this. And we will use the same abstraction of transactional memory, but in a different environment. And so the approach that we'll refer to is basically distributed transactional memory.
27:01
And similarly to what we saw before is that we are bringing transactions to the top of the software stack. So they are no longer buried down deep in your data store engine, but actually they are up there close to the application. And so this allows for dynamic transactions, meaning you do not have to specify a priori how long they are,
27:22
which positions they will access. And this will be straight embedded into the application logic, which is also favorable for long-lived transactions. But a bit different from what we saw before from Torvald is that now we have to worry about these three things down below. So namely persistence, because people in these recent applications enjoy this a lot.
27:41
And the distribution itself, because this raises other concerns in terms of, for instance, the costs of communication, and of course fault tolerance, because this is the main driver of why you're building distributed application in most of the cases in first place. And so just to give you a quick list of concepts,
28:00
imagine you have this set of data up here, and you have your distributed system. What we have until this point from Torvald is that we have a single machine. Now we basically just replicate this data, because this is not fault tolerant. And so this is what we call a fully replicated system. However, if you're going to change a bit of your data,
28:23
you have to communicate this to everyone. And if you're going to scale to a lot of machines, this starts to become highly costly. And so basically what you do is that you only replicate a given data item in a subset of the machines, which is partial replication. And so considering this,
28:42
I'm going to just try to briefly convince you of why you should use strong consistency in these scenarios. And for that, let's consider what has been quite popular until some years ago, and started perhaps 15 years ago, which is eventual consistency. So if you have this client up here,
29:01
which says to the server, I want to change this data item. It's no longer green, it's yellow. And then after a while, it says, I want to read that data item again, and it gets forwarded to a different machine. And actually it sees what was there before, and not the most up-to-date thing. And this is a reflex of what we typically say in popular saying,
29:25
that eventual consistency is no consistency at all, because it's not formally defined. You have to dig into your system and understand what it's providing to you to then realize what you can do useful with it. And this varies from system to system,
29:40
and it's terrible for programmers. And so in this case, the problem is that the replication was still ongoing. And so this is a very similar problem to what you might have witnessed on Amazon, for instance, where you say you want to buy something, and after a while it's no longer in your buying list, and then it pops up again. Or even in your Facebook posts, you write something, and after a while it's not there,
30:02
and other people can see it. And so this has convinced a lot of people, and in most recent years we are now getting a lot of popular systems which actually give you the abstraction of a transaction. But you might still have transactions which are stronger than others. And one example is that you can use snapshot isolation in transactions.
30:23
That is still not perfect, and let me present you an idea of why that is the case. So you have here a game where you have two players, and they can never be in two contiguous slots. So what they do is that they look around, they see what's there, and they choose the next move such that they will not collide with the player.
30:45
And that is what they both do. Then they concurrently move, and they move next to each other, breaking the rules. Basically what happens is that the positions that they wrote did not intersect, and snapshot isolation will only abort a transaction if it writes to something that was written concurrently.
31:03
And this is what is known as a write-skew anomaly. But notice that if they had moved one at a time, as we presented to you earlier, with a notion of sequentiality that comes from the atomicity, this would not have happened. And that's what you get with serializable transactions,
31:21
is that every execution that is allowed is equivalent to sequential one. And so that's very easy for programmers to reason about. And basically that's the approach that we try to take, is that we embrace serializable transactions which might be costly, but for overcoming that we try to explore several things
31:42
such as partial replication that I showed you earlier. And we can build scalable protocols on top of that. And you might have heard of recent proposals such as Google Spanner, which are trying to move into this direction. And so what we are doing here is not trying to solve the hunger in the world.
32:02
We are aiming for a simple use case, but hopefully the common one. And for that we try to give a framework which is easy to bootstrap, has enough, fast enough scalability and performance for most use cases, and has a lot of details which are hidden from the programmer.
32:21
Of course, this will not give you the ultimate performance if you really need to have something critical. And the approach that we take starts from a very key point, which is that you specify the domain of your application in a domain-specific language, and all of this in an object-oriented way. And through this we will hide from you concurrency control,
32:44
persistence, and data placement in your distributed system. For some things we might still need your help as a developer, and for that we give you APIs in an object-oriented manner, for which you can do distributed execution of code and ensure data locality.
33:01
And then we still have some things only for expert programmers if they want to go there. And so for the rest I will use this guiding example, where we have phonebooks which have contacts, and the contacts may be shared among phonebooks. And this typically maps to some Java classes or whatever your preferred language, and then they will reify these relationships over there via some collections or references,
33:30
which have to be applied a bit directionally and so forth. And if you consider a typical approach in the industry, which is to use, for instance, hibernates, you will basically write some codes resembling this.
33:42
And so you have to put these two collections on the two sides, which you have to manage every time you change one side, you have to remember to go on the other one and update it accordingly. You have to worry about these annotations and manage the unique IDs of the classes and so forth.
34:01
And actually this approach we believe has a lot of good things underlying it, and so we will be quite close to this. That's why I'm showing this example. But we'll try to clear up a bit the view and focus on what's actually interesting. And so the DSL that I talked to you about, we call it domain modeling language.
34:21
And for the example that I provide to you, basically you only have to specify the classes that you have in your domain. In this case it's very simple. You have a phonebook, a contact, their attributes, and then we say that they are related, and they are related in this many-to-many fashion. But this is written in separate file, domain file, in a Java-ish syntax,
34:44
which should be easy for most to understand. And this is all of it. You've already understood and then you just saw it for the first time. And so as I told you, it's a DSL, and basically the framework that we have will then use this DSL to generate code.
35:00
And it will generate code with a well-known interface. And this will basically boil down to mapping the entities to classes, with getters and setters hiding all those things that I talked to you about, and then making the relations automatically managed in a bidirectional manner. And so the framework will start from your DML over there,
35:23
so the application has the domain written by the programmer. Then some code generator will be triggered, generates the classes, and they are placed in the application such that the programmer can extend these domain classes with the behavior, because he only tells us, I have a book with this structure.
35:43
He does not tell the behavior. That's his job to do. The programmer still needs to do something, right? And then in the end, these interact with the rest of the application. And so an example for the classes that I presented to you are basically these, is that you have the common getters, setters for all the things, and all the concurrency and persistence distribution are hidden in there.
36:05
So the interface will always be the same, regardless of the actual code implementation. Pretty much a bit like what we just saw for the GCC. And so then what you can actually do is to choose different transactional managers and backends for persistence.
36:22
And you do not really have to change the application. You only have to specify this in a configuration file. And for the rest of the talk, I will use the guiding example of using InfiniSpan, which is a data grid open source from Red Hat. And so what we actually do in this particular example is that we have to map your domain to this persistence,
36:40
and InfiniSpan is a key value store, distributed one. And so we will somehow do this mapping, such that you don't have to worry about it. And then you run your transactions in your application, we will map these two transactions in InfiniSpan, and we will ensure the serializability that you wish. But what matters is that programmers are not aware of this part.
37:01
They might even be able to see it if they want, but they don't have to, and that's what's important. And this is regardless of the backend that you're choosing, whether it's InfiniSpan or any other. And so if you were still using the other process that I told you about, like hibernates, for instance,
37:21
I told you we wanted to clear up a bit the landscape. And so this is what an application would look like using the domain that I specified. If you go with DML, you can clear up this, and in the end, you only have the atomics here, which are related to what we saw in the first part of the talk.
37:40
But nothing else is related to concurrency or persistence. It's completely hidden, and even distribution. And so if you want to run an application which resembles what I just described, basically you have to specify only a properties file, and then you have to point to the configuration files which are backend dependent.
38:01
In this case, I'm using this example. In this example, we need to specify some jgroups file for communication distributed system. Just use a prebuilt one. We have to configure the InfiniSpan. Well, it's rather easy. We just say we want strongest consistency. If performance sucks, then we might want to delve a bit into that.
38:20
But most cases, this should be enough and does not bother the programmer with anomalies, and you can write it in a safe way. So what you can see here is basically that we are asking for serializability. We use an optimistic concurrency control mechanism. It's distributed using partial replication and uses a multiversion algorithm. All of this is there for free
38:41
and independent of the framework. But, okay, you can also configure it in other ways, for instance. You might want repeatable read because you want to shoot yourself in the field. But in the end, it's very easy, so you just build your application with a simple Maven command specifying the code generator. You might use another code generator
39:01
besides InfiniSpan or others that are available. In this case, I put the example of hibernate.gm. And then you just run it with yet another simple command. And dealing with all the boilerplate of the database distribution and so forth, it's hidden from you. And so for quite a long time,
39:22
so this is a website of my university. And for, I think, over ten years, this has been used there. And this is serving around 1,500 people. And what is actually cool about it is that it started there. It was made open source. So they built a load of stuff on top of this framework,
39:40
which matters to the university itself in terms of issuing diplomas, registering for classes, you know, all that stuff. And actually this was made, it started to become used in other places as well. And so it's running in many universities in the end. And actually one cool thing about it is that at some point they created this map of the object domains,
40:04
and basically what you're seeing is the instances of the domain model of the university, colored by the part of the university they belong to, whether it's students, professors, and so on. And this system, which is about over one million lines of code,
40:21
if I'm correct, with bazillions of functionalities for universities, it's running in several places now. That's the cool part about it. And it's using the approach that I just described to you, basically. And so there are plenty of advanced features that I could talk about, but I'll just briefly go over a couple of them.
40:42
And in this case I'll just talk about index relations, which is if you have this design that I told you about, and you now want to disable a contact from your phone book given in email, you have to write code somewhat like this, which basically you need to go over all your contacts, and then you say, okay, this is the contact I want to disable.
41:03
But this is crappy if you have a lot of contacts. And so what happens here is that you can simply say, oh, I want to index my contacts by a given attribute. And then we'll give you a very efficient index that will solve the problem, and you can just say, okay, I'll get my contacts,
41:22
and this should be efficient, independent of the backend. So of course you get these features there, I was just trying to simplify it in the beginning. Another cool thing is about data location. So we don't let the programmer know where things are. And this is important because you might drop servers at runtime, bring new ones to have provisioning,
41:40
so we cannot let the programmer know exactly where things are. But he can try to play a bit with it. And here what he can say is that he wants to point out that this contact should be placed using this attribute. What this allows him to do is that if he wants to run some code,
42:01
he can create a task to run in a distributed system, which will use this contact, and then will use the locality of the contact to be placed. And what this means is that this code, this distributed task will run in a machine where the contact is guaranteed to exist. And so this allows you to have co-location of data and code,
42:20
which is important in distributed systems to avoid issuing remote operations to other servers. Now the cool thing is that you can co-locate data, and that's simply using the same technique where you can place these two contacts in the same machine by using the hints. Of course, ultimately the framework has the choice of not doing that
42:41
because you might just put everything in the same machine, and that defies the scalability of the OSH service. And so in summary, we know that distributed applications are hard to develop, and in particular if the programmers are exposed to a lot of low-level mechanisms that they have to tune and configure.
43:00
And so our take here is that the APIs should hide the complexity every time it's possible, but still allow expressiveness whenever it's needed by the programmer. And so that's exactly what we tried to do here. Some references. So this was done in the scope of a European project. The framework I described to you is open source here,
43:21
is the corresponding documentation, and the university stuff that I told you about, over a million lines of code used in several universities, is there with a lot of documentation and nice manuals. And so if you want to get involved in this, there is a research network called EuroTM, sponsored by European Union.
43:41
And it's basically bringing industry and academia together. It has lots of countries belonging to it. And you can participate in, namely, workshops, which are coming soon in April in Amsterdam. The EuroTM can fund your participation in these workshops. Also training school, which will take place in La Plaine in France also soon,
44:01
where we'll have speakers from Intel, Red Hat, and even Oracle as well, even other industry members. And you can also collaborate with other institutions and just check the website for more information. And that's it. Thank you very much.
44:25
Questions? If you've got any questions, just raise your hand so we can run up to you and give you the microphone.
44:43
I have a question regarding the cloud part. Can you speak up please? Or everyone please be quiet so we can understand the question. I have a question regarding the cloud part. We have seen that why consistency is needed.
45:00
So what I don't understand, what is the proposed solution? Do I introduce one central management station, let's say, that is going to handle this? Or how I do it in the distributed manner? Okay, so the question is basically whether the proposed solution is using centralized components to deal with the distributed data management.
45:24
And the answer is no. But that might depend on the backend that you choose. So we have backends which manage data in a completely decentralized way, which is favorable for scalability. And we have used this with over 100 machines.
45:40
But if you choose a backend, which for instance we have support from MySQL and you can run it with a centralized database, and in that case you're asking for it and you get it. Centralized components. So it's orthogonal and independent. Any other questions? Over there, same place.
46:06
Hi, I was just wondering what kind of latency you see on the transaction management when it's introduced like this? Okay, so once again that might depend on the backend that you're using. So the question was what kind of latencies we noticed with this service, right?
46:24
Yeah, do you know what kind of factors are involved? Sure, so one of the main problems is the data placement. So by providing the programmer with this simple abstraction of a domain language where they can specify the domain, and not actually asking the programmer to place the data,
46:42
you might have a transaction which is spanning a lot of different machines. Kind of if you have a MySQL sharded, and now you're talking with a lot of shards. But hopefully this can be fixed, and actually there is an internal part to the framework which notices and tracks accesses and moves data to be co-located
47:01
to make sure that if you access X and Y together a lot of the time, then X and Y will be together. There are several efforts regarding implementation. There are several efforts regarding implementation of transaction memory as far as I understand. So it's a standard, but there are several efforts from GCC
47:25
and from Microsoft I assume also. Is there any effort to synchronize those? So one of the efforts is definitely the Study Group 5 and ISO C++. So that is standardizing the language constructs that you have.
47:43
So the interface that the programmer sees. And there are people involved from Oracle, Redhead, Microsoft currently not, IBM is involved, HP. So quite a set of a large group of people, and academia as well. On the ABI level, Intel and Redhead have been standardizing
48:03
or have a document out there that specifies the ABI that's used by the Intel compiler and GCC. Does it answer your question? You mean the language for an event or?
48:24
The transactional memory part. The STM is implemented in C++, yes. Visual C++ compiler. Is it to be implemented in Visual C++ compiler? Visual C++, no, I don't know about the Microsoft, sorry. Okay, one last question.
48:43
So in case of distributed backend, could you give an example, how is it solved, this problem that you demonstrated with eventual consistency? Okay, so the main problem is that there is this very known problem in distributed systems called consensus,
49:04
which is that you have different machines that need to arrive to the same conclusion. So when you have distributed transactions, you basically have to solve consensus. And basically what happens is that if you want serialized ability, you always have to solve consensus, and if your set of machines is large,
49:22
you'll be paying a high cost every time. So the main trick there is to make sure that every transaction only touches little number of machines. And that's why splitting up the data in a smart way paves a long way to make this efficient. So if I had to say the main key point that makes this possible is what I presented earlier as partial replication.
49:44
So dividing and making sure that you only replicate your data in a subset of the machines, and that you do it in a smart way, such that you are resolving consensus between a small number of machines. Let's thank our speakers again.