Parallel programming in Ruby3 with Guild
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 66 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46604 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Ruby Conference 201832 / 66
5
10
13
14
17
18
21
22
26
29
37
45
46
48
50
51
53
54
55
59
60
61
63
65
00:00
Computer programmingVideoconferencingPresentation of a groupDreizehnProgrammer (hardware)Faculty (division)Core dumpMereologyThread (computing)Musical ensembleFlow separationLink (knot theory)Parallel computingMultiplication signTwitterProgrammer (hardware)Arithmetic progressionData miningProcess (computing)SpeicherbereinigungOcean currentComputer programmingVirtual machinePresentation of a groupJSONXMLComputer animation
01:43
Office suite10 (number)Block (periodic table)Parameter (computer programming)Semiconductor memoryObject (grammar)Military operationAlgorithmString (computer science)Read-only memoryElement (mathematics)Hash functionHacker (term)BenchmarkComputer programmingAbstractionParallel computingImplementationThread (computing)MultiplicationBefehlsprozessorCondition numberConcurrency (computer science)FiberFiber (mathematics)Digital photographyPresentation of a groupFlow separationElectric generatorAbstractionMemory managementCartesian coordinate systemObject (grammar)CAN busParallel computingThread (computing)ResultantComputer programmingGoodness of fitCodeSemiconductor memoryElement (mathematics)Resource allocationHash functionAreaUniform resource locatorRevision controlLimit (category theory)String (computer science)Hacker (term)Latent heatTask (computing)NumberOcean currentFreewareMechanism designCombinational logicFunctional (mathematics)Multiplication signCodeMathematical optimizationBlock (periodic table)ImplementationJust-in-Time-CompilerProduct (business)Shared memoryInterpreter (computing)SpeicherbereinigungCore dumpNeuroinformatikForcing (mathematics)System callMessage passingBenchmarkProgrammer (hardware)Ultraviolet photoelectron spectroscopyComputer animation
11:20
Thread (computing)Object (grammar)TelecommunicationComputer programParallel computingShared memoryLocal ringString (computer science)WritingInvariant (mathematics)Symbol tableFloating pointSocial classModule (mathematics)Operations researchInstance (computer science)Variable (mathematics)Communications protocolImplementationLogical constantData structureNormal (geometry)Hash functionSynchronizationDatabase transactionRead-only memorySoftwareRun time (program lifecycle phase)Error messageObject (grammar)Social classModule (mathematics)Communications protocolNormal (geometry)Hash functionSoftwareDatabase transactionInstance (computer science)Variable (mathematics)Computer programmingFormal languagePresentation of a groupType theoryData structureComputer configurationLibrary (computing)Scanning tunneling microscopeOverhead (computing)String (computer science)AdditionThread (computing)Arithmetic meanShared memoryContent (media)Parallel computingExpressionLine (geometry)Multiplication signLogical constantState of matterMultiplicationSymbol tablePoint (geometry)Numeral (linguistics)FreezingWeb pageBlock (periodic table)Revision controlOcean currentSemiconductor memoryLevel of measurementLocal ringAlgebraic closureComputer animation
20:34
Block (periodic table)Error messageRun time (program lifecycle phase)Process (computing)Streaming mediaGastropod shellScripting languageErlang distributionData modelSemantics (computer science)Address spaceObject (grammar)TelecommunicationInheritance (object-oriented programming)Fibonacci numberPerspective (visual)Personal digital assistantThread (computing)Operations researchString (computer science)Normal (geometry)Element (mathematics)Type theorySynchronizationCommunications protocolParallel computingComputer programmingRepresentation (politics)Control flowTelecommunicationSimilarity (geometry)Endliche ModelltheorieState of matterLimit (category theory)Computer programmingShared memoryErlang distributionProcess (computing)Object (grammar)Formal languageMultiplicationLine (geometry)Thread (computing)Variable (mathematics)Scripting languageSemantics (computer science)String (computer science)Game controllerBlock (periodic table)Arithmetic meanSequenceInformation retrievalRun time (program lifecycle phase)Pairwise comparisonComputer configurationCalculationResultantPoint (geometry)Network socketoutputServer (computing)CASE <Informatik>Address spaceGastropod shellInformationMultiplication signError messageClient (computing)SubsetProgrammer (hardware)Goodness of fit
29:48
Formal languageConcurrency (computer science)Process (computing)Erlang distributionComputer virusMessage passingDefault (computer science)Convex hullData modelNetwork topologyImplementationContext awarenessFiberThread (computing)Computer programmingMultiplicationSpeicherbereinigungObject (grammar)Communications protocolTransport Layer SecurityMathematical optimizationFiber (mathematics)Normal (geometry)Semantics (computer science)Military operationCodeBefehlsprozessorVirtual realityRevision controlTask (computing)Software frameworkSerial portFibonacci numberRecursionExecution unitComputer fileMaxima and minimaResource allocationElectric currentPattern matchingCASE <Informatik>Special functionsLine (geometry)Virtual machineNumberFibonacci numberCodeComputer programmingOcean currentMultiplication signImplementationMessage passingCore dumpProgramming languageNeuroinformatikWorkloadMathematical optimizationProcess (computing)MultiplicationResource allocationLatent heatSpacetimeEndliche ModelltheorieSynchronizationShared memoryCommunications protocolSoftware bugThread (computing)Variable (mathematics)Formal languageEvent horizonSoftware frameworkObject (grammar)SpeicherbereinigungBefehlsprozessorTask (computing)WordOverhead (computing)Revision controlSimilarity (geometry)CountingCartesian coordinate systemResultantRun time (program lifecycle phase)Context awarenessSequenceSelectivity (electronic)2 (number)HypercubeMaxima and minimaSerial portProjective planeLocal ringVirtualizationComputer animation
39:03
Coma BerenicesXMLComputer animation
Transcript: English(auto-generated)
00:18
This is time to talk about the prior programming in Ruby 3 with Guild. As you can see, my
00:28
English is not so good so you can download this presentation and I uploaded this link to the Twitter, so please check this Twitter. Today I want to talk about these items. So
00:46
at first I wanted to introduce Ruby 2.6 performance improvement, not about the instruction sheet but about some contribution of mine. And I introduced the main topic, Guild. I proposed
01:03
Guild idea at RubyConf 2016 and I want to share with you about the current Guild progress. So I'm a programmer. I changed my job several times but I didn't change my mission to improve
01:27
the performance of Ruby virtual machine. Just now I'm a member of cook pot, so please stop by our cook pot booth in this Ruby conference. And also I'm a father of the youngest
01:46
attendee of ladies Guild Tokyo. Thank you. Very cute. This is only one photo in this
02:02
presentation. Okay. So on Ruby 2.6, there are several performance improvements. Maybe most biggest one is for the Ruby, but there are other optimizations which will affect
02:25
your application. For example, pro call is 1.4 times faster, and calling pass block
02:41
is 2.6 times faster than older Ruby versions. And the biggest contribution ‑‑ thank you. This is not so big feature. I show you the more bigger feature about the transient heap. Transient heap is new memory management mechanism. I don't show details about Ruby's
03:05
memory management, but Ruby uses malloc and free function called combination to allocate and free memories. But malloc and free combination has several performance issues
03:22
about speed and the space because of the fragmentations. To solve these malloc issues, I introduce transient heap. I also don't show details about transient heap because
03:42
I have only 40 minutes. And this talk is not about the transient heap. I only want to share the concept of transient heap. Transient heap is using ‑‑ using copy
04:01
and generational garbage collection techniques. Generally speaking, I can't use memory moving technique because of the ‑‑ we use conservative garbage collection. But I introduce this moving technique with some limitation and MRI‑specific hacking. To keep compatibility
04:27
with the current existing code. With transient heap, we can speed up allocation for young memories. In this case, young memories means no long living memories. So if you allocate
04:43
some memories and free immediately, it will be a young memory. Now we support transient heap for array, object, struct and small number of ‑‑ small elements of hash objects.
05:01
String is big target of transient heap, but it is too difficult to support it. So it is the future task. This just shows how fast array creation and GC collection using transient heap compare with no transient heap version. So you can see ‑‑ so
05:30
how to speed up compare with no transient heap. You can see no performance improvement with zero to three elements. This is because array is only a zero to three element.
05:45
We use another optimized technique. So we don't use transient heap on this area. But we can see the performance improvement for the four and greater elements. So you can
06:02
see 50% faster. And also we can see the hash allocation and the allocation with maybe it is 50% or 100% performance improvement for the small element of the hash object.
06:23
Over 80 elements, there are no speed ups because transient heap is not used for such a big hash object. Let's summarize transient heap on Ruby 2.6. So transient heap is a new memory hacking to improve the performance. It is a generational technique with MRI‑specific
06:45
hacking to keep compatibility with the current code. With transient heap, you can improve your application's performance. You can. So I don't say your application should improve the performance, but I want to say you can. Your application can. So micro benchmarks
07:06
shows good performance result. But I can't see no performance improvement on this benchmark. So maybe it depends on the applications. So please try. So you can try now. Okay.
07:22
So this talk is not about transient heap but about guild. So back to the main topic. This is a short summary of this talk. Guild is new concurrent abstraction to force no sharing mutable objects for Ruby 3. Now guild specification is not fixed yet. Also
07:47
guild implementation is not finished. We have only a buggy implementation. So your comment or your contribution is highly welcome. If you have any interest about concurrent
08:03
programming with guild or something like that, I'm happy. This is one purpose of this talk. Let's talk about the guild background. There are two motivations for guilds. One
08:23
is productivity. So yesterday productivity is the biggest concern. I think so. So my opinion is thread is very, very difficult to make a thread-safe program. There are
08:43
many reasons to make the difficulty of thread programmers. But one biggest problem is I think is sharing mutable object between threads. It makes very difficult to make correct thread-safe concurrent programs. Ruby is very productive language, I think.
09:08
And maybe you think. And I believe we can, so it is happy if we can make concurrent
09:21
program very easily. And I want to achieve this productivity. The second one is performance by parallel computing. Now your computer has many, many cores. So you need to utilize such a core. But current MRI cannot provide a way to utilize such many cores. So I want
09:52
to introduce such a, the convenient way to utilize the multiple cores. To achieve these goals, I propose guild, a new concurrent abstraction at RubyConf 2016, so two years
10:09
ago. This idea is very simple. As I said, the difficulty of thread programming is sharing mutable objects. So, sorry, the difficulty of thread programming is sharing mutable objects
10:27
between threads. So guild prohibit, so very idea is simple. So guild prohibit to share mutable objects between guilds. So I want to replace threads into guild because of
10:40
this productivity. Next is design. I'll share the current guild design and I want to share the current discussion topics. Ruby interpreter can manage multiple guilds. A guild has at least one thread. And when we run a Ruby interpreter, there is one guild
11:03
and which runs one thread. Threading one guild cannot run in parallel, but because there are giant blocks in each guild. However, threads belong to different guilds
11:21
can run in parallel. So we can, so if we make multiple guilds, then we can run parallel programming in Ruby. This is a simple example to make two guilds. This is very similar,
11:44
I think this is very similar to the threads programming on Ruby. You can pass block to guild method and each block run in parallel on each guild. So in this case, expression one and expression two run in parallel. Guild prohibit sharing mutable objects. So
12:07
normal string, array, hash and blah, blah, blah. So many Ruby objects are mutable. So
12:20
there are several objects we can share between guilds. We say this kind of object as shareable objects. We define shareable objects as non-shareable objects. Separating this kind of, two kind
12:41
of objects, we can enjoy ordinal programming using mutable states without thread safety concerns in one guild, because we can't share mutable objects between guilds. So in other words, you can't make thread unsafe program on guild. So thread unsafe or data
13:09
you can't write. Generally, most of the objects are thread local for ordinal concurrent programs.
13:21
So only a few objects are shared. So we only need to concentrate on such a sharing object to make concurrent programs, I think. So this page is about non-shareable objects. So non-shareable object is equal to the most of objects, most of the objects. So Ruby programs
13:49
make many, many strings, array, hash and so on. And they are mutable and non-shareable objects. A non-shareable object is a member of one guild. And other guilds can't access
14:02
to the non-shareable objects easily. And if you use only one guild, so it means it is very compatible with Ruby 2, the current version of Ruby. So you can make a compatible
14:24
program with Ruby 2 and Ruby 3 very easily. We define four types of shareable objects. So in other words, so other than these four types of objects are non-shareable objects.
14:42
So we define four special four types of shareable objects. Immutable objects, class module objects, special mutable objects and isolated block objects. Important assumption is shareable objects only refer to shareable objects. So if shareable objects refer to
15:02
point to a non-shareable object, we can share the non-shareable object between guilds. So to prohibit such a danger, we need to keep this assumption. The first shareable object is immutable object. Sharing immutable object is no problem, no data lazy issue
15:27
because we can't mutate this kind of objects. I think it is easy to understand. But one difficulty of immutable object is it is not equal to the frozen objects. So for
15:45
example, this A1, this array is immutable object because this array is frozen and also this array only point to the immutable object. But this array, the array A2 is not
16:07
immutable. So array is frozen. But the array refer to the mutable object. So created by object.new. So we need to care about that. So maybe we need to introduce some
16:22
deep free syntax or method. And also numeric objects or symbol objects or some literal objects are immutable so you can share them easily. And also frozen string
16:41
objects is immutable if they don't have instance variables. Class and module objects are also shareable objects. So it is very difficult to understand. But and we need to introduce some more consideration protocol to share the class
17:07
and module objects. But maybe class and module objects should be shareable objects because all objects refer to own classes and classes can refer to module objects. So that's the sharing class module object is straightforward
17:25
and it is easy to make some kind of programs. However, there is a disadvantage over this idea because we need to introduce new protocol to refer no shareable objects from classes and modules. This is because classes and modules can refer
17:44
mutable object by class variable constant or instance variable class and module objects. So we need to introduce some special protocol to prohibit such sharing mutable objects. I skip this discussion on this presentation
18:05
because there is no time to discuss more. The third shareable object is the special mutable objects. So sometimes we need to share data structure such as share the array or share the hash object or something like
18:24
that. So to share the data structure, we introduce special mutable objects. Special means we need to introduce some special protocol to access these contents of special mutable objects. For example, locking correctly
18:45
or transaction and so on. I don't implement it yet, but it is needed for some kind of programs. For example, closure languages. The closure
19:01
language has a shared mutable data structure using a software transactional memory with an STM. So it is one strong option to implement. Compared with normal objects like array, hash and so on,
19:21
special mutable objects introduce additional overhead because we need to introduce some protocol. So as I said, transaction or something like that. But I think this kind of overhead is not a problem because only, as I said, only a few shareable objects we need to use on
19:44
the concurrent, ordinary concurrent programs. The last one is isolated block. Block objects can refer, so block objects can refer the outer local variables. For example, this block has a local variable A and
20:08
this local variable is up here, so it is the outer local variable and it refers to the mutable object. So it means that the block
20:21
object can't be a shareable object. But sometimes we want to pass the block object to the guild, another guild. So to achieve this one, we introduce a block isolate method. Block isolate
20:45
method make duplicate block names isolated block. Isolated block can't access to outer local variable. So if we make isolated block, then this access to outer local variable is prohibited.
21:05
So this line, it lays a long time error. So we can introduce some kind of shareable object with block isolate. So as I show this small example, we pass the block
21:29
transform to the isolated block implicitly. And in this case, this line, this one, this local variable G1 is outer local
21:44
variable. So in this case, it lays a long time error because block transform to the isolated block. Okay, I showed some kind of shareable objects. This is some information.
22:09
So other languages are using similar ideas. So there are several languages using similar idea of guild. It means some of these languages introduce limitation of sharing
22:22
states or shared nothing model they use. Locket program language, Kotlin native, or shell script or JavaScript, and other extra processes. So the name and the model are different but similar idea they use. So I
22:46
think this approach is not so long, I think. Okay, so we need to prepare the inter-guild communication API. So I
23:10
designed based on actor model at this moment. The
23:23
destination of the, sorry, destination address is described by the guild object itself, like Alan or XR's process. And sending shared objects means sending only reference to the objects. So it is very
23:42
lightweight. And also if we send a non-shared object such as an array, we have two method, we have two move methods. This is a very simple server client program using the inter-guild communication API.
24:11
This line shows how to send, this line, this line shows how to send an object to the guild one. So in this case the numeric 30 object is sending to the guild
24:25
one, and at guild one this line receives object guild, the object the numeric 30, and calculates something and this line return back the result of
24:42
calculation. So this is a very simple example of the server client model with guild. Sending, so we need, I said that we can't share the non-shared object. So if
25:05
we want to send the non-shared object, we need to do something. So the one method is copy. So copy is very easy to understand. So if you want to copy
25:21
the user one, then the object one and child object are copied to the guild two. So the point is we need to copy everything so the one can traverse, traverse from one. The move semantics is
25:44
somewhat difficult. So if we move the object one to other guild, then we can't access one from guild one. So in this case, so if we move one, just after
26:01
the sending, guild two can access to the one and the child object, or two and or three, but from guild one we can't access to the sent object. It is faster than the copy semantics because we don't need to copy everything. Move semantics is suitable
26:23
for huge string or data or IO objects. For example, master guild makes a socket object, IO object, by accept method and move it to a worker guild. And
26:41
the worker guild receives a request and sending request in parallel and after that master guild doesn't need to access to a socket anymore. So this is a summary of shareable and non-shareable objects. So it shows details, but the important
27:04
is we can't share the mutable objects and in a threat program we can share some kind of shared object accidentally and it will be a bug. But with
27:20
guild we can make correct non-data-less concurrent programs. This is a discussion, one discussion topic. So I show the, I said that I design
27:41
based, I design communication API based on actor model, but we have another option. So actor is a guild or processing along with extra languages. But some other languages like Go languages, JavaScript, Kotlin, native or rocket program languages, it
28:03
gives a CSP model. CSP means communication sequential processes. I searched this long name in Wikipedia. So in this case we manipulate channel object explicitly and for example we can transfer the
28:24
channel object to other guilds and so on. So we have two options I think and they have advantages and disadvantages. So we need to compare such pros and cons and we need to decide which we
28:45
need to do that, which we need to use. So in fact I introduced a good idea with CSP model so using channel explicitly but just now I think the actor model is more suitable
29:01
for Ruby but we need to consider more and more. Another difficult topic is how to retrieve multiple channels. So sometimes we need to manipulate multiple channels because for example we want to use data channel and control channel or if we want
29:22
to monitor multiple guilds so we need to communicate with multiple channels. But the API design is very difficult I think. So we can manipulate multiple channels with programming technique with one channel, one communication
29:40
channel. But it is a bit difficult and it is tough for Ruby programmers. So I think if we can provide a good API so it will be nice. For example Go program language has a select
30:02
syntax so we can write or communicate so we can retrieve the multiple channels. So in this case X or create channel. And also we can handle multiple channels with pattern match. And also
30:25
JavaScript worker we need to register a particular handler in the community message channel and also the local place user event handler at the thing special methods, special functions.
30:44
So I give you the time kind of the languages and for example on Ruby, on actor model we can introduce a tag to specify the channel and receive with specific tags like that.
31:03
So I don't explain about this but I want to share that I'm thinking about that just now. So if you have any idea or comments it will be very nice for me. So this is our channel case.
31:25
Okay. So I want to introduce some implementation of Go. Some of the implementation but preliminary implementation you can access to this one.
31:43
But it has many, many bugs so if you run some programs you can see the segmentation fault or something like that. So we need to, we introduce some special context between Vue machine and threads.
32:02
Also we need to introduce some many, many fine grained synchronizations. So it means that we need to do lazy thread safe, sorry, lazy thread programming we need to do. So it means that multi-set programming
32:22
is very, very difficult. So this is why my current implementation has many, many bugs. And also garbage collection is a big issue. So we need to stop, the current implementation we stop all of the guilt and do the garbage collection process.
32:41
So it means that it has only one object space. So we have one preliminary implementation that we need to do more and more. So fix garbage collection bugs and introduce some features.
33:01
So prohibit sharing and non-sharing objects and introduce synchronization to project or VM wide resources such as global variable or something like that. And introduce shareable object protocols. And performance is not so good. So we need to improve the performance.
33:22
I have many idea to improve the performance because I make some trial, for example, this paper shows how to make parallel threads on MRI. So we can, I think we can introduce
33:43
some optimization techniques from this paper to the current guilt implementation, I think. Another topic is naming. So in the Ruby world, naming is important. So the current guilt name is a code name
34:03
and so we have some reason why we choose guilt. But some people say guilt is not so good name so we are considering the name of guilt. So I want to share that guilt is code name
34:23
and we are seeking new good names. So I want to show the demonstration of the current preliminary implementation. So I prepare the 40 virtual CPUs machine
34:45
so it means that 10 cores, two hyper selecting CPU, we have two CPUs. So we prepare two CPUs. So totally it will be 40 virtual CPUs.
35:05
And the workload here, so calculating the Fibonacci number many times. And the third version of computation is this line. So it is very simple, I think. And the good version, I make this kind of master worker
35:26
model. So we can increase the workers on this case. And the program is here. But so it is very long, but we can introduce some framework to do such a commodity example, I think.
35:44
So making this example. So increasing the number of guilt, we can see the speed up compared with the sequential serial execution. So the maximum improvement is maybe 15 or 16 times
36:03
faster on 40 virtual CPUs. Maybe this performance improvement is good, I think, in this case. So next one is to fix the number of guilt to 40
36:20
and change the workload, the number of end. In this case, we cannot see the performance improvement here. So calculating the Fibonacci, we can see the very y-axis
36:41
shows execution. So serial execution requires about three hours. But using guilt, we only need 30 minutes. So it is very faster example, I think. So this is the speed up ratio.
37:01
So if we calculate only a few number of Fibonacci, the overhead of guilt is very high. So there are no performance improvement here. But if we increase the task workload,
37:21
we can see the performance improvement. And the last demonstration is word count example. So we make a similar framework, similar master work model with guilt. And the result is very, very slow on 40 guilt.
37:48
So y-axis is execution time. So higher is bad. And serial execution requires only 1.7 seconds. But with 40 guilt, it requires six seconds.
38:06
It's very bad result. This is because GC object allocation requires naive locking. And the current implementation, so this is only a current implementation limitation, I think. So we need to improve. We can solve this slow down, I think.
38:27
OK. So today's talk is about Ruby 2.6 update by my contribution. And I introduce guilt, the idea of guilt,
38:41
and discussion, and implementation, and demonstration. So there are no time to make a Q&A session. So if you have any questions or any comments, I'm happy to meet you, with you. Thank you so much.