Optimizing GELI Performance

Video in TIB AV-Portal: Optimizing GELI Performance

Formal Metadata

Optimizing GELI Performance
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year
Production Place
Ottawa, Canada

Content Metadata

Subject Area
Features, like encryption, need to have minimal overhead for them to be widely adopted. If the performance is to slow, few people will use it. The first iteration of AES-XTS using AES-NI in FreeBSD was not much faster than the software version of it. The talk will describe why the AES-XTS algorithm was slow and what was done to improve it. It will cover topics from intrinsics, adding them to gcc and advantages of using them over assembly to how to use HWPC that are included in most modern processors to evaluate performance to identify performance bottle necks. Optimizing code first starts with measuring the performance, but it also requires you to understand the parts of the system so that you can decide if increasing performance is possible. It will cover: 1) Cipher modes and their performance impact 2) Processor performance, understanding pipelining, throughput and latency 3) Other SSE instructions used for XTS tweak factor calculations 4) Intrinsics and their use with GCC and CLANG 5) Using pmcstat and kcachegrind for understand performance. 6) Possible future work to increase the performance beyond what it is today.
Purchasing Server (computing) Overhead (computing) Data storage device Bit Mereology Advanced Encryption Standard Causality Videoconferencing Hard disk drive Encryption Software testing Game theory Physical system
Rounding Asynchronous Transfer Mode Scheduling (computing) Functional (mathematics) Multiplication sign Combinational logic Function (mathematics) Shift operator Graph coloring Permutation Roundness (object) Different (Kate Ryan album) Encryption Data structure Task (computing) Shift operator Standard deviation Key (cryptography) Mapping Numbering scheme Bit Pseudozufallszahlen Computer configuration output Encryption Key (cryptography) Quicksort Table (information) Asynchronous Transfer Mode
Asynchronous Transfer Mode Numbering scheme Context awareness Online help Special unitary group Emulation Twitter Revision control Frequency Different (Kate Ryan album) Analogy Encryption Physical law Default (computer science) Cellular automaton Structural load Ciphertext Chaining Friction Personal digital assistant Factory (trading post) Order (biology) Encryption Game theory Block (periodic table) Arithmetic progression Asynchronous Transfer Mode Cloning Wide area network
Point (geometry) Metropolitan area network Asynchronous Transfer Mode Numbering scheme Standard deviation Key (cryptography) Ciphertext Physical law Infinity Cryptography Blockchiffre Decipherment Chaining Computer configuration Factory (trading post) Encryption Website Right angle Encryption Block (periodic table) Alpha (investment) Cloning
Point (geometry) Rounding Slide rule Asynchronous Transfer Mode Implementation Serial port Overhead (computing) Multiplication sign Electronic program guide Function (mathematics) Inverse element Coprocessor Power (physics) 2 (number) Blockchiffre Advanced Encryption Standard Mathematics Roundness (object) Different (Kate Ryan album) Encryption Software testing Diagram Data structure Office suite Drum memory Area Algorithm Structural load Ciphertext Planning Staff (military) Measurement Cache (computing) Process (computing) Encryption Cycle (graph theory) Block (periodic table)
Asynchronous Transfer Mode Multiplication sign Online help Branch (computer science) Mass Code Computer Field (computer science) Element (mathematics) Carry (arithmetic) Divisor Machine code Structural load Parameter (computer programming) Bit Term (mathematics) Exclusive or Type theory Chaining Order (biology) Encryption Potenz <Mathematik> Block (periodic table) Alpha (investment) Sinc function Resultant
Machine code 1 (number) Bit Term (mathematics) Mereology Power (physics) 10 (number) Exclusive or Word Computer configuration Reduction of order Right angle Divisor Office suite Alpha (investment)
Point (geometry) Email Scheduling (computing) Functional (mathematics) Building Group action Computer file Structural load State of matter Decision theory Multiplication sign Complete metric space Open set Mereology Code Coprocessor Power (physics) Advanced Encryption Standard Chain Different (Kate Ryan album) Encryption Cuboid Lie group Matching (graph theory) Electric generator Assembly language Block (periodic table) Structural load Line (geometry) Control flow Software maintenance Arithmetic mean Function (mathematics) Order (biology) Buffer solution Data type Befehlsanordnung
Email Numbering scheme Structural load Multiplication sign Complete metric space Control flow Advanced Encryption Standard Chain Hypermedia Function (mathematics) Order (biology) Touch typing Befehlsanordnung
Point (geometry) Email Abstract state machines Group action Regulärer Ausdruck <Textverarbeitung> System call Structural load State of matter Strut 1 (number) Compiler Streaming media Mereology Bit Code Computer Power (physics) Roundness (object) Encryption Queue (abstract data type) Data structure Compilation album Data type Raw image format Mapping Structural load ACID Line (geometry) Compiler Word Arithmetic mean Personal digital assistant Function (mathematics) Order (biology) Video game Ranking Data type Spacetime Flag
Point (geometry) Rounding Trail Group action Computer file INTEGRAL Mereology Code Attribute grammar Advanced Encryption Standard Broadcasting (networking) Roundness (object) Natural number Kernel (computing) Single-precision floating-point format Encryption Flag Software testing Data structure Information security Compilation album Exception handling Area Rule of inference Email Standard deviation Structural load Projective plane Floating point Interior (topology) Physicalism Maxima and minima Bit Cartesian coordinate system Limit (category theory) Demoscene Compiler Code Arithmetic mean Kernel (computing) Fluid statics Website Quicksort Figurate number Resultant Directed graph Flag
Numbering scheme Building System call Identifiability Overhead (computing) Service (economics) Computer file INTEGRAL Confidence interval Multiplication sign Execution unit File format Virtual machine Set (mathematics) Branch (computer science) Function (mathematics) Mereology Programmer (hardware) Mathematics Network topology Different (Kate Ryan album) Computer configuration Average Software testing Utility software Distribution (mathematics) Standard deviation Physical law Sampling (statistics) Staff (military) Limit (category theory) Performance appraisal Personal digital assistant Function (mathematics) Network topology Software testing Bounded variation Resultant
Thread (computing) Graph (mathematics) Multiplication sign Mereology Programmer (hardware) Geometry Mathematics Radio-frequency identification Semiconductor memory Befehlsprozessor Encryption Cuboid Flag Software framework Physical system Area Block (periodic table) Maxima and minima Bit Thread (computing) Message passing Process (computing) Oval Order (biology) MiniDisc Website Software testing Personal area network Functional (mathematics) Numbering scheme Connectivity (graph theory) Virtual machine Device driver Web browser Revision control Advanced Encryption Standard Latent heat Software testing Data structure Associative property International Date Line Multiplication Matching (graph theory) Weight Planning Volume (thermodynamics) Device driver Cryptography Grass (card game) Limit (category theory) System call Planar graph Sign (mathematics) Word Friction Personal digital assistant 5 (number)
Web page Slide rule Context awareness Scheduling (computing) Group action System call Multiplication sign 1 (number) Mereology Code Coprocessor Floating-point unit Mathematics Population density Read-only memory Encryption Software testing Data structure Resource allocation Physical system God Authentication Time zone Broadcast programming State of matter Memory management Data storage device Bit Cartesian coordinate system System call Frame problem Resource allocation Personal digital assistant Network topology Software framework Summierbarkeit Key (cryptography) Resultant
Advanced Encryption Standard Slide rule Goodness of fit Image resolution Semiconductor memory Program slicing Menu (computing) Special unitary group Code Reading (process)
so it was the 1st thought that decided no with all the recent and say Why are tapping up from the back of for are may take over to the video or stuff like that would prefer to have made it a cryptid on the hot dry and so I'm decided to join collection on my so I'd but that had been part server 6 cost her and deployed jelly encryption and the performance was very very slow I'd because only like a 100 50 megabytes per 2nd and this was on the spindles with 6 chorus all 6 cause were completely saturated known was a modern like to point gigahertz the pumice like should be getting a better performance than that completely unacceptable for or Ms so by the end of my 1st deployed using its offer for encryption and after doing some research because of what purchase not in the UK from a new about that it has not instructions until a had realised that the way to some of the fastest so after a while the investigation like but my persisted as it turned out at the end enable the tonight and ice almost no improvement for for might the company had previously worked for the party researched actually did a review of the instructions so I'm used to but the performance at a should get from using these instruction should be significantly faster than using suffer got my wife that happy and you know what I'm new that the profound should be created by the 2nd for the stuff and featured this system has overhead and there is a lot of other things that can impact performance but 150 megabytes 2nd was just under unacceptable and obviously if it slow people use in out if you know that you know they needed to gigabyte 2nd into the storage array and I can only give 100 led by for 2nd than the game at the turn off encryption just unacceptable to be slow and through a little bit of work also to try to make it more maintainable so 1 of the main encryption algorithms used in that jelly is a so all go
over what is a faces a basic bauxite from it takes in 16 by been put it Permutes the data into a bowl at which designed the bits but actually generates another 16 bites that map is 1 1 mapping and a as it was called a key are at the start European pseudorandom permutations and that means that there will waste the one to one mapping from the input output and when you do means cypresses print 40 because it was like the PP your which is what most task functions and he would not be able to reverse and so that the basic structure is they do in and around the which is a simple at 4 of the the 1st 16 by of key schedule and then each additional round after that is a combination of sub by which is a simple beach by the basic looked up in a table of 244 56 tingle that more by the fact that people are interested shift rose shifting ceramics called the around and begins at around the and depending on how many Yawkey sisal determined around these and other stuff and in the final round is slightly different in that makes cost us not so
there is different things that can affect the performance of the US 1 of that there is a difference eye modes because the standard of the new voicing actually think that action from and the have come people and seeing Penguin easy be pictures so the if using and its basically the ticket Kinji when Penguin and included within easy being mode and you view it and yes all the colours were wrong but you can actually still make out the Penguin and that's because the common colours were in the picture like the sort background got all I could to the same about its I can still see that and so easy Easybeats not and this was just another with other days that people were like you don't white people including moods bankrupt elaborate well it's also useful for building other side and so there is a couple of different other on but in the centre actually talking about this encryption that thing that becomes impotent insecticides because every time I do not passes various Saxon where's as you know encryption a there is a certain fixed costs over and by increasing the cost of actually has cost over and I can do better and then are obviously key size wanted size more around and think comes up being sought so that 1 of the
things that talk about it was slow through the uses like him if you benchmark opened up as the cell this is the Performance numbers that they were getting on various and all talk about the analogue so that will line is the secret to pursue reason why and this is actually the this beside performance his enabling yes and by default if you don't forget he just to open as a sell speed a as 1 28 the Beastie use some of and built in the assembly not has not version said I needed to get the peak actually get real that for for and this demonstrates that with different side from roads became diplomatically different performance and is also relates to the difference actresses Secretary if you use a really small sector science the over had become becomes to create a new don't actually get much before so jelly has both the and the and the moves implemented As against the next yes is much more ideal using the city seem load when using it so
next which as talking is different from the reasons why there is the big performance Degradation in be mode she is the fact that you can or in the pipeline see while you context and the reason why you can't not pipeline Senisi is that after which walking friction you need help with the protect actually X or on to the next taxes in order to derive the next walking which way as you can see with next yes and this will be a period for those who are the no before the Pentagon Wikipedia created the chance for a day and that he has a lot about is actually tweak also what is commonly used in this case by the end but it was the white sect number into another whites bauxite progress and then and this is a simple go of the starting increment for each round and and that is used that and tax or Playtex and excellent tweet factory in order to generate the fact that it is against the each of these bauxite prop encryption were decryption walks depending on which way go in is now independent and this Gallo of the stock as well it can be made very cheap and because of this we can now actually pipeline a synchrotrons decryption and achieved some of the great game then there is the other
than mentioned in numbers Academy Road is similar to the upside yes fresh and yes laws the watch so on that will
be it accommodation soccer Dietrich could theoretically be predicted but the same is that they the walks IFA used in the attack or what no key to so dull not know that the other thing is that even if they were to in no even if they have a point and the site for tax they do not know they don't know what did also don't know at this walking option where will be the end and that they cannot really determine what between factory so even if they were able to add up the have bought the plaintiffs in the side for tax payable right are out it would be another cryptographers side can say exactly where the very difficult to get the treat factory and the and the and and also in a way that doesn't do is not too much because the real protection is actually a pretty those encryption and between factories just to prevent like the standard I see the attack with a cease see the and when you do it and if you crop the secretary at the only actually from the current 1 and the next ball and and after not because the next Savitex wants is the cracked once in the deal that saw for decryption that all future serphid comes after textbooks are now magically killed so even if you do get even if the the attack was to krupp decipher tax would only have to block the data that was prompted it does not answer so then but we
have to bring in Cairo because that is commonly used by the also began after the area care for when using can road because if you use the same counted again very bad things will happen because now an attack or can simply take your output cyphered secretary the chief Apotex's Axworthy together and until get expert difference between your place and that is not at all very good but it is now you can be at the end especially if the attackers it would control 1 of the 2 separate because now they can generate the data actually say Oh well Cinzano this plane tax I'd get nothing Crippin X or might sex or the 2 together and now have this thing cryptid tonight's or might known Playtex back from a night out of the plane textile so that luckily joined is not implement power could directly to secure but it's not so as a mention instruction
pipelining is very works into how the achieves its performance and as and mention it and very puzzled as to why the original BSE and next year encryption did not for a while and a new that side decided to look at because the and when the big problems was that it was only Processing 1 walk at a time and so normally if you just and depending on what are Ketek talk but the Performance latencies on these can range from for about 7 cycle and some of the rocket caches have the inverse throughput load and instruction being in about 1 2 inch in cycles per structure and so that means that you know if your during a stand as this diagram shows this is assuming that 7 cycle latencies with dispatch issued every cycle that I can see we now under the walks into a which were clock cycles than if we had done but the serial and so now we hit me about 3 times as much work with just very macho increased and the Intel paper actually talks about that but I'm sure and processors had significantly improved since the original himself paper the original Intel paper recommended the rounds which is what actually ended up implementing to pipeline by think now most wanted processors for may actually be the because almost all my processors are like the other 4 cycle latencies with 1 1 1 instruction per cycle so that so that it gives you a huge Bruce tests on but what would I might measures the all I'd did not actually doing in those Lomas global measurements but like 1 of my very 1st Implementation I'd I'd I didn't do that and I've forget about the slide but when actually finally did all my changes are told the original X T S algorithm out to use a so that we could get rid of all the Colonel overhead again after recompile colonels to all of that stuff the ritual algorithm was performing at about 150 megabytes for 2nd in user land and the original my original pipeline boosted that to over a gigabyte for seconds and then with some other tweaks like with improved while of some of the stuff they got that improved to over 2 gigabytes Prosek in use on the same day point for bigger which so that just the pipeline party showed significant provement forget exactly how detailed by officers a diagram which all the talk about in the 2nd half but and guide you actually also used ideas Kimsey's staff to to actually measures performance but in actually do any of the really other crazy counters that they have which is like a note from the cycles were not actually dispatching instructions but that it would be interesting to see them so the all other
party as and mentioned his that if we go back but you guys remember that takes yes
between every time between every walk we actually have to compute between after and this is not this is all thought which is the primitive element in the Gulf field which this pre-match allways to and so that is what makes it so now out of but in order to compute that we needed to the US and the original code was not very efficient and
doing that was doing that by the time and using a bunch of British so itself and well it works him then Apeejay died did a little bit of improvement where he decided to actually improve using 64 by 64 but said types said now we only have 1 branch and were using a well instructions so on my friend might Hamburg was actually talking with him to help me a lot and he actually came up with a new interesting way of handling the tweaking Computing between after using sense amusing Assisi instructions since uneasy as a eye and that have access to these as the instructions on the even with this code of your after dual load and and that it should be to allow the other side of the with his help it turns out that we can't do simple Gallois permanent in by the sea instructions and so this is the code that we actually and using this is that week load up a constant we do a shuffle a shipwreck by 31 and that would help mass and then we Jewish left by 1 and and we saw the results on hand to give
this a little bit more of the diagramed we have on this pitch so they put this that's what happens is that we shuffle words around which is basically turns out to be a simple left rotate by 32 word then we do the show right here at the cosmetic by 31 and that basically the black splatting and higher high the overall registers now the register if the hide it was 0 1 now that word contains the their 0 and now we are the and mosque the 1st the 1st few 3 words are all ones because it's a one we had it with the 1 we now carried the and the hide it from 1 word to the next were now the next part is that if the and if you in the shifting out the well but with the way they go on officers implemented in shifting out this high 1 power is a wonder they needed tax or leader basically during options that and the reduction step is that soaring the constant 87 into and so if we end up 15 out a high 1 out this will actually come out and and 87 if there is a wonderful was and will be 0 and no longer fell and chef extra reais or these 2 together and now we have done well to start and so and that he ends up with the entrance and the
USA compile scented basically 5 instructions as opposed to the tens of instructions for you so 1 of the
decisions that are made 1 out was developing the soft were was always indigo used entrance and there if you look at all the out their everybody's during had hand assembly openness to sell has a crazy pearls trip to generate it's assembly and the flight read by really don't want to deal with that and 14 it into freakiest I would be a pain we don't have prolonged based of and it still had to put the Pearl but the pre generated a 10 regenerated all that other stuff so the capabilities to be means intrinsics yes it's intrinsics is almost like assembly bigger only using assembly with a key part that you what and the rest of the year when the power to does turnout that do not get the same Performance Using intrinsic said he would use handle assembly but the question is is that dictated Asian performance managed by the maintainability other interesting things that I'd now use the exact seem code on both by 3 6 an in the 64 boxes and not have to have different assembly sauce files because all the code compiles exactly the same and I know I'm using under the forget forget that mentioned but the data type used to represent the 120 but but datatypes are exactly the same between that so if you look at the current pace sexiest code in previous the only differences between Ituri's 6 in the 64 right now is the key schedular and that's only because it in order to improve the whether other advantages of the transit is that it actually work around a geometrician with that and the 64 house in the 64 can only passed in for good the notes are not showing and its for registers and as you know it the other additional Joachim and have to be passed on the state and so even if your calling of really hot function like doing 8 encryptions at a time your down but the spelling out and possibly pudding some of your in on to a stack and also return values you can actually believe passback even though the baby allows to return next month registers work after fine the textbook the direct from the UK with the city the that is cracked the or reason why and comparing in this is that almost everybody uses wine Assembly for all of the rich and so that the also the online assemblies still have the same difficulty in the past I have actually never and would never actually tried to do you know what might lie Ayodhya men and when some waste of this possible that we could do yet there we should yes not that that is an interest but he still looked so also allows the online assemblies they actually doing the Antarctic do algorithm away and don't do it builds up and so this way at should have simple and building blocks of matter that is a good point but so as a major and with this in mind functions were actually able to bypass that women and those who also the sexiest are code actually had interesting thing where they actually used a common cricket function in the past not whether it was an ingrafted function and so by being able to use in line intrinsics as able to actually have both been the grip registers map of and so there was last reorganisation and but as he said that the online assembly would be the same but as the match in the Assembly there is few prosing had a instruction Scheduling No 1 of 1 of the big problems with the but that some may also discovered in my original code was that there were few places where the buffers would actually the unwinding and on being able to get to the if you by before that he cap and although both claim GCSE see when you compiled code that references the 120 datatypes actually allways issues Load that requires the data to be a wife and a woeful to the biggest not and that's part though wide using the difference because almost all processors execute the the same almost exactly the same and it is only a slight penalty for online and not quite sure what they choose to do the difference and so fares and another other advantage was the GCSE I do not support this is sometimes Tree 6 with her so say
are so 1 more G previous died at the time that I'd in this work which started the over the
at we by the US and Russia which GCSE he 1st on the spot is raising for 2 1 in actually real ball was the top Beachill's but dozens of for its minority because even if you were to be used by and went some way to get past 2 whenever until the so when did all this
work GCSE he was still considered supported aka touches even though by 3 the 6th inning the 64 have switched over to claim and there were still number of people who for various reasons chose to remain compiling on GCSE United not 1 to be the person to lead them out of the way so as I'm edging claimed everything just worked and mentioned are GCSE seems to be killed this very very old on the order of play committee dividend at 6 so that and that it has a nice support and that at the same time added piece yellow more acute cuticular which is that carry was multiplied with the double to a and so once added a as instructions into the Beatles and got them to the road the appropriate intrinsic set off all week now actually used dash and media on GCSE just like we could on quite that was too bad but
here is the original assembly that was used in order to generate the to do that a encryption and this was actually just a week and this is just the kind of counter the see the tide Road and not trained at for you say they did have slightly more complicated ConstructionSkills as you might know this they're saying it was not was comedy now and replaced with the bite Street and this was because of the fact that are the tills did not support a as eye instructions and so in order to get their code to work compiled the child and her actually have use a bite stream compiled prop and I'm sure how we didn't bother with playing whenever or and Assembly and got we we that the and applauding a data are actually read load up are fully load up the the pointed to the and make sure that they are actually but there is where we were when the hype the and the idea that the the players like the assault which took over ride with the data from the Ivy applied that or on and then we now to to start a fight in the rankings with this actually applies the 1st round he Ahmed to around the comment around car if we still have grounds to go do it over again and until the last 1 of them concede a as banque last words was also a vice state and obviously there is a return at the end idea 1 of the point you smallfry guys so far as I'm
implied and that can only be introduced the transit code provides a 120 data datatypes and there are a few different ones but the won the for World miniature map used for for and 120 5 days of in the White the and the and all the things spaces and then the penny of some of the from some of the intrinsics are built implemented is built tins and part of that is in order to early handle cost publication because some of them are demands to an intrinsic will actually be a constant a computed constant but it turns out that the think it's at least with GCSE see the if you in line assembly if it's a computer constant it won't know actually if you go to the players they want note actually be a before it into a constant and so there is still the issue think it's still more acute queue if under other and circumstances has and not go through the work of making peace the Stonewall cuticular built and there are certain cases where the successfully compiled because of the cost of a full stop and so this year and teaches must be enabled by the the power 5 days instructions from tabled by Dutch and the and the Stonewall cubic you can enable by Dutch Campisi or more N as mention the snow easy way to handle unaligned data intrinsics and by easy means just sitting at tribute this this data disappointed now 1 a light and this would also be very useful in some of whom were in other stuff but they even though other detectives also have online loads for some reason the compilers half have black it i believe I've tried it even though he said the that both playing and GCSE lost the rumours of the idea that there is a a kind actually in the code actually have commented out 1 of the following gains and from sale that would actually make my life easier both GCSE am claims today but that can happen to beauty and another way that you can actually work around the SAS which and actually and using both of these depending on the use is actually telling the compiler that its structure that 120 by titles actually in the structure and the structures packed and by declaring that its packed it out of the compiler have no clue what the line and manages to be the prop a Load structures
so finally is a minimum was pretty all in all all of whom but that is correct but also like the packed attributer is not part of the scene 1990 London Stock so it's track and yet the tackling I had to get an idea of where I was was hat and the iPad re written the codes so that instead of using a packed structure and all the separate loads and would have done the exact sinking by using the Saxons note works and provides much more sites and part of the reason why it should use intrinsic was to get concise code and so this is a single roundy encryption I'd do not actually show that it round Pollock because code would be a little bit longer but now said using the assembly following the sort earlier this is now what it looks like stance that might be application L know about you but I'm going to have to look at much rather look at this as opposed so Ghedini all of the food to get comply tested I don't use your land doing all that other stuff result lighting and indeed the whole point of this project was to make it work broadcaster and the and the next West so it turns out that the group will and structure does
not have does not have the standard and the file Securities and are standard Kernel does not have the intrinsic Tata's because well although transit cirrhosis eager not sees Floating-Point Kernel except for when the special work you can so the question is now we have all this Physics how do we make a compiler what lies with biassed eastern figure in this actually a separate file that lets us call out socialist ructions on how to Compile as the file and so we do really not eating think removing no Standard Bank from the Colonel high and so that the flag of standard in means don't include the standard include pedophiles the reason why I get away with this idea nature that they pie-eyed for this 1 file is uses no Kernel faces and has a very limited way pie-eyed such that I'd go needed depend upon an integral header files that could possibly with compiled and come up with just the right amount and also people compilers actually 0 to and from my testing of 3 got the performance but 1 eye was actually recently working on the as he on actually found that those 3 the worst for for the noted for another fine trauma pipe so I'd have to enable aminex as a seat on the area's structure and so now we are able to with this in their where Bernau able to compile are so the next
being used performance testing if you have 100 people know of many staff the has when I say that is a utility Peachtree wrote the number of your Soco if you do I basic benchmarking away whenever it is very simple to use utilities to actually Tell did did the changes that make up the difference in the profound because a lot of times I noted unit build timing the build world a whenever the times will very from run dry and they may look like the the improved but actually it turns out that the variations between the build world time Sir with every trade is so significant that you can do it looks like it improved but actually the various is to hide economic should say that my changes proved time so which means that it's a very basic utility that you provided a simple file with numbers and and each file which filed that you provided basically a set of numbers were 1 integration options will Competa the result average results in standard patients that sort between each files and say I was there proved so in this case the 1st follow the fastest 1 was tax which would be as 9 with at least disabled and what this 1 the variation was not very good so public tell from by eyeballing that was now we actually have a 95 per cent confidence that the other 2 for foreign significantly worse but where are the other 3 Gisti now actually includes this as part of the peace distribution it is no longer in tools which is a very useful so few needed any benchmarking poised to these 3 to 5 runs castigated for his that from the so this is basically a case the 1st of 4 testing and other stuff now the next question is how do you identify various hot spots other stuff and this is
where Kimsey's that comes in to help helpful being held pim the Shias ago wanted to do was at my work wanted to do some performance evaluation and like by No he has Kimsey counters and like well with teams carries the overhead is virtually 0 you can run it and you don't get a slowdown human like the standard dash peachy compiled with cheap off the performance in the 5 10 times law which if the during timing and that things are timing to to trickle had recalled the basically can do it with G prop but and annoyingly Linux at the time and they still do not enable it by people but and previous we actually have usually and ability to call in Kimsey's that available to all users and just works and gives you really also data and so that we can generate Kimsey's that data with this and that but Test perfectly as much test programme that was doing and said that there has you mention the want different there is less of different differences cameras to see that you can do you can get like cash misses to Cashman said is like a branch stalls the of remaining different pedicab and it gives you much better idea of what to do for this is that this was a set simple sampling when the machine was running that we we were right in the whole idea 1 of whom was written to was that that has different output for execute actually output G prop for for me but apparently Ireland the tool to long preaching samples on June prop has lead the 16 but limit toll accounts for the various thinks and well ahead of the few that were watches and so I cannot use people but look legal Tree for coffee for me which is used by the cash prise came to the rescue services went to generate your statistics on new now converted into the perfect coffee of come the people here have you has seen sinking cash and so
big is you very pretty graphic this is 1 of them all pull up other than Waldouck actually showed up so this need look very confusing but this is basically a stick by idle thread this is actually a few might have a as an eye and a test might be on the block encryption though this is basically all the good that has violence during a this was the and the Isle should actually say this is actually the area is about how much time each of the functions of the pot and the where to each other and said that that the big while a this specific function consuming time while the and the and the box within it actually to know what the child function calls are and so that when you can see this is highlighted the programme around happened by hand running the and even noticed that there was this other Fred which was gripped picturesque what was consuming about it when he went adjusted about 15 per cent the and as far as I'm was like while would I'd be writing this other had no idea and so upon for the investigation attorneys out there is a knife thread even though and the trip to face like of cricket driver was doing coal synchronously it was not actually telling framework that it was doing all this work synchronously and so when it actually did all this work in return for building cryptograph earn order to help the boy walking issues was actually Scheduling another threat to do the Kolpak and so this is based said there was about 50 per cent time Sekiyu just to switch to another threat to run politicos and and switch back and so is a matches when I'm enable that flag in microphones testing has out-of-order 27 per cent increase in which just literally adding a flight to initialisation quite supplies and other performance by will
admit plane grasses around for a while but I'm Fulham for the 1st time yesterday so that generate this 1 real prickly my some another very nice about previous tedious that there are good young framework makes wearing walked free well and 1 of the thought that he Jedi wrote was G 0 Dawlat this is basically a sauce 0 6 or any other things you want all and the same and so that makes it takes a lot of the disc another Processing out like even and the even though it is very has risen by about this is even while the weight and the and so with this this is very similar to the call for an hour to a show you the is the plane graph what if you actually have the fully the G version your browser can actually Mount house over the various components actually see how much percentage of time to and so the user that disable with his other ways you for for for stock limit or and jelly threat to 1 red the idea that were on the other side of that the number of trades used there are Benefits using multiple friends like it G 0 3 accessing the same each early jelly volume on multiple times have multiple threats can be beneficial but at the same time if you have the dislike my original see a fastball Systems did and jelly launches 6 threads because of a 6 per box for all the discs 0 and by the way your is also contributed to the as another to swaps also improve did here and so is the sole files and cryptic suddenly you have a machine with a 100 threads doing jellying friction on all of them are going to be competing for the beauty and so by the details that Chile down to 1 of those machines during the team the multiple taking for performance by the Baltic disc access and so in this case I'm and just did some tests on my in the 8 10 box that runs to play for gigahertz that Sept to work for gigahertz I'm able to get a 100 about 100 megabytes Percec it was such a great job yet part of the site of was the said were also said so my original test was actually using next Yasser there was no actually change inside so it was all of this might 6-secs plus was also seem Shirley due to Pipelining and all of us are increase on the same increased changes moral all the way out of it all waiting ok so well so that 1 of these is as like with felt sorry for the young and now we are doing like with sugariness a 64 of was a Tiger about the belt soccer word of going from 64 120 well 1st of all lesson structure because with the 64 now we have to do to extra words you have to do a few ships and Tests and other stuff in order to do that but also the other thing is that were or if we have all over the area maximum registers if we do it 64 but we have to actually like stacked in order to actually transferred into a 64 but register and back again 1 of the things which using intrinsics another stuff basically the data path is entirely innocent registered with the nightly touching and 64 but registers through the day pass from the time that we loaded from memory to the time that we don't all over the thanks so us so Association used to be just a bit too much of it very well know that this each year registers are call that they well there is also why the if there are less a Seitaridis ours so they are so the performance was now actually and now actually getting decent enough performance and other there are a few things like
for improvement on the on 1 of these is the only people who were using the open critical directly are actually proved there are various calls that are using direct calls to and all the crappy that will not see the light to see the change followed on his handling the Pew context says when he announced the global context which most of the cases becomes law but difficult because the likely about a kilobyte context somewhere and I can now do there is actually a 1 of the issues that notices that under certain cases jelly does large memory allocation and by large memory allocation and talking created for kilobyte and in previous died we do not have any zones for Kate which means that we you have to do a large page allocation which means that we have to do idealistically these and other really big nasty and so by switching I'm seemed about for 5 per cent consumption in and out free badly switching to you may sell something similar would take back to basics virtually 0 and so that would be another big believer not looks like he schedule is actually consuming a decent ties and so we could actually pipeline the 2 cents for that Emyle's Howsam somewhat unrelated to this but and also working on a stadium improvements and adding that the group and eye believe these my case the last the Performance following the have long jelly encryption is that I'd chose to use Shawty 56 for all my cheque sums partly to have still authentication on my in September will be time could expand 1 that can talk to you about that but I tried to 56 actually turns out to be very slow like think this now my application becomes long for a bit had actually significate with more than 80 right so any
questions about the ability of the House the gap I use yes a this change hands like so to the no have not been any Test sorted out early because they all have tried and later in other side some say the effort by the Bolton each I'm back to attract the best and the of via and the that it should and what seems like an age but it yet structures it got better Oh yes the at the time there is an instruction many after that has most latencies the on but but in the dock the and then actually if you do your compiled world compiled with the appropriate ones which that that we picked up the out of date so on the eye ideas seemed those that have not actually really looked frame what poverty side of the 3rd inning and at least for a guest the slump in the for 4 months and yes we could probably do to all of the stock trades at the same time but considering its only 5 instructions to run it now so that OK but we be saving 5 instructions Muslim pipelines after all he added I'm not thought about using loans that way part of the other teams that so 1 major this code runs on this the fastest as possible and that it should be using the white agility 6 3 even the fight will be in registered the coming up in would be interested in that yes Ferguson but yes a book had actually density any of my performance wanted testing by father of all the works by trees but it well so there are lot and the idea that some of the other processors to have a instructions but but have a look at the folly of lack of time in the salt came out because I'm 1 my storage system to actually for 4 at the back the God of it was a result of the past a of ways still so actually open a so has where they actually used basically posted to generate and their actually are getting that performance as if you remember the slide way back guillemot up the
slide this slice the cash
aloof Lucy Cripps and yes we are almost giving half for gigabytes for 2nd so the and that and the other in my Testino my code in user and I'm not external also cited was only reading about to point used to point for gigabyte so as you can see there is actually a significant gap between my ex yesterday and to sell but what when the idea there identified did you some are more as the following up so that just have a good memory Wordsworth way that with the fact that it will do much it after yet by can go imagine closer than any other questions well thank you for coming the


  416 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)