We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Hardware/Software Co-Design for Efficient Microkernel Execution

00:00

Formal Metadata

Title
Hardware/Software Co-Design for Efficient Microkernel Execution
Title of Series
Number of Parts
561
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
While the performance overhead of IPC in microkernel multiserver operating systems is no longer considered a blocker for their practical deployment (thanks to many optimization ideas that have been proposed and implemented over the years), it is undeniable that the overhead still does exist and the more fine-grained the architecture of the operating system is (which is desirable from the reliability, dependability, safety and security point of view), the more severe performance penalties due to the IPC overhead it suffers. A closely related issue is the overhead of handing hardware interrupts in user space device drivers. This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems. One reason for the IPC overhead is the fact that current hardware and CPUs were never designed with microkernel multiserver operating systems in mind, but they were rather fitted for the traditional monolithic operating systems. This calls for an out-of-the-box thinking while designing instruction set architecture (ISA) extensions and other hardware features that would support (a) efficient communication between isolated virtual address spaces using synchronous and asynchronous IPC primitives, and (b) treating object references (e.g. capability references) as first-class entities on the hardware level. A good testbed for evaluating such approaches (with the potential to be eventually adopted as industry standard) is the still unspecified RV128 ISA (128-bit variant of RISC-V). This talk discusses some specific hardware/software co-design ideas to improve the performance of microkernel multiserver operating systems.
SoftwareMikrokernelComputer hardwareSystem programmingForschungszentrum RossendorfKernel (computing)Sanitary sewerObservational studySpeicheradresseWebsiteStatement (computer science)BefehlsprozessorAtomic nucleusPhysical systemBranch (computer science)Process (computing)InterprozesskommunikationControl flowFunction (mathematics)Enterprise architectureOperations researchCycle (graph theory)EmulatorBefehlsprozessorPower (physics)Computer hardwareData recoveryPhysical systemComputerMikrokernelVirtual realityMultiplication signMultiplicationKernel (computing)Group actionDomain nameVulnerability (computing)Form (programming)Vapor barrierProjective planeFormal verificationStatement (computer science)Internet service providerLink (knot theory)Point (geometry)Mechanism designDirection (geometry)Overhead (computing)Single-precision floating-point formatSpeichermodellEntire functionProper mapSoftwareInformation securityMemory managementCycle (graph theory)Operating systemDoubling the cubeCuboidAddress spaceRight angleInformationFormal grammarServer (computing)Special unitary groupSemiconductor memoryComputer animation
Kernel (computing)System callMikrokernelInterprozesskommunikationData structureRead-only memoryPointer (computer programming)SubsetSpeicheradresseAddress spaceWeb pageExtension (kinesiology)Message passingBefehlsprozessorStack (abstract data type)Cache (computing)TLB <Informatik>Buffer solutionLine (geometry)Maxima and minimaAerodynamicsOverhead (computing)Texture mappingStandard deviationMechanism designBefehlsprozessorShared memoryCycle (graph theory)Focus (optics)Translation (relic)System callEnterprise architecturePhysical systemContext awarenessFunctional (mathematics)Block (periodic table)Information securityMereologyInterprozesskommunikationKernel (computing)Computer hardwareLine (geometry)Semiconductor memoryWritingOperating systemBranch (computer science)MikrokernelAddress spaceSingle-precision floating-point formatBuffer solutionCodeFlow separationMemory managementMessage passingTask (computing)Limit (category theory)Cache (computing)Data structureAsynchronous Transfer ModeParameter (computer programming)Multiplication signLogic gateScheduling (computing)CASE <Informatik>Ocean currentMechanism designSpeicheradresseIdentifiabilityStack (abstract data type)Roundness (object)Network topologyGraph (mathematics)Similarity (geometry)CyberspaceTLB <Informatik>ImplementationDevice driverMathematical optimizationEmulatorSPARCCausalityTerm (mathematics)Model theory3 (number)Server (computing)BitGroup actionLevel (video gaming)Modal logicScattering2 (number)Process (computing)MultiplicationSoftwareLecture/Conference
Context awarenessComputer hardwareMultiplicationPhysical systemThread (computing)Scale (map)Memory managementMechanism designSimilarity (geometry)TLB <Informatik>Event horizonBefehlsprozessorScheduling (computing)CyberspaceInterrupt <Informatik>Address spaceDevice driverLevel (video gaming)PeripheralExtreme programmingCoroutineGame controllerComputing platformKernel (computing)Source codeSpeicheradresseCurvatureVirtual realityEinbettung <Mathematik>Pointer (computer programming)ImplementationMaß <Mathematik>Operations researchMikrokernelAverageMessage passingSoftwareDisintegrationLevel (video gaming)MikrokernelObject (grammar)Pointer (computer programming)Computer hardwareContext awarenessInterrupt <Informatik>Kernel (computing)CASE <Informatik>Event horizonMechanism designScaling (geometry)BitDevice driverMessage passingWorkloadProcess (computing)Scheduling (computing)Physical systemFormal languageTask (computing)Address spaceAuditory maskingException handlingBuffer solutionMereologySlide ruleINTEGRALPoint (geometry)Order (biology)Thread (computing)Java appletBefehlsprozessorComputing platformCache (computing)Disk read-and-write headCodeOperating systemSource codeExtension (kinesiology)NumberTerm (mathematics)Multiplication sign3 (number)Integrated development environmentSoftwareDirection (geometry)Game controllerMultiplicationIdentifiabilityLecture/Conference
Maxima and minimaMikrokernelContext awarenessThread (computing)Operations researchAverageDisintegrationMessage passingSoftwareComputer hardwareComputer programmingModel theorySpeicheradresseVirtual realityRead-only memoryProcess (computing)Function (mathematics)System callPoint (geometry)ImplementationBridging (networking)MetreTime domainIntelExtension (kinesiology)Field programmable gate arraySpeicherschutzLevel (video gaming)Computer hardwareWeb 2.0Server (computing)Open setCASE <Informatik>Library (computing)BefehlsprozessorObject (grammar)Direction (geometry)Kernel (computing)Mechanism designCartesian coordinate systemSystem callFunctional (mathematics)Extension (kinesiology)Table (information)Descriptive statisticsAbstractionAddress spaceLogic gateAsynchronous Transfer Mode32-bitComplex (psychology)Performance appraisalNormal (geometry)Hydraulic jumpMultiplicationSingle-precision floating-point formatTask (computing)Barrelled spaceProcess (computing)SoftwareState of matterMikrokernelSemiconductor memoryLimit (category theory)AreaCache (computing)Context awarenessSubject indexingParallel computingOperator (mathematics)CuboidEncryptionProgramming paradigmModel theoryThread (computing)Office suiteSeitentabelleComplete metric spaceMeasurementBinary codeLine (geometry)Order (biology)Message passingData centerMathematicsCore dumpLecture/Conference
MaizeField programmable gate arrayEuler anglesSystem programmingOperations researchProduct (business)Compilation albumCollaborationismPrototypeLimit (category theory)Streaming mediaService (economics)ArmComputer hardwareExtension (kinesiology)Execution unitComputerBound stateGoodness of fitPerformance appraisalProduct (business)Software architectureVulnerability (computing)BefehlsprozessorPointer (computer programming)CuboidSpeicherschutzSoftwareSpeicheradresseMoment (mathematics)Level (video gaming)Enterprise architectureMoore's lawSoftware developerMultiplication signDecision theoryInformation securityStress (mechanics)Process (computing)Line (geometry)BitChemical equationModul <Datentyp>Uniform resource locatorOpen setMikrokernelOverhead (computing)Computer virusPoint (geometry)ImplementationMixed realityTDMAPhysical systemCore dumpVirtual machineObject (grammar)Run-time systemIdentifiabilitySemiconductor memoryArithmetic meanReduced instruction set computingRandomizationLecture/Conference
Computer animation
Transcript: English(auto-generated)
So, shall we start? I have my own.
Not this year. I already did, at the RISC-V level, actually. Yes. Thanks for the introduction, thank you all for coming.
And when I'm mentioning it, I mean... Most of the talk will be about the same topic I have already spoken about at the RISC-V Dev Room. So, if you have seen that, there is no reason for you to see it again, unless you would like to discuss. Again, this is, as usual, this is something I always do, more or less like an opinion piece, to spark some discussions.
So, please go ahead and disagree with me. So, for those who don't know me, I've been working in operating system domain for some time. I have been working on the HalanaOS project since 2004.
I have been working at the Charles University in Prague on the formal verification of HalanaOS. And quite recently, I mean, not longer than two years ago, I have joined Huawei, where I'm doing the same stuff.
Microkernels and formal verification, stuff like that. So, I would like to tell you that microkernel multi-server systems are better than monolithic systems. That's it, thank you. No, no, really, seriously. I mean, this has been, I would say, an informed opinion of many people for many, many years.
Gradually, we got some, I would say, qualitative evidence that this statement is true. But, I mean, qualitative evidence is still just basically a form of an opinion. Now, we are gradually also getting quantitative evidence that the monolithic operating system design is flawed.
So, you have probably noticed there is this paper co-authored by our friend, Gernold Heiser, who we had the privilege to see today. And this paper basically looked on several critical vulnerabilities in the Linux kernel.
And tried to estimate if these vulnerabilities would have been mitigated if we would consider a state-of-the-art microkernel design such as a CL4.
And yeah, I mean, you can read the summary here, you can read the entire paper, and it's pretty much convincing. So, I mean, this is a single piece of evidence that we are getting in the right direction, we are going in the right direction with microkernels.
But, and this brings me back to my original talk at FOSDEM Microkernel Dev Room in 2012. We are paying some price. The price is with the performance overhead. And if you were there, if you remember my talk, I said that it's a fair price, according to my opinion.
So, I mean, there is no free lunch. But, you know, the safety, security, availability, other guarantees that the microkernel provides are, you know, counterweighted by some performance overhead that we need to pay.
But is it really necessary? Is it really unavoidable? Let's look on this. The microkernel ideas are not particularly new. I mean, the earliest incarnations were already set in 1969.
But, you know, there was, and there is still some disconnect between the software design and the hardware design. Designing hardware used to be complicated, expensive, it usually required a huge company to back it. And, you know, therefore, the operating systems have been written for the CPUs only after the CPUs were out.
So, not before. I mean, even having powerful emulation tools like QEMU, stuff like that, that was not always available. So, you know, something happened that the hardware designs got stuck in certain ways.
I mean, really, try to think about something revolutionary in hardware design that hasn't been there already in IBM S370. I mean, memory management, it was there.
Virtualization, it was there. IOMMU, it was there. Offloading, you know, computations to dedicated hardware devices. I'm talking about, you know, so-called data channels, what was the exact name, I don't remember. It was there already.
So, there is nothing new under the sun. And the problem is that microkernels suffer. Because, you know, the current hardware is being designed for the monolithic kernels in mind. Or with the monolithic kernels in mind. And therefore, we need to pay the performance penalties due to the fine grain design, due to
the need of crossing the other space barriers due to the IPC mechanisms and stuff like that. So, let's try to change it. Let's try to design the hardware in a better way or in a way that is more suitable for the requirements of the microkernel systems.
I really think that there is this vicious cycle that the CPUs currently don't support the microkernels properly. Therefore, microkernels suffer performance penalties when running on them compared to the monolithic systems. Therefore, microkernels are still not considered, you know, the true mainstream.
I mean, yes, we have them on safety-critical and machine-critical devices. We have them maybe in some embedded devices. But, yeah, you can have Mac OS which is running on Mac, but just as a single server microkernel.
But still, you know, we are talking about Linux and we are talking about making Linux more secure and stuff like that. Which is crazy, I mean. The only way how to make Linux more secure is throwing it out. And since the Linux is, since the microkernels are still not in the mainstream,
there is no strong push on the hardware manufacturers to actually provide CPUs with proper microkernel support. And this closes the vicious cycle. Well, there has been a lot of effort spent on this box already. I mean, for the past 25 years, people have been trying to squeeze out every single CPU
cycle from their microkernel code to make the IPC run as smoothly and as quickly as possible. But it was given the limitations of the hardware they had. So, and I mean, we have been trying on this too, right?
I mean, this is the reason why we are meeting here. So, now let's focus on this part. So, let's focus on the requirements on the hardware and let's focus on creating better hardware to support our microkernels to finally get rid of this trade-off between safety and performance, security and performance.
I got some ideas. I have to say these are very rough ideas. And again, this is something where I would like to spark a discussion. I would like to spark, you know, or inspire people thinking about the actual mechanisms that could be done to make this happen.
And my ideas are targeting, you know, the obvious culprits like the IPC and context switching and stuff like that. So, first about the problem with the IPC in the microkernel multi-service systems. The finer the architecture it is, the better for safety and security, availability, dependability.
But, you know, the more we are paying due to the need to move data between other spaces. So, I mean, I probably don't need to explain the problem in very much detail,
but compared to the multi-kernels where communication between subsystems is just a function call. In a microkernel multi-service system, the same communication is implemented via IPC, which means that we cannot use all the registers for actually passing the arguments because some registers are reserved for something else.
We need to switch to the kernel level, to the kernel privilege mode and switch the address space and then switch back. We potentially need to do some scheduling in between in case the IPC is asynchronous.
Of course, this is not necessary. And if we are moving larger amounts of data, we either need to copy them between the other spaces or establish some kind of memory sharing, which again might be a little bit costly. So, what to do about this?
One thing that is probably, would be probably quite simple. Just implementing richer call or jump instructions that would actually switch the address space by themselves.
So that, you know, we would save at least the single kernel round trip where the only thing that the kernel is actually doing here is changing the address space. This could be done by the hardware or by the CPU and it could be as simple as just switching the current address space identifier.
Of course, this still needs to be just a mechanism, just a generic basic mechanism. I'm not proposing, you know, moving some kind of policy from the operating system to the CPU. That would be crazy. So, how to do it?
Implement by having something like a call gate that would be cached in some kind of hardware cache, something like a TLB. So, you know, the first time this call happens, obviously it will trap into the kernel. The kernel will check, you know, the permissions, capabilities and stuff like that and set up an entry into this hardware cache.
And consequent calls will be then done just by the CPU. I believe this could be really very simple. Regarding the asynchronous IPC where there is probably some need for buffering of the messages, I also think this could be optimized.
I mean, even nowadays, like somebody already mentioned at my talk yesterday, this could be, you know, this message buffering and message passing could be optimized by, you know, making sure that you don't trash the messages from your cache lines.
That's fine. But again, I would imagine that the CPU could do it even more intelligently. So basically using the cache lines as fixed size buffers for the messages.
And it's not a problem that it's a fixed size because in most of the microkernels I have seen that are using asynchronous IPC, the kernel buffers are also fixed size for obvious reasons that the user space cannot exhaust the kernel memory. So again, I would see that there is a clear separation between the mechanism that could
be very efficient, very fast, very lean and the policy that will obviously still stay in software. If you remember Spark v9, this reminds me of the register stack engine that they have there. Or stack engine. What was the term? Stack engine or register stack engine?
AI 64, is it? Itanium. Itanium. Okay, okay. Well, Spark has something similar. So how about the bulk data? We really need to move a lot of data between the processes or tasks.
Currently, the current best optimization we have is memory sharing, which actually works quite nice, quite fine. The only problem is that the memory sharing needs to be established and possibly turned down. And if this is happening too often, this causes the performance penalty.
And also the data needs to be page aligned. So it's not really very useful for sharing scattered data structures. It's fine when you need to share blocks of data that needs to be written to a block device driver or write from a block device driver.
But it's not very useful for really graph structures and trees and stuff like that. So again, an idea that could be something that could be done is to have a new
simple layer of memory, of hardware-based memory management that would map virtual addresses to cache lines. Because a cache line is usually something like 64 or 128 bytes, which is much more reasonable granularity for the scattered data structures.
And of course, again, we need to sit down, we need to create a model, we need to evaluate it, we need to implement it in an emulator to be sure how well this will perform, what should be the parameters, what should be the size of this translation buffer, stuff like that.
But I really believe there is some possibility to make this work. Context switching, I mean, we have somehow avoided parts of this context switching in case of the IPC.
But still, the problem is that in a microkernel multi-server system, there are more active processes or more active tasks in a monolithic system. So there will be still some context switching. And all that our hardware is currently doing is basically masking latency.
And we have very efficient mechanisms for masking nanosecond latency, that's called the caches. We have quite efficient mechanism for masking millisecond scale latencies, that's IO buffers.
But the context switch is precisely in the middle, it's on the order of microseconds, and we have really nothing to mask this latency. So, I mean, I wouldn't say we don't have anything. There is a hardware mechanism quite often used, which is multi-threading.
And this is precisely why we have multi-threading, to be able to somehow make sure that the ALUs or parts of the CPU have always something to do, despite there is some data or dependency or waiting for some data.
But this does not scale to many, many threads. I mean, you usually have just a couple of hardware threads. And we can do context switching in software. So how about combining this and having, again, something like a hardware support for unlimited number of execution contexts.
Some of them, the most frequently used should be cached in some hardware cache. And having some dedicated instructions or extensions to efficiently operate with these hardware contexts.
Again, this will keep the scheduling policies and stuff like that mostly to the operating system. But the physical mechanism of quickly switching to a different workload when our current workload is being blocked on the hardware level,
because it is waiting on some data that could be done autonomously. We could even think about somehow connecting some other external event triggers to this, like interrupts or exceptions. And this would allow us to do even more stuff. I believe I have a slide about this.
Here, yeah, that would allow us to do very simply or very elegantly, purely user space-based interrupt processing. Currently, the interrupts always trap into the kernel space. And in a microkernel environment, what the kernel does, it generates some kind of IPC message that is then being forwarded to the user space driver.
If we would be able to do the fast context switching using the hardware, it is just a single step further to extend it to the interrupt delivery to the user space drivers.
Which would not only make some things faster, it not only would this allow us to get rid of polling in case we are dealing with some very latency-sensitive device,
where actually even in a monolithic system, the interrupt processing can be so expensive that polling, despite stupid, is more efficient. But it would also solve the final compromise regarding the elegance of the microkernel design,
and that's the fact that we still need some device drivers in the microkernel, like the timer. With direct delivery of the timer interrupts to user space timer driver, we would not need any timer driver in the microkernel.
And possibly even moving the scheduler out of the microkernel, which again is something like a holy grail of many people. Something would need to be done with the level trick interrupts, the usual pain point.
Again, I would say that there is some possibility to have some integration with the platform interrupt controller, that would autonomously mask the source of the level trick interrupt when it happens, so that there is no issue with this endless reassertion of these.
Capabilities, I mean, this is really just a stretch. I mean, I did not really find very much useful ideas in my head about what could be done with capabilities on the hardware level, but at least something. I mean, if we just consider the narrow use case of capabilities as object identifiers,
again, the microkernel would always need to be in charge of making sure that the methods called on the capabilities are permissible for that holder of the capability.
I wasn't able to think about any elegant hardware mechanism, how this could be avoided. But at least for the actual access to the object, the capability ID or the capability reference could be somehow embedded within the pointer itself, and then the hardware would be able to autonomously check whether the access to that given object is allowed by the current context.
If you think about RISC-V 128, the 128-bit variant, I believe you could wonder what would be the actual use of 128-bit long pointers.
I'm not sure that a flat 128-bit pointer is really so useful. Maybe it is. Maybe I'm wrong. But we could easily divide it into 64 bits for the object offset and 64 bits for the capability reference, and this could work quite elegantly.
By the way, I mean, this would be probably even more useful for some managed languages like Java.net, stuff like that, because they are always dealing with, or the VMs running this managed code is
always dealing with the fact that they need to do a lot of bound checking on the objects. If this could be uploaded to the hardware, it would probably help them a lot. Okay, some ideas. Do you have something to add to this? Yeah, please. Yes, and I believe I have it here.
So, I mean, you might come to me and say, oh, these are just bad dreams. I mean, this is not going to happen.
So what I'm now going to present, unless there are other comments or objections, there are some cases or some prior art which somehow leads into the same direction, where I would like to convince you that it is possible to do something like that.
It's even possible to do something like that with the hardware we currently have. So imagine the possibilities if we can actually change the hardware. Let's think out of the box. So the first reference is just a basic paper, rather old, about actually offloading some of the microkernel functionality to hardware.
This was done by basically modifying kind of soft core FPGA CPU. And they were moving, you know, actually complete operations like thread creation and context switching. So, I mean, the context switching is something more or less in the line what I have suggested,
but the thread creation is probably too heavyweight, I would say. But nevertheless, they were able to measure reasonable performance improvement, something like 15 to 27%.
And, you know, just speaking about the ways how the hardware could optimize IPC, this has been also done in practice in the wild on the massive parallel architectures. So, again, having a lean hardware mechanism for efficient message passing that is somehow connected to a reasonable software abstraction for it.
So this could probably work. About, you know, the address space switching, there is an interesting paper from the Barrelfish people about SpaceJump, which is basically a programming model where a single process uses multiple address spaces at once.
And in that case, that was not targeting some performance improvement. It was targeting, you know, just, it was just entertaining what would be the possibilities
and benefits of such a programming model for, let's say, data-centric applications. So, I mean, this is not so much relevant to what I have been talking about, but, I mean, there are approaches and they were able to implement this on Barrelfish, obviously, in Dragonfly BSD.
So it did not require a huge modification to the kernel abstractions. And if you are old enough, like me, you might remember that if you are running an x86 CPU in 32-bit mode, or probably even in the 16-bit mode, you can still have the task state segment,
which is basically hardware-based context switching. It does not have a dedicated, you know, hardware cache for that. It just uses, you know, regular memory for caching the context. But still, I mean, performance-wise it's still competitive to the software-based approach. Even the Linux kernel used this mechanism previously
and they stopped using it not because of the performance, but because they just wanted to have a more portable approach, yes. Yeah, but that was probably just a very, you know, artificial limit because in some index, in some, you know, the global description table could not be...
It was basically, you know, based on the selectors and... Yes, yes. So it was something like 16K or something like that, yeah. Well, I mean, that's the technicality. I'm not mentioning it that this is the way we should do it. I'm just mentioning it because it has been done.
So let's have a look on it and let's improve it. Thank you. Okay, about the cross address space calls. There has been actually quite nice paper by some of my colleagues from Huawei
who have used the VM func instruction or the VM functions extension to the Intel VTX, which is something like that. It's a mechanism that allows you to basically do cross VM calls
by setting up some call gates and then, you know, pass the registers from one VM to the other. So it does the wall switch and address space switch, I mean, switching the extended page tables on the hardware level.
And actually this paper contains an evaluation where they took a rather complex application, something like a web server that uses the OpenSSL library and they have separated some of the, you know, encryption functions,
so individual function calls from the rest of the binary into a dedicated VM and they have used this VM func instructions to, you know, change it from a normal function call to this cross VM call.
And their performance evaluation was quite interesting because the VM func function was as costly as just a single system call. So it was more costly than just a jump or just a call, but not huge, not more costly, definitely cheaper than going to the hypervisor,
going to the kernel, going to the hypervisor, making the address space and VM switch in software. So again, I mean, if we would think about this mechanism in more detail, if we would try to improve it, maybe this could really be helpful for the microkernels.
Actually, my colleagues are working on some suggestion in that area and they will publish a paper, or they have already a paper accepted at Eurosys this year. So if you are interested into this, have a look.
Yeah, this is what you have mentioned. This is the Cherry capability models, so an evaluation of how the capabilities could be implemented on a hardware level. Again, this allowed them to, this was evaluated on a FPGA software,
but the performance evaluation was very positive and it allowed them to have basically byte granularity memory protection. You know, again, the limitation is that they have used 64-bit MIPS, so they had to somehow squeeze in the bounds and starting addresses somewhere,
so their obvious decision was to have dedicated capability registers like an extension to the MIPS ISA, which they self-confirmed that this is not so flexible. So how about using the 128-bit pointers
and embed the capability identifiers directly in them? Actually, if you look on Intel MPX, this is a similar RDR that again has been already implemented in some of the newest Intel CPUs. According to what I have read, the implementation is not so great.
I mean, the performance benefits compared to software-based bound checking is very minor and the overhead of setting up this thing is not good, but if Intel even does it, why not try harder?
Ok, so to sum up, I really think that we have done, as the microkernel community, a lot of work. First, explaining to people that software dependability
or computer system dependability, safety, security, things like that are important and those goals cannot be achieved by using a poor software architecture
like a monolithic architecture. I mean, this applies not only to operating systems, this obviously applies everywhere, see microservices. But, I mean, we have been always struggling to explain to people that they have to pay some price for these assurances.
It's funny that when there are vulnerabilities such as Spectre or Meltdown, suddenly everybody accepts a 5% to 10% to 15% performance slowdown
just to get the assurances we were always thinking we are having. But if we would propose that we can have more assurances, we can have safer systems, we just have to pay a small price for it, I mean, we are suddenly being rejected. So, let's think out of the box and let's design our hardware
in such a way that nobody could complain anymore. And maybe, sorry, I need to mention my colleagues from Huawei who have contributed to the ideas I have presented.
But what I wanted to say also that if you really would like to do something about this practically, I'm opening a new R&D lab in Huawei which will be located in Dresden. Obviously, the location was not chosen randomly. And we would like to have a very balanced mix between basic research,
so something like I have presented here, something like 40%, and obviously some practical development. We won't be making products, we will be an R&D unit still, but we will obviously try to contribute to our product lines,
which is also good because we should have clear requirements from our products. And our company is producing a lot of hardware. And if you would be interested in working on this, please let me know, please contact me by any means. One side note, we own HiSilicon, which is one of the major arm chip producers,
so we have the possibility to actually change the hardware. That's it. Thank you.
And if there are any questions, yes please. Okay. A bit of a counterpoint which might support your ideas, nevertheless. So if you look further back in time, there were so many ideas of supporting special features for application, runtime systems and whatever, like LISP machines, Intel's 432,
or even the RISC guys from Berkeley who tried object support, and all of them failed. Most of that stuff had been implemented in microcode, so it was horribly slow, but flexible. Now why did these approaches fail? I think the idea was Moore's Law killed them
because regular general purpose processes got fast enough so it killed them. We are no longer in that situation, so that might be the point in time where it's actually worthwhile to consider your ideas. Yes. We were also thinking in research about some of these ideas, like people have been working on cache-only machines which don't have direct access to main memory and stuff like that.
We are thinking along these lines and putting this together with microcomms might be really, really interesting. Yes. Thank you for the comment. Just to quickly summarize for the stream, basically your idea is that we are in the precisely good moment in time to do something like this because Moore's Law is no longer applying
and stuff like that, so we need to do something to improve the performance generally speaking. I would add to your comment, my comment, that we have RISC-V now. This is a huge opportunity to actually create a totally new, open, modular hardware architecture that actually might have some industrial traction.
Let's take the opportunity. I mean, they wouldn't be principally against, but on the other hand we have a full ARM license so we could even change ARM if we would like. Thank you.