We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Systems Integration: The OpenStack success story

00:00

Formal Metadata

Title
Systems Integration: The OpenStack success story
Title of Series
Part Number
107
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Flavio Percoco - Systems Integration: The OpenStack success story OpenStack is a huge, open-source cloud provider. One of the main tenets of OpenStack is the (Shared Nothing Architecture) to which all modules stick very closely. In order to do that, services within OpenStack have adopted different strategies to integrate themselves and share data without sacrificing performance nor moving away from SNA. This strategies are not applicable just to OpenStack but to any distributed system. Sharing data, regardless what that data is, is a must-have requirement of any successful cloud service. This talk will present some of the existing integration strategies that are applicable to cloud infrastructures and enterprise services. The talk will be based on the strategies that have helped OpenStack to be successful and most importantly, scalable. Details ====== Along the lines of what I've described in the abstract, the presentation will walk the audience through the state of the art of existing system integration solutions, the ones that have been adopted by OpenStack and the benefits of those solutions. At the end of the talk, a set of solutions under development, ideas and improvements to the existing ones will be presented. The presentation is oriented to distributed services, fault-tolerance and replica determinism. It's based on a software completely written in python and running successfully on several production environments. The presentation will be split in 3 main topics: Distributed System integration ----------------------------------- * What's it ? * Why is it essential for cloud infrastructures? * Existing methods and strategies OpenStack success story ---------------------------- * Which methods did OpenStack adopt? * How / Why do they work? * What else could be done? Coming Next --------------- * Some issues of existing solutions * What are we doing to improve that? * Other solutions coming up
Keywords
CodeGoogolSystem programmingDisintegrationOpen setStack (abstract data type)Software testingPhysical systemComputer virusWordState of matterSystems integratorTheory of relativityComputer animation
Red HatDensity of statesPhysical systemDisintegrationPhysical systemSet (mathematics)DatabaseService (economics)AuthenticationStrategy gameINTEGRALBitNetwork topologyCASE <Informatik>Lebesgue integrationBasis <Mathematik>Graph (mathematics)Systems integratorSign (mathematics)Parameter (computer programming)Source codeMultiplication signSystem call1 (number)Web serviceCodeDistribution (mathematics)Right angleSeries (mathematics)Order (biology)ComputerDifferent (Kate Ryan album)Generic programmingOpen setMereologyVertex (graph theory)TwitterInternetworkingArithmetic meanComputer hardwareLecture/Conference
DisintegrationWeb serviceWeb serviceService (economics)CASE <Informatik>Design by contractWindowINTEGRALBus (computing)Bit rateBitTelecommunicationComputer animation
Web serviceDisintegrationBus (computing)Message passingService (economics)System callBus (computing)TelecommunicationTerm (mathematics)INTEGRALBitWeb serviceCuboidACIDCASE <Informatik>Computer animation
Computer fileMagneto-optical driveHigher-order logicView (database)Point (geometry)Cartesian coordinate systemTelecommunicationWeb 2.0Computer fileComputer hardwareDatabaseTerm (mathematics)Virtual machinePhysical systemGoodness of fitScheduling (computing)Message passingInstance (computer science)CASE <Informatik>Service (economics)State of matterData storage deviceSystems integratorWeb serviceRow (database)Service-oriented architectureFile systemBus (computing)Statement (computer science)Multiplication signSet (mathematics)Information securityNeuroinformatikDifferent (Kate Ryan album)Server (computing)Object (grammar)Division (mathematics)Universe (mathematics)Series (mathematics)INTEGRALNetwork operating systemSemiconductor memoryPrisoner's dilemmaPerspective (visual)Scaling (geometry)Stability theoryRule of inferenceIntServDimensional analysisRight angleACID
DisintegrationMetropolitan area networkComputer animationLecture/Conference
Message passingRouter (computing)Singuläres IntegralOpen setNP-hardService-oriented architecturePoint (geometry)Message passingService-oriented architectureExecution unitWeb serviceInformationParameter (computer programming)Physical systemVideo gameReverse engineeringService (economics)NeuroinformatikScaling (geometry)Systems integratorTypprüfungTerm (mathematics)IntServSemiconductor memoryLogicTelecommunicationElectric generatorCASE <Informatik>Demo (music)Particle systemFunctional (mathematics)Proxy serverIndependence (probability theory)Complex (psychology)CuboidSpacetimeRemote procedure callSystem callComputer architectureWeightDifferent (Kate Ryan album)State of matterSeries (mathematics)Instance (computer science)DatabaseProcess (computing)Arithmetic meanMixed realityMedical imagingVirtual machineProcedural programmingDesign by contractBasis <Mathematik>ResultantHypermediaStreaming mediaScalar fieldMeasurementCategory of beingFormal languageObservational studySinguläres IntegralINTEGRALEvent horizonRow (database)BootingScheduling (computing)RoutingCommunications protocolRouter (computing)Radio-frequency identificationShared memoryNumberLie groupLevel (video gaming)Enterprise architectureParsingComputer animation
Read-only memoryCurve fittingCommunications protocolService-oriented architectureService (economics)Data transmissionPhysical systemMessage passingCubeKey (cryptography)Semiconductor memoryBitOpen setParticle systemRevision controlLebesgue integrationCASE <Informatik>Point (geometry)Latent heatMathematicsWeb serviceMultiplication signType theoryWave packetVertex (graph theory)Functional (mathematics)Rule of inferenceData storage deviceVolume (thermodynamics)File formatSystems integratorPersonal digital assistantFamilyMilitary baseComputer programmingBit rateServer (computing)Right angleWindowFormal languageLink (knot theory)Connected spaceCodePeer-to-peerCircleParameter (computer programming)Centralizer and normalizerResultantArc (geometry)Basis <Mathematik>Sinc functionDatabaseMiniDiscWritingDesign by contractLevel (video gaming)Queue (abstract data type)Lecture/Conference
Web serviceSoftware testingTask (computing)Computer architectureMessage passingData managementDesign by contractService (economics)Service-oriented architectureState observerView (database)FamilyServer (computing)Web serviceLibrary (computing)Transportation theory (mathematics)Field (computer science)QuicksortCodeSet (mathematics)QubitCASE <Informatik>SummierbarkeitPresentation of a groupPressureTerm (mathematics)Functional (mathematics)Scalar fieldImplementationContext awarenessPhysical systemFormal languageDistribution (mathematics)Right angleVideo gameDifferent (Kate Ryan album)Hazard (2005 film)MereologySoftware testingINTEGRALGraph (mathematics)Programming languageProgrammierstilEndliche ModelltheorieMultiplication signSystems integratorLie groupWordCycle (graph theory)Greatest elementLebesgue integrationLimit (category theory)BitSimilarity (geometry)Closed setScheduling (computing)Bus (computing)Software bugPatch (Unix)Scaling (geometry)Proxy serverStreaming mediaComputer programmingData miningLecture/Conference
Red HatSystems integratorSoftware testingService (economics)Web serviceIntegrated development environmentVirtual machineCodeTouchscreenDecision theoryInformation securityCycle (graph theory)Message passingAssociative propertyVideo gameCASE <Informatik>Right angleUsabilityPhysical systemBoss CorporationComputer simulationTelecommunicationCondition numberCalculus of variationsConstructor (object-oriented programming)Mathematical analysisOperator (mathematics)INTEGRALSoftwareDrop (liquid)Medical imagingFunctional (mathematics)Service-oriented architectureSinguläres IntegralWeb pageWage labourStructural loadProcess (computing)Sign (mathematics)EncryptionBus (computing)Goodness of fitPatch (Unix)Multiplication signCAN busLogicLecture/Conference
Googol
Transcript: English(auto-generated)
Okay, welcome to the second talk in the session given by Flavio Percoco on OpenStack.
Thank you. Hello. Hello. Does it work? Yeah. Okay. Cool. Hello, everyone. Today's talk, well, my talk is about system integration. It's not actually 100% related to OpenStack. It's mostly related to integrating systems to each other.
And I'll use OpenStack as an example because most of the methods I will present to you are being used by OpenStack itself to integrate all the services that we're using. So that's me. That's my Twitter handler. Pretty much everything you want to know about me is out there in the internet.
Something I want you to know, I work for Red Hat and I'm part of the RDO community. RDO is a community of a bunch of really great people working together to make OpenStack amazing on road-based distributions. Other things about me, I'm not going to go through those.
The one I will mention is I'm a Google Summer of Code and OPW mentor. And I wanted to mention this because I really believe it's very important. And if you have spare time in your day and you want to mentor people, please sign up. We need more mentors there. So let's get to it.
Before we go through the methods that you would use to integrate systems, let's first define a little bit what system integration means. System integration is basically what you do when you have a set of subsystems and you want to make them work together towards a common goal or scope, right?
So all the methods and technologies and strategies that you would use to make those systems work together and to get towards that goal is what we call system integration. This is put in a very simple way. There are a whole bunch of different definitions of system integration. You can integrate systems. Systems are not necessarily software.
You could use hardware as a system. You could use many other things as a system. So system is a very generic term that you would use to say that you have a set of subsystems working together for a single cause, so to speak. There are many different generic strategies to integrate systems.
These are the three that I will present very briefly, and we will dive a little bit more on the last one. So vertical integration means or basically looks like the small graph up here. It looks like stairs.
Basically you have a set of systems, and you will talk to the system. So the system that are above will talk to the system that are below, and this is done based on each subsystem features and what you need from them. So you will have a web service that integrates to your database, and then you have two systems
that are working together, or you will have your authentication service, and then you will have your other services below it. You will make this service with the real features talk to your authentication service, and you are integrating those two services vertically. The star integration, well, it's called star integration because it's supposed to
be like a star, but it's more like spaghetti integration because all services know what other services do, and they all talk together, and they do that in a case-by-case basis. So you have service A that needs something from service B, but before doing that it has to talk to service C because it needs something from service C before getting to service B. So it's quite a mess.
Use cases for these, there are plenty of them, but it's very risky, and it's very error-prone, and there's for sure a very high risk of not having a contract when those services are talking together. Not having a contract basically means that you don't know what you're going to get back, and you don't know when something is going to go wrong when you get something
from service C to talk to service B, but service B is expecting something different, and it turns out that service C was updated. And the other method that we're going to dive a little bit more today is horizontal integration, and it's based on a service bus, and service bus is, I call it communication
bus. I don't like to use the term messaging bus here because it doesn't matter, I mean it's not about messaging itself, but about making those services communicate together through the same bus. So you have service A, service B, and service C, and they all communicate through this
communication bus, sending either messages or just a data asset that would make the whole feature work through this communication bus. So diving a little bit more on horizontal integration from an application's point of
view. So how would you make all these? So imagine that you have a set of applications that you want to make them work together. You need to have this communication bus. So you have to come up with an idea, with a technology that you would use to make them talk together. What I'm going to do now, I will present like four different methods to make those
applications talk together, and these are not new methods, they've actually been around for a long time. Many people use them and they actually don't know that they are basically integrating a system and they actually don't know what the whole thing they're doing means.
And each one of these cases are good for very, sorry, each one of these methods are good for very specific cases. Some of them are more generic and others are more specific. And the first one is, sorry, and the first one is files. Files is probably the oldest way to integrate different services talking together.
For a long time people used to open a file, get a file descriptor, put something in there and having another application within the same piece of hardware reading out of it a data asset that it will use to do something. So some people would use files as a messaging bus, like you would use a RabbitMQ right
now as a messaging broker. It's good for very specific cases, try not to use it. It's good for cases like embedded systems. In an embedded system you won't have RabbitMQ running for sure. So if you have very limited hardware and very limited processor and memory, you probably
want to use something that's really cheap. Files are cheap. Definitely accessing the file system has a cost. It has a high risk in terms of security and reliability. But it works very well for embedded systems.
We used to use files in OpenStack to have some kind of in-server distributed lock for some time. Many things went wrong with that, so don't do it. We're now working on another way to have distributed locks.
But that's one of the cases where, for example, in OpenStack we used files and we moved away from them because, like I said, they're very good for hardware that has very limited resources. But if you can afford something more expensive, you probably want to do that. Databases. Databases are probably one of the most common.
And by the way, all these statements are based on my own experience. I don't have really actual data that proves that this is the most common or the files are the most oldest. So this is all based on my experience of my own researches. Databases are probably the most common way to integrate services.
They are asynchronous data-wise. What that means is that when you have a message and you want another service to get that message, you would just store it in the database and you are done with it. So the producer stores the message in the database, the producer is done with the message and then the consumer eventually will get that data out of the database and will do
something with it. Databases are really great for storing states. And I'm saying this is probably the most common one because most of the web services out there, like I couldn't think about a web service that does not rely on a database. And if you want to scale your web service, you definitely have or most probably have
a single database for your whole thing and you have several services talking to that database and getting data out of there, right? And they're great, great, very great for states. And the way we use this in OpenStack is, in OpenStack, most of the services are probably the biggest services have been split in several smaller services.
So in Nova, for example, Nova is the computer. How many of you know OpenStack or have heard of it? Awesome. So Nova is the service that is responsible to spawn in new instances. So it probably is like EC2 for AWS.
And Nova has three sub-services, well, it has many more than that. But the main services that you need from Nova are like three or four services. And you have the API service, you have the compute node, and you have the scheduler, and you have conductor that gets messages and stores everything on the database.
So when a request for a new instance comes in to the Nova API service, a new record will be created in the database. And then a message will be sent to the scheduler that then will talk to the Nova compute node to spawn a new virtual machine. So what Nova compute does is it gets the data of these new instances that was requested
out of the database, it spawns a new virtual machine, and when the virtual machine is running, it will update the state of the virtual machine saying that, hey, the virtual machine is running, and it will update the data. So that's system integration in a really small scale. And that's a way you could use databases to integrate systems. So that's why you say that they're probably the most common way to integrate systems,
and probably many people don't know that they're actually integrating systems by using databases. So LibreOffice is, there you go, at your LibreOffice.
So does any of you have any questions so far?
Feel free to interrupt me if you have questions. LibreOffice is stuck, okay. So messaging, what I mean by messaging here
is not a broker, it's not ANQP, and it's not the specific technology that allows you to send a message from point A to point B. What I mean about messaging here is the message itself, like the message as a unit to send data from point A to point B.
Whatever method you use to send that message from point A to point B, the benefits of using messaging is that it's loosely coupled, and it adds way more complexity, because by being loosely coupled, it means that you don't have a contract on the message,
so the service A can send a message to service B, but service B has a hypothetical idea of what it's going to get, and what it wants to do with that message. It adds more complexity, because if you don't know what the message may look like, you probably will have some parsing errors, type errors, or whatever,
depending on the language and what you want to do with that message. Some benefits, though, it's like being loosely coupled, you can say, I will send this message, and whoever gets this message can do whatever it wants with it. And one of the places where we use this kind of messaging or loosely coupled contract is in Cilometer,
where, so Cilometer would plug into the notification stream of OpenStack, and it will get all the notifications of what's happening in your infrastructure. If you spawn a new virtual machine, a new notification will be sent, so Cilometer gets it, parses it, and does something with it, creates new events, creates stats,
and allows you to build users based on what you've done. And one thing about messaging is that it may depend on message routers and transformations. So, when you use messaging, and you want to send a message from point B to point C, but you have to go first to point B,
you will need in point B some kind of logic or technology that will allow you to route that message to point C. And you will do that based on the message information itself, so you have to know what to do with it, and you have to try to parse it and get information out of it to know where the message has to go.
And this is something that Nova Scheduler, for example, does. It doesn't get a notification, it gets an RPC message, and we'll go to that later. But it gets a new message, it parses it, and it tries to get a Nova Compute node that will do the work for it, and it will send that message to Nova Compute. And it does that by using some filter logics and availability
and other important information. But let's not dive into that. They're very easy, they're very cheap, but they add complexity to your system. And the last method that I want to present today is RPC.
And RPC, it stands for Remote Procedure Calls. It was probably introduced pretty much by the enterprise world, when system integrators wanted to integrate systems for their customers, and they would go and use RPC calls to do that.
And RPC calls, the way it works is you will send formatted messages, so you have a contract on that message, to point B. Point B will do that, and it's called Remote Procedure Call because you're basically calling a remote function just by sending a message. You will say, call this function and pass these arguments to that function,
and give me the result back. It's the most used method throughout OpenStack, and I do have numbers for this. The message channel may vary, you can use databases, message brokers. So like I said, I'm not talking about, here I'm not talking about the method you would use to send a message from point A to point B.
In OpenStack case though, we use message brokers to do RPCs. And one of the drawbacks, but it's actually something required for RPC, is that it's tightly coupled, so you have a protocol, you have to invent something, you have to agree on a contract
when you send a message from point A to point B, because you want to call a function that you know it exists in point B, and you have to pass some arguments to that function, and you want to get a result back. So you have to know what are you going to get back, and you have to know why you have to send to the point B to call that function. So it's really tightly coupled, you will need to design your own protocol to do this.
But it's really common, it's very useful for doing that kind of remote calling function thing. But you have your benefits and your drawbacks taken from this. So in the OpenStack case, this is pretty much a high level overview of how OpenStack works, in terms of system integration.
It's based on shared-nothing architecture. If you don't know what shared-nothing architecture is, it's basically, in a very simple way, it's services working together, but not sharing anything. And by not sharing anything, I mean they don't share memory space on your box, they don't share processes, PIT and other resources,
like they can't live together on the same box, but they won't share the same resources, they don't have their own space within that box. So every service knows very few things about other services, and with that we managed to keep all those services very isolated from each other,
which is really good if you want to integrate systems together. You want your services to be independent, you want your services to be isolated from each other. And if something happens to one of your services, you definitely want your other services to still be alive and be able to work on top of other services in your system.
So we use databases for inter-service communication. Like I said, Nova API will store a new instance record with booting state, and then Nova Compute will update that state. And we use RPC for inter-service communication. When Nova API gets a new instance request, it will send an RPC message to the scheduler,
and the scheduler will get that message, and then it will send another RPC message to the compute node that will then boot the virtual machine. And we use messaging for cross-service communication. And I already mentioned this. The way it works is that services, when something happens with OpenStack, services will generate notifications,
then they will send it to some specific topic in the broker that other services can just plug into and get messages out of it, and they can do something with those messages. So since OpenStack relies a lot on brokers,
and it's probably right now one of the common tools to integrate services in many deployments, I would like to say a few things about brokers and the technologies that you could use, or how you could do integration based on protocols like AMQP
or just using technologies like message brokers. So the first thing I want to say is that scaling brokers is really hard. You've maybe read or heard something like, brokers scaling is already fixed, and you can scale wrap AMQ. I'm sorry, that's a lie. That doesn't work that way. There's a lot of documentation, yes.
There's some explanations how you can do it, yes. There are some demos that people have done it, yet when you get it to big scales, it doesn't work that way. So scaling brokers is hard because synchronizing messages between different nodes of your broker that is heavily read and heavily written on, it's really hard, and it doesn't work that way.
Another thing is that brokers need a lot of memory. It really depends on your use case. If you don't have many messages traveling around your system, you probably won't use a lot of memory, but if you have a big deployment, your broker is definitely going to use a lot of memory, and it really depends on how fast you write to it
and how fast you read from it. If you write really fast and you read as fast as you read, sorry, if you read as fast as you write, your broker will probably use less memory. I mean, the memory footprint will be pretty linear and stable, but if you have more writes than reads,
your broker will use a lot of memory. Brokers need a lot of storage, and if you want to have durable queues, and you have your messages to stick around if something bad happens, you probably will use a durable queue. If you use a durable queue, your broker will have to write everything on disk
because if the broker goes down, it has to start from somewhere, right? So it will read all your messages out of whatever database or storage system it is using, and it will make those messages available again. So, again, if you have a lot of writes and not as many reads as you have writes,
your broker will use a lot of storage. So I was looking at the time, and it says nine minutes because LibreOffice went down. I was like, oh, I'm already done. So, since I've been ranting about brokers for a bit,
I would like to say something about those. If you are going to use brokers or any messaging technology, prefer federation instead of centralization. What I mean by that is if you have a centralized broker and you want to scale that broker, and that broker goes down, your tone,
like your system is off, yeah, you will have HA and all that you want. You want to scale a broker, and you want to have it replicated and all those kind of things. But if you prefer federation instead of centralization, you will have a whole bunch of nodes that are lightweight workers, and if they go down, you will probably set up a new one,
and you won't rely on a single broker that is in the middle of your system, processing all your messages. And one way to do that is relying on NQP 1.0. NQP 1.0, I'm pretty sure most of you are familiar with NQP itself. The current, latest version of the NQP protocol
being used by RabbitMQ and most of the brokers is NQP 0.10. And NQP 0.10 is not a standard, and many brokers have implemented it in different ways. Whereas NQP 1.0 is actually a standard, and it detects how messages will go from point A to point B.
So how you can share, send messages between two peers. NQP 1.0 is peer-based in a message basis. What I mean by that, it takes, it explains how a message will travel
from a point to another point, but it also, in the specification, there's also an explanation how you would do that with an intermediate broker. So it doesn't say that you have to have a completely federated system, you could also have a broker in the middle that is capable to speak NQP 1.0.
So NQP 1.0, it's all about messages, and how those messages will go from point A to point B. And if you want to scale it and have more routing intelligence, so to speak, in your system, you could use something like kp-dispatch
that will allow you to create new rules, send those messages between your services, as you would do with using routing keys in NQP 1.10. So in NQP 1.0, you don't have exchanges, you don't have queues, you don't have binding rules, and you don't have routing keys. In NQP 1.0, you just have messages,
and you have links, and every link's basically a connection to one of the peers in your system. So after having said all that about methods to integrate systems and technologies that you could use and protocols and all that stuff,
I would like to give you some tips and tricks about system integrations. This is mostly based on our experience in the OpenStack community. First and foremost, transmission protocol matters. By transmission protocol, I'm not talking about the lowest level. I'm not talking about UDP against TCP. I'm talking about probably a higher level,
like whether you want to use a protocol that's TCP-wide, or you want to use HTTP, or you want to use some other RPC protocol whatsoever. Transmission protocols matter. Depending on the protocol you choose, you have some extra cost on your messages
and transmission of your messages. So be aware of that. Depending on your use case, make sure you choose the best protocol for your use case. Use versions for your wire protocol. If you choose to use RPC to integrate your systems,
you probably will have to agree on a protocol, and you probably have to define that protocol by yourself. Something that has been around in OpenStack for a long time is the version of those protocols. So when you define your protocol, you probably will say, my protocol is a dictionary that I will send between services,
and that dictionary has a key that is called function, and that function is the key, the value of that key is actually a function name. And then it will have arcs and keywords in the dictionary, and that will be the value to pass to the arguments and keyword arguments of the my function. But then you want to update that protocol.
You say, I want to also specify the return type I want from that function. And if you have your system deployed, and you want to make a change to your protocol, you can do that. But if you don't have versioning, you will probably have to tear all your services down
and then up again once you updated the protocol. Because if some service gets a message, an RPC message with an RPC format it doesn't recognize, it will probably fail. Instead, if you have versioning, you can do rolling updates on your system,
by restarting services one at a time and updating those services so you don't have any downtime. Versioning is not just useful for upgrades, it's also useful for backward compatibility. If you do a change and that change turns out to be really bad, you can go back to your previous version and you still have your services that used to work with that version.
Keep everything explicit. I have a really nice quote that I got from Jeff Hodge's talk at the RIAC conference. He basically said, in a distributed system, having implicit things is the best way to fuck yourself.
That's really true. If you have implicit things happening in your system, you send a message, like an RPC message and you don't agree on the contract for that message, you will probably face some subtle issues that you didn't expect to happen. So keep everything explicit.
Even if it is more verbose, even if you need more code, even if you need more notes running, that's fine. Just keep everything explicit, because when something bad happens, you will know what it is. You will know how to debug it and how to fix it. Most of the time. Can you step to the microphone, please?
Yeah, I can repeat it. He's asking for an example of an implicit, oh, I can get one of the OpenStack issues. Well, yeah, like, for a long time, Fargal, correctly, Selometer uses messages. It gets messages out of the notification stream in OpenStack and there were some implicit fields
being sent by some services and those fields weren't sent by other messages, services. So Selometer didn't know about that and there was a case where it failed when you got those messages. A good thing is that it was before the release,
so it could be fixed. But anyway, you don't need to, like, something that you want to keep it explicit is how you distribute your system, right? How, where your notes are running and what notes can run alongside with other notes. You don't want to have all notes running on the same server, so if you keep your architecture
and your distribution very explicit, and even in the way you use separate services, it will be easier for you to estimate the scale and how to distribute those. A good example of this is Nova itself. Again, Nova has a Nova API service and it has a Nova scheduler service. So if you're getting a lot of API requests,
you will get a lot of messages going to your schedulers. If your scheduler is under a lot of pressure, you can add more schedulers to it and you can scale them horizontally very easily. So the way you distribute your services in terms of code, like, usually you have an API service,
a scheduler service, a conductor service, and a compute service. It's another way to being explicit in how your distributed system should look like. Design by contract, and I've been using the word contract a lot today. If you design by contract, you don't have to,
like, you know what service B is expecting you to send, and service B is expecting you to send something. So service B can run, let's say, a set of aspirations before running anything and it will be replied back if some of those requirements are not met.
So to put it another way, when you integrate your system and you want two services to talk to each other, you have a contract between them. Pretty much like you're account manager and yourself. You have a contract with him and you know what he's expecting you to do when you pay something and he wants you to get all the receipts and give those receipts to him.
And you know when you give the receipts to him, he will do something with it and you will pay him for that service and he will expect you to pay for his service, right? So you have a contract with him. The same thing happens with services when you send a request to service B. Service B is expecting something to you from you. You know what the service is expecting, so you will send that.
If you don't met those requirements, he will reply back with an error. And if you send all the requirements, you are going to expect something back from it. If you don't get what you're expecting, you can just call back again and say like, hey, this was not what I was expecting, so please give me what I want. This design by contract was probably known by most of you.
It was introduced by AFOL, the programming language, and it's basically part of the coding styles of the language itself. Keep services isolated as much as possible. Like I said, share nothing architecture
is very useful when you want to keep your distributed system safe from failures. And it's not like completely safe from failures, but if one of your services goes down and it's isolated from all your other services, you probably can just run another one somewhere else and just make it talk to. So keep them isolated, keep your services very stupid
if you can, and I'm not talking about microservice architecture and having like thousands and thousands of microservices doing just one little function thing. But keep them isolated and very focused in context on what they have to do.
Avoid dependency cycles between services. I wouldn't recommend using the star integration method. It's really messy and when something goes wrong, it's very difficult to debug it. So avoid having dependency cycles between your services. If you can have a service bus and you can send messages through it, make sure you don't depend on,
both services don't depend on each other to get something done. Mock is not testing. If you have a distributed system, you probably want to test it. If you want to test it, you would say, hey, the easiest way to test it is by mocking what I'm expecting from the other service. Yeah, that works and it probably will succeed every time, but that's not testing. You want to test your distributed system,
get it installed, run everything live, and that's the way to test it. That's how you would know when something is working and it's not working. We have mocks in OpenStack, but we also run everything live for every single patch. This is very important. Many bugs we have found in OpenStack
and data related to how services are distributed were not tested live and we have mocks for those tests. So, mock is not testing. And before closing, since this is a Python interconference, here are three libraries for doing integrations,
like Kombu is for sending messages. Kombu is a library that's actually used by Celery and it supports transports and every transport is basically a messaging technology you could use. You could use RabbitMQ, MongoDB, Redis, and some other sort of technologies that it supports.
DureMQ. Celery is a distributed task manager. Well, there was a presentation before mine about it. Basically, it allows you to have distributed workers doing something based on messages and Celery itself uses RPC implicitly to tell workers what they have to do.
And also, messaging is an RPC library and that's what we use in OpenStack to send RPC messages through services. And it also supports, well, it has the architecture to support many brokers. It just supports RabbitMQ and QPID for now and we're working on the AMQP 1.0 support for it.
And these are some messaging technologies that you could use. You probably already know them, like half cam, accounting, DureMQ, RabbitMQ, and the QPID family. You have the QPID which is the broker and it supports 0.10 and, well, it actually supports 1.0 as well. You could use QPID program which is fully AMQP 1.0
and QPID dispatch for routing messages throughout your system. And that's pretty much it. Any questions? Please come to the microphones if you want to ask questions.
Hi, thanks for your talk. I was really, I would be quite curious about how do you do your systems integration testing? Like, do you have some automated system integration testing of, you know, setting up a cluster with all the services and so on?
What tools do you use? So, in OpenStack we use get-read-for-code-review. Every time you submit a patch there, there's a tool which is our testing integration tool. It basically gets notification from get-read and it runs a Jenkins job every time it gets a new patch. And those Jenkins jobs will install OpenStack completely in a single node and it will test.
We have live tests that call APIs and it will send messages throughout the whole system and like simulate a live environment like spawning new virtual machines, taking it down, creating new volumes, deleting volumes, creating new images and deleting images and all that kind of things. So it's been tested live. We do have automated tools like Jenkins is basically the one that does everything.
And we use DevSec to install pretty much all the tools in those Jenkins jobs. All right, thank you. You're welcome. I have a question. You didn't talk about security. If you run this messaging infrastructure,
how do you secure it? Sure. So, right now in OpenStack, security is pretty much done by binding everything on your private network. In this layer, we have some work going on on signing messages and encrypting messages before sending them through the whole pipe, so to speak.
There was a talk about Marconi that was done yesterday where one of the things, the good things about Marconi that was presented is that it is good when a message broker is not good enough. One case is especially security. We have guest agents running in virtual machines
and we don't want those guest agents to talk to the central broker. So Marconi would be good for that use case where you can just set up a new service that doesn't have to take a high load of messages in your infrastructure and it will isolate everything from your message broker. So the security, how it's done in OpenStack right now is just by binding everything on the private network
and we don't allow anyone to talk to that except for the services running in the OpenStack deployment. And like I said, we have some work going on to sign messages and encrypt messages before sending them through the wire. Go ahead. Yes, I have another question. Do you have a way to make the dependencies
between your services visible? Because when I see this communication bus, it looks very clear and simple. You just put a message on the bus and somebody else will get it. But in the end, that's just a way for the services to communicate with each other. And you can easily build a spaghetti dependency system
by just using a very clean bus. So how do you prevent this? Logically, we don't have any assertion between services that say hey, we can depend on each other.
It's just done logically when the design decisions are being taken. Like we cannot make service A depend on service B and service B depend on service A. So let's try to figure out a way to do that which basically means create a service C, unfortunately. But yeah, it's done logically. Cycle dependencies, in my opinion, are bad
but they are not always bad. Like everything is software. But we try to avoid them as much as possible. It's all done logically. We have everything explicit. So since we know which service depends on each other logically speaking or function-wise or feature-wise, we know that we cannot create cycles
in some of the services. Or we try not to as much as possible. Can you use the mic, sorry? And that's explicitly written down somewhere in the code. Yeah, no, it's somewhere in the code now. It is in the code, definitely, but we have also documentation about it written on Wiki pages and the documentation of each other's service and the operations book, obviously,
because you have to know how to do the whole thing. If there are no further questions, I'd like to thank the speaker again. Thanks. And thanks for attending. Thanks.