We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Cloud Messaging with Node.js and RabbitMQ

00:00

Formale Metadaten

Titel
Cloud Messaging with Node.js and RabbitMQ
Serientitel
Anzahl der Teile
150
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In this talk I'd like to present CloudStagram an Instagram clone prototype that has been built with "real time" features from the get go. New uploaded images are broadcasted for background processing using RabbitMQ from the node.js frontend to the Clojure backend. From there real time updates are pushed back to the node.js servers and then to the browser via sock.js. All this implemented in such a fashion that allows horizontal scalability of both the frontend app and the workers app with the requirement of deploying the app to a public Cloud. In this talk you will learn about the advantages of a message oriented architecture to be able to mash up together a polyglot system of apps and services.
31
59
Vorschaubild
1:00:41
89
Vorschaubild
1:00:33
90
Vorschaubild
1:00:33
102
KnotenmengePunktwolkeTwitter <Softwareplattform>GruppenoperationApp <Programm>MereologieSpezialrechnerSoziale SoftwareBandmatrixSoftwareentwicklerAppletZeitabhängigkeitCodeImplementierungUmsetzung <Informatik>E-MailDateiformatFormale SpracheMusterspracheBildgebendes VerfahrenRotationsflächeNormalvektorFamilie <Mathematik>MittelwertCloud ComputingWeb-ApplikationGamecontrollerDatenbankParametersystemFunktionalSpezialrechnerArithmetisches MittelPunktMehrwertnetzProzess <Informatik>DatenverwaltungMultiplikationsoperatorBinärcodeWasserdampftafelVirtuelle MaschineWeb ServicesVideokonferenzWeb SiteZentrische StreckungPhysikalisches SystemKlassische PhysikCASE <Informatik>Digitale PhotographieAdressraumMusterspracheUnternehmensarchitekturSummengleichungServerUmsetzung <Informatik>Rechter WinkelHypermediaVorzeichen <Mathematik>EinflussgrößeDatenstrukturSchedulingLie-GruppeMathematikBildschirmmaskeCodeSoftwareentwicklerBenutzerbeteiligungWarteschlangeSystemaufrufKartesische KoordinatenZusammenhängender GraphElektronische PublikationMessage-PassingImplementierungWeb logDickeKlasse <Mathematik>E-MailGoogolLastTwitter <Softwareplattform>GruppenoperationIntegralEreignishorizontURLSkalierbarkeitDateiverwaltungCookie <Internet>AppletFormale SpracheStreaming <Kommunikationstechnik>Produkt <Mathematik>Gebäude <Mathematik>BildschirmfensterSystemverwaltungProjektive EbeneBandmatrixEvoluteApp <Programm>ProgrammbibliothekKnotenmengeBitPunktwolkeXMLUMLComputeranimation
SpezialrechnerImplementierungWeb-SeiteServerProtokoll <Datenverarbeitungssystem>MultiplikationOpen SourcePunktwolkeApp <Programm>SystemplattformComputerarchitekturEinfach zusammenhängender RaumErlang-VerteilungProtokoll <Datenverarbeitungssystem>SoftwareE-MailMusterspracheSchnittmengeFormale SpracheCASE <Informatik>TelekommunikationOffene MengeProzess <Informatik>Virtuelle MaschineRechenzentrumSchreiben <Datenverarbeitung>Kartesische KoordinatenMessage-PassingMaßerweiterungServerMAPPhysikalisches SystemPlug inFestplatteWeb logPunktwolkeProgrammierspracheAppletKnotenmengeDienst <Informatik>Quelle <Physik>Pivot-OperationProdukt <Mathematik>MereologieFreewareGüte der AnpassungWeb SiteSoftwareentwicklerClientAdressraumLastLateinisches QuadratOpen SourceAlgebraisch abgeschlossener KörperMicrosoft dot netAuswahlaxiomServiceorientierte ArchitekturCachingFlächeninhaltKonfiguration <Informatik>WarteschlangePunktStandardabweichungMomentenproblemZweiZentrische StreckungImplementierungDatenverwaltungFamilie <Mathematik>MultiplikationsoperatorFramework <Informatik>DifferenteInternet der DingeSystemaufrufWasserdampftafelGeradeBenutzerfreundlichkeitBus <Informatik>Rechter WinkelGerichteter GraphKlasse <Mathematik>QuellcodeQuantenzustandWeb ServicesPartikelsystemSpeicherabzugRelativitätstheorieInzidenzalgebraElektronisches ForumDatenverarbeitungssystemInteraktives FernsehenSichtenkonzeptGemeinsamer SpeicherGraphfärbungComputeranimation
App <Programm>PunktwolkeDienst <Informatik>Web ServicesDefaultStichprobeKlon <Mathematik>SpezialrechnerInformationsspeicherungHash-AlgorithmusSimulationCOMMailing-ListeBildgebendes VerfahrenPunktwolkeRechenzentrumAggregatzustandApp <Programm>Array <Informatik>KnotenmengeEindringerkennungZählenDefaultFestplatteDateiverwaltungEinfach zusammenhängender RaumSpezialrechnerDifferenteE-MailSchlüsselverwaltungKlon <Mathematik>InstantiierungMereologieSpeicherabzugMessage-PassingPunktDienst <Informatik>EchtzeitsystemHalbleiterspeicherKreisflächeWarteschlangeProfil <Aerodynamik>FunktionalPasswortBitWeb SiteDatenfeldUnrundheitBinärcodeMetadatenSynchronisierungKartesische KoordinatenBus <Informatik>AdressraumVisualisierungTermRechenschieberLoopDatenverarbeitungssystemMAPOpen SourceServerComputersimulationDemo <Programm>Hash-AlgorithmusInformationsspeicherungMultiplikationsoperatorFehlermeldungGeradeGemeinsamer SpeicherMobiles InternetKlasse <Mathematik>Prozess <Informatik>Algebraisch abgeschlossener KörperPhysikalisches SystemStabRechter WinkelFamilie <Mathematik>AssoziativgesetzInterprozesskommunikationHydrostatikMathematikWort <Informatik>Computeranimation
SimulationServerInformationWurm <Informatik>Message-PassingNo-Free-Lunch-TheoremDemo <Programm>SpezialrechnerSchlussregelE-MailMessage-PassingRoutingTypentheorieFestplatteAlgorithmusAbstraktionsebeneSommerzeitBenutzerschnittstellenverwaltungssystemMetadatenWarteschlangeFehlermeldungKonfiguration <Informatik>LoginKartesische KoordinatenMAPWort <Informatik>SkalarproduktMustersprachePunktBimodulInstantiierungHash-AlgorithmusServerMathematikSchlüsselverwaltungMini-DiscSpezialrechnerWeb-SeiteHalbleiterspeicherApp <Programm>Demo <Programm>ComputerarchitekturMatchingPay-TVGruppenoperationErlang-VerteilungSchnelltasteCASE <Informatik>Inverser LimesPunktwolkeDatenbankAbfrageDefaultTabelleKnotenmengeData MiningBeobachtungsstudieGrenzschichtablösungTextur-MappingZentrische StreckungFlächeninhaltBitrateMetropolitan area networkDatenverarbeitungssystemRechter WinkelNP-hartes ProblemEndliche ModelltheorieGeradeDatensatzTermKette <Mathematik>Computeranimation
BroadcastingverfahrenSpezialrechnerEreignishorizontProzess <Informatik>CodeE-MailRechenwerkSinusfunktionLokales MinimumMenütechnikKonvexe HülleTermEinfach zusammenhängender RaumPunktInformationServerApp <Programm>Objekt <Kategorie>Hash-AlgorithmusProgrammbibliothekLastteilungThreadServiceorientierte ArchitekturCodeMusterspracheE-MailDatenreplikationKomplex <Algebra>SpezialrechnerFunktionalCASE <Informatik>Prozess <Informatik>SchlüsselverwaltungWarteschlangeMessage-PassingInstantiierungAlgorithmusInhalt <Mathematik>Elastische DeformationMailing-ListeAggregatzustandZahlenbereichEreignishorizontDatenverarbeitungssystemRelativitätstheorieQuaderKnotenmengeTypentheorieElektronische PublikationDatenstrukturSystemaufrufRoutingKartesische KoordinatenArithmetischer AusdruckBitAlgebraisch abgeschlossener KörperSchaltnetzBildgebendes VerfahrenWort <Informatik>InformationsspeicherungMereologieEntscheidungstheorieMultiplikationsoperatorProjektive EbeneLoopFormale SpracheComputerarchitekturQuantenzustandSummengleichungWeb ServicesEinflussgrößeSondierungGemeinsamer SpeicherData MiningBitrateMathematikFormation <Mathematik>Web SiteCoprozessorGravitationComputeranimation
DualitätstheorieNormierter RaumRechenwerkBenutzeroberflächeMathematikGammafunktionMaßstabMultiplikationApp <Programm>PunktwolkeWeb ServicesVerfügbarkeitDienst <Informatik>Message-PassingPunktwolkeKnotenmengePlug inSoftwareCASE <Informatik>WarteschlangeKonfiguration <Informatik>Protokoll <Datenverarbeitungssystem>MathematikCodeDatenreplikationStrategisches SpielServerFlächeninhaltBroadcastingverfahrenFunktionalEinfach zusammenhängender RaumFormale SprachePunktObjekt <Kategorie>TaskServiceorientierte ArchitekturRelationentheorieMultiplikationEchtzeitsystemWeb-SeiteWeb ServicesSpezialrechnerSoftwarewartungSchnelltasteDatenbankAggregatzustandVirtuelle MaschineGüte der AnpassungFacebookMailing-ListeChatten <Kommunikation>AbstraktionsebeneEin-AusgabeElektronische PublikationDifferenteSystemzusammenbruchElastische DeformationBrowserSocketBenutzerbeteiligungReelle ZahlClientAlgebraisch abgeschlossener KörperMereologieProjektive EbeneRechenzentrumMultiplikationsoperatorSchlüsselverwaltungMAPTopologieLesen <Datenverarbeitung>Physikalischer EffektSystemaufrufDatenstrukturRuhmasseFamilie <Mathematik>Klasse <Mathematik>Kartesische KoordinatenGemeinsamer SpeicherWort <Informatik>ProgrammbibliothekZweiProzess <Informatik>EreignishorizontAlgebraisches ModellSpieltheorieKlassische PhysikFächer <Mathematik>ZahlenbereichComputeranimation
XMLUML
Transkript: Englisch(automatisch erzeugt)
So welcome everyone, my name is Alvaro Videla and today I'm going to present about cloud messaging with RabbitMQ and Node.js. So a bit about myself, I'm a developer advocate for Cloud Foundry and RabbitMQ as well.
That's my blog and that's my Twitter and I'm the co-author of this book, RabbitMQ in Action. So if you want to get a copy, they have it there at the mini library or mini bookshop they have.
So the goal of this talk is to show how can we use RabbitMQ and Node and so on to build an application and how we can decouple the components using RabbitMQ and messaging. So the first question is like, why do we need messaging?
Okay, I can kind of know the answer for that but maybe everybody can say but I just have a database or I don't know, why should I care about this kind of technology? And I always like to illustrate this question with a very simple example that is very easy to understand and to follow.
So when we build a classic web application for example, we can be asked to implement a photo gallery. And when we build this app, we have like an upload picture form and the image gallery which we can answer to the product owner like this is pretty simple to implement
and we can even have a nicely set up schedule let's say. But at some point we start to get new requirements. And I mean as we know, no project ends as it was designed from the beginning, right?
So we get the product owner, he comes to us and says, can we also notify the friends of this user whenever there is a new image upload and they want to deploy this tomorrow as usual? Every feature is urgent for some reason. Then we get a social media guru in the company and he says that they want to give badges to users
for each picture upload, similar to for example what Foursquare does. And also send everything to Twitter, so to spam every follower basically.
And I don't know but two years ago everybody was blocking the Foursquare tag on Twitter, if you remember. Then we have the sysadmin or as I like to say in Switzerland, we have Swiss admins and these guys will come to you and say that you are delivering the image at full size. And of course the bandwidth build has tripled.
And the need to get this fixed for yesterday because we are throwing money out of the window. This may sound a bit stupid but in a company I was working in China, we had at some point the bandwidth build from a video service we built.
So in this dating site people could film themselves, you can imagine what they were doing and they will stream all this video everywhere. And at some point all these terabytes came back to us and they were not so happy about the whole feature. So then we have developers in other teams that we usually talk to them, not all the time but sometimes.
And let's say we implemented the first thing in PHP and they need to call that from Python or even Java. Then there is the user, we always forget that there is somebody that will actually use our feature.
Usually we just like our machines shipping code there, we don't even care what the feature is about. We just know that some product owner asks us to implement it. So it's like a water fallish disguised as crumb usually. And there is a user there that doesn't really care that the application needs to resize images,
that the application needs to tweet about it or whatever. If I'm a user of your app I just want to click upload and see the image ready. I don't really care what your app needs to do in the background. So don't make me wait for that. And then there is us, after the whole story we started with a very simple design and now we want to probably quit, do something else.
So, let's see the evolution of the code. If we have a normal web app, let's say with modules, controllers and so on. I will be using save.code here, these are comments, that's the function name or method if you want.
Those are the arguments and that's the function body and that's the return value. If this sounds familiar to you then you probably know ELAN, that's just ELAN syntax. Because I think it's very uncluttered syntax to just show what you want to show.
So, in the first implementation we were asked to implement an image controller. There was a put method where we get the image and then we called the image handler, did the upload, probably inserting something on the database
and moving the file from the temporary file system to the actual final location. Finito. But at some point we had to add the method to resize the picture. Of course this required that we redeploy all this code.
Then we were asked to notify the friends, again redeploy all the code, add points to the user, redeploy the code and so on. And to it for a new image. So the question that this code has is can the code scale to new requirements?
What happens if we need to speed up image conversion? In that kind of code we need to probably add more web servers where we just want to scale the resizing, we are scaling for everything even if we don't need it. What happens if we need to send notifications by email? So we need to go and deploy again.
What happens if, let's say, Google decides to create his own social network and then we need to send stuff there and not to Twitter? What if we need to resize in different formats? What if we need to swap the language technology without any downtime? So resizing in PHP is too slow, maybe we can do it in Java or C++ or whatever.
And we want to implement all that. Because usually when we speak about scaling we just think about maybe horizontal scaling or scaling up. But we don't think that maybe at some point we need to scale down actually. We need to, at night, stop having so many consumer resizing images.
If you are deployed into a cloud service like EC2 or something you probably want to pay less money at night or when your website is not so much used. In this dating site we built in China we knew when people before going to work they were using the website a lot at lunch and when they left work.
So those were the big times where we had to have more workers, for example. So is there a way to do better? Of course. I'm here to sell messaging, so that's what I'm saying. And in messaging, if you know this book, Enterprise Integration Patterns
this image is from that book and that's just a very simple publish-subscribe pattern. In this case, the example the book has is for an address change event that is sent over a channel and there will be three consumers doing whatever they want to do with that event.
That's basically what we could have implemented before. So the first controller we can implement will do the image upload as usual then it will create a structure with the user data and the image data
I mean metadata, not the actual binary of the image that will be very inefficient to move around an image all the time. And then we just publish a message saying new image that will be the tag of the message. Then somewhere we can start friends notify our consumer or subscriber
and this one will listen on the new image event and it will notify the friends. From the message it will get the user data and the image and will do something with that. Then we have a points manager that will add points to the user and a resizer. The point here is that any of these processes can run on their own
and we can fire many of them or just one or none of them by using messaging. And actually let's say we only want to give points to users at night when there are no users on the website so we have the load is low so we can run more background workers for example.
A messaging system that implements queuing will queue all this messaging while the other end of the network is offline in this case the points manager and we can put it back online at night and it will process all that queue of messages. So there are many advantages if we decouple the architecture using messaging
like that being one of them, scaling and so on. Also any of these consumers could be implemented in any language things that don't need to be in PHP as was the example or Node.js or whatever. And the second implementation for that, there is no second implementation
we just deploy the first part of the code, that's it. We don't really care about new requirements, of course we care but the thing is we can add them on the go. So that was the example I would like to use later in the talk
it's what I implemented actually to demo this concept. But now maybe you don't know what RabbitMQ is. So RabbitMQ is a multi-protocol messaging server this means RabbitMQ at the moment supports three protocols
AMQP which is the standard and the most supported one but also it supports MQTT and Stomp. Depending on the protocol is what you can do with them but there are many different use cases you can do with each of them.
It's open source under the Mozilla public license. It's polyglot in the sense that you can connect to RabbitMQ from many languages. If there is a client for any of these protocols in your language of choice then you can use or interact with RabbitMQ.
And also now since I don't know how many of you follow the Erlang community but RabbitMQ is written in Erlang and recently not recently but in the last year let's say there is a new programming language for the Erlang virtual machine. Erlang works similar to Java like you have the JVM or the CLR in .NET
that you have this virtual machine and then all the languages on top like in Java you have Java itself or Closure or Scala or whatever. In the case of Erlang now there is a new one called Elixir which is very similar to Ruby in syntax. And you can also write plugins and stuff for RabbitMQ using Elixir
so you can even mix and match languages at that level if you wanted to extend Rabbit for example. Then yes as I said it's written in Erlang OTP what do you care or what should I care that it's written in Erlang because Erlang is a language made specifically for high concurrent applications
and has message passing embedded as the main way of coding in Erlang basically. So it's very easy to write servers in Erlang. Then OTP is the open telecom platform. What does that even mean? I don't know. But the thing is the open telecom platform has a set of patterns
that you want to have if you create a distributed system. For example in RabbitMQ whenever Rabbit is reading from the network it has an Erlang process to read from that particular TCP connection. There may be many of those processes listening on the network
and then there is a supervisor which is an Erlang process that supervises all those small processes. If you send wrong data over the network then that particular process will crash but you don't want to crash the whole server. And you don't want to care about restarting the worker that worker that is reading stuff from the network.
The OTP framework will provide you with this supervisor pattern that knows how to restart a worker knows how many times to restart a worker what will happen if the worker keeps dying and dying and dying maybe shut down the whole application maybe shut down this family of workers and so on.
The thing is this stuff doesn't need to be implemented by the RabbitMQ developers so less code, less bugs. When we adopted RabbitMQ at this date in the company back in China that was one of the things we liked about it. Beside that we had already deployed Erlang and knew it worked pretty well under high load
we knew all the advantages from the language that they don't need to be written by the Rabbit guys. Then the multi-protocol I said already ANQP is called Advanced Messaging Queuing Protocol and in the advanced part it actually has a lot of options.
So when you think about messaging you probably want to send a message and be confirmed that the message arrived to the other end or maybe you don't care. Maybe you want the message to be written to the hard drive or you don't care, you don't want that because you want a faster messaging. As a consumer of messages you may want to tell the server
whenever you send a message I will acknowledge back that I processed the message so please don't delete the message from the queue until I confirm that I processed the message. Or on the other end you don't care you can say send me the message don't care anymore about this message. All those options are in ANQP.
Because of that it could be a quite heavy protocol for a small device like in the Internet of Things area. IBM created a protocol MQTT for all these small devices with low battery and so on and RabbitMQ supports MQTT.
And it also supports Stomp. Stomp is a text-based protocol very similar if you know the Redis protocol or main cache protocol. It's a very simple Stomp text protocol. And it's the one that CERN uses when they talk to Rabbit for example. So they prefer to use Stomp
because also with Stomp you can integrate with other brokers that's another story. If you want to know more one of my colleagues wrote that blog post on the VMware website. Then I said Polyglot before.
You can use it with PHP, Node.js, Erlang, Java, Ruby, .NET, Haskell and many more. Like Closure, a guy wrote recently how to use it from Korba, Delphi, whatever. There are all these protocols and each of them has many clients. The one with the more clients
I guess is probably AMQP and Stomp because Stomp is very very simple. Also sometimes I get asked by people is there anybody actually using this RabbitMQ thing because I don't know some people think it's a research toy or something.
Anyway, Instagram is using RabbitMQ and they are using it clustered across many data centers inside Amazon. And actually one of the developers from Instagram tweeted that some months ago when there was a data center going down for Amazon,
RabbitMQ and Cassandra were the only two services that survived this whole data center blowing up. So that was pretty cool to know basically. Indeed.com is like a show of search website. They are using Rabbit. Mailbox app, this application for reading email that got acquired recently.
It was everywhere on the press. And MercadoLibre. Maybe you've never heard about MercadoLibre but this is the 8th biggest online retailer. They are the eBay of Latin America. I think Latin America has 600 million people. Mexico, Brazil, Argentina and Uruguay where I'm from.
I'm pretty sure most of the load comes from Uruguay but that's another story. They are all using MercadoLibre. RabbitMQ. And if you want to get it that's the address to get RabbitMQ. Now the current release is 3.1.1 actually, not 3.0.4.
So the next question is how we can start using messaging today? And now is where I talk about Cloud Foundry. So Cloud Foundry is another product that now is part of Pivotal, before it was part of VMware, the same as RabbitMQ, the same as Redis and many other products.
For those of you that don't know, EMC and VMware, they put many of their products together into a new company called Pivotal. RabbitMQ and Cloud Foundry, Spring Source, Redis and so on all belong to that new company basically. So why is Cloud Foundry good for messaging?
Apart from that I work there basically. So a key aspect that Cloud Foundry has. Cloud Foundry supports many applications per account. So Cloud Foundry, I don't like to say that but I have to say think about a Heroku that you can deploy on your own data centers for free basically.
So Cloud Foundry is on the path level and it's open source so everything that I'm mentioning here you can download it, build it and deploy it to your own data center and that's what many companies are actually doing instead of using .com offering basically. Anyway, when you create an account on Cloud Foundry
this account can have many applications. So what? Right? Then it supports many services per account. You can have many RabbitMQs, many MongoDB, Redis and so on. So what again? But the cool part is that you can share all these services across your apps.
So you can have many applications or talking to each other by using RabbitMQ as a message bus or synchronizing data over MongoDB or Redis for example. And RabbitMQ is supported by default.
So, for this example I created an application called Clusteram which is just a clone of Instagram basically or clone wannabe I would say. It's a bit going too far to say it's a clone of Instagram. So yeah, calendar. It has real-time updates
so the idea of this app is that you have your profile there you upload images you have friends that follow you you follow them, some of them, whatever and whenever you are on the website and you upload a new image then all your friends will see a real-time update that you got this new image
for some definition of real-time of course like real-time embedded and I don't know how many of these terms don't mean anything to the even cloud what does it mean? But anyway, there are many image fields. For example, you have the latest images you can see all what's going on on the website
there is a feed for logged out users so they can actually get to see something they don't need to register basically and then they log in user images so in your own timeline you will see what you and the people you follow have posted. So let me just show you a bit.
This is Clusteram you can upload the images you can use it if you want please don't upload anything inappropriate if you go home, let me see if this works you can register if you want create username, password I won't do anything with this username and password
it's just so you can have a profile basically and then you have the profile you can see that I have four pictures no followers, nobody following and so on that's basically it. So, what's behind this Clusteram?
So it's deployed in Cloud Foundry it's using RabbitMQ for all the synchronization of data and moving data around it uses Redis to store all the metadata related to images and users and MongoDB for the grid file system
to share the actual binary of images and the real-time stuff on Node.js is implemented using sock.js So, in Cloud Foundry you can have all these apps or many instances of one app running there
so to share data around, as I said I'm using Rabbit then at some point I decided to separate the workers from the core app and they are also running as a separate app so that's the frontend, that's the resizer in Node and then just to see if this actually works
I rewrote them in Clojure, deployed that and the resizers are in Clojure so the cool part is I could just do this swap of technology without actually having to shut down something or anything just deploy the Clojure resizers and that's that
So, the MongoDB as I said is used for the GridFS image storage that I share across all the app instances then Redis has many keys there with different functionality
or purpose for example, with the user colon username so, Alvaro or whoever you will have your profile stored as a hash then with your user ID and images we keep a list of all your images with all the IDs of the images
not the actual binary then there is the image count of that particular user then there is the timeline which has the IDs of all your images plus the IDs of the people you follow and then there is this latest images list
so all those lists are kept up to date whenever there is a new image stuff will get pushed there basically then SOCK.js is used for it has two arrays let's say
to keep state in the first one I just have an array a JavaScript array where I keep all the connections the WebSocket connection from all the anonymous users and the hash is used because each user by its user ID has an entry there whenever you are connected to the website
so, if I need to broadcast that there is a new image to all the unknown users then it's just a loop over this array on the top and if I need to tell a follower that I have a new image then I will pick that particular follower
and send the image there via SOCK.js and the same for my own user so whenever I do the upload this should appear right away so, to get everything together I'm using RabbitMQ so, how does RabbitMQ work?
for that I want to give you a small demo I should have shown this slide actually and the demo is on the RabbitMQ simulator so this is a tool I created also to explain
how RabbitMQ works because I think showing static images is bad visualization or it didn't happen basically so, in RabbitMQ you want to have a producer of data and a consumer or in any messaging app and you want to get your data from here to this point
to do that you need an address the address is the exchange so, what you do in RabbitMQ you usually send messages to the exchange you send the messages, nothing happens why? because actually you want to keep the messages in a queue
soon we will see what the point of having this exchange in between so if I click send now I will get three messages there still nothing happens on the consumer side because I haven't subscribed the consumer to the queue
once I subscribe it you can see they start getting the messages also at that point the consumer was offline let's say and the messages still got queued because that's the whole point of having a queue and if I add a new consumer for example
RabbitMQ will do the round robin for us it's not exactly round robin because if there is a consumer that finished before another one that consumer will be ready so actually RabbitMQ will round robin across consumers ready
because there is no point of waiting for someone when it's actually processing stuff I made this clear because we had this exact question like two weeks ago on the mailing list so that's just how to get the data from one place to the other
and then if you add a new queue here for example and click send you see the messages go to both queues so what the exchange is doing is basically getting all the queues that are bound to the exchange all these blue circles
and then sending the message to each of them inside the server there is no message copy also that's a very common question and very important question because you want to know if RabbitMQ is just using your hard drive or memory for for replicating data but Rabbit will keep only one copy of the message
but the metadata will say it lives in this queue, in that queue and so on but think about the exchange as when you have an inbox in email and you have rules and I don't know, you get a message from your boss goes to the trash or somewhere there closed then some other message from NTC Oslo you will put it on the top and whatnot
so you the messages will go directly to the inbox but they will end up in a separate folder that's the first abstraction you get from the exchange besides that there are three types of exchange direct, fan-out and topic and each of them implement a different routing algorithm
so basically the exchange depending on the type will say this is how I want to route messages in this case we have a direct exchange so if I come here and set a routing key like Oslo now I send a message the messages don't get there
the default routing key is the empty one I'm showing here binding key because if I show something empty we cannot see anything actually but in fact there is an empty routing key there now if I say Oslo and send the message only goes there
so basically what the direct exchange is doing this one is to check the routing key or basically saying select queues from the queue table where routing key equals binding key something like that it's not like that because routing queue doesn't use an SQL database
below basically but that's the idea of the direct exchange if the routing key doesn't match it won't get the message then there is the fan-out exchange which doesn't take into account any routing key
so this is like a direct exchange where the routing key always matches basically there is actually no query on the routing key so it just gives me all the bound queues to this exchange and finally we have the topic exchange
which is the most advanced one and the one where you can implement cooler things I would say so if I send now with the Oslo routing key you can see that only this queue here got the messages so that's just a direct exchange with a fancy name maybe I don't know not really
when I have a binding let's say you have let's say a login example you have a server one then application one module one and info so
let me maybe start again so it's easier to see it in action so I have a queue here I will change the binding key to that and then I have a producer and I send this nothing fancy
what we have here is a word separated by a dot that's the whole thing what happens if instead of sending messages from server one let's say we are logging all the logs going centralized via Rabbit I start sending messages from my server two
it goes nowhere of course we can create a queue bind it come here put server two but that's what we saw so far I mean there is nothing special so here is where the patterns start to take place so we can add like a star
or asterisk sorry I changed the queue name not the binding key this here star change what did I do wrong?
I don't know what happened so the idea of the just to move on the idea of the topic exchange is that you can have these patterns where you
decide okay I don't want to match by this particular word I want to have a star to match that word or not then if you have like a hash for example you should match everything that's there I don't know what's going on anyway live demos they never work
and the hash will match more one word or more so we have these patterns which were separated by dots if we use the star we can swap them for one word or if we have the hash for one word or more sorry for the broken demo anyway so
those are the three types of exchange we have if I don't change the exchange type it's difficult that this will work that's the whole thing there it goes so it was me doing the wrong demo basically so let me show you one more time
change that and yeah that queue will get the message the other one gets the message of course you can mix and match all that you can put the you don't care about the log level so if I send here
error then it will go there and so on and finally yes if you want to get every single log you can set hash here or even like hash error so you don't care about the modules the application name
or the server there are all these things you can do with the topic exchange so for login for example you can have all these queues that you can see they have like an auto-generated name here those are options that ANQP will provide you like if you don't really want to think about a new queue name because you just want to listen
for the logs coming there you can do that and so on there are way too many options but you can say like all errors for example and have this queue like that anyway the whole message is you have producers you have exchanges which are different routing algorithms
and at the end you have consumers that will take the messages from the queues in this particular case RabbitMQ is very extensible you can add your own exchange types but you need to do that in Erlang your neighbors from Sweden they know a lot about Erlang I don't know in Norway because Erlang comes from Sweden
anyway enough talking any questions so far? sorry? so the question is if there is no subscription what is going to happen to the message right?
so by subscription you mean having the consumer or having the queue? so if you don't have a consumer what happens is the the messages start to get queued here your limit is the hard drive basically
it's not memory bound RabbitMQ has an algorithm that will try to keep as many messages as possible in memory but at some point you will raise a memory alarm and then it will start paging that to disk but the idea is to try to not touch the hard drive
but at some point you will so if there is no subscription you will get all the messages queued and you can reject messages from the consumer you can acknowledge them and there are many things you can do there
so what's the architecture of this cluster RAM? for the image upload we have many users they can even come from be served from many app instances so we can have many Node.js servers running there they all will send messages to the cluster RAM upload exchange
so whenever there is a new image upload it will be a new message there then there is a resize queue and this queue will have 1, 2, 3 or whatever amount of resizers you want to run the point is cloud messaging
did you put this on the title just to get us all in this room? or is there any cloud messaging here? in cloud one important concept is to have elasticity and competing consumers is the name of this messaging pattern it's just that you can have
one box with 10 consumers it will make sense to have probably some relation of your computer course with the consumer name or amount but then you can have more boxes and all of them have this consumed from that particular queue also with RabbitMQ you can do
high availability, you can mirror queues or replicate the message content let's say across many servers and this state will be known for all these queues so what happened there? there is a new image event so the resizer finished doing this resize and it sent a message
cluster RAM, new image so now we have the actual resized image so one consumer will grab the message and will add this image to the user to the user data in Redis so that add image to user queue we just have consumers
getting this data, put it into Redis new image queue will send this image to this offline, everybody sees what's going on list in Redis and image to followers queue will grab the message and will send it to the followers to the list of
images for the follower remember that there was a list in Redis that kept all my images plus those of the people I followed, so there we do this other update another point that to take into account is that we have producers and consumers they can live in the same process sometimes people think that ok
I have one process that is only producing another is only consuming that's not necessarily a requirement here for example the resizer one is actually the same resizer that was there in yellow, so we have the same process, getting message, sending message just to make that clear
and finally we have the add image to user consumer will broadcast a new message knowing ok now the user has an image, ok now we can broadcast to everyone that we have this new image so so JS will grab this event and will send it via WebSocket to the uploader
also to the anonymous users and to the followers so everybody will get this new image, so also here it might seem like very simple but imagine you have a WebSocket server and you have all the sessions connected to that particular Node.js
server but then you add the elasticity to the app and you fire more Node.js servers so if I was connected on the first one and then you come later and load balance to let's say the server number 2 you are not actually in this array and hash of the first server
I mean the first server has no clue that you are there so I can implement some complex replication algorithm I don't want to know about or I can put a queue in between with and grab it and then all these consumers will get it if they know about this particular user in sock.js I send the message else I don't care
so it's like duplicated effort but it's a very simple way to broadcast this across all the app instances so you need to decide what pattern you want to implement basically. So for Node.js I created the most basic
and ugly DSL to send the messages so whenever you want to create a consumer you define a variable you specify the exchange name where you want to get the messages you give a queue name a routing key in case you need one
and the callback function so this callback will get a message the headers of the message and some delivery information and there you will put the callback code like resizing the image sending putting something into Redis whatever the point here is that
whenever you use RabbitMQ you will need to create a connection to the broker you will need to open a channel RabbitMQ has an architecture or MQP of having many channels per connection if you have a language that supports threads or like L lang or C sharp or Java whatever
you want to have one channel per thread for example in PHP this doesn't make any sense because there is only one thread anyway you will need to open the connection, open the channel, declare the exchange, declare the queue and do the binding all the time probably. So if you don't want to do that you probably want
to implement a library not like that one because that's probably the most ugly hack there you maybe want to have a proper object oriented thing that you pass all these parameters and give you back this thing but you want to do something like that whenever there is a message you call a callback that's what you care, you don't care about the exchange
that's only a one time decision the same with the queue name and the routing key what you care is just your callback code and the cool part here is that if this callback code is much easier to test for example and to decouple because it's only that what you need to call so you have functions that will get messages and produce something, it's very
simple at the end of the day the concept of messaging and if you do Node.js you already know about all events and callbacks and whatnot so it's not new and then this library is called Samper and it has a method called startConsumers and it will loop
over all of them of these consumers you pass and will create the queue, create the exchange do the binding and so on so if you want to see the code it's there BitBucket and my username and the name of the project is in BitBucket because there is where people put stuff when they know
they don't want anyone to see if you want to make your code public you send it to GitHub not necessarily because BitBucket is wrong because in case of GitHub I have many followers and then people as soon as I put something they will probably follow it or whatever and I don't want that and yeah, that's why it's there
and if you see my user there there is also the resizer for closure so you can see both bits of the code by the way you ask the first question what's your name? Sorry? What's your name? Ali, you want the book?
Get it afterwards? I just forgot usually whenever I have a book on hand I give it to the first guy asking a question oh girl, not necessarily a guy, anyway so maybe you want to see code before I get to the coda of the talk
so let me get this here so the app is a very basic express.js application in Node it has all the beautiful callbacks and all the stuff people love in Node
and what I'm doing basically is to get setting the MongoDB connection if this works, RabbitMQ connection if this works, I start the consumers for the user modifiers and I start the resize consumers and then set up the server, that's all we need to actually care
so whenever we get a new image where is this? TextMate has
a word combination of keys for hiding the thing on the side anyway, so I do the image upload, this will be the route call whenever there is a new upload some sanitizing, whatever stuff there and at some point I do the storage on MongoDB
and get the callback and if this succeeds, I just create a JSON structure with the username, file name comment if the user added a comment and the MIME type and then Samper this mini DSL will publish a message to the cluster and upload exchange
with the JSON stuff we sent and no routing key and make JSON image is just that it just returns a user ID and so on it's very simple, so I'm just passing JSON around basically so the next question then is to see who is actually
bound to this exchange so if I search we have this resize consumer has a callback we'll get the message, headers and delivery info and it will read the image back from MongoDB based on the message file name it will do the resize
and once it's finished the resizing will send a new message called cluster and new image and this thing is there the add image to user consumer just has a callback that calls a library, add image to user, user ID and so on
and this is just an abstraction over Redis it just appends something to a list basically that's all there is to it and if this works then we can finally broadcast that we have a new image
that's all then there is also the new image consumer that will add the latest image to the user then another consumer will add the image to the followers and so on if you are paying attention you can see that I'm actually cheating
here because I have all this code in the same Node.js process, you could take them apart and create one consumer for each task and then just start as many of them as you need that will be the right thing to do and then at the end of the down there I have
this thing starting all the consumers, then we have this broadcast let's see where this is called so this will interact with the sock.js server
which is in this object called broadcast and in this case will send to the actual uploader as the image that there is a new image basically then to all the unknown users it will also do a broadcast and the same for my own followers
and at some point I start all those consumers so that's basically what I'm doing there if you want to see what's inside Sampler, I don't remember but yeah basically I get the RabbitMQ connection
I set up what will be the consumer function for them, yeah this is the meat of everything, whenever we install a callback we basically get an exchange, then we get a queue then we bind the queue and then when there is the queue is actually bound we subscribe
for whatever callback function we pass there, so all this code just because we want this function called whenever there is a new message to avoid all that I suggest that in every language that you use Rabbit you want to have your own DSL basically and to send a message
we have the exchange name, message and routing key here and the same, we get the connection we get the exchange and we send the message there just to make this clear you don't need to create the exchange all the time or the queue or the binding you can have one cron shop that just runs once and sets
up the whole topology for example, or you can do it all the time just to make sure that the stuff is there depending the paranoia level basically, and that's that I mean then somewhere I have the shock.js code which is this
it will keep a connection object and whenever there is a broadcast it will send it to the unknown client or to the user there is a message to send to the user and another one to send to the unknown user
it's very basic code I could try to see somebody magical uploaded an image and if I login, let me see desktop
that's the real time, so that actual image it can be a shock.js and web socket to the browser and if anybody was actually following me and were on this page they should get also this image popping up there
which if you have a lot of followers will be probably the worst user experience I imagine so, anyway and so that's Clustering if you want to have any comments or whatever you can send them to in this big bucket, probably open an issue
I don't know, at some point I will take care of cleaning the code and putting it in GitHub but anyway, coden messaging with messaging we can scale not necessarily scale up or horizontally we can also scale to new requirements, we can have one code deployed there and then add new stuff without
redeploying everything, we can change the technology, swap the language and if we don't need to have so many workers we can also scale down, that's very important if you want to save money then you have all this decoupling yeah, in my example I have all the code in the same node.js project but you can
split it apart and if you see on Bitbucket you will see the part that is actually now in Closure and you have all this polyglot stuff that you don't need to care if you implement your own thing you can talk from different protocols for example in the iPhone the iOS chat from Facebook is
using MQTT, that's a very good use case for this tiny protocol and so on, it really depends where you are what protocol you want to use and RabbitMQ can offer all of those and in the case of Cloud Foundry you already have all this stuff there, so
if you need to have elasticity for example in your cloud thing you don't really need to think about how you will do all that, I remember back when I was working in Uruguay we had an in-house made queue on top of MySQL I mean you don't want to know how to debug that and
what happens when one consumer crashes and then did they consume the message yes or no and whatever, it's always polling the database every one second every one minute or every whatever but it's always polling it will not get the data in real time for some definition of real time as when RabbitMQ
will push that and so on and in the case of Cloud Foundry we'll do all the heavy lifting for you we'll maintain MongoDB, Redis RabbitMQ, whatever service for you it supports all these multi-applications per account multiple services and yeah if you want we can say we can do cloud messaging there
so thank you very much questions? there are no more books yes, ok
let me see if I understand how you make this change redundant that's the question
so RabbitMQ supports three ways of doing high availability the most basic one is using L and clustering, that is you can have many RabbitMQ brokers living even in separate machines where the state is replicated all over, so this change will live in all
these servers that's the most basic one if you care about the messages and the data then you can use mirror queues so when you declare the queue you can tell Rabbit is a mirror queue and there are many strategies on how many mirrors there are and how to choose the master, what happens if the master
goes away, like new master election, all this stuff but yeah you can do this kind of replication then there is a plugin called Shovel that can also replicate messages across one area network and then there is a federation plugin which also can do federation of
queues and so on, so there are many options depending on what you want to do, for what I know Instagram is using the mirror queues across many data centers in Amazon, that's what they do but depends on the use case questions? Okay, thanks
and you have the book