Modernizing Legacy Messaging System with Apache Pulsar
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61625 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Physical systemVirtual memoryStreaming mediaJava appletLocal GroupSoftware developerSlide rulePoint cloudComputing platformSphereOpen sourceProduct (business)DisintegrationData managementDatabaseService (economics)Group actionEvent horizonSystem programmingSoftwareApache MavenArchitectureDemo (music)Core dumpMessage passingQueue (abstract data type)Device driverData typePay televisionOperations researchObject (grammar)Client (computing)Configuration spaceData modelInformation securityLimit (category theory)Component-based software engineeringDatabase transactionVideo gameContext awarenessInjektivitätStandard deviationTask (computing)Point (geometry)CodeWindows RegistryData storage deviceScalabilityMultiplicationReplication (computing)GoogolSoftware frameworkFunction (mathematics)Process (computing)Computer clusterLink (knot theory)InformationSoftware developerJava appletPoint cloudCASE <Informatik>Directory serviceObject (grammar)Data storage deviceInterface (computing)Message passingLatent heatBitQueue (abstract data type)Integrated development environmentFactory (trading post)Connected spacePhysical systemSlide ruleData managementPoint (geometry)Client (computing)Web 2.0Scheduling (computing)Projective planeComputer architectureOpen sourceCore dumpSoftware repositoryMultiplication signInjektivitätDatabase transactionService-oriented architectureType theoryInformation securityStandard deviationComplete metric spaceFile archiverReal numberPay televisionComputer fileOperator (mathematics)Revision controlMappingQR codeComputer animation
09:20
Open sourcePoint (geometry)Point cloudMultiplicationScalabilityData storage deviceComputing platformReplication (computing)Object (grammar)Windows RegistryGoogolSoftware frameworkMessage passingFunction (mathematics)Process (computing)Client (computing)Virtual memorySurjective functionCodeData modelScale (map)Channel capacityDistribution (mathematics)Data recoveryEvent horizonDisintegrationArchitectureEuclidean vectorComponent-based software engineeringAerodynamicsEnterprise architectureAdditionMetadataService (economics)Interrupt <Informatik>NamespaceAsynchronous Transfer ModePay televisionType theoryPartition (number theory)LogicLevel (video gaming)SpacetimePhysical systemConfiguration spaceShared memoryExclusive orSingle-precision floating-point formatQueue (abstract data type)Service-oriented architectureLibrary (computing)Computer networkFactory (trading post)Context awarenessString (computer science)Software testingDemo (music)OvalContent (media)System administratorKeyboard shortcutVirtual machineLocal ringStreaming mediaFreewareNamespaceConnected spaceSet (mathematics)Factory (trading post)CASE <Informatik>Context awarenessComputer architectureJava appletData storage deviceSubsetType theoryClient (computing)NeuroinformatikScalabilityQueue (abstract data type)System administratorMultiplication signException handlingMessage passingMereologyParallel portComputer filePay televisionBitEndliche ModelltheorieStandard deviationData managementMixture modelSoftware frameworkCodeWordFunctional (mathematics)C sharpComputing platformReal-time operating systemLevel (video gaming)DatabaseConnectivity (graph theory)MappingPlug-in (computing)Video game consoleService-oriented architectureProcess (computing)Different (Kate Ryan album)Cartesian coordinate systemEnterprise architectureServer (computing)Asynchronous Transfer ModeCoordinate systemTransformation (genetics)LogicImplementationKeyboard shortcutObject (grammar)Demo (music)
18:34
EmailDynamic random-access memorySynchronizationGamma functionJava appletIndependent set (graph theory)View (database)Computer wormWechselseitige InformationCodeMessage passingScheduling (computing)Queue (abstract data type)PhysicalismReal numberCartesian coordinate systemFactory (trading post)Type theory
19:48
Component-based software engineeringServer (computing)Physical systemBlogService (economics)Scripting languageProcess (computing)Demo (music)Configuration spaceCartesian coordinate systemMathematicsInstance (computer science)Noise (electronics)
20:38
Mobile appFluid staticsSocial classCore dumpServer (computing)Coma BerenicesPhysical systemDefault (computer science)Pulse (signal processing)InformationBlogAssembly languageService (economics)StapeldateiInstance (computer science)Loop (music)Port scannerView (database)GEDCOMQueue (abstract data type)Newton's law of universal gravitationCommunications protocolMessage passingEmailString (computer science)Message passingJava appletEnterprise architecture
21:05
Execution unitVirtual memoryService (economics)Java appletSystem programmingDatabaseMultiplicationComputing platformPoint cloudComponent-based software engineeringSource codeServer (computing)Replication (computing)InformationLink (knot theory)Electric generatorProduct (business)InformationOpen setAdditionMultiplication signSlide ruleOpen sourceAuthorizationPoint cloudScalabilityService (economics)Software maintenanceGoodness of fitWebsiteCodeComputer animation
23:56
Point cloudMultiplicationPlastikkarteDatabaseStreaming mediaEvent horizonGeometryPasswordSign (mathematics)InformationService (economics)Right angleSlide ruleTwitterProjective planeFunctional (mathematics)Enterprise architectureComputer animation
24:58
Software developerSoftwareMechanism designEmailAsynchronous Transfer ModeProcess (computing)Software frameworkFunctional (mathematics)Computer animation
25:24
SoftwareJava appletSoftware developerMechanism designVorwärtsfehlerkorrekturSPARCTurbo-CodeCompilerTwitterFluid staticsMathematical analysisEmailFunctional (mathematics)Message passingCodeSoftware frameworkComputer animation
26:17
Program flowchart
Transcript: English(auto-generated)
00:05
All right, we should start right now. Oh, no, no, okay. So, hello everyone, so welcome to our talk and really thank you so much for staying for this long. This is like the second last of the session of the day so really appreciate you being here.
00:20
So today we're gonna be talking about modernizing legacy messaging system with Apache Pulsar. And in here, you know, we have Enrico and then myself too, we're from DataStax. Okay, so but before we start, if you'd like a copy of our, you know, slide deck, here is the QR code and also the short link, if you want it, I'll let you take a moment.
00:50
Okay, well even if you missed, don't worry, we'll be sharing with you our connection info, then you can connect with us, we can always be there to answer your questions too. So with that, let me start.
01:01
First, just a quick introduction, who is Mary? So I'm a streaming developer advocate at DataStax and DataStax is a company based in California. Specializing in Apache Cassandra, managed cloud, and then now we also have the managed cloud for streaming, which is Apache Pulsar. And I am a, I was also a developer advocate
01:21
before joining DataStax last year and I'm based in Chicago, I'm also the president of the Chicago Java Users Group and I'm also a Java champion. And I, before this, I was spending over 20 years or so being a developer myself too. So that's me, and then this is Enrico. Oh yes, sure.
01:41
I'm Enrico, I work with Mary. I really enjoy working with the open source communities, so I'm involved in a few Apache projects like Pulsar, but all the big DataStax or DoKeeper, and also I collaborate with Maven and Curator.
02:00
I'm participating also in some CNCF project like Pravega that is still about messaging and distributed streaming, and also contributed to RDB, that is a distributed embeddable Java database, so. Okay, great, thanks Enrico. I'm really happy today to be here with Enrico
02:20
because we were just working remotely, finally get to meet here in Belgium when he lives in Italy and I'm in Chicago, so. Okay, so without further ado, this is the agenda within 20 minutes, so it's gonna be a little bit quick, but we'll end up having Enrico also doing some quick demo as well. So first let's kind of give an introduction to what is JMS,
02:41
assuming not everybody's familiar with that, so some introduction, and then we'll talk about Apache Pulsar and why Pulsar, and also just quickly describe the Pulsar architecture and how do you do the mapping between JMS and Pulsar, and then how do you use JMS API with Pulsar, and Enrico will show that, and then that's how we're gonna be doing.
03:03
So first of all, just some core concepts too of JMS, and as such, JMS is all about also messaging, but it's very much a Java-centric technology, and it's here, as you can see, it's also published, subscribed kind of model, making use of destinations that it supports cues and topics.
03:24
So messages, producers, consumers, these are typical like PubSub producer, consumer type of pattern, and such is a pattern, but this has its own implementation, and basically too, it makes use of the JMS context, and that will help you with the connections and sessions.
03:41
Okay, so about destinations, so essentially too, it supports both queuing and the topic too, and so it acts as a broker in the topic case, but for queue, so each message is basically, as such, message queue is you drop the message there, and then it gets picked up, and then it's kind of done by the consumer like that.
04:03
It's browsable, this queue, first out kind of approach, and then with topic, it allows for multiple subscriptions too, and message dispatch according to the subscription type as well. And consumer, as far as consumer styles go, you can have blocking,
04:20
which is in the blocking receive methods, and that's all application-driven. And also, yeah, okay, and then there's also making use of the message listener method, which is a JMS driver-driven in that case. And as far as producer styles go, the blocking will be send method,
04:41
or there's also a async send too, and that will be like with completion listener, so that's real quickly. And then as far as administrative operations go, as we know, JMS does not cover administrative operations. And how do you manage the destinations, and doing connection properties,
05:00
all of these things, the defining security models, our resource limits, all of these things, and configure all of these, JMS itself doesn't have to do it. So how do you manage it? It usually relies on your vendor. How do you kind of do all of the management tools through some vendor way of allowing you to do that?
05:21
And so basically too, there's also API also too, it lets you work with administrative objects too. And so basically they're supposed to be kind of also provided by the system as well. And as far as destinations go, there are queue and topic references, and connection factory basically is the,
05:41
is essentially too, using connection factory is the client that allows you to connect to the system in that case. And then there's also JMS, right? The API essentially allows you to interact with Java EE, or now it's Jakarta EE, but back then it's Jakarta Java EE, and in that case you can basically
06:02
make use of EJB components. There's Stateful, stateless EJB, that's used in web surflets, or the JAX-RS, JAX-WS endpoints, and allows you to also do background, like doing scheduling kind of way of doing things.
06:20
And then there's also message-driven beans, so these essentially too is basically they're JMS specific kind of beans to handle messages in there, and it's basically managed by the container, the JEE container. When you receive messages from a container, then it will be essentially be activated in that case.
06:42
So the Java EE container provides support of the lifecycle management, pooling of these contexts, dependency injection of these things, and transaction support, security, standard API, all of these tools, basically relying on the container to do that for you.
07:02
And then there's also too, what about external resources? So a lot of times then, that's how it relies on resource adapters. It allows you to essentially extend the Java EE container in that case. So, and some key points that basically to use it is you need to have the resource archive file, so .rar file that will contain the code,
07:23
and you have to then configure the resource adapter and everything, and it allows you to essentially create administer objects, right? That conforms, these objects will conform to the standard API, and it's implemented by the core inside the resource adapter too. So these are the different packages,
07:42
like basically JavaX.jms, in this case it's I think, in the new version will be Jakarta, but we're still talking about Java, the older JMS in this case, and will be connection factory queue and topic. So usually each objects too are bound to a JNDI, naming and directory interface registry,
08:02
provided by the container. And so it's specific to the container as to how you do deployment too, and that's how it usually works. Now then let's get introduced, right? So now we talk about JMS stuff, it's a bit more legacy stuff. So what are some of the options, right? To kind of leverage on today's more modern world
08:23
that allows you to work in a cloud native environment. But also we want to introduce to you Apache Pulsar, it's an open source platform, and it's cloud native, and it supports distributed messaging and streaming too. And as such too, this is the link where you can kind of find out more information
08:40
or this is actually more the GitHub repo. So wanting to highlight it, because we may not have too much time, but basically it's very cloud native in nature, it's born with the cloud native DNA. And it's basically, the key point of it is that why do you want Pulsar? I think at least one of the key point,
09:02
it separates out the compute and the storage. So basically Pulsar can focus more on working with the messages, delivery, right? Dealing with all the messages coming in, delivering all of these things. And then you have a whole laundry basket of all the log messages, then what do you do with it? Rather than dealing with it, Pulsar said, let me get bookkeeper to handle it for me.
09:22
So that way Pulsar can focus on that, just the messaging part and coordinate with the bookkeeper. So that's what it does. And it also supports multi-tenancy, and that's a very nice way of helping you to organize all of your messages, as well as some features that are more kind of ready for enterprise level, geo-replication is also a major thing in that.
09:43
And also it has what is called tiered offset. It's basically if your messages get code, right? In bookkeeper, you don't want it to take up too much room, then you want to move it to, or actually I should say, it gets kind of in the warm storage and you want to move it off to cold storage. So all these is Pulsar has built in and it knows it.
10:01
So native Kubernetes support all of these things, schema, it has a Pulsar schema, connectors, and you can use basically a Pulsar IO framework to build different connectors. And currently we're supporting like almost 100 different kind of connectors too in there. Message processing, you can use the Pulsar functions framework, so you don't need to use anything outside
10:22
to do message transformation as you're building your data pipeline. And also the nice thing too is that it doesn't restrict you to only using Java as your client. You can use other things like C++, Python, Go, and other community contributions to such a flow. There's also Node.js, also .NET, C Sharp client too. So that's really flexible
10:42
and really functioning real well in Pulsar. So let's kind of really quickly kind of take a look. I already mentioned some of it. Essentially too, it's a blazing performance. That's what we all want, provides you with true like real time type of processing. That's why we want it, right? It's basically millions of JMS messages can be handled if you have JMS leveraging on such a platform.
11:03
So it's all good, horizontal scalability. If you expand your infrastructure, adding more servers and nodes and all of these to it, Pulsar will handle that for you. You don't need to rebalance all of your topics and you don't need to deal with offsets, right? Such as in maybe like Kafka, things like that. It has its own way
11:20
so then you don't have to worry as a developers, worrying about all of these infrastructural thing. So all of these things are just listed here. I know there's a lot of words in here, but it allows you to kind of get a bit more into detail and we can share with you this thing. So let me pass this on to, actually let me see. Oh, let me kind of quickly, I thought this was on.
11:41
Okay, so just a really quick basic architecture. This kind of pictorially described to you what I just talked about. We only had so little time. So this is just describing to you, right? Producers, consumers can be written in many different languages, not just with Java. And it gets managing by bookkeeper that deals with all of the storage side of things.
12:01
And very dynamic, as you can see. This kind of quickly summarize in picture what Pulsar can do for you. Okay, and then here, just quick summary, Apache Pulsar. Again, take mixtures of a pops up type of architecture, right, and that's what it is. And supports like multitenants, namespaces. Different subscription modes do that.
12:21
You can also leverage on that, essentially turn Pulsar into a queuing kind of capability if you use an exclusive type of mode to do subscription. And what other thing? Yeah, so there are different modes. It's just highly flexible, is what we're trying to tell you about this story. So here, we have a recall. Actually, a little bit, sorry about that.
12:42
We can talk more about it later. Yeah, so I just want to map Pulsar concept to GMS. GMS is pretty straightforward. So the model is quite flexible because it deals with queuing, but also pops up. And in Pulsar, the mapping is really natural
13:04
because you can map a GMS topic to a Pulsar topic, whatever it is, a Pulsar standard topic, partitioned topic, virtual topics. Or a GMS queue is like a Pulsar shared subscription. And the GMS is like a Pulsar message
13:20
with an envelope and with the body. So in GMS, we have several consumer types. So I'm not going to enter the details, but there is a subscription type that matches the GMS requirements. One important thing is that if you want to use GMS with Pulsar, you don't need to install any additional plugin
13:42
because the GMS API is built over the standard native Java client because the Pulsar features are a superset of GMS. So it's only about implementing an API. As in JDBC, you have an API
14:00
that allows you to connect to every database. In GMS, you just have to implement the API and follow the specs. If you want, you can deploy a server-side component just to push some of the computations. So for instance, in GMS, you have filters. You can filter the messages. So if you want, you can filter them on the broker.
14:22
Otherwise, you can simply filter them on the client side. I'm just showing some examples of how to use Pulsar with GMS. Maybe if you are already familiar with GMS, that's pretty simple.
14:41
So in GMS, you start with a connection factory. So we have a Pulsar connection factory. And this is GMS 2.0, and you can get a GMS context. You get a reference to a destination. This is create queue. Create queue is not creating a queue. It's creating a reference to queue because GMS doesn't deal with
15:01
that administrative operation, as Mary said. You create a producer. You can send as many messages as you want. And if you want to consume, you create a consumer, and you can use receive or set MS listener. This is from standard Java. If you're using Jakarta or Java Enterprise,
15:21
actually, yes, I've been helping a few companies to migrate from Java Enterprise to Pulsar. So I know much more cases about Java Enterprise more than Jakarta, but that's it.
15:40
So for instance, if you want to write and you have an Enterprise Java bean, then you can ask to the container to inject the connection to Pulsar. And this is a standard Java Enterprise code. So this code runs with active enqueue, with the tbco, with whatever you want,
16:01
whatever you are running. And the container injects the connection factory and the destination. And you can, as in the standard Java code, you can get a reference to the JMS context, and then you send. We will see later how the administrator, for instance, with Apache Tomy connects all the parts.
16:23
The consumer, usually in Java Enterprise, you use message-driven beans to consume from destinations. And so, yes, this is a simple message-driven bean. You configure all the relevant things that you want.
16:42
Usually you configure the destination, that is still a logical name, and subscription type or the parallelism or the kind of things. In many containers, you can configure the things on other descriptors, usually in XML files.
17:01
You implement a callback on message. Every time a message is dispatched to the application, the code runs, and if everything goes well, the message is acknowledged to the Pulsar broker, and it won't be delivered anymore. If there is any exception that is thrown, Pulsar will deliver again the message.
17:24
In Tomy, there is a very simple way to deploy the resource adapter. I'm deploying the resource adapter for Pulsar. So Pulsar RA, you configure the connection to Pulsar. Now, in the demo, I'm using local host.
17:40
And this is the most interesting part. I create a logical queue, so full queue. This is a queue, and I bind it to a physical destination. So the container will create a Pulsar connection factory and also the Pulsar queue.
18:01
The demo is on my GitHub space, so yes, you can run it by yourself. I'm going to use Apache Tomy 8, Starlight for JMS. I'll talk about that later. That is basically the JMS implementation. Create the administered object
18:21
with the same file that we saw, and Apache Pulsar 2.11. So we have one application that consumes, one that produces, and Pulsar will run locally. So let me switch to the console. Oh, no, yes, the code. The code is really simple. This is on GitHub, so you can check it out later.
18:45
So this is the producer. I'm not writing the code that instantiates or assigns some value to the factory or to the queue. I'm scheduling the execution of this method every two seconds, and that's it, very easy.
19:03
On the JMS listener, these are two separate applications. Usually in a real world application, you have some application that produce the data, then you have a pipeline that transforms your data and something else that consumes the data. This is pretty common. So here, on message, depending on the type of message,
19:24
I'm printing the content and message. Here I'm just declaring the reference to the logical queue that I want, and in this case, openajb, that is still Tommy, will resolve the binding with the physical queue via JNDI.
19:49
We are running out of time, so I have a script to run all the demo. The script simply instants two instances of Tommy,
20:00
Pulsar, copies the configuration file, deploys the resource archives, changes some ports because I'm running multiple services on my machine, so there will be conflicts, copy the consumer application to Tommy one, copy the producer application to Tommy two, then start the Pulsar standalone.
20:21
That is a quick way to start Pulsar locally with all the services, but only in one JVM process. Tommy one, Tommy two, and then we will see the logs. So there is some noise initially, because it is installing everything.
20:41
This is Pulsar, this is starting. These are the two Tommy. Actually, we don't see. Oh yes, this is good. So Tommy two is sending the messages, Tommy one is receiving the messages. So it works. It's a very straightforward setup
21:00
and very common way to develop with Java Enterprise. Let's wrap up, two minutes probably. Yes, okay, good. So JMS is very useful and it allows you to switch very easily to another vendor. Usually with JMS, you don't use very specific features.
21:25
Usually, in my experience with JMS, maybe you're using TIBCORE, you're using ActiveMQ. You configure on the container some special flags, but the code usually is pretty standard, yes? So switching to Pulsar is usually easy.
21:42
Pulsar is cloud native. It's scalable horizontally. So like Mary said, really, it looks like a promise, but this is real. You can add machines, remove machines, and the service automatically adapts. Actually, at Datastax, we are running it
22:01
as a service on the cloud. And so this is very powerful, because you can automatically adapt the resource consumption. And also, you can move the data that is not actually consumed to tier storage, and this allows you to really lower the cost.
22:21
It's open source, it's a vibrant community. If you want, you can reach out to me on the community, and there are many people that are very enthusiastic. Pulsar is young, it is only five years old, something like that. But in the past two years, it grew very fast. Because it is really the next generation.
22:44
Maybe someone working with the ActiveMQ, then I did it in my previous jobs, ActiveMQ and then Kafka, and then Pulsar. Now it's time for Pulsar. If you want to use Pulsar, you can use Starlight for GMS. I'm the initial author and main maintainer
23:01
for Starlight for GMS, so yes, feel free to ask me any questions. It's open source, it's on GitHub. Pulsar Connection Factory, if you're using standard Java, there is a resource adapter that works well with many containers, and it's already tested, and it is running on production.
23:21
Okay, and these are, just real quick, if you'd like, get this copy of the slide deck, but otherwise, there are resources in here, community info, references to all the Pulsar information on GitHub and also in our Pulsar site. And also then, just additional information too, with Datastacks, if you're interested, we offer the $25 credit per month for personal projects,
23:44
so wanting to share with you, I know it's not true, open source in that sense, but we do have Astra.datastacks.com, and all of the Astra streaming is our company's supporting this in our cloud, so oops, where did it go? Sorry. Oh, I'm sorry. You tried to subscribe to us.
24:02
Oh, that's right, okay. So how do you contact us, right? So this is the slide just containing information about Twitter handles and LinkedIn, all of these things. So please do consider staying in touch with us. We'll be very happy to answer more questions that you may have, and all you want to share with us, your project idea, we'll be happy to help you. And also on Fuzhe's luck.
24:21
Yes, that's right, that's right. So thank you, thank you so much, and I think then that's all, right? Yes, that's all. So we'll make sure. Okay, thank you. Thank you. Any questions? Yes.
24:43
Sure. First one. All right. The functions in the messaging part, doesn't it make it an enterprise service bus? What, the Pulsar functions? Oh, Pulsar function is a lightweight
25:02
processing framework that usually it's very, is used to enrich the data that you have on your topics. So it's for very lightweight processing. So if you have to do more complicated processing, you usually move to something like Flink or other things.
25:25
But Pulsar function is very useful when you have to really process your data, and also it is the base for Pulsar.io that is the connector framework. So basically in Pulsar you can deploy on the Pulsar cluster your code
25:41
that transforms your data on your topics. Yes, it starts from a message on Pulsar, and usually it ends with another message on Pulsar. So it's really useful for transforming the data that is on Pulsar or to push your data outside of Pulsar.
26:03
I don't know if this answers. We need to continue. Oh, yes. There is a question over here. If you want to have a discussion, and also on Fugee Slack, you can have discussions with people, but really at the top there, Mary.