We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
RedisJSON
Subtitle
A document DB in Rust
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Over the last decade, Redis has become one of the most popular NoSQL DBs delivering on the promise of high throughput and low latency. What started as a pure C code base is gradually being augmented with Rust due to the trifecta of safety, concurrency, and speed. A primary example is thre RedisJSON module which turns Redis into a document DB. The talk outlines the principal architecture of the re-implementation of RedisJSON, the challenges encountered and the solutions for these. The focus is on the practical aspects rather than conveying theoretical knowledge. A comparison with other open source document DB concludes this presentation, concentrating on latency and throughput aspects.
Point (geometry)Order (biology)CodeString (computer science)2 (number)Computer animation
FingerprintSoftware maintenanceArchitecturePhysical systemOperations researchKernel (computing)Information securityForm (programming)Computer architecturePerspective (visual)Open sourceSoftware developerFingerprintDomain nameComa BerenicesDreizehnCartesian coordinate systemBitFocus (optics)Point (geometry)Slide ruleFilm editingPresentation of a groupComputer architectureInformation securityInsertion lossMultiplication signError messageWebsiteSoftwareComputer animation
Buffer overflowOpen setLatent class modelConnected spaceProgramming languageReading (process)DatabaseSemiconductor memoryClient (computing)In-Memory-DatenbankComputer animation
Data modelEndliche ModelltheorieData structureGraph (mathematics)Set (mathematics)Endliche ModelltheorieData typeCodeInterface (computing)CuboidQuicksortImplementationGeneric programmingServer (computing)Raw image formatSearch engine (computing)Semiconductor memoryMultiplication signCloningDatabaseGoogolCartesian coordinate systemModule (mathematics)Bookmark (World Wide Web)Time seriesRight angleTouchscreenProjective planeWeb 2.0Graph (mathematics)CASE <Informatik>System callExtension (kinesiology)Computer programmingFunctional (mathematics)Perspective (visual)Module (mathematics)Data dictionaryTraffic reportingRemote procedure callData storage deviceKey (cryptography)Mereology1 (number)NP-hardSoftwareAreaStructural loadPairwise comparisonComputer animation
Set (mathematics)String (computer science)Streaming mediaGamma functionCodeMereologyImplementationLibrary (computing)Generic programmingExtension (kinesiology)Module (mathematics)Client (computing)CuboidMultiplication signData structureComputer animation
Extension (kinesiology)Module (mathematics)Search engine (computing)MereologyFunctional (mathematics)DatabaseFrequencySystem callElement (mathematics)Level (video gaming)AreaRootPositional notationDot productServer (computing)Insertion lossEndliche ModelltheorieOffice suiteObject (grammar)Image registrationLatent heatWebsiteService (economics)Bookmark (World Wide Web)Decision tree learningSet (mathematics)ArmState of matterComa BerenicesMultiplication signFormal languageAxiom of choiceBitObject modelQuery languageModule (mathematics)Computer animation
Formal languageKeyboard shortcutLatent heatRun-time systemKeyboard shortcutImage registrationProgramming languageAxiom of choiceLibrary (computing)Communications protocolTelecommunicationMemory managementFirewall (computing)Module (mathematics)Interface (computing)Client (computing)Formal languageServer (computing)Computer architectureCodeMilitary baseCompilerImplementationSystem callMultiplication signMereologyOrder (biology)MultilaterationRevision controlDevice driverDatabaseSoftware frameworkStack (abstract data type)BenchmarkModule (mathematics)Computer programmingRight angleElectronic mailing listAbstractionNetwork socketWrapper (data mining)TouchscreenArmPoint (geometry)Figurate numberBasis <Mathematik>Tournament (medieval)State of matterLimit (category theory)BitOffice suiteComa BerenicesBit rateLaptopWebsiteService (economics)Semantics (computer science)MUDCovering spaceDifferent (Kate Ryan album)Core dumpEndliche ModelltheorieSequelPlastikkarteFrame problemPresentation of a groupProgram flowchart
ImplementationDevice driverStack (abstract data type)ChainAxiom of choiceServer (computing)Decision theoryOpen sourceMultiplication signTotal S.A.Line (geometry)Memory managementJava appletBitModule (mathematics)CodeImplementationModule (mathematics)Figurate numberCompact space1 (number)Extension (kinesiology)Physical systemPoint (geometry)Military baseFormal languageData managementRevision controlServer (computing)Axiom of choiceEnterprise architectureSoftware developerContrast (vision)Dependent and independent variablesSoftware frameworkWeb 2.0Programming languageElectric generatorAbstract syntax treeNetwork socketCompilerWhiteboardAbstract syntaxWordProcess (computing)Service (economics)Graph coloringCAN busCurveLimit (category theory)WebsitePresentation of a groupEndliche ModelltheorieGreen's functionStability theoryEuler anglesComputer programmingDatabase normalizationRight angleMessage passingBasis <Mathematik>Self-organizationAbstractionArmInternet forumProjective planeComputer animation
Point cloudBenchmarkDisintegrationSoftware frameworkVariety (linguistics)Interface (computing)Library (computing)Client (computing)Extension (kinesiology)Connected spaceInterface (computing)Social classImplementationPort scannerRight angleCASE <Informatik>Thomas BayesProcess (computing)Reflection (mathematics)Line (geometry)Image registrationClient (computing)DatabaseJava appletBenchmarkCodeSpacetimeCuboidINTEGRALMereologyData loggerGraph (mathematics)Software frameworkMultiplication signServer (computing)Pattern languageTerm (mathematics)Physical lawDifferent (Kate Ryan album)NumberCircleWorkloadSet (mathematics)Staff (military)Electronic mailing listCycle (graph theory)Machine codeLibrary (computing)InformationComputer animation
Thread (computing)DatabaseBendingCartesian coordinate systemNumberWorkloadSingle-precision floating-point formatClient (computing)Java appletImplementationPairwise comparisonCodeFormal languageType theoryConfiguration spaceMereologyMassServer (computing)Virtual machineLatent heatRow (database)InternetworkingRevision controlScaling (geometry)Functional (mathematics)Interface (computing)Different (Kate Ryan album)ExpressionMathematical analysisProjective planeDatabaseDecision theoryCASE <Informatik>Multiplication signTouch typingThread (computing)Endliche ModelltheorieMulti-core processor2 (number)Memory managementStandard deviationLevel (video gaming)WordCore dumpBenchmarkComputer architectureRootBitBit rateSummierbarkeitRoyal NavyUniform resource locatorECosComputer filePersonal digital assistantNetwork topologyCloningRight angleArmNatural numberWebsiteImage registrationProduct (business)Finite differenceLine (geometry)Service (economics)40 (number)Semiconductor memoryFigurate numberOverhead (computing)CountingCAN busSign (mathematics)Condition numberPlastikkarteProcess (computing)CausalityComputer animation
Extension (kinesiology)Read-only memoryFocus (optics)Variety (linguistics)TheoremSpherical capDisintegrationFunction (mathematics)Personal digital assistantMaizePoint cloudServer (computing)Module (mathematics)BenchmarkFunctional (mathematics)CodeSemiconductor memoryPublic key certificateReal-time operating systemFormal languageSequelSystem administratorServer (computing)SpacetimeConsistencyDatabase transactionCASE <Informatik>Software developerTriangleUniverse (mathematics)Extension (kinesiology)Type theoryCartesian coordinate systemProcess (computing)PlastikkarteReading (process)Multiplication signProgrammable read-only memoryWorkstation <Musikinstrument>Presentation of a groupOnline helpRight angleImage registrationSearch engine (computing)Perspective (visual)Link (knot theory)Object (grammar)DemosceneInterface (computing)PiMereologyFocus (optics)Combinational logicTerm (mathematics)View (database)BenchmarkSubject indexingStress (mechanics)INTEGRALDifferent (Kate Ryan album)ACIDPartition (number theory)Query languageMaxima and minimaFreewareMessage passingModule (mathematics)Point cloudCuboidSpherical capMathematicsComputer animation
Module (mathematics)Extension (kinesiology)Cartesian coordinate systemMereologyOverhead (computing)Scripting languageContent (media)Projective planeServer (computing)PredictabilityLogicTerm (mathematics)Query languageCodeBenchmarkModule (mathematics)Prime idealMoment (mathematics)ImplementationEqualiser (mathematics)Multiplication signProduct (business)Service (economics)CASE <Informatik>Reading (process)Video gameMUDFunctional (mathematics)InformationPerspective (visual)Computer animation
Point cloudFacebookOpen source
Transcript: English(auto-generated)
Great. After these technical hiccups, let's get started. Let's get started. My name is Chris Zimmerman. I'm here to talk about where is JSON, a DocumentDB in Rust. I have about 40 minutes, 45 minutes, right? So questions, please, at the end. I'm available in the
hallway afterwards, so feel free to kind of grab me if you have any kind of topics or discussion points. First of all, I'm going to dedicate this talk to my son Luca who cannot be there. If you're watching the stream, Luca, this is for you. Second dedication is actually to something of the CTO team at Redis Labs, who are the
people behind the codebase. Who am I? I don't go through all of the details. Suffice it to say, I help a lot in Frankfurt. I'm an arch package maintainer, blatant, blatant plug. I'm going to run the Linux beer bandung this year in Kreunberg, so if you have a week at the end of August, early
September, to join about 20, 30 people for discussions, hackathons, presentations, hiking, and of course beer, check out the website. It's actually Linuxbeerbandung.com or monochrome C forward slash LBW, monochrome C
com forward slash LBW 2020. Second plug, and then I'm going to finish with the plugs, I run a podcast. We just posted the first episode. It's called Linux In-Laws. Domain has gone live yesterday. First episode will be aired on Hakka Public Radio, HPR as your go-to source, on the 13th of February.
It's open source with a dark side humor twist thing. Plug's over. Hobbies include software development lifecycle. That's what I'm dealing with for the last, what, 20, 20, 25 years. I've been using open source for the last 30
years, 30 plus years. I'm also dabbling in IT security, and if I still have the time, I work for a company called Redis Labs, full disclosure, as a solution architect and liaison. A couple of things, basically, what this talk is all about. First of all, how many of you have used Redis or
don't know what Redis is? Wow, okay. There are a couple of intro slides. I'm going to go through them fairly quickly, because if the majority is already familiar with Redis, there's a point in repeating these details. Then I'm going to talk a little bit about the architecture, with a special focus on how applications perceive this document DB written in Rust.
And of course, summary and outlook will conclude this talk. As I said, I'm going to go through this fairly quickly. So far, Redis is, so far so to say, Redis is one of the kind of leading in-memory databases. It was founded about 10 years ago. We have more than 25,000
GitHub stars. There are more than 162 clients written in more than 57 program languages. I reckon that makes Redis one of the most loved. databases when it comes down to client connectivity.
How many of you actually have programmed against Redis? And I reckon most of you did this in a caching use case, right, or as part of a caching use case. This is basically where Redis comes from. About 10 years ago, the project initiator, somebody called
Salvatore Sanfilippo, was looking for a performant reporting database for one of his web projects. He checked out Memcached, he checked out other solutions they didn't check out. So he came up, he wrote his own database as a key value store initially. This is basically where Redis comes from.
Does anybody know what Redis stands for? Remote dictionary server. This is what this is. But over the years, Redis has evolved into much, much more. In about 2015, Salvatore introduced something called the module SDK to the code base that allows the initial Redis server
implementation to be extended with so-called modules, and Redis JSON is one of them. The idea is to take your native Redis implementation and to plug any gaps that you may perceive from an application perspective with functionality implemented in the modules.
So over the years, there have been modules in the area of graph. Redis graph turns Redis into a very performant graph database. Full software compliant interfaces similar to Neo4j. There's something called time series that turn Redis into a time series
database, comparable in functionality to something called InfluxDB. And so forth. The beauty is that all these extensions are on GitHub. So if you're looking for a performant graph database, simply clone the Redis server size, clone the graph module, compile the whole thing, load the module
when you start up the server, and then you have a full software compliant graph database in memory at your disposal. The idea was to turn Redis, with these modules, to turn Redis into something very application specific, while maintaining the advantages that
native Redis brings with it, namely low latency and hard throughput. This is the kind of idea. And the modules that you see on the screen are just the ones provided by Redis Labs. There are many more modules out there in the world living on GitHub. So if you think Redis doesn't offer something that
you need, simply deploy your favorite search engine and Google for a module extension. Chances are somebody has written something that you can either clone or deploy natively, as in right away. Okay, again, this is native Redis. I won't
spend too much time on it, but this is what native Redis offers out of the box. And the majority of the modules would fall back on these generic data types, including strings, sets, sorted sets, HyperLogLog, which is a probabilistic data structure, and so forth. As I said, this comes with native Redis.
This has been in the code base since pretty much day one. Most of the client-side implementations would reflect these data structures, either as part of their local ecosystem as a client library side, or written by an extension. And what is Redis JSON all about? You have your native Redis, and then
you have something called Redis JSON, which is essentially an ECMA 404 compliant module that turns Redis into document-oriented DB in functionality comparable to other document-oriented databases.
If you see this Mongo, if you see this Couchbase, you know what I mean. Okay, you have your typical JSON commands that allow you to insert documents. That would be JSON set, that allow you to retrieve documents from a database. That would be JSON get, but also because JSON offers array and all the rest of it, you have JSON array append and JSON array insert,
which allow you essentially to insert elements into an array and basically append them at the end. The navigation is done via JSON path. Who of you have used JSON path in the past? Okay, not that many.
Okay, so I'm going to spend a little bit more time on this. Again, just deploy your favorite search engine of choice. JSON path is, if you're looking for the specification, JSON path essentially is a standardized way similar to a document object model, if you know what that means, as in a DOM, that allow you to access data in a JSON document.
Essentially, all of these levels are separated by simple dots. Or you can use array notation to access these levels. The following is equivalent to .foo.bar.
By the way, an initial period also always reflects the root of the DOM, of the document object model. That's something very important to keep in mind. So, .foo.bar is equivalent to the first array in this level, which comes after foo, which is essentially bar.
Or you can write it all in an array notation. Simple as that. And something very important, you'll also find this in DOM query languages, is the support for wildcards. So, if you are unsure about specific selector, just insert a star, for example,
and you will get back the corresponding array, the corresponding document that reflects that path. How does it look like from the server side? You have the native server, and this is generic, this is not registration specific. You have essentially the red server that is running on,
by the way, what port? Can anybody tell me? 6379, right? Very, something very important. If you want to access Redis from the outside, make sure that your firewall is open for that port or kind of fortified correctly. So, all communication goes through a wire protocol called resp
between the server and the implementation on the client side libraries. And the client side pretty much looks similar to what is on the screen now in pretty much any programming language for which there is a Redis client. Essentially, you have a small wrapper around a socket interface, which is called HighRedis.
Written in C, highly performant, does little more than just wrapping socket access. On top of this, you have language specific bindings. As I referred to earlier, Redis is supported by more than 57 programming languages. So, each and every programming language has at least one,
if not more, clients. The problem or the issue, the challenge with these programming language, they're all different. You have Go, which is compiler-based. You have Python, which is, sorry, you have Python 3, very important these days. You have Python 3, which is interpreter-based. You have Native C or you have C++.
For all of these programming languages, there are client side library implementations, but they're all different. So, there's an abstraction layer, essentially implementing the interface layer to HighRedis, but that can understand the semantics, the specific semantics, the specific interfaces towards the programming language of choice
and including the runtime environment that this programming language uses. On top of this, you also have module-specific bindings. If you go to oss.redislabs.com, you'll see a list of modules, as I said, that Redis Labs put out there on GitHub.
And most of them would have a Python, a Go, and maybe a JavaScript implementation right from day one. So, these are the kind of the basic interface libraries you find for the modules. Needless to say, these module-specific client-side implementations would use the language-specific bindings
in order to talk to the server, which, of course, then requires the module to be loaded if the client-specific bindings should work. Let's take a look at how this is done for Redis JSON. Essentially, this architecture depicts a performance benchmark that I'm going to go into
as part of a later part of the presentation. So, I'm going to just spend some time on describing the architecture. You have the module-specific bindings, language-specific bindings, and then you have the HighRedis, as explained before. And on top of that, you have a small layer called JRedis JSON.
That, and this is then used by a database performance benchmark framework called YCSB, probably known by most of you who are looking for performance database, because essentially it stands for Yahoo Cloud Serving Benchmark.
It's a standard benchmarking framework for databases simply to TCP, if that rings a bell, only concentrating on NoSQL databases, because Redis is a NoSQL database, right? So, the idea is to extend YCSB
with a thin driver that talks to the document-oriented database, and it does so through the ordinary architecture-specific stack. On the server side, this is reflected. You have the native Redis server written in C,
as in you can pull it down from GitHub, it's all written in C, it's there. I used version 5.05 for this, but more on this later. Then you have something called the module SDK, which, written in C, has been there for the last couple of years, as explained. The trouble with that, of course,
is that you cannot use this really from Rust right away. This is the reason why Redis Labs created something called a module crate that essentially wraps the module SDK in Rust bindings. As you probably know, to call C from Rust,
you have to tweak it a little bit. Essentially, you have to say, now look, the Rust compiler. This API is not safe, because the usual memory management techniques, as in sole concept of ownership and all the rest, and you are Rust experts, you know this,
do not apply to C code. The idea behind the module crate is essentially to wrap the module SDK in something that Rust can understand. Then you have the remaining codebase written in Rust, talking to this module crate, which then in turn talks to the module SDK,
which in turn talks to the server internals. By the way, this is Redis JSON 2. The first implementation called Redis JSON was written in C, as we can see in a couple of minutes when I'm going to go through the performance aspects between these two code bases.
Okay. The original implementation, some figures, the original implementation had about 5.2 kilo lines of code. The new implementation is about 3.2 kilo lines of code. What were the main decision points for re-implementing the already existing
Redis JSON extension in Rust? First of all, yes, the native codebase of Redis is written in C, but as probably most of you know, aging C code is sometimes or somewhat hard to maintain, especially if it's been around for a while.
So C code bases bring, especially for new members of a team, learning code with them, plus also do not necessarily help with the overall total cost of ownership or something called technical debt. So there was a decision being made to going forward that the new implementation language for any new modules or re-implementation
of existing ones like Redis JSON would be done in Rust. The idea was to have a more compact code base, and I think the figures kind of reflect this, with a lower technical debt and very important with lower QA effort because there are fewer lines of code that need to be tested
and of course that leads to a lower overall total cost of ownership when it comes down to maintaining and extending the code base. And of course, something important, when you're working for a company that sells support around this and other services, time to market comes down to something
that may be important for the business side of things. This is the reason why it was a conscious decision about two years ago to go forward with Rust rather than C when it comes down to native implementation of modules. A little bit of experience when we engaged
with this re-implementation of the new code base. The team had quite a diverse background. Some of them were coming, many of them were coming from C. Some had some Java backgrounds and also Golang was present, but so the background was pretty diverse
when it comes down to programming language. And the reason why up to then the main implementation language of choice was C because A, the module SDK being present already was written in C and of course the remaining server code base has been written in C. That's the reason why natively it was pretty much a no-brainer
in the beginning to use C as the implementation language for any modules being developed inside Redis Labs. But going forward, as I said for the reasons explained, Rust was chosen for as the new technology to implement modules. So some lessons learned from the team
that engaged on re-implementing Redis JSON 2. Yes, Rust does have a steep learning curve. Just hands up, how many have used Rust more than two or three years? So you can probably reflect this, especially if you haven't put up your hands.
That means you're kind of still learning first, I reckon. So you know what this is all about with regards to memory management. Can be tricky at times. I'm talking about the board checker. Somebody about a year ago told me if you have convinced the board checker, if you have convinced the Rust compiler to generate code, you're halfway there.
In contrast to C where this is slightly different. Okay, but now the plus sides. A pretty comprehensive ecosystem. If you take a look at what's out there on crates.io, that's a lot. You have at least five web frameworks to choose from. You have crates for socket access.
You have crates for Rust itself is self-hosted. So you have actually crates for ASTs as an abstract syntax and all the rest of it. So chances are, like Python, you take a look at what's out there if you want to program a new code base and simply reuse the stuff that has already been written.
So this is a major advantage when it comes down to implementing new systems because essentially, as with other any open source projects, you are resting on the shoulder of giants. Responsive community. If you take a look at rustlang.org, especially at the forums, it's amazing.
When I started to learn Rust, there was always somebody out there helping me. Nevermind how stupid the question was, if there are any stupid questions, you will get support. So in contrast to other communities, Rust, and I think this is one of the big advantage of the community and the language,
has been pretty responsive and pretty supportive. And this is reflected by what the development team experienced when they first started to this enterprise of basically programming something in Rust. And of course the toolchain support is awesome.
Not only you have different toolchains at your disposal, beta, stable, and of course nightly. If you want to check out new features, you simply switch to a new toolchain version and off you go. But something pretty important, cargo. Best example, right?
It's not only a build system, it's a package management system all wrapped into one. I've yet, maybe apart from Golang, I've yet to find a programming language that does it all in one go. Maven for Java came later, and it took Java people a long time to get it right. My opinion.
Before any Java people kill me after this presentation. No, jokes aside, Mozilla decided about 13 years ago that they needed a new programming language because C and C++, as in the code base, then in place for the rendering engine, didn't cut it anymore.
So Rust was developed first commit, I think about 11 years ago, if I'm completely mistaken. But they did it with intelligence this time around. And you'll see this if you take a look at the toolchain support at the ecosystem, and people picked up on it pretty quickly. It took Python about, I reckon, 15 to 20 years
to where Rust is now within the short amount of 10 years. And that's pretty amazing, I think, for a programming language. More info on our beloved Yahoo Cloud serving benchmark. As I already said, it's written in Java. It's a standard framework. So the idea is that you have quite a few
DB integration layers as part of the native code base when you clone it from GitHub right away. Redis is, of course, supported out of the box. So is Hadoop, Mongo, Couchbase, even some graph database and all the rest of it.
So the idea is, basically, if you want to take a look at how your implementation, how your ecosystem, when it comes down to NoSQL, is performing, you simply clone the code base, you compile it, and then you can start testing. If, and that's what's the case when we started off on this performance benchmark exercise,
if there's no integration with your new NoSQL database yet, it's not that difficult, because you simply implement four, actually five, methods in Java that talk to the respective clients at library implementation, and then you're good to go.
So the idea is, basically, you have inserts, you have updates, you have deletes, and then you have writes and you have scans, plus maybe initialization and finalization of the database connection, but that's about it. So the implementation of the Redis JSON extension that talks to Redis JSON is about 200 lines of Java code.
It's as simple as that. So not a big deal. Don't fret if your database is not on the list of the already supported NoSQL databases. Writing that interface there is not that hard, apart from using Java, of course, but that's a different story.
Okay. YCSB has the concept of workloads. There are five workloads that all reflect different use cases. For the purpose of this benchmark, I used about three of them. So they range from A to F, of course, without saying. So they all reflect different kind of access patterns
on the client side, if you will, as I said, different use cases. So workload A would reflect your vanilla kind of cycle in terms of you have 50% of writes, you have 50% of reads that hammer onto the database.
Then you have workload B that has a more caching-oriented access pattern. Namely, you have about 95% of the database accesses would be reads, and only 5% would be writes. And then you have your bread and butter workload, which is F, that reflects a typical CRUD circle,
as in you read a data record, you modify it, and then you write it back. So for the purpose of this benchmark, this is basically what I focused on. Especially interested in workload number B, because this is where this caching thing comes from originally,
about 10 years ago. How much time do I have left? 20 minutes. 20 minutes, okay. Plenty of time, okay, cool. So we can spend some time on the analysis of this. First of all, some specs. I used a stock Ewan, as in the latest Ubuntu release,
and the machine I ran it on is actually a Dell XPS 13, that has a mobile i7 with about 16 gigabytes of RAM, and it's 512 gigabytes of SSD. And the number of records that I used is actually around 1 million.
Bear in mind that these figures, Express in seconds, of course, can be scaled if you move to a different server specification. Goes without saying. I travel a lot, so I use that machine that I'm presenting from as my go-to server in a vertical mass. I use Redis 505 as a kind of reference architecture.
You see that in the first line, basically to see how Redis JSON scales up against the native Redis implementation. And the native Redis implementation is already, as I said, part of the YCSB code base. And then I used Redis JSON as in the C implementation,
and I cloned this on the 2nd of January 2020. And one day later, I cloned the Rust version. The reason for the one-day delay was that my internet essentially broke down after I cloned the C code base.
And then I was traveling, and then I had internet access back again next day. So that's the reason for that one-day delay. But I took a look at the commits, and there were no commits in between. So that's the reason why. Pretty important because I'm measuring only Redis. I'm not measuring all Redis JSON. I'm not measuring ComBongo.
I'm not measuring CouchBase. I'm not measuring any other NoSQLDB as part of this benchmark. So I left it at in memory management only. So there was no persistence configured. If many of you have used Redis, you know that Redis has two types of persistence,
namely append-only files as well as snapshots. I didn't use any of them because I want to keep it straight, and I just want to confine it to Redis. So I simply said, now look, Redis, do your thing in memory, as you have been doing for the last 10 years. Let's take a look at the native implementation first to get some sort, some feeling of how Redis is measuring up.
The number of threads, you can configure that when you run your YCSB invocation. Number of threads essentially reflect or kind of model the number of applications hammering onto that database instance. As you can see, and this is implemented by pure Java threads.
Thank you. Pure Java threads. On the client-side implementation, because this is what YCSB uses when it comes down to accessing or simulating access to the server-side implementation. So one thread, four threads, and eight threads. Bear in mind that this is only a mobile quad core.
So you're looking at essentially a dual core with hyperthreading. That's something important to keep in mind. Already at the kind of Redis level, we'll see a spike in performance when you move from one to four threads. This slightly goes up if you move to eight threads, because that's what you see
when you hammer with multiple databases, sorry, when you hammer with multiple clients onto a single database instance. And that's actually workload A. And this is reflected also in the remaining workloads.
What I'm a little bit surprised about, and I haven't done a quite thorough root core analysis yet, why the performance for eight threads is actually higher on the native Redis level rather than four threads. So I reckon it's down to something called the Jadis interface that is used
when accessing native Redis in YCSB. Jadis is one of the standard Java clients, apart from Lettuce and something called Redisn. And as I said, I'm suspecting that this is basically down to the Jadis implementation. I used 3.2. The tagging of the release was fairly recent.
So I reckon there's something fishy, maybe, and maybe in that Jadis version. Switching over or comparing this to Redis JSON as in the native C implementation of the DocumentDB, you will see the price you pay when switching from native Redis to Redis JSON,
i.e., the overhead, the performance penalty that you pay in a word of commerce when you actually use that module giving you document-oriented functionality. And as you can see, it's not that much at the end of the day, because essentially you're looking at 30% multi-threaded
and even less when it comes down to a single thread. And now it gets really interesting, because the comparison between Redis JSON and Redis JSON 2 essentially tell you the performance penalty when using a Rust codebase
in comparison to a C codebase. So this is the impact that you have when you go from C to Rust. Slightly simplifying, but you get the drift. Let's take a look at the numbers. Work, let's pick a random workload. Let's pick workload A.
The performance penalty that you pay is quite minimal, because essentially you're looking at 44 versus 49 seconds. That's five seconds difference. But on the other side, you gain all the benefits that come with Rust. This is why this comparison for this particular use case
is quite revealing, I think, when you are facing the decision what implementation language to use for your next project. Would that be Rust? Would that be C or C++? I'm going to leave it here, because we have about 10 minutes left, something like that, so there's still some more slides.
The slides, of course, will be on the website. So free-free to take a second look to do some further analysis. The code will be on GitHub very soon. More on that later. And feel free to get in touch if you have further questions on this. Okay, short recap.
Redis JSON, of course, is an extension, basically, of Redis. And this is purely aimed at, or the primary use case is in-memory processing. So the focus is on what Redis comes with natively, performance and high performance and low latency, as in maximum throughput.
And of course, because it's based on Redis, you have the full CAP triangle at your disposal. Sorry, CAP? Anybody doesn't know what CAP is? Okay, great. Sorry, yes. Brouwerth's theorem, it's a triangle between consistency, availability and partition tolerance. Essentially, from ACID write
up to maybe Coherence, something like this. So essentially, ACID doesn't, of course, ACID should ring a bell. So ACID is your typical SQL-based use case, as in you have a transaction that either fails or succeeds, nothing in between. Many, as it turns out,
many applications do not have to use that strong consistency notion. So that's the reason why you especially see this in the NoSQL space, that more and more applications move away from this strong ACID compliance. And Redis, with the different types of persistence, with the in-memory focus and some, and thank you.
And the other benefits, of course, allows you to move in that triangle, based on your use case, pretty freely. So this is basically the advantage of this NoSQL approach. The outlook is, we're going to integrate this, and you'll see this when you take a look
at the code base on GitHub already. There's a module called Redis Search, which essentially is a full text search engine, also based on Redis, of course, that allows you to have the functionality of a real-time index in memory at your disposal. Many people use it,
as far as I've seen, basically to implement something called like an in-memory search engine, because that's what it is for. So you take a document, you let Redis index the whole thing, and then you can basically search for your index terms, or for your search terms on that index. So the idea is basically
to combine Redis Search with Redis JSON, which up to the time only supported JSON as a query language, or JSON path as a query language. And of course, other functionality improvements regard extending the functionality currently implemented in Redis JSON 2, plus, of course, API extensions.
So using, at the end of the day, add a document DB that is fully indexable in real time. So that's the overall idea moving forward. Before we come to the questions, couple of links. On Redis.io, you'll find the full Redis documentation.
No need to tell you that, because you know this already. Redis JSON.io reflects the reference documentation that is out there for Redis JSON and Redis JSON 2. By the way, the API is the same. So whether you are talking to Redis JSON or Redis JSON 2, it doesn't matter.
And it will always be the same. So if we extend Redis JSON 2, these changes will also be reflected in registration for the time being. You will have the code base on GitHub. There's, of course, the code base for Redis from Salvatore, maintained by Salvatore Sanfilippo
on GitHub as well at his handler called Antires. You'll find the YCSB cloud server benchmark also on GitHub. As soon as I get around to it, I'm going to issue a pull request for the Redis JSON interface layer. So if Brian accepts this,
you'll have a native YCSB integration out of the box on GitHub. And of course, there's also something called the Redis Labs University. Just a sentence on that, because this is not a commercial presentation. It's essentially an online university where you can get to know more about Redis,
both from a developer as well as administrative perspective. It's free of charge. Simply register, create an account, enroll in a course, pass the exams, pass the homework, and you get a certificate. Any questions before we close it off? Yes. Can you make a module that uses another module
so you can move part of your application in the server? Yes, of course. Thank you. The question was, can you nest modules essentially, as in can you move business logic onto the server side? Yes, you can.
There's something called Lua, which is a scripting extension for Redis, but there's also an upcoming module. It's in technical preview at the moment. It's called RedisGears. RedisGears allows you to take your, at the moment it's only Python, allow you to take your business logic code in Python and move it off to the server. So your Python script is then executed
as part of a Python implementation running on the server. You can also do this with module functionality, although I haven't seen this yet. I was thinking specifically the thing like Rust. Rust.
Question is, it hasn't been done that. From a functional perspective, I don't see any reason why, as long as you stick to the module create, goes without saying, because essentially this is what you use when you access the module SDK, but I don't see any reason why you couldn't do this. But as I said, the POC still remains to be done.
Any other questions? You had a question, right? How much prediction ready is the RedisGears 2? Is it just functionally complete? Is it already optimized? Yes, thank you. The question was, how ready is RedisJSON 2?
We, as in Redis Labs, released this last year. The commits have been mostly bug fixes. So I know quite a few projects who use this already in production. So the answer is, it's ready for prime time now. Needless to say, PRs are still accepted, I think.
So if you have further improvements, let us know. Other questions before we close it off? Yes.
The question was, is the performance equal between creating a new JSON document or just updating one? Essentially, the idea is, if you are updating one, you basically have to modify the contents. That, of course, includes a query
in terms of where you want to stick in the content. So I don't have exact performance benchmarks, but based on the already kind of native performance implementation, I wouldn't expect that overhead to be really significant. But as I said, having not done the performance benchmark
for this particular use case, let's see. Another question? Okay. In that case, I would like to thank you for your time and enjoy the code.