IMDB showdown - comparing OrigoDB, Redis and SQL Server Hekaton
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 163 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/50246 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Cache (computing)Computer networkMiniDiscServer (computing)Read-only memoryMultiplication signOracleSQL ServerSemiconductor memorySequelSoftware developerMultiplication signAlgorithmInformation technology consultingDatabaseBuildingServer (computing)Virtual machineChannel capacityData storage deviceRoundness (object)BitDivisorCivil engineeringComputer architectureMathematical optimizationData structureOverhead (computing)Subject indexingCASE <Informatik>Projective planeCategory of being40 (number)MetreDistanceRight angleElectronic mailing listProduct (business)Relational databaseGroup actionProxy serverNetwork topology2 (number)Database transactionIntelligent NetworkMaxima and minimaWeb pageSkewnessPosition operatorKey (cryptography)Endliche ModelltheorieType theoryDampingModule (mathematics)Functional (mathematics)Real numberWave packetPhysical systemSet (mathematics)Scaling (geometry)FrequencyDevice driverDisk read-and-write headGoodness of fitPerfect groupMathematicsCache (computing)MiniDiscQuery languageLaptopTable (information)In-Memory-DatenbankDifferent (Kate Ryan album)Cartesian coordinate systemWeight
08:34
Read-only memoryOracleMultiplication signSQL ServerData bufferDatabase transactionWeb pageLogicCache (computing)Block (periodic table)HypermediaTape driveMiniDiscComputer architectureRight angleLibrary (computing)Semiconductor memoryFormal languageSlide ruleConstraint (mathematics)Web page1 (number)Square numberNetwork topologyMereologyBuffer solutionCASE <Informatik>Type theoryReading (process)InformationSound effectInsertion lossPhysical lawSingle-precision floating-point formatMappingRelational databaseCache (computing)Error messagePunched cardMultiplication signKey (cryptography)Block (periodic table)SequelRandom accessGroup actionData storage deviceLevel (video gaming)Projective planeHybrid computerServer (computing)DatabaseTheory of relativityDatabase transactionCrash (computing)NeuroinformatikChain2 (number)Data managementGraph coloringGreatest elementDifferent (Kate Ryan album)MultilaterationBitData structureExtension (kinesiology)CuboidSequenceComputer fileDisk read-and-write headPosition operatorConnectivity (graph theory)Process (computing)Archaeological field surveyEqualiser (mathematics)Forcing (mathematics)ImplementationCategory of beingCivil engineeringMathematicsDivisorLoginWritingProgram flowchart
17:08
SQL ServerMultiplicationRevision controlControl flowData structureData bufferLoginServer (computing)Enterprise architectureDemo (music)NumberDuality (mathematics)Maxima and minimaTable (information)Key (cryptography)Hash functionSet (mathematics)Convex hullMultiplication signMachine codeTable (information)Pairwise comparisonDifferent (Kate Ryan album)WorkloadSequelAdditionHeat transferProcedural programmingSemiconductor memoryRight angleServer (computing)Software testingAxiom of choiceSubject indexingStability theoryUltraviolet photoelectron spectroscopyNetwork topologyData structureData typeHash functionType theoryBitSoftware developerCartesian coordinate systemLimit (category theory)Key (cryptography)Numbering schemeEnterprise architectureEntire functionDatabase transactionRow (database)Revision controlMathematicsBit rateSound effectData storage deviceRegular graphComputer fileStreaming mediaBackupGreatest elementInformationInheritance (object-oriented programming)Link (knot theory)Online helpPhysical lawObject (grammar)Adventure gameCodeRepresentation (politics)Data managementRandomizationMultiplicationWeb pageSpacetimeComputer architectureInsertion lossChaos (cosmogony)Process (computing)Game controllerConcurrency (computer science)Menu (computing)Mathematical optimizationChecklistBuffer solutionData loggerRelational databaseConstraint (mathematics)Slide ruleFlip-flop (electronics)Basis <Mathematik>PlanningDatabaseQuery languageContext awarenessSingle-precision floating-point formatMultilaterationScripting languageBinary codeDefault (computer science)JSONXML
25:42
Server (computing)String (computer science)Electronic mailing listHash functionSet (mathematics)AlgorithmTwitterPasswordSystem programmingData modelKey (cryptography)Reading (process)Data storage deviceLimit setUniqueness quantificationServer (computing)Data structureAlgorithmCartesian coordinate systemCache (computing)Semiconductor memoryMultiplication signTwitterSet (mathematics)Hash functionDevice driverFormal languageFacebookComputer fileInstallation artBuildingSpeicherbereinigungOperator (mathematics)Relational databaseOpen sourcePoint (geometry)Translation (relic)Library (computing)WordDatabaseObject-relational mappingTrailIn-Memory-DatenbankType theoryPhysical systemInsertion lossSelectivity (electronic)Right angleMappingEndliche ModelltheorieDomain nameFreewareConcurrency (computer science)Sinc functionCloningCASE <Informatik>SummierbarkeitObject (grammar)Theory of relativitySequelRemote procedure callStatement (computer science)Data dictionaryHome pageNumberString (computer science)QuicksortInterface (computing)Data modelAssembly languagePasswordArithmetic meanKeyboard shortcutComputer animationJSONXML
32:51
Data modelRead-only memoryPhysical systemOperations researchJava appletComputer-generated imageryEvent horizonSystem callCodeQuery languageServer (computing)Video game consoleTime domainLatent heatProcess (computing)Replication (computing)Data storage deviceConsistencyConcurrency (computer science)Graph (mathematics)Multiplication signSemiconductor memoryCache (computing)Point (geometry)Atomic numberOperator (mathematics)State of matterData structureElectronic mailing listData dictionaryWeightType theoryPhysical systemMappingDatabase transactionDomain nameRepresentational state transferLoginServer (computing)Binary fileJava appletMiniDiscWeb applicationRandom accessACIDMedical imagingReplication (computing)CodeData storage deviceSerial portDefault (computer science)Product (business)Mathematical optimizationMessage passingKernel (computing)Instance (computer science)TelecommunicationDatabaseInformationObject (grammar)Computer fileSingle-precision floating-point formatTheoryEndliche ModelltheorieCartesian coordinate systemQuery languageEvent horizonPower (physics)Software developerProjective planePattern languageSource codeConnectivity (graph theory)MereologySoftware framework2 (number)MultiplicationWeb 2.0Module (mathematics)Directory serviceProcess (computing)Queue (abstract data type)Link (knot theory)Square numberRight angleBlogPhysical lawGraph coloringUniform resource locatorComputing platformGroup actionOffice suiteInsertion lossJSON
39:56
System callQuery languageCodeServer (computing)Video game consoleTime domainLatent heatOperations researchProcess (computing)Replication (computing)Data storage deviceConsistencyConcurrency (computer science)Graph (mathematics)Point (geometry)TrailEvent horizonData modelOrder (biology)Product (business)SerializabilityString (computer science)OvalCountingQuery languageCodeKey (cryptography)Constructor (object-oriented programming)Generic programmingSoftware bugSource codeSoftware frameworkType theoryBinary codeData dictionaryData structureSemiconductor memoryPoint (geometry)Latent heatEndliche ModelltheorieConcurrency (computer science)Message passingView (database)Entire functionControl flowWeb pageCategory of beingKernel (computing)WhiteboardException handlingTrailEvent horizonMultiplication signServer (computing)Internet service providerCartesian coordinate systemLink (knot theory)DatabaseDirectory serviceConfiguration spaceObject (grammar)Data storage deviceGraph (mathematics)Product (business)Order (biology)Physical systemDebuggerResultantSingle-precision floating-point formatThread (computing)2 (number)LaptopSlide ruleComplete metric spaceMiniDiscConsistencyNatural numberClient (computing)Software developerSurfaceSearch engine (computing)Right angleDot productGoodness of fitPrincipal idealProcess (computing)Demo (music)Frame problemSet (mathematics)GradientSocial classArithmetic meanCausalityGoogolJSONXML
47:01
Identity managementZugriffskontrolleServer (computing)Goodness of fitCore dumpMeasurementSystem programmingStatisticsVacuumAnnulus (mathematics)Server (computing)Open sourceCodeSearch engine (computing)Electronic mailing listEndliche ModelltheorieCache (computing)BlogGoodness of fitResultantRoundness (object)Connected spaceLink (knot theory)BitSource codeKey (cryptography)Point (geometry)Semiconductor memoryWordAddress spaceExtreme programmingScripting languageDatabase2 (number)Casting (performing arts)Computer animation
48:47
Core dumpEndliche ModelltheorieQuery languageSet (mathematics)Object (grammar)Extreme programmingLink (knot theory)Extension (kinesiology)LogicMereologyString (computer science)CodeResultantQuicksortData dictionaryKey (cryptography)WordRight angleLine (geometry)Source code
50:04
Endliche ModelltheorieServer (computing)String (computer science)Data dictionaryType theorySingle-precision floating-point formatReading (process)Assembly languageCloningEinbettung <Mathematik>Source code
50:58
Pairwise comparisonMatrix (mathematics)Formal languageSubject indexingMultiplicationCapability Maturity ModelFreewareProcess modelingSource codeExpressionPoint (geometry)Endliche ModelltheorieServer (computing)Formal languageSemiconductor memoryPairwise comparisonOperator (mathematics)MiniDiscSequelProcess (computing)Different (Kate Ryan album)Moment (mathematics)WritingResultantClient (computing)Type theoryData dictionaryLink (knot theory)Graph (mathematics)Query languageError messageDependent and independent variablesExtreme programmingMessage passingGreen's functionField (computer science)Software developerFile formatGoodness of fitCommitment schemePhysical lawLevel (video gaming)Curve fittingEnterprise architectureRelational databaseDemo (music)Multiplication signKey (cryptography)Event horizonTable (information)Connected spaceLimit (category theory)DatabaseRight angleSyntaxbaumFreewareDatabase transactionCartesian coordinate systemReplication (computing)Figurate numberTheory of relativityPartition (number theory)Exception handlingWordFitness functionScripting languageScaling (geometry)Mathematical analysis2 (number)Metropolitan area networkPower (physics)Capability Maturity ModelProduct (business)Reading (process)Software bugStructural loadSingle-precision floating-point formatForm (programming)MathematicsLie groupComplex (psychology)LogicMappingObject (grammar)Cache (computing)Extension (kinesiology)TheoryStreaming mediaBinary fileSerial portPatch (Unix)SynchronizationProjective planeACID
Transcript: English(auto-generated)
00:02
Welcome. Really happy to be here. I'm Robert. I'm from Devrix Labs. It's a small company in Sweden, and I'm a developer. I've been developing since about 20 years, mainly on .NET and a lot of SQL Server, and a lot of other things. At Devrix, we're building
00:20
Orego DB. It's an in-memory database, and we're trying to make a living from it. Not really there yet, so we support ourselves doing consulting and training in Sweden, mainly. Today I'm going to show you a little bit about Orego DB. I'm going to show you a little bit about Redis and a little bit about SQL Server's in-memory OLTP function. One hour
00:44
is not a lot of time, so it's going to be pretty brief, except we'll spend a lot of time with Orego, of course. Let's get started. What's this in-memory thing all about? Why is in-memory so special? Why is there so much buzz about in-memory?
01:06
Well, it's speed, obviously, right? In-memory is really fast. Here's a little table with some numbers. In the middle, you can see RAM there. We measure the time, access time
01:20
to RAM about in nanoseconds, right? You buy a module of RAM, the access time is about 60 nanoseconds. If we compare that to a rotating disk, then the access time is about 10 milliseconds. That's about right. You probably all have SSDs, but we can use a rotational disk as a reference. 10 milliseconds. That sounds pretty fast, but if you compare
01:46
that on a scale, that's 167,000 times slower than reading from memory. That's kind of a head start right there. You don't have to really do perfect stuff in memory.
02:01
If you just stick your data in memory, then you can cheat with everything. You don't have to really do perfect stuff at that speed. So speed is the primary driver. Just to get some perspective, the last column out there, it says one second for
02:23
RAM. One second, that's a pretty short period of time. If I'm reading some value from somewhere and it takes me one second to get the reply, one second is one second. But if we compare that to rotating disk, that's 46 hours. I hope I did the math right.
02:46
I did it on a napkin during Geek beers yesterday. You might have to check it. The distance. If we have a distance of 240 meters, it's just down the street,
03:02
buy a beer and go back 240 meters. That's one trip around the earth, 40,000 kilometers. So the same time that I can go to the store and buy a beer and go back, I'd have to go around the world 40,000 kilometers. So memory is really fast.
03:25
Memory has a different property because it's a random axis. You can read at any position in memory and you pay the same price, about 60 nanoseconds. When you access disk, you need to organize your data so you read it sequentially.
03:43
If you try to do random accesses, a transaction in a relational database, it might do 20 or 30 page modifications. If we had to read and write all those pages, we could have 100 reads and writes of a page. If those were random,
04:02
then each transaction would take a second. Then you can do a maximum of one transaction per second. But they're not really like that. We're going to look at the architecture a bit, the current architecture of the relational database systems, SQL Server in particular.
04:21
We'll look at that soon. Here's another factor. Memory is fast, right? That's great, but if I only can fit a tiny bit of my data in RAM, then I still have a problem. I still have to keep it on disk and that's necessary. If you look at the prices, how they've developed the last 30 years or more,
04:42
one gigabyte of RAM cost six and a half million dollars. That's a lot of RAM, around the 80s. If you look at the price today, it's about eight US dollars for a gigabyte of RAM. At the same time, the top capacity in servers for RAM is also grown.
05:03
I remember my first PC. It was a 386S6. Remember those? I got it and I thought, well, I bet it doubled the amount of RAM, so I got two megabytes of RAM in that machine. That was awful out at that time. This one's got 16 gigs in it, this laptop.
05:24
On Azure, the largest VM you can purchase on Azure has 112 gigs of data. On Amazon, it's twice as big, 244. They are expensive, but you can get servers with RAM.
05:43
You can buy commodity servers from Dell with about two terabytes of RAM. That's a lot of data, right? How big is your relational database? How big is your OLTP database with your working set of data? If I look back at my experience, the projects I've worked with,
06:01
healthcare and that kind of thing in Sweden, a big database is always less than a terabyte, around 100 gigs or 200 gigs. Then again, there's a lot of extra data in those gigs because there's a lot of index and there's a lot of overhead with the data pages. If I represent that data in memory with an optimal structure for memory,
06:26
then it will just shrink away. We don't need all the indexes because we can just scan and seek the data. We don't need indexes for every type of query. Your data probably fits in RAM.
06:41
If it does, it could be a good idea to put it in memory and not just for speed. We'll get to that. That's not really why we built Orego. We didn't build it for speed because most of the time, a relational database, unless you really mess things up, is fast enough. If you know what you're doing, you can make it work well enough for what you need to do.
07:04
We built it to save time. There are a lot of in-memory stores out there, a lot of products claiming they're in memory, kind of riding on the buzz and trying to get a market share just by claiming they're in memory.
07:22
Here are just some of the products. I erased some of them. I had a list of about 40 or 50, but just to show you some of them, you've probably seen them. There's one group of in-memory databases called NewSQL. You've probably heard of that, the NewSQL. I'm going to show you the old SQL today with the relational architecture,
07:42
the B-tree structure. We're going to look at that soon. Some of these new vendors, they're putting all the data in memory, and they redesigned the relational architecture from scratch. VoltDB is one of them, and another one is MemSQL, and there are probably others as well if you've heard of NewSQL. That's pretty smart because they're utilizing the existing knowledge that we have.
08:06
We all know how to use a relational database. We all know how to type SQL and so on. They're kind of giving us something that we can get working with quickly, a model we're familiar with.
08:21
Then we have all these key value cache thingies, and they're not really databases because it's just a key value store. You push in a piece of data and you associate it with a key. They're not a database any more than a dictionary is in .NET, perhaps.
08:40
You've heard of these? Aerospike, Hazelcast, they're used mainly for caching, kind of like distributed caches. Does anyone use AppFabric, the cache thing in AppFabric? Maybe? All right. I think they're going out of business, AppFabric. They're getting rid of it, right?
09:01
I heard that the other day. You'll probably be choosing something like this, Hazelcast, Aerospike, or Redis in Azure. We're going to be talking about Redis as well. Those down there to the left are key value stores. On the right, we see these hybrid solutions, the existing, the dinosaurs, the ones that are taking all your money.
09:24
They want to keep taking your money, so they have to offer some kind of in-memory thing before they go and die. Hopefully, Oracle will soon. We'll be looking specifically at SQL Server's in-memory OLTP.
09:44
Well, a bit. We'll be looking at it. I call it a hybrid because you have all the same behavior. All the regular stuff is there. They've just added new features that can take advantage of the in-memory features. SAP, when I see that in the project, I run.
10:03
They have some cool in-memory stuff. Have you heard of it? SOPHANA? The Germans? Okay. This is pretty impressive stuff they built there. Then there's Orego. I don't really know where it is, where it fits on this map. It's not new SQL. It's not hybrid. It's not key value store unless you want to.
10:23
We'll have a look at that later. Okay, so let's start by having a look at SQL Server. I'm sure you're all familiar with it, use it, right? Or any other relational database. Most of the things I'm talking about here apply to most relational databases with traditional architecture.
10:44
SQL is old, right? It's from the 70s. What else from the 70s do you use? Languages or libraries and stuff. Nothing, right? Okay. The design for the B3, the underlying architecture, the data structure behind relational databases
11:06
was conceived during the 70s when we had tiny amounts of RAM and pretty small amounts of disk as well, but our data had to be on disk. Before that we read from punch cards or tapes like streaming media.
11:23
Then when disks came, we had to utilize them in some way. Just make things go reasonably fast. Just reading and writing data. It was slow at that time. The relational database was kind of a solution to those constraints that were really extreme.
11:45
None of that is really true today, like I showed you a few slides ago. We have a lot of memory, and it's cheap. It's kind of old. It's impressive, but it's obsolete.
12:01
SQL Server, the team, they've added some in-memory features to just stay alive. Let's have a look at how B3s work. This is a technical part of the talk. B3, it's a logical tree. You see the tree on the left. It's a logical structure, and all of the squares there, the green and yellow ones,
12:24
or whatever color they are, are data pages. Each data page maps to a block on disk with the size of 8 terabytes. It's a logical structure. On disk, it lives in the data files. We can have one or more data files in SQL Server.
12:42
On the right, it says data, 64K blocks. Those are called extents. Data on disk is consecutive blocks of bytes of data pages. In between, we have a buffer manager. The buffer manager is the component that writes the dirty pages to disk,
13:02
and it reads the pages into memory, into the buffer pool. That's on the left. The tree is in the buffer pool, some of it. The buffer pool is like a cache. If you have a lot of memory for SQL Server, all the memory on the server, it will take as much memory as possible,
13:22
and it will use it mainly for buffer pool, so it will keep the data pages in memory. That's a good thing. If you have a lot of data pages in memory, you don't have to move them back and forth between memory and disk. That's the buffer manager's job. That's a component of SQL Server.
13:41
The buffer manager takes dirty pages from the tree and writes them during checkpoints. That can happen every second or every five seconds or sooner. It's pretty complicated, but between checkpoints, you have dirty pages in the tree. If the computer crashes, the server crashes, then you lose those pages.
14:03
That's why you have a transaction log to keep the durability. If you crash, you save everything in the transaction log. Every time a transaction commits, all the information from the transaction gets written to the log. That's a transaction log. SQL Server uses effect logging or binary logging
14:23
because it logs the data pages. We're going to look at other types of logging a bit later. Orego uses a different type of logging. VoltDB uses a different type of logging. Redis uses logging, also a different type. It uses logical logging. It remembers the information that caused the change.
14:43
This is effect logging. We don't remember what the insert, update, delete things were. We remember which pages were new, which pages were modified, and which pages were deleted. All those get written to the log. We have to write deleted and modified pages to the log
15:01
or the original pages before they were modified, so we can roll back. If something goes wrong, you have to roll back. Then we read the pages back from the log and replace them in the tree. Got it? That's logging. For every single transaction in SQL Server,
15:21
it has to flush one or more pages to disk. Hopefully, the heads on the spinning disk won't be doing other things. You have a dedicated disk for your log if you have spinning disks. Then the head will remain in the same position, and it can write pretty fast. It won't be doing random access.
15:40
It will be appending to the log all the time. But if you put your log and your data files on the same disk, then the disk will have to jump around. The log defines how fast, how many transactions, what kind of performance you can get from SQL Server and write transactions.
16:07
Got it? That's a B-tree. That's the old structure. It's designed in the 70s or 80s. So that's pretty old. Yeah, but nobody does that anyway, right? That's not what we're doing, says Alan Kay, right?
16:26
This is kind of an implementation or architecture, but not really an idea. But if you take this idea and build it specifically for in-memory,
16:40
then you'll get totally different performance and other properties. We'll be looking at that soon. OK, so let's talk a little bit about SQL Server's OLTP. The previous slide was the traditional architecture. This is kind of like, OK, so what do they do about it? How are they going to speed things up?
17:02
Here's a pyramid, and at the bottom, that's the most... the factor with the largest impact, the buffer manager I.O. Writing and reading to disk, buffer manager, at random places in the data file, that consumes a lot of time.
17:23
And so on the right, we see what they did. So they kind of... Oops, I'm pressing too much there, OK? Yeah, so they added features. They redesigned the B-tree and just created an entirely different and new representation of data in memory.
17:40
So they keep the entire table. You can choose to create one table and make it in memory. It's on a table-based basis. So it's not the entire database or portions of it. You take one table at a time, and you migrate it to in memory. So table is either in memory or it's not. And the in-memory architecture is totally new. So it's a redesigned architecture optimized to work in memory.
18:07
OK, so the B-tree we saw on the previous page, it's gone. Right, and then we have locks. Locks are during a transaction. Because relational databases are so slow,
18:21
transactions need to be concurrent. Because if you have to wait for the previous transaction, things will be chaos, right? So locks are used by the transactions to remember what kind of data... I think I have a timer on that one. The locks are used to remember which data rows
18:41
and data pages have been modified by a transaction, so to isolate transactions, so they don't step on each other's data. More on isolation later. So what did they do about that? They added a multi-version concurrency control. It's kind of like snapshot isolation. But it means that if one transaction needs to read the data,
19:02
it will get a copy of it. So they don't have to block on the same data row. So you just copy the row and then both use it. And then at the end of the transaction, it's kind of optimistic. If there's no conflict, if nobody changed the two same rows, there's no conflict, then both transactions can commit.
19:22
So they've eliminated locks from in-memory OLTP. And that's a significant amount of time for processing the locks. Latches are for the data structures themselves, to protect the B-trees and other structures. And they've redesigned them as well,
19:41
using lock-free data structures. And logging, we saw logging, we write stuff to the log, effect logging, all the data pages that were modified and so on, written to the log, but not in-memory OLTP. We just log the actual data rows, the data row that is being written.
20:03
That's the only thing that goes in the log. And it's very small. So it's not the entire data page, it's just a tiny bit. And then the log, it doesn't use checkpointing, because the data is only in-memory. And there's a background process that reads the log and updates the data.
20:24
And the data isn't regular data files, it uses the file stream thing. So this is kind of technical. If you want to use, I'll back it up. If you want to use SQL Server in-memory OLTP, you don't have to know all these things. You just start the wizard,
20:40
and you say, okay, next, next, next, and then you're finished to use it. It's good to understand things and why it's fast. But the main benefits come from the bottom of the pyramid, and then they, yeah, diminishing returns the higher you get up the pyramid. And at the top, we have stored procedure execution and query planning and that type of thing.
21:04
Today you can create, with in-memory OLTP, you can create native compiled stored procedures. So they're compiled to C, they're transcompiled to C when you create them, and then they're C compiled so you have native code. Your procedures are native code working on these in-memory structures.
21:21
So everything's pretty fast. All right? You guys heard of Oracle times 10? They bought some startup, and times 10, they're boasting because their goal was to make things 10 times faster. And SQL Server, the code name for SQL Server's in-memory OLTP was Hekaton.
21:41
You remember that? So Hekaton is Greek for 100. So that was the goal, to make everything 100 times faster. But it's kind of silly. You can't just make everything faster. It's difficult to measure. You have to compare different workloads, and it's pretty complicated, but you can probably get not really near that.
22:01
So it's really transparent for the application. That's one of the strengths because if you have SQL Server Enterprise Edition, yeah, lots of money, then you can just convert your tables to in-memory, and then you'll benefit from the speed. And it's entirely transferred to the applications, but not really because there are limitations.
22:22
You can't have foreign keys. You can't have certain types of indexes. You can't have certain types of columns that are not supported. And that can be a bit disappointing because you have to do a bit of work to really get there. And then you Enterprise license and some performance there. All right.
22:43
Let me show you there, SQL Server somewhere. I have a 2014 Developer Edition here, and AdventureWorks, you've probably seen it before.
23:00
So on the context menu, there's a memory optimization advisor, and I'm just picking a table at random here in AdventureWorks, and it runs a checklist and shows me all the reasons I can't convert this to in-memory.
23:21
So it's a checklist, and there's information. There's a link to the help, but user-defined data types not supported. Foreign key not supported. Unsupported indexes defined on this table. Default constraint not supported. Okay, not supported. Let's try another one. Data's log.
23:41
Oh, XML type not supported. Large object binary not supported. Okay, try another one. Okay, almost. Foreign keys. Yeah, there are a lot of limitations. So you have to work with that. I didn't manage to find a single table in AdventureWorks that could pass the test.
24:03
So I did this one. Fairly simple table, right? Just to show you how it works. We don't have a lot of time, so we're not going into the details, but I think that table's right there. Okay, and we'll run that one.
24:21
And it passes all the tests. Amazing. All right. And some warnings. We have some procedures that access that table, so we could probably convert those also. We don't have to, but we can compile them, pre-compile them. All right.
24:41
So make some choices here. We can choose between a hash index. It uses a specialized hash structure. It's like a dictionary or a hash table in memory, and it's really, really fast for lookups. Or we can use a traditional tree-based index. Those are the two we can choose from. We can also create secondary indexes later,
25:01
but those are never saved to disk. They're always recreated on startup. All right. So I won't do that. I'll just script it so you can see what happens. It kind of looks like this. You create a table, and you say with memory optimized on.
25:22
And durability, that's schema and data. If we say just schema, then the table would be empty on startup. Yeah, if we want to do some kind of caching or something. Okay. Enough of that. And back to the slides.
25:50
Okay. Okay, so let's talk about Redis. Redis is a key value store, but the values are very special.
26:00
It's actually an abbreviation. Remote dictionary server. So that's the abbreviation Redis. So it's a dictionary server. So you have keys and values. You know dictionaries, keys, values. But the values are very special. You can put either strings, but you can also put... Let's have a look at that. You can put...
26:21
The values can be strings. It can also be lists. So it's kind of like a data structure server. So one key can point at a set or a hash or a sorted set. So it's a limited set of collections you can use. And then there are a lot of predefined commands. And you can talk to it with Telnet. You just start up...
26:41
Yeah, you just talk over TCP, and you send text, and you get text back. So that's the interface, and there are a lot of drivers for a lot of languages. Redis is extremely popular, used by Twitter and Facebook and all those big stuff for just keeping data in cache, caching data, keeping it near applications for speed.
27:03
And Redis also has persistence. So you can do... There's an append-only file. So every time you change something, you send a command, it will write the command to the append-only file. So when you start it up again, it can reread the commands in the file and rebuild the data in memory.
27:21
But most people use it. They start up an empty node, and then just start using it like a cache from the application. Okay, written by a guy in Italy. Highly optimized C, a lot of fast algorithms. And every single command has a documentation on how it performs. So it's a great performance, written in C,
27:40
no garbage collection installs or that type of stuff. Widespread, open source. Okay, so if we do something crazy like build Twitter using Redis only, what would we do? It could look like this. So here's a command, increment. It means increment the key with the name next user ID.
28:03
And if it doesn't exist, it will create one. And it returns the incremented value. And it's an atomic operation. So it's kind of like a key with a number behind it, but we're incrementing it or creating it. So we're adding a user to Twitter. So we need a user ID, a unique user ID.
28:22
So that's what we do first. HM set means we're... H means hash and M means multiple. So we're creating a hash and we're setting values for it. And the key is user 1000. That's the user ID we just created. So the application sends that ID back, right?
28:43
And then we're creating name, bar, password, hash. And you could fairly easily map that to a JSON object, right? Key value, key value. So the key points at a hash, and the hash in itself has keys and values.
29:01
All right? Hash set, we're setting the key users is a hash, and we're setting the key bar to 1000 in that one, right? This kind of looks like assembly language, right? It's pretty cool. Yeah.
29:20
But I wouldn't want to build a big, large application where people's lives depend on this, right? Because your application has to translate this, right? Create these and send to Redis. Of course, the library will do that, but this isn't really a data model, right?
29:41
Yeah. So the title of the talk was IMDB in memory databases. Redis isn't really a database, is it? It's kind of like a key value store on steroids. Okay. Yeah, I don't think I'm going to talk you through each command here,
30:03
but you get the feel of it. If you go to Redis.io, the home page, they have a complete example built with PHP with all this thing. It's a Twitter clone, and you can download it and you can play with it. But this is what the application does. It sends these commands and gets stuff back. So it's kind of like an advanced key value store.
30:23
All right. So that was Redis. Very short. Okay, let's talk about Origodb. Just checking my time here.
30:41
Okay, so here's our slogan. One of them, build faster systems faster. And that's not really a good slogan because it doesn't really say what we built it for. It's kind of catchy because it uses the same word twice, faster and faster. Okay, so you can build fast systems or faster systems,
31:00
and you can build them faster, and that's what the emphasis should be. Okay, building stuff faster. I'm extremely impatient. I don't like wasting time. I don't like dealing with object relational mappers or relational databases. Well, I don't have to. I just want my objects, and I want to perform some operations on them. And then I have to do a bunch of modeling on the database,
31:22
and I have to pull the database into memory. I have to modify it, and I have to keep track of concurrency, and then push the data back. Feels like there should be some simpler way, right? And that's kind of the idea behind Orego DB, to make it easier. Build faster systems faster, yeah.
31:42
And they are faster, but that's kind of for free. So I didn't work really hard to make this thousands of times faster than SQL Server. The engineers on the SQL Server team, they're like, yeah, every single one of them is 10 times smarter than me. But this is just so simple.
32:00
It's so easy. So what's the problem? I kind of mentioned it. It's this thing. Relational databases, they were kind of meant to do domain modeling when they were invented, right? The SQL was intended as an ubiquitous language, right?
32:22
End users were supposed to type insert and delete and select and stuff. A select statement, that's pretty close to plain English. That was kind of the intent during the design. So why do that twice? Once with the old design from the 70s
32:41
and all the work that comes with that, that's the thing all the way to the right. And then we have all this mapping because we really don't want to deal with the database. So we use tools in between that kind of hide the database and those tools mess things up even more. Performance, right? The mapping.
33:01
All right. And databases, they're so broken, they're so slow, so we need a cache. The point of a database is I can put my information there, I can do some transactions and give me some information so I can show it to the user. But they're so slow, so we need a cache. Isn't that stupid?
33:23
Yeah, so if the database was fast enough, we wouldn't need a cache. And that's kind of like the performance and what happens when you go pure in memory. And this is kind of the idea we use, our philosophy. So it's not really the performance,
33:41
it's kind of like we don't have to do those three things and we save a lot of time. Okay. Yeah, so we stick all the objects in the domain layer and just keep them there. And then we run the transactions in memory and then we log the commands. We write them to a file like a log.
34:01
And then when the system starts, you reread the log and then you've rebuilt the system. And the only catch is it has to fit in RAM. But every single database I've worked with the past 20 years, they easily fit in the RAM we have today. So it's not really a problem. Okay.
34:20
So it's a pretty simple idea. This is some systems or transaction theory. We have a state, an initial state, right? And then we apply an operation to the state and then we land in a new state. And each state is consistent, you know, acid transactions. They're atomic because the operation either fails or it doesn't.
34:42
We don't get stuck between the states. And yeah, we just apply the operations and get new states. So that's really the idea. And if you keep the state in memory and just persist the operations, then you don't have to worry about how you map your data or represent it on disk.
35:02
And then you can keep it random access because it doesn't have to have a specific structure. So you can use collections, you can use lists and dictionaries and queues, all the .NET collections and types. And there's no mapping to different types. So that's a really simple idea. But I think it's pretty powerful.
35:22
It's so simple, it's deceptive. People don't understand the power because you like shiny tools with lots of buttons and stuff, right? Most developers do. Okay, this isn't really a new concept. System prevalence. There's a project called Prevailer if anyone's working with Java.
35:40
It's a pretty old project from 2000, I think. So they're doing exactly the same thing as OrigoDB is. Yeah. And the guy who started it, he thought he solved all the problems of databases year 2000. And he showed it to the world and no one understood.
36:01
And now there's a small group, a couple of hundred or a couple of thousand just using this and just benefiting from it. Okay. MongoDB uses op logging. You know what that is? The operations you do in MongoDB, it logs those. It also uses the operations for replication.
36:21
So it's pretty easy to set up a slave server and you just send the operations over there and wait for them to complete. And then you have a synchronous replica. All right. Redis uses append-only file. If you read Martin Fowler's blog, he has a post there on memory image. If you read that carefully,
36:41
you'll see it's the same pattern. That's the name of it. Voltev uses logical logging. So they have a log and they log the operations and they keep all the data in memory. Okay. Akka, pretty cool right now. You guys know talks about Akka at NDC. That's strange.
37:01
Akka.net, actor's model. They have persistence for the actors and it's the same principle. So you log all the operations of the events, messages that have received the actor and then you can rebuild the state of the actor later. Event sourcing, of course, you've all heard it. It's the same principle, right? Those of you familiar with event sourcing?
37:22
CQRS? But mostly event sourcing. It's the same thing. You save the events and not the data. Okay. So let's look at how it works. Let's look at how it's built. This is also pretty simple. The blue things here are the components or the parts of the framework that you interact with
37:42
and the yellow things, if that's the color, are the things that you create. Right? So you have application code and it talks to the engine if it's in the same process. You can run Origodb in the same process. You can host it directly in your IIS web application
38:01
so it runs in the same process. You should turn off recycling for the IIS if you do that because then you have to reload from memory every recycle. So your application speaks with either the engine or a server and the server, it's a standalone Origodb server
38:23
with a lot of features like replication. I mentioned MongoDB replication. Origodb server has it as well. So you can do single, master, multiple slave replication. You can do ad hoc queries, the web UI on the server.
38:40
I'll show it to you if we don't run out of time. Okay. And we have the engine. So the server embeds an engine. And the engine has a storage module and it's pluggable. So default, it writes the file. So if you just start up Origodb, it will, in the current directory, it will just start writing the events to files in the current directory.
39:01
You can also write snapshots. But you can use SQL as a backing store as well. So you can put the events or the commands, write them to SQL, or you can use Event Store or you can write your own. Okay, you can choose serialization. Default is binary format or you probably shouldn't use that in production. I would go for protobuf because it's flexible and fast and small.
39:24
Pretty optimal. Interoperable. You can use JSON. That's pretty smart. So you can pass commands and queries from your application to the server in JSON and that opens up for HTTP and REST APIs and stuff and communicating from other platforms.
39:41
Okay, so the kernel is the thing that guards the model. The model, that's the most important thing. That's your data and that's in memory. So you have an instance of the model and it lives in memory and you don't have access to it. You don't have a reference to it. All communication goes through the engine. And what do you do? There are two things you can do with it. You can ask it something or you can tell it something.
40:01
And asking something, that's a query. So you create a query and that's just an object that you've written yourself probably using link or something else and you pass the query to the engine. The engine executes the query and takes care of concurrency and that kind of thing and returns the results to your application code. If you pass a command, the command gets written to the journal.
40:24
It gets journaled. And then it gets executed. So the kernel executes a single command at a time. So it's single threaded. So there's no concurrency issues between the commands because it's fast enough. I can run 20,000 transactions per second on this laptop.
40:42
Commands, sustained commands per second on my laptop. And then there's plenty of room for queries as well. Because in between those commands, while the engine is waiting for them to flush to disk, we can run hundreds of thousands of queries, depending on their nature, of course. So it's really, really fast. But we didn't work hard to make it fast.
41:01
We got that for free because it's in memory. Okay, so you write commands and queries in a model and then you just go. Okay, this is so important. I made a slide specific from it. When you have a complete history of events, everything that's happened, and something goes wrong at the client, I can grab their log, the command journal,
41:25
and I can load it up in the debugger. And I can go to the exact point where they wonder something or something failed, and I can start stepping through the code. Just replay the events, and that's extremely powerful.
41:40
I can fix a bug. If I have a bug in the command execution code, so the thing in memory isn't really what it should be because there's a bug somewhere. I can fix that bug, and then you replay the events, and you'll fix the bugs that are old, not just the new bugs. So if you screw up some data, you just re-project it.
42:01
All right, that's pretty cool. And you can restore to a specific point in time. I want to back up to yesterday. You just tell it what time you want to start from. So that's kind of the benefits from event sourcing. And then you have an audit trail. You know who changed what and when. Every single command, when it was executed and by whom.
42:21
You get that for free. OK, I wanted to say that. OK, so let's look at some code. Here's an example model. It could look like this. OK, so we write some code. We write a model, and it derives from model. It's one of the framework types. And it should be serializable if it's supposed to support snapshots.
42:44
If we try to take a snapshot with binary formatter, and it's not serializable, the snapshot will fail. So that's why it's serializable. But you don't have to do snapshots. You can just stay with the journal. OK, so here are three data sets, three sorted dictionaries. So the keys are GUIDs,
43:00
and they're pointing to specific objects, classes, types, customer or product, you see. And we're newing that up in the constructor. And you see it's internal, so we can manipulate these data structures from commands and queries. This is a kind of anemic model. It's just a container for structures. You don't have to do it this way. You can put behavior here as well, and you can test it.
43:23
But this example is anemic. So the command. Here's one, add customer. And it just derives from, you see, command. And we're passing the generic type, commerce model. And that was the model on the previous page. And it's immutable. So we have read-only properties. They should be immutable.
43:40
You shouldn't be changing these things after they've been logged. That's kind of like getting in trouble. All right, and then you have an execute method. And it's the kernel that calls execute. You give the engine the command, and the engine passes the command to the kernel. And the kernel, it calls execute and passing in the model,
44:03
running only a single command at a time. It's pretty simple. It's calling a board here, so it's throwing an exception if there's a duplicate key. And abort signals to the engine that, okay, something went wrong, but we didn't touch the model, I promise. We didn't screw things up. So just ignore, just continue.
44:22
That's what abort means there. If we throw an exception that's unexpected, then we can tell the system, uh-oh, shut down, because this is not good. This command failed, and we don't know what happened to the model. Because some systems, that's really important. If you work with healthcare or you work with medicine,
44:40
how much medicine did that patient get the last time they got medicine? We don't know. That's not a good answer, right? So some systems are really, really important. And if you need that consistency, then you can get that here, okay? So queries, same principle. We derive from query, pass in the model,
45:02
and then the return type, you see? Customer view, that's what we're returning. And customer view is not a customer, because if we return a customer, and the customer has orders, and each order has a reference to a product, and each product has a reference to every order,
45:20
you get the picture, right? It's strongly related, highly connected. If we grab a customer, we'll pull that entire graph and return the entire graph back. That's not what you want. So you have to kind of break the graph, and that's why we use views, though. So return customer view, okay?
45:40
It's kind of a principle. You could use a lot of advanced link here as well. I'll show you that soon. Okay, so now we have a model, we have a command, and we have queries. So then we start our engine. So we see engine.for as a convenience method for creating or loading an existing database. It'll look in the current directory.
46:02
We can create a configuration object and pass to engine.for, pointing it at a remote server or pointing it at a specific directory or using a specific storage provider and so on. Yeah, so you see what's happening? It's pretty simple. We're creating a new add customer command, and we're passing the command to engine, right?
46:21
Engine execute. So that gets logged, and the model gets transformed. And then we're asking a question also by passing a query to execute. And that's all there is to it. I build applications using this, and I've been doing that for the past ten years. Okay, and it works great. Okay, so let's have a look.
46:44
A little demo. I'm going to connect this. Let's do that.
47:02
There. Okay, this is an example. We built it just for fun, just to show how fast things are. So there's no cache here. This is a kind of search engine. It's like Google, but really, really small. So I have a little process that goes, fetches a lot of blog articles,
47:23
and just pushes them into an in-memory model. So when we're searching, if I do a search here... Oh, that was great. Something better.
47:47
I think the JavaScript is wedged. Oh, server not found. I'm not connected. Good. That kind of explains it, right? Good. There, that's better.
48:02
Okay, so 48 results in the in-memory database, and it took 0.07 seconds, and there's no caching, so it did a round trip to the server and searched all the data. There's not a terrible amount of data. There are 8,153,000 connections between links from keywords
48:23
to the articles that contain the keywords, and that's what we're getting in the list. You can check this out if you like. It's on GitHub. You see the address, just Google for GeekStream. It's built with Orego DB and Orego DB Server. The source here is on GitHub, and it's useful as well.
48:41
Okay, so let's show you a bit of code. Complicated, right? So the entire model is 260 lines of code. That's the model. And then there are some commands over here.
49:01
So we have a domain with one, two, three entities, feeds and feed items, and then we have the model, the GeekStream model. And then there are query objects over here, getFeedByID, getFeedsQuery, and so on, searchQuery. Look at the search query.
49:25
Yeah, so it's a pretty simple link extension methods. Get some items. We're searching the model over here, so the search logic is a part of the model, and then we're just skipping and taking and transforming the result in the query. Pretty simple stuff.
49:41
Great. And here's the core. It's a sorted dictionary of string. So the string keys are the words that we're searching for, and they point at sorted sets of feed items, the feed items that contain that word.
50:02
Okay. That was GeekStream. Okay. I'm not going to show you this one, but I actually built a Redis model, because I don't like that. Redis kind of feels like assembler. So I wrote a Redis clone using OregoDB.
50:21
So it's a Redis model. And here's a dictionary of string to objects, and those are just using regular types. So I implemented every single basic command for Redis using OregoDB, and it took an evening or two. Pretty fun. It's not as fast, of course. Redis is a great piece of work,
50:42
but I kind of did this for fun. But you can use this, and you can do it embedded. You can't run Redis embedded. You have to run it as a remote server. But now you can have the same model if you want it embedded. So that was that.
51:04
Extend. Great. The demo. And we've got nine minutes left. Okay, great. So conclusion. Yeah, it's not a lot of time to look at three products, but I hope you got a little hint at what they all can do.
51:21
Really different. The only thing they have in common is it's in memory. We're keeping data in memory. That's kind of the commonality here. So, yeah. You like that text? The old, the new, and the ugly? Can you guess which one? Which one is which one? Yeah. Well, and I was really being nice
51:41
because I thought you could say, I was thinking about saying the disease and the cure. Right? So SQL, it's great, but it's old. It's not fast enough for us, so we add a cache. And so the relational database and the cache are kind of like partners in crime. Yeah.
52:02
So I say use the ArrigoDB when you can, but I'm crazy, so I use it for everything. I want to try it out, but it fits really well when your data fits in RAM if you don't have extreme amounts of RAM. It fits really well when you have performance problems with SQL, but if you have an existing SQL,
52:20
it's pretty hard to integrate, so you kind of have to go all in. It's good for greenfield development, I would say. But you can integrate because you can write the log to a SQL server, so if someone says, oh, everything has to be SQL server, okay, fine. So I'll create one table with events in SQL server, and then everyone's happy because the operations,
52:41
they have SQL, and they can back it up, and that's a strength. Okay? So you can combine them. So use Arrigo when you can. Use SQL server when you have to. Use Redis when you have to. Redis, it's really cool and fast, but I wouldn't use it to build applications. All right? Caching, it's good for caching. One thing about Redis is you can't really do queries
53:03
across all the values or keys. You operate on one key at a time, so that's a limitation. You can't do analysis. You can do analysis with SQL server. You can do analysis with Arrigo as well. All right. SQL is great if you have a lot of memory, a lot of data, more than fits in RAM, right?
53:25
So Arrigo is, when you run out of RAM, you have to pay for a lot of RAM. If you're not really benefiting from the performance, maybe it's not really a good fit. Yeah, and SQL server, okay, I'll skip that slide.
53:43
There. I have a demo that shows that SQL server is not really atomic unless you really, really know what you're doing. If you're interested, you can ask me later. It's not consistent, and it's not isolated either if you're into ACID and all those kinds of things. You know, it's read committed, isolation level by default,
54:03
and there's a lot of anomalies you can get with that. Okay, we can talk about that later. So, yeah, so licensing. SQL costs money. It's enterprise licensed. Redis is free. Arrigo is free, but we're trying to sell some enterprise stuff, some replication, but I think we're going to change that.
54:24
Change the model, so we're working on that. But the embedded engine is free at the moment. Okay, so SQL, that's great writing, right? T-SQL, you like writing application logic with SQL? That's great. Testable and great.
54:43
No, so, Arrigo, C sharp, link, nothing but. No mapping, nothing. Redis, you use the commands, but you can also use Lua scripting. It feels kind of glued on, but you can write transactions and do them with Lua script. I haven't really played with it. So OLTP, yes, for all of them, OLAP, analysis can't do with Redis.
55:06
Arrigo, you can run it in process. If we were to rank performance, I'd say Redis is first. It would be silly to, say, claim some amount of transactions per second and stuff, but Redis is great performance, and Arrigo is great performance as well,
55:22
but not really near Redis. But we can do more with Arrigo. Yeah, the modeling is pretty powerful. I think you understood that, right? You create a model with C sharp, and then we do commands and queries, everything C sharp. So the modeling is really flexible, really powerful.
55:41
Relational is powerful as well, but it's kind of old. But the modeling capabilities of Redis are kind of limited and kind of fixed. But then again, SQL is extremely mature, right? Lots of connectivity, lots of operations, lots of knowledge. We know how to use it, lots of tools.
56:01
Yeah, really powerful. It's hard to beat that, but I think it will go away in 50 years, maybe. Okay. Yeah, size. In-memory OLTP for SQL has a limit of 250 gigabytes per database. But then again, you don't have to put all the data in memory with SQL, right?
56:22
So you can choose specific tables where you get the largest benefits. You should put those in memory and figure out what those are. So it's not really a limitation. But both Redis and Arrigo, this amount of available RAM on the server, but it has to stay in memory.
56:41
So, a little comparison. Great, I'm finished. And if you have any questions, just shoot, we have three minutes left. Any questions? What happens if the database really exhausts the memory?
57:01
Exhausts the memory? Well, you have to check that. If you have a relational database, you heard the question? What happens if you run out of memory? Well, then you get an out of memory exception and commands will fail, okay? That's a simple answer. But what happens if you run out of disk? We've all done that before, right? The log is full or the disk is full when we're doing SQL.
57:21
We have to monitor it and just be aware of it. So you have to know when it happens or you're not really doing your job. So it's not really a difference, I wouldn't say. Yeah. Well, you can do JavaScript at the moment. At the server, you can run stuff.
57:41
There's an API. If you run the server, there's a JSON. You can pass messages as JSON over HTTP. Exactly. And you can also write a client.
58:00
If you use the protobuf formatter and describe the format, then you can write a binary client in any language as well, theoretically. This is just basic request response. Okay. More questions? Yeah?
58:24
Well, that kind of depends on the model. That's a good question. Okay, you heard the question. Can we partition? Can we scale it, scale out? Well, if the model is partitionable, then yes. The geek stream model we looked at, it's partitionable. We kind of designed it for that. So each feed has a bunch of feed items.
58:43
And we can put those on, we can partition those based on feed ID because there's no relationships between feeds. And then we just have duplicate keywords. So we just parse the keywords that are on each server and let them point at the items on the server. And they're built into the client so you can send a query to multiple servers
59:05
and then you gather the results in parallel. So it's not MapReduce, but it's something similar. And it's built into the client. But it's not automatically scalable. You have to design the model to work that way. Okay?
59:26
That's a really great question because what is a graph? A graph is nodes and edges. And what is an object with references? It's a graph, right?
59:40
So the model you saw here, it's a strongly typed graph. The other day I created a generic graph model. It took a few hours. It's not complete, but it's just a model with a dictionary of nodes and a dictionary of edges. And it's finished. And then just some...
01:00:00
some operations, all that. You can do a generic model, so you can do the type of stuff like you can do with Neo4j, or you can do a specific model like we did here, and you can query it with strongly typing language. I found a project on GitHub where you can take link expressions, expression trees,
01:00:21
right? You know what that is? Okay. Expression trees and serialize them. You can write queries for the graph model and then serialize them and pass them to the server. I already experimented with it, and it's working. Yeah? Great. Took a few hours. Okay? Next. If you want to persist it, what happens if the binary load starts to grow way too
01:00:43
large? You can't really delete anything because you have to play back from the beginning, Yeah. Good question. Okay. What happens when the log grows? This will be the last question because it's 10-2, so we can continue down here later. Okay, so what happens when the log grows too big?
01:01:02
Well, it takes a long time to load, yeah, and depends kind of on the size and the serialization format. The binary format is pretty slow to deserialize because it has to reattach all the references and it's kind of complex, right? But if you use a protobuf and you compress it, it's pretty fast, so you can load hundreds
01:01:24
of gigabytes of log in, like, 10 minutes. So if you have two nodes, if you have a master and a slave, and you're patching a server, and then you just take down the slave, you patch the server, you bring it up, and then you sync it, yeah, and then you take the other one down.
01:01:41
So you always have one node up, yeah. I try to avoid that. I save the data as far as possible, but you can take snapshots, yeah. So when it loads, it loads the most recent snapshot and then the log from there. But I don't like to throw away data, so, okay.
01:02:01
Great, thanks for coming. Thanks for having me. I'm good. Yeah. Yeah. And then some button on the way out, right? Okay, thanks.