The ZODB
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 66 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/55292 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsjahr | 2016 |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Plone Conference 201650 / 66
4
5
6
7
8
9
10
11
12
13
15
17
18
19
22
23
24
25
26
28
30
31
32
36
43
44
46
47
48
49
50
52
53
55
56
57
60
62
63
64
65
66
00:00
EbeneSeidelW3C-StandardSoftwareProzess <Informatik>ComputerspielMultiplikationsoperatorPunktRotationsflächeObjekt <Kategorie>SoftwareURLDatenbankWhiteboardElektronische PublikationHierarchische StrukturTransaktionComputersicherheitKlon <Mathematik>Vollständiger VerbandEin-AusgabeRichtungOffice-PaketErwartungswertObjektorientierte ProgrammierspracheQuick-SortUmsetzung <Informatik>Physikalischer EffektServerDateiverwaltungChatten <Kommunikation>Vorlesung/KonferenzBesprechung/Interview
03:55
Produkt <Mathematik>FokalpunktMailing-ListeRuhmasseInformationsspeicherungGRASS <Programm>Online-KatalogIndexberechnungElastische DeformationTopologieSchlussregelDatenbankAutomatische IndexierungAnalysisCloud ComputingNotepad-ComputerFramework <Informatik>Güte der AnpassungBefehl <Informatik>Kontextbezogenes SystemZweiBitKomplex <Algebra>Quick-SortSynchronisierungFitnessfunktionGerichteter GraphProjektive EbeneWellenpaketDesign by ContractMultiplikationsoperatorZahlenbereichOpen SourceProgrammbibliothekNatürliche ZahlKartesische KoordinatenTopologieServerRechenschieberPunktDateiverwaltungInformationsspeicherungPhysikalisches SystemAutomatische IndexierungDatenbankSystemzusammenbruchVersionsverwaltungSchlussregelProdukt <Mathematik>ElementargeometrieÄhnlichkeitsgeometrieSicherungskopieImplementierungTermWeb-ApplikationEchtzeitsystemRückkopplungApp <Programm>ProgrammierparadigmaVektorpotenzialFokalpunktDatenreplikationTransaktionMathematische LogikClientSoftwareOnline-KatalogÄußere Algebra eines ModulsGerade ZahlDatenverwaltungProgrammierungComputeranimation
12:36
Framework <Informatik>Notepad-ComputerProgrammbibliothekRechnernetzDatenbankGanze FunktionImplementierungROM <Informatik>Cloud ComputingServerLoopEreignishorizontMathematische LogikThreadFunktion <Mathematik>TransaktionDatenflussTeilmengeInhalt <Mathematik>Web SiteZeitbereichKartesische KoordinatenWeb SiteDatenbankCachingHalbleiterspeicherProzess <Informatik>ServerPhysikalisches SystemTransaktionWarteschlangeObjekt <Kategorie>SichtenkonzeptTermCASE <Informatik>Hecke-OperatorGebäude <Mathematik>FitnessfunktionInstantiierungZeitstempelMathematikBildschirmfensterEinfach zusammenhängender RaumClientVersionsverwaltungProtokoll <Datenverarbeitungssystem>SystemaufrufBimodulSchnittmengeTaskQuick-SortBitComputerarchitekturRechenschieberParallelrechnerLoopFunktion <Mathematik>EreignishorizontThreadEndliche ModelltheorieImplementierungDatenflussDienst <Informatik>GruppenoperationMultiplikationsoperatorInterface <Schaltung>AnalysisProgrammbibliothekSocketEin-AusgabeDigitalisierungFächer <Mathematik>GrößenordnungAussage <Mathematik>VerschlingungResultanteDeskriptive StatistikMaschinenschreibenKonfigurationsraumAuswahlaxiomSynchronisierungPunktDatenparallelitätGüte der AnpassungSchreiben <Datenverarbeitung>TabellenkalkulationCodeProgrammierparadigmaProfil <Aerodynamik>Computeranimation
21:18
ROM <Informatik>TeilmengeWeb SiteInhalt <Mathematik>ZeitbereichMarketinginformationssystemDatenbankCompilerBrowserCodeStellenringClientInformationsspeicherungBenutzerfreundlichkeitStreaming <Kommunikationstechnik>IRIS-TLastObjekt <Kategorie>CachingBasis <Mathematik>AuthentifikationPasswortDigitales ZertifikatChiffrierungDienst <Informatik>TermAuflösung <Mathematik>ServerKonfiguration <Informatik>VersionsverwaltungServerVersionsverwaltungGrenzschichtablösungWort <Informatik>Kartesische KoordinatenSchnittmengeCachingGraphische BenutzeroberflächeObjekt <Kategorie>ÄhnlichkeitsgeometrieDienst <Informatik>PunktWhiteboardAutomatische IndexierungSichtenkonzeptOffice-PaketDatenstrukturBoolesche AlgebraLastteilungDateiverwaltungRelationale DatenbankDatenbankClientInformationsspeicherungTransaktionAuflösung <Mathematik>Projektive EbeneFortsetzung <Mathematik>MultiplikationsoperatorInhalt <Mathematik>BenutzerbeteiligungBrowserQuick-Sortp-BlockSynchronisierungBrennen <Datenverarbeitung>Interface <Schaltung>MAPDifferenteKontextbezogenes SystemZahlenbereichKorrelationsfunktionKlasse <Mathematik>Zentrische StreckungFeasibility-StudieOrdnung <Mathematik>Arithmetisches MittelEindringerkennungCluster <Rechnernetz>VerkehrsinformationBimodulIterationEinfach zusammenhängender RaumMathematikFormation <Mathematik>SchlüsselverwaltungKonfiguration <Informatik>MereologieMechanismus-Design-TheorieGüte der AnpassungSchreiben <Datenverarbeitung>GamecontrollerBitWeg <Topologie>Äußere Algebra eines ModulsIndexberechnungEndliche ModelltheorieKonfigurationsdatenbankVererbungshierarchieDigitales ZertifikatPasswortMusterspracheTermAuthentifikationProtokoll <Datenverarbeitungssystem>LastStreaming <Kommunikationstechnik>AbfrageKanban <Informatik>Folge <Mathematik>Prozess <Informatik>ChiffrierungDatenreplikationCodeNichtlinearer OperatorComputeranimation
30:00
TopologieKeller <Informatik>Auflösung <Mathematik>ClientObjekt <Kategorie>TransaktionRuhmasseMeterServerKlasse <Mathematik>DatenbankPhasenumwandlungProzess <Informatik>EntscheidungstheorieInnerer PunktFlächeninhaltp-BlockVerschlingungBenchmarkBefehlsprozessorStatistikSpitze <Mathematik>VererbungshierarchieVirtuelle RealitätWurzel <Mathematik>GrenzschichtablösungZugriffskontrolleAdditionChiffrierungAuthentifikationDigitales ZertifikatClientBasis <Mathematik>SoftwareProtokoll <Datenverarbeitungssystem>TransaktionServerBildgebendes VerfahrenVersionsverwaltungAuflösung <Mathematik>DatenbankProzess <Informatik>PhasenumwandlungDatenstrukturBoolesche AlgebraObjektorientierte ProgrammierspracheLastteilungElementargeometrieFächer <Mathematik>Quick-SortMetrisches SystemMultigraphMailing-ListeRechter WinkelMereologieNeuroinformatikPhysikalisches SystemInverser LimesBitKonfigurationsraumFormale SpracheLastMakrobefehlThreadMaschinenschreibenBootenKartesische KoordinatenComputersicherheitMAPCASE <Informatik>DatenverwaltungExogene VariableMessage-PassingMaskierung <Informatik>PunktStreaming <Kommunikationstechnik>GruppenoperationEinsAutomatische HandlungsplanungZweiUnrundheitProjektive EbeneWarteschlangeKlasse <Mathematik>Abstimmung <Frequenz>Objekt <Kategorie>MultiplikationsoperatorVirtuelle MaschineEindringerkennungAggregatzustandTermGrenzschichtablösungHilfesystemZahlenbereichMathematische LogikDifferenteBrowserData DictionaryTupelOrdnung <Mathematik>Mechanismus-Design-TheorieEndliche ModelltheorieProgrammbibliothekSchlüsselverwaltungAnalysisÜberschallströmungStrategisches SpielWort <Informatik>EreignishorizontVerschlingungKontextbezogenes SystemBefehlsprozessorAutorisierungElektronische PublikationHeegaard-ZerlegungMultiplikationSchnittmengeComputeranimation
38:42
Objekt <Kategorie>GrenzschichtablösungWurzel <Mathematik>VererbungshierarchieDatenbankAdditionZugriffskontrolleChiffrierungVirtuelle RealitätAuthentifikationDigitales ZertifikatMultiplikationZenonische ParadoxienNebenläufigkeitskontrolleÄhnlichkeitsgeometrieInformationsspeicherungSoftwarewartungImplementierungVersionsverwaltungDatenmodellLokales MinimumRegulärer Ausdruck <Textverarbeitung>TermVektorpotenzialClientExogene VariableMeta-TagSpieltheorieTransaktionKonfiguration <Informatik>ServerKontextbezogenes SystemElektronische PublikationDatenbankInformationsspeicherungMathematische LogikSpeicherabzugMailing-ListeClientBrowserCASE <Informatik>Objekt <Kategorie>ImplementierungSichtenkonzeptServerKonfiguration <Informatik>Endliche ModelltheorieDatensatzMereologieQuick-SortTermSchlüsselverwaltungResultanteService providerBitUnrundheitKartesische KoordinatenMultiplikationsoperatorPatch <Software>DatenparallelitätStrategisches SpielSynchronisierungRichtungSoftwareTelekommunikationHeegaard-ZerlegungEinfache GenauigkeitGamecontrollerQuelle <Physik>Föderiertes DatenbanksystemLastMechanismus-Design-TheorieWiderspruchsfreiheitHalbleiterspeicherDatei-ServerDateiverwaltungSpeicherbereinigungTransaktionEchtzeitsystemZeitreiseSoftwarewartungZeiger <Informatik>TopologiePunktEinsRechter WinkelFormation <Mathematik>VektorpotenzialSoftwaretestFächer <Mathematik>FehlermeldungValiditätArithmetisches MittelDateiformatWurm <Informatik>Güte der AnpassungElementargeometrieVererbungshierarchieAnpassung <Mathematik>HackerGeradeAuthentifikationAuflösung <Mathematik>Serielle SchnittstelleMultiplikationVersionsverwaltungWrapper <Programmierung>Offene MengeHook <Programmierung>Ganze ZahlTabelleMusterspracheDienst <Informatik>EindringerkennungComputeranimationVorlesung/Konferenz
47:23
ServerInformationsspeicherungElektronische PublikationTransaktionMultiplikationDatenbankSoftwaretestVersionsverwaltungZeiger <Informatik>Mixed RealityAutomatische IndexierungROM <Informatik>DualitätstheorieSpeicherverwaltungImplementierungPrimzahlzwillingeDatenbankDatensatzElektronische PublikationZahlenbereichZeiger <Informatik>VersionsverwaltungÄhnlichkeitsgeometrieMultiplikationsoperatorImplementierungQuick-SortBefehl <Informatik>SoftwaretestMultiplikationSpeicherabzugBitMereologieAdditionFormale SpracheAuswahlaxiomThreadEndliche ModelltheorieServerMetrisches SystemWeb logCachingDateiverwaltungSpeicherbereinigungsinc-FunktionKomplex <Algebra>DatenloggerVerschlingungSoftwareHeegaard-ZerlegungSkalierbarkeitKartesische KoordinatenAutomatische IndexierungPunktHalbleiterspeicherElektronischer ProgrammführerGebäude <Mathematik>Projektive EbeneTermGrenzschichtablösungSichtenkonzeptInhalt <Mathematik>ProgrammbibliothekData MiningNichtlinearer OperatorInformationsspeicherungTransaktionProzess <Informatik>Güte der AnpassungComputeranimation
56:05
SpeicherverwaltungImplementierungServerInformationsspeicherungGammafunktionSoftwaretestGlobale OptimierungFormale GrammatikDatenreplikationRahmenproblemDesintegration <Mathematik>Elastische DeformationAutomatische IndexierungAuflösung <Mathematik>E-MailSpeicherverwaltungLaufzeitfehlerQuick-SortStandardabweichungZeiger <Informatik>DatenreplikationServerInformationsspeicherungSoftwaretestComputerarchitekturMustersprachePunktBenchmarkResultanteSoftwareentwicklerVirtuelle MaschineObjekt <Kategorie>MereologieBitEreignishorizontRechenschieberLoopTransaktionSpeicherbereinigungWeb logDateiverwaltungMultiplikationsoperatorTeilmengeZählenEntscheidungstheorieImplementierungMAPTermDienst <Informatik>KonfigurationsdatenbankStreaming <Kommunikationstechnik>Automatische IndexierungProzess <Informatik>Protokoll <Datenverarbeitungssystem>ClientDatenbankSpeicherabzugLastteilungRelationale DatenbankVersionsverwaltungKartesische KoordinatenKanban <Informatik>WhiteboardOrdnung <Mathematik>Klasse <Mathematik>SichtenkonzeptAuflösung <Mathematik>ÄhnlichkeitsgeometrieSchreiben <Datenverarbeitung>Objektorientierte ProgrammierspracheÄußere Algebra eines ModulsProjektive EbeneWeg <Topologie>ZahlenbereichIterationSchnittmengeComputeranimation
01:04:47
RahmenproblemAutomatische IndexierungDesintegration <Mathematik>Elastische DeformationAuflösung <Mathematik>ProgrammierumgebungSchwebungAutorisierungKlasse <Mathematik>Protokoll <Datenverarbeitungssystem>SpezialrechnerBrennen <Datenverarbeitung>Data DictionaryQuick-SortVerkehrsinformationSoftwareClientServerTupelBimodulMechanismus-Design-TheorieRechter WinkelIterationKartesische KoordinatenMultiplikationsoperatorVektorpotenzialEindringerkennungEinsVirtuelle MaschineAutomatische HandlungsplanungMereologieProjektive EbeneKlasse <Mathematik>GrenzschichtablösungProtokoll <Datenverarbeitungssystem>SchnittmengeBoolesche AlgebraBitBildgebendes VerfahrenBasis <Mathematik>BrowserComputeranimation
01:11:16
AutorisierungKlasse <Mathematik>Protokoll <Datenverarbeitungssystem>SpezialrechnerSichtenkonzeptInformationRechenwerkCachingClientEndliche ModelltheorieObjekt <Kategorie>RichtungObjektorientierte ProgrammierspracheVersionsverwaltungWrapper <Programmierung>Quick-SortMereologieTermPhysikalischer EffektSchlüsselverwaltungMusterspracheServerEindringerkennungDateiverwaltungTabelleDatei-ServerGanze ZahlPhysikalisches SystemRechter WinkelAutorisierungFächer <Mathematik>Klasse <Mathematik>Metrisches SystemProzess <Informatik>SoftwareMultigraphTUNIS <Programm>InformationsspeicherungImplementierungMessage-PassingMaskierung <Informatik>Kartesische KoordinatenBildgebendes VerfahrenMakrobefehlComputersicherheitComputeranimationVorlesung/Konferenz
01:17:46
InformationSpeicherbereinigungZeitreiseMechanismus-Design-TheorieQuick-SortMereologieDateiformatZeiger <Informatik>TopologieInformationsspeicherungReelle ZahlSoftwaretestMultiplikationsoperatorTransaktionFächer <Mathematik>DatensatzEinsObjekt <Kategorie>ServerGlobale OptimierungMailing-ListeComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:05
I asked Jim if I could give an introduction for him because he will never remotely do a good job of introducing himself. 20 years ago, Jim started working on the software that he came to. I'll tell a funny story about that during the live chat.
00:23
But, it's amazing to go back and think about that. We had marketing brochures at the time, three big points. One of them was a database that feels like a file system. Still true today, still very different from everybody else in the market, still feels revolutionary.
00:41
URLs you can read to your grandmother over the phone. Back then it was vignette and this was your URL. Object publishing, still revolutionary today. And don't let your customers shoot you in the foot, which was hierarchical security and hierarchical objects.
01:00
It's amazing to think about what Zope, after that flown, based on these ideas have done. Hundreds of companies around the world bet their business on Zope back in the day. Hundreds and hundreds and hundreds and hundreds of companies built their businesses on top of the clone. Based on ideas that still haven't been caught up to.
01:22
Philip Eby one time wrote a foreword to a book about Zope and about Python saying, the rest of Python doesn't even know what they don't know. When they talk about Zope and talk about Python. So, I'm going to use this as an opportunity to make fun of Jim.
01:41
Tonight if you buy me a beer, I'll tell you the story about what Jim said when the venture capitalist said that everyone should work 70 hours a week. So, instead I'll tell you a different story. Cause he's talking about the ZODB. Unlike my other stories, this one's true.
02:01
He walks into my office, and my office is approximately the size of his chair. He sits across the other side, cause I've been bugging him, can we have transactions? Can we have persistence? Can we have transactions? Can we have multi-app server processes? And he comes and he says, where do you think we are, a database company? Ladies and gentlemen, this is Jim.
02:23
I should point out that when I said that, it wasn't in, especially object oriented databases, because back then they were still a thing.
02:44
Companies doing interesting work with object oriented databases, and I thought it was pretty exciting. It's a shame, I mean it's understandable, thank you Java, that the industry sort of went a different direction. But there was a lot of cool research going on back then.
03:02
Okay, so I'd like to talk to you, you know, I've been, as I'll mention, I've been working a lot on ZODB lately. I've been doing some interesting things, moved forward on some things. And so I want to, and I plan to continue doing that for a while at least. And so I'd like to get input about, you know, directions that we might go.
03:27
So I really want this talk to be kind of a conversation. Whenever I give a talk, I always start by saying, you know, the only bad question is the one you don't ask. And I prefer to be interrupted rather than have you have a question and then miss out, although this isn't going to be super technical.
03:43
So, but I encourage you to ask questions. If it gets out of hand, we'll move on, but there's a lot of time here so that we can actually do that. So anyway, so I'm working 100% on ZODB and have been for several months.
04:02
I've wanted to do this for a long time. There were times at Zope Corporation when I was really, when I could focus on it, but I was really focused on the problems that we were trying to solve and we solved some pretty interesting problems. And then Zope Corporation is gone, if you didn't know that.
04:21
Zope Corporation is, it's like the Python, it's like the parrot in Monty Python. So I hadn't been able to work on it. I was doing interesting work for a great company, but I really wanted to work on ZODB again.
04:41
And I really wanted to provide some focus to really give it a chance to succeed. Succeed in the niche that it belongs in, because it's not a solution to every problem. No database is, really. It always frustrates me when people say, well, this is a good database.
05:01
And that statement makes no sense out of the context of whatever problem you want to solve. But I think ZODB is an excellent database for certain kinds of problems, and I'm excited to be focusing on it again. This was made possible because a company called ZODB was building a product on top of ZODB.
05:23
They took advantage of the fact that most of the logic is on the client, which meant that the data could be encrypted at rest on the server, and that was sort of an opportunity. Unfortunately, their customers weren't really Python developers. It wasn't really a good fit for the kind of customers that they had.
05:45
So we did a lot of interesting work together, but they're focusing on their Hadoop effort, and I'm continuing to focus on ZODB and hoping that there'll be some opportunities for me to help out on projects, provide training, consulting, etc.
06:01
Zope Corporation used to offer support contracts for both Zope and ZODB, when more or less whatever you'd want to buy them on, but they were structured in an interesting way, because a support contract basically gave you a certain number of hours that didn't give you a solution.
06:20
I think that's actually, especially for open source, a pretty good model, because if you have to give somebody a solution, then you have to limit their freedom to be able to hack on the software. Anyway, I'd like to do that when I figure out how. But moving along, before I get into bringing you up to date on some of the happenings,
06:40
I'd like to get some feedback from the audience on a couple of things. Unless you're doing an embedded system, and there have been some interesting embedded systems with ZODB, and I occasionally hear of interesting things where something like file storage makes a lot of sense, but if you're doing a typical web app or a typical database application,
07:04
you're going to be using ZODB with RHEL Storage, NEO, or ZIO. So I want to get a feeling for what people are using these days. So how many people are using NEO? Nobody. I'm not surprised.
07:21
Does anybody know what NEO is? How many people know what NEO is? That's kind of cool. So NEO has a lot of potential. Nextity is doing projects for people where performance and reliability are really important.
07:40
So they're doing some interesting things with NEO in terms of highly durable storage. I think that's a worthy thing to investigate. It's a little bit more effort to set up, but I think it's a worthy alternative. How many people are using RHEL Storage? And how many people are using ZIO?
08:04
Cool. Thank you. So of those people who are using ZIO, how many people are using ZRS? Well, I encourage you, if you're using ZIO, I encourage you to try ZRS. It provides real-time backup. It's not quite as durable as NEO.
08:25
With ZIO, typically replication happens very quickly, but theoretically you could have a system crash between committing a transaction and before it's gotten replicated to another system. And so there's a little bit of a chance of losing data, whereas with NEO, NEO doesn't commit the transaction,
08:43
doesn't consider the transaction committed until it's been committed on a majority of the replicas. But ZRS works really well, especially since ZRS2. ZRS1 was kind of a nightmare. I forget what it was using, but it was kind of a mess.
09:01
But ZRS2 is extremely simple, and I'll say a little bit more about some of the opportunities with ZRS later. So I encourage you, if you're using ZIO, I encourage you to use ZRS to back up your data. In my opinion, at Zobe Corporation, we never actually did backup, so we never used Repose.
09:20
We never backed up our databases, we just replicated them. And so we knew that they were essentially backed up, and they were backed up in real time. Okay, so in terms of providing search in your applications, how many people are using a catalog or something like that?
09:40
How many people are using an external index? Okay, fair number. And how many people don't use an index at all? Just maybe use a B-tree here and there. I guess I'm not surprised at that. At Zobe Corporation, for what it's worth, we ended up, partly because of the nature of the applications, towards the end we were doing some interesting mobile applications, where it wasn't really content management.
10:05
And so we actually just, where we needed to search, we pretty much just used a few B-trees. Okay, so I'm probably, when I was writing this slide, I was afraid that I'd miss some pain point, because I feel no pain.
10:21
But how many people feel that database performance is a pain point for them? Okay, interesting. Conflict. Okay, probably a lot of the same people. Indexing. Wow, you all are very kind. You're that, or I've forgotten all the pain points.
10:44
Rules of persistence, or the sort of programming model? Okay. Anybody want to shatter any pain points that I've missed? Okay, well you can tell me later too. Okay, so I want to say a little bit about some of the stuff that ZeroDB, that I did with ZeroDB.
11:04
So ZeroDB had two products. One was, and they're both about storing data encrypted at rest. The first was a database built on ZeroDB, and then the second was something similar with Hadoop, where the idea is that you'd unencrypt your data as it entered a pipeline and encrypted at the other end.
11:23
At least I assume that's what it was. I never really got into it myself. And that's what they're focusing on now. So one of the first things I did, because I felt it was, you know, the Zio implementation is very old. It's the Zio4 implementation, and it was kind of a little bit over-engineered and kind of complex in some places.
11:46
And the library that it was using, the asynchronous library AsyncCore, which has got to be by far the oldest async library in Python. And it's the same library, not quite coincidentally, that Zserver was built on.
12:03
AsyncCore is really sort of deprecated. It has some issues. And there was some suspicion that maybe it was contributing to Zio performance. Over the years, when I've talked to people, I've heard a lot about performance issues of Zio. There was the whole ZioDB shootout thing. I think Zio is a little bit closer to parity now.
12:24
But that's been something that's bothered me for a long time. And there was a suspicion that maybe AsyncCore was to blame. And maybe it is a little bit. So before doing other things like SSL, and of course AsyncIO also makes SSL easier because it's got support built in.
12:44
I re-implemented on AsyncIO. That would have been less effort than I put into it, except that I also used that opportunity to clean up the code base quite a bit. And so that was a good thing. And in fact, there were some performance improvements, especially for writes.
13:04
And there were a couple of places. I published some performance. I've added a link there to a spreadsheet that has the results and some description of how I did them. It also is interesting because it touches on some configuration choices you have,
13:22
like whether you use SSL or not, or whether you use server sync, which I'll talk about in a minute or not. But anyways, Xeo5 is, especially with Python 3.5 and UVloop, which is an alternate implementation of the AsyncIO event loop,
13:45
it's significantly faster for reads if by significant you consider maybe 20 or 30% significant. For writes, in most cases, and especially high concurrency, it's an order of magnitude faster.
14:02
So it's quite a bit faster. Now, AsyncIO introduced sort of a flurry of interest in asynchronous programming in Zopen using ZODB. And I've used asynchronous libraries for IO applications pretty much since I've been using,
14:33
from the beginnings of digital, since, say, 96. So I've been a big fan of asynchronous IO.
14:43
I'm very biased. I happen to hate asynchronous programming interfaces. And ZODB is an inherently synchronous API. For better or worse, I think for better, but people can legitimately say worse,
15:00
ZODB is an object-oriented database, and it wants to provide the illusion that you're just working with objects in Python more or less like you work with any other data. So that's what it's really about. That's its value proposition. And so there's really no good way to fit an asynchronous programming model into that, at least that I can see.
15:23
Although there's been some interesting work that I'm going to learn a little bit more about later tonight, so maybe my mind will be changed. So, ZIO is using an asynchronous library, but that's only an implementation detail, and in fact, that could change.
15:42
There's an issue, which I mentioned here, and I realized this when doing profiling and performance analysis, when working on ZO5, and Shane actually sort of figured it out a while ago, which is that, although I'm not sure if he figured out exactly these terms,
16:03
but when you're combining an async IO library with thread pools, when the thread's done doing its work, it has to notify the async library that it should do something with the data, and it turns out that that interface seems to be expensive, relative to, say, a thread queue or a lock of some kind.
16:30
And in fact, for Zope and ZServer, years ago, Shane introduced this hack into ZServer so that, rather than waking up the event loop,
16:40
he just, when a request is done, he just writes directly to the output socket, and there's a lock that protects that so that the event loop and the thread pool don't write to it at the same time. And that turns out to be a big performance win. So there is a little bit of a dark side to the architecture that I recommend in terms of async server and thread pools.
17:03
But this also has an impact on Zio, so I might actually, in the future, go to actually a less asynchronous model in the implementation of Zio, because of that. So this led me into, I didn't really have any good place to put these slides, and I'm disturbing the flow a little bit,
17:20
but there's some things I wanted to point out relative to this. And in terms of just when thinking about developing with ZADB, if you're developing an application that only has one client, some of these things aren't important, but if you're anticipating an application that has lots and lots of clients, then some of these things become very important.
17:42
So the first is that when trying to service lots of clients, you want to keep transactions very short, and there are a few reasons for this. One is, long transactions have a much higher chance of a conflict. Because basically, ZADB uses a timestamp-based protocol, which many modern databases use,
18:09
where basically, at the start of a transaction, it sees a snapshot of the database as of the start of the transaction. And so any changes made after that potentially conflict, so you want to reduce that window.
18:21
Also, connections are expensive resources. One of the big wins of ZADB is its object caching. Please don't call this a pickle cache. There's a module that suggests you should call it a pickle cache. It's an object cache. It doesn't cache pickles, it caches objects. But anyway, so there's this object cache, and you want it to be big enough to hold your working set, ideally.
18:42
But that means it's a very expensive resource, because you don't want to have a lot of connections unless you have a lot of memory, because connections can use up a lot of memory if your working set is of any significant size.
19:02
And if you've got any long-running tasks that you need to do, consider trying to do them asynchronously, typically using some sort of queuing system like Celery or SQS or what have you. But, unfortunately, there's a gotcha with that, in that you want to find some way to do that hand-off reliably.
19:23
And most of the solutions, like Celery doesn't really provide a good way to do that. SQS doesn't provide a good way to do that. So we came up with something at Zope Corporation that used a very short-term transactional queue. And then we would move data from the transactional queue into, in our case, SQS.
19:45
And so it would be ideal if you could somehow hand off to something like Celery transactionally, so that if the transaction committed, you knew that Celery had it. But with us, we were sending data to SQS, and it hardly ever failed, like really really really rarely failed.
20:03
But any failure is really hard to reason about. So we really didn't want to tolerate any failure, because we didn't know what the heck was going to happen if it failed. So, anyway. Also, something in terms of, again, this is a little bit off the path I was going, but these are some ideas that I wanted to share.
20:24
If you're building a large application on top of ZODB, or possibly any database that has an effective cache, a common problem is that your working set doesn't fit in memory. I've talked to people who said, well, I've got something that runs a whole bunch of sites within a given instance,
20:45
so this is not an uncommon problem. At Zope Corporation, we had one application that hosted 400 newspaper sites. And the data that we typically needed to use was too large to really fit in the amount of RAM that we had available to us.
21:02
We allocated roughly 4 gigs, really 3 gigs per process. And so because of that, we were constantly turning data in, and then having to make requests to the server, and then the server was getting beat up pretty bad because we were constantly hitting it.
21:20
So we wrote a content-aware load balancer, and the one that we wrote happened to be dynamic, so that it would sort of learn and sort things out over time. But there are a number of content-aware load balancers available. And with the content-aware load balancer, you could sort of say, okay, well, if there's any correlation between the content that you need and something in the request,
21:43
then you can say, okay, all of the requests of this particular class that need this particular content, I'm going to send over here. So you could segregate by class, and then reduce the, essentially split up the working set. And that was a huge win for us.
22:03
And also, just to give you an idea of scale feasible with ZADB, we were running 40 or 50 clients, and then after adding the content-aware load balancer, we were able to reduce that to about 20 clients. And also, when we had to restart a client, it started a lot quicker.
22:21
Just generally, things moved a lot better. So if you've got sort of large applications where that sort of content can be segregated, you should consider that. So, when I think about growing ZADB, among the things I think about are maybe moving beyond Python, and the thing that's most interesting is, from a market point of view, is JavaScript.
22:44
But again, remember I said earlier that ZADB is inherently a synchronous API, and that, of course, is at odds with JavaScript. So for what it's worth, if I were to do this, and I would love to do this, I wouldn't do it speculatively, but if somebody wanted to pay me to do it, I'd love to work on it,
23:01
but if I were to do something like this, I would probably run ZADB client-side applications in Web Workers, and then have them provide an asynchronous API to the UI, so that your browser UI would still use an asynchronous interface, but it would be an application-level interface rather than a low-level ZADB interface.
23:24
And if I were to do this, I would actually rewrite it in JavaScript. ZADB actually isn't that big, if you're familiar with it. After working on it a few months, I'm pretty familiar with it. I'll probably forget it if I stop, but right now I'm pretty familiar with it.
23:43
If you're interested in this whole issue of asynchronous APIs and performance, there was a really interesting article that was posted last year, where somebody actually measured the blockiness of different database APIs on the client, and they found that local storage, which is synchronous, was less blocky than IndexedDB.
24:07
No silver bullets, I'm afraid. Okay, so back to new things in ZADB that are interesting. A lot of these are interesting from a performance point of view, and most of you said you weren't interested in performance, so sorry.
24:24
But a challenge for some applications is that when an application, especially an application startup, but even in other situations, you need to make a bunch of requests. And in Z04, only one request could be outstanding at a time,
24:42
and there was no real way to say, I want these five objects. And so my first answer to that really is, at least for the startup problem, is use a persistent cache. Persistent cache has got a bad name because for a while they were kind of unstable,
25:01
but we finally solved those problems several years ago, but I don't think the word has gotten out. There can be also some operational challenges with them, but if this is something of concern to you, if you have a working set that could fit in a ZO cache, persistent caches are stable at this point.
25:23
But you might have a situation where, for example, you have a bunch of objects that maybe you did some sort of index query and you have a bunch of objects that you know you're going to want to load, if you're a genius. And so there's now an API that lets you do that.
25:40
And so the way it works is you call prefetch, and you can pass OIDs or objects or sequences of OIDs, or actually iterables of OIDs, or as I like to say it, irritables of objects. And what it does is it sends the request to the server, but it returns right away.
26:02
And so when you go to fetch one of the objects, ZO will say, oh, okay, well, I'm already fetching this object, so I won't make a new request. I'll just wait until that request I made before comes back. And then by the time you get the first object, chances are the next object is going to be right behind it. So it basically addresses the sort of round-trip latency
26:24
of requesting objects one at a time. The challenge is figuring out how to actually leverage it. Amongst the ideas I thought of is you load a vtree bucket that contains persistent objects. Maybe you have some policy that you're going to load all those objects in that bucket, but not obviously the whole vtree.
26:49
Or sometimes you might have an object that has a sub-object that's persistent, but whenever you use the parent, you're always going to use the child. So maybe you want to say, okay, when I load these kinds of objects, I'm going to load the children.
27:01
Or maybe looking at it the other way, maybe you have certain kinds of objects that should always be loaded when the referencing object is loaded. So we could build some of that, possibly as pluggable policies, into ZODB to actually automate some of those things. That sort of pattern is where I kind of think of them as sub-objects.
27:24
So anyway, something to think about. You all have been very quiet. Okay, good. I don't believe it, but good. Okay, so one of the big things was SSL.
27:40
Obviously, this was a big thing for ZODB. So it provides encryption of the connection, of course, but it also provides alternate authentication models. And Zio had an authentication model. It complicated the code quite a bit. It was kind of a specialized thing,
28:01
and it didn't actually encrypt the channel, so I think SSL is a much cleaner option. So the old thing is gone. That sort of disappeared. Was anybody using the old Zio authentication mechanism? Good, okay. So I think this is kind of interesting.
28:21
It provides... ZODB was considering doing hosted ZODB databases, where the actual clients were outside their controlled clusters, and so this is pretty interesting for that. So you can basically, when you set up a Zio server,
28:41
you can give it a collection of self-signed certs that it will then use to authenticate the clients, primarily to allow access to the Zio server itself, but you could obviously do more than that, and they did, and I'll talk about that in a second. But anyway, amongst the things you could do,
29:02
and that they did, was they played with a model where each user would upload a certificate of their own and then use that to authenticate, and then they later came to, I think, a saner approach of just simply using usernames and passwords, but that were sent over the SSL connection.
29:24
Another interesting change is that Zio now supports client-side conflict resolution. Neo and Rails Storage already did this. For ZioDB, since the data were encrypted on the server,
29:42
there was no real way, and the server didn't have the keys to unlock it by design, the server couldn't do conflict resolution, so in order to be able to do conflict resolution, and for some applications that's important, then we needed to move it to the client, and this has been something that I've been wanting to do for a while. In fact, I'd like to take it a step further.
30:08
So, with conflict resolution on the client, there's potential to do a lot more. For example, you could have conflict resolution logic that looked at more than one object at a time.
30:21
The current machinery also really only sees state. It doesn't really... And it tries to deal with common situations where it doesn't have the classes around. So, for example, if you've got a B-tree that contains references to persistent objects, it doesn't know what the persistent objects are, but it knows what their IDs are, and so it takes that into account
30:41
in terms of deciding what's a conflict and what's not a conflict. But if you did this on the client, of course, everything is there. You also have operational advantages, but the big potential win here is to actually possibly get to the point where common situations can always be resolved.
31:01
So, you know, what I'd like is to have sort of non-conflicting data structures, which is really a misnomer. What I mean is data structures where we can always resolve the conflicts. And I think that's within reach if we do it on the client. What I'd like to do eventually is actually move conflict resolution up to the ZODB,
31:24
and then sort of start exploring some of the different ways that we might make conflict resolution always work for some interesting cases. The one that comes to mind the most to me is implementing a queue. You should be able to implement a queue in such a way that it doesn't conflict.
31:43
But, you know, a lot of the common use cases, like adding or updating separate keys in a B-tree, we could probably arrange that that never conflicts. Right now, those conflict when a B-tree bucket splits. And then you get into all these strategies
32:01
to try to prevent that from happening. And it's kind of a, it kind of ends up sort of going in a lot of different places, like are the buckets big enough? How do you allocate the keys? If you allocate keys sequentially, then you're going to be, you know, then lots of different threads are going to conflict at the same time when it splits.
32:20
So, anyway. So getting back to the client-side conflict resolution, of course it works with encrypted data. The biggest operational win is that you no longer need custom classes on the server. So if you've tried to write your own classes that implement conflict resolution,
32:40
then in order for them to work, they have to be on the server, which is a deployment headache. Now you can't simply use a generic Zio, or in our case, ZRS RPM or Docker image or what have you. You've got to have these other classes. But if you do the conflict resolution on the client, then this is not an issue.
33:00
Also, it potentially reduces server load because the server's not doing this computation of the conflict resolution, and it opens the door for non-Python servers. The cons are that you increase the number of round trips to the server when there's a conflict. Basically, the way it works is that when,
33:21
during the transition from the first phase of conflict resolution to the second, there's a vote step. So before, a vote was always return yay or nay, but now it can still return yay or nay, but it also can return a list of conflicts.
33:41
And then the client, if it can resolve all the conflicts, it rewrites them to the server, and then if there are no conflicts at that point, then it can commit. And the client-side conflict resolution doesn't support undo. So undo can possibly sometimes undo transactions
34:00
that would otherwise not be undoable by using conflict resolution. So another feature that was added that I worked on for ZeroDB, it didn't turn out well, but is object-level locks.
34:22
So currently, when Zio locks a database for the second phase of conflict resolution, it locks the entire database. Now, I think some people have a misconception that it locks during the entire commit process,
34:43
but it only gets the database-wide lock in the second phase. And that's a problem because in the second phase, there's a round trip to a client, and sometimes clients aren't... The trips are expensive to begin with, and sometimes you can have misbehaving clients,
35:02
possibly because they're talking to another transaction manager that don't respond in a timely way, and you're basically prevented from committing new data while that's going on. So it's kind of a problem. So a way to mitigate that is to get object-level locks
35:21
so if you lock a transaction that modifies a certain set of objects, transactions that don't touch those objects can still commit. And in fact, this is what Neo does, so another reason to investigate Neo. I should have investigated Neo more myself,
35:40
but I'm really lazy, and it's a little bit involved to set up, so I'm like, ah... Anyway, so I did some work on object-level locks for Zio, and I got it working, but it didn't actually provide a performance win, except when clients were connected over very slow links,
36:03
which I think was potentially a useful use case for ZeroDB since that was one of the things... They wanted to be able to have outside clients be able to talk to the database, but for sort of a normal configuration, it really didn't provide a win.
36:21
In fact, it might have even been slower. And I think a big part of the problem is it's really easy to get under heavy load like we do in a benchmark, but at Zope Corporation, we definitely, especially before we did the content-aware load balancer, we beat up our servers pretty bad, and Zio 4 uses multiple threads,
36:44
and so you could actually... I mean, I've seen our Zio servers go to 200% CPU, which means they're actually using more than one CPU. Mainly that's because they're doing IO outside the... A lot of the computation is not done in Python.
37:00
It's done in C because you're doing IO in C. So it's not uncommon to get the Zio server CPU banned, and so even though theoretically there should have been a win by letting multiple transactions happen in parallel, the win was sort of swamped by the database being just slow.
37:24
Part of that was that there was extra computation involved in actually managing the locks, but I don't think it was that significant. Yep. Yeah.
37:41
And, well, Jason Madden has, and Jason Madden is awesome. He does a lot of interesting things with ZODB, and he kind of pushes its limits. He runs ZODB, by the way, for people interested in async. He uses a G event for all of his servers. It's still a threaded computing model, but it's on an async library.
38:00
But anyway, he did some analysis, and it was quite a bit faster using PyPy, especially on the server, not so much on the client, but on the server it was a pretty significant win. So, you know, that's definitely something to consider if you're deploying Zio servers.
38:22
If I was deploying Zio servers today, I would definitely consider that. So some other interesting things about, so up to now I've been mostly talking about the improvements to ZODB that I did on behalf of ZeroDB,
38:40
but they did some interesting experiments. They actually did most of these experiments before I got involved, and I just sort of enhanced them a bit. But because they were sort of thinking about trying to provide hosted ZeroDB, they came up with a model for multi-tenant databases, and so what that really meant was that somebody could walk up to
39:04
a UI theoretically and say, I want to buy a database, and what they would really get is a sub-database of an existing database. And so they had a mechanism for splitting databases, a single database, into virtual databases where each database was owned by a user.
39:23
Each user's records were encrypted separately, so even if the users saw each other's records, they wouldn't be able to decrypt them. Plus they had an access control model that prevented access to other users' data, and also, interestingly, that affected invalidations.
39:42
So invalidations for a user would only be sent to that user. They wouldn't be sent to other users. And of course they had a user database and an authentication model. So it was pretty interesting. I think it's worth, particularly if you want to sort of support multiple,
40:01
if you have a need to support multiple databases within a Zio server, I think this is a potentially good way to go about it.
40:25
So I actually spent a little bit of time in the spring looking harder at Neo. And then as part of that, I realized that based on some, I mean, Neo has done a lot of interesting things where they patched the ODB to work a little bit differently
40:42
and actually a little bit better. And so I got to, I mean, I'm sure they'd given me those patches before, but at some time when I was fighting some other fire and I never really bothered with them, and it's a shame because there are some pretty good patches. And so most, if not all, of those patches got applied,
41:01
and one of them really simplified quite a bit the way ZODB implemented multiversion concurrency control, which both simplified the logic quite a bit, and then also sort of at the same time and as part of the same cleanup allowed me to reduce some, get rid of some stupid sort of cases
41:21
of locking things and preventing concurrency when it wasn't really necessary. And as sort of part of this, and part of sort of making sure rel storage would work with this new way of doing things, I realized that actually the way we were moving and the way Neo was doing things was already in some ways closer to the way rel storage worked.
41:43
And in the course of this, I also realized something that I never understood before in terms of rel storage is that, because it was always kind of weird because rel storage had this IO called MVCC storage, and I was like, wait, wait, ZODB is already MVCC, what? Why are we doing this? And the key is that rel storage uses the MVCC implementation
42:04
of the underlying database. And so it leans on the underlying database to do that, and that's why. And so really most of what it's doing is just sort of bypassing ZODB's MVCC. So the end result of all of this is that the rel storage API is now the dominant API,
42:21
and the older storages like Zio and Neo and Cloud Storage, et cetera, are really adapted to the API that rel storage provides. And the adapter is where the MVCC logic is. It's actually changed the storage API a bit for those people who have ever dealt with it,
42:42
which you probably haven't unless you're a ZDB hacker, is that the load method, which is sort of like a core method, is now gone, or effectively gone, and now everything uses load before. Another happy sort of outcome of all this discussion is that Shane Hathaway has handed the baton for, well, he sort of dropped the baton a couple years ago,
43:03
but he picked it up and he handed it to Jason Madden, and so now rel storage has a maintainer, which is a really good thing. So a common problem, and this was brought up on the list a few months ago, is sort of inconsistency between Zio clients.
43:22
And a typical scenario, one that we ran into at Zobe quite a bit, was you'd have a request that caused an object to be added, and then the browser on the next request would then try to do something with that object, and they would happen to hit another Zio client very quickly,
43:41
and that Zio client hadn't gotten the news of the new object. And the reason this happens is that each client is consistent, but it's consistent as of a particular point in time. And because network communication isn't instantaneous,
44:01
while all clients are consistent, they may not be consistent with each other in terms of what sort of view of time they have. And this was a problem for Zio. It could potentially have been a problem with rel storage as well due to the way it polled.
44:21
If you set the poll interval to zero, I think it polled at the beginning of every transaction, I think. But if you send it to non-zero, then potentially you could have the same problem. So Neo has always, at the beginning of a transaction,
44:44
made a round trip to the server. It didn't really matter what that round trip did. It could have effectively been a ping. But what that does is, by waiting for that round trip, any invalidations that were in flight, it sees before it gets the answer to its ping. And so that means it may be at a different time
45:02
as the client that added the data, but it's at least up to the time at which that object was added. So Neo has always done this. Zio now has an option to do it. And rel storage has gotten rid of the poll interval. And so now it effectively makes this round trip every time as well.
45:23
The reason it's an option in Zio is that it's kind of expensive in the sense that you're making a round trip. If all your data are in memory, you've changed what would have made no server round trips to something that's making a server round trip. So if this is a problem for your applications,
45:40
this is an easy way to solve it. But if it's not a problem for your applications, then I would consider not doing it. Maybe it should be the default, and maybe turning it off should be the option.
46:18
Well, I mean, you could always conceivably...
46:49
It should be doable on it. I mean, there are a bunch of application strategies that you can use for this, but it's kind of a bother. It's potentially a win because it would mean that you wouldn't need to sync
47:01
unless you knew that you needed to sync. So it's potentially a performance win. But this actually provides an easy way to say, okay, I'm going to just make the problem go away. So what have I... So ZeroDB sort of started going in a different direction or decided to go into a different direction
47:23
at the end of the summer. So what have I been doing since then? Well, I decided after a while that I just absolutely had to unscrew the documentation situation. It's been a sore for a long time. I'm a bad person.
47:41
I didn't make it a priority a long time ago. But for ZeroDB to succeed, it's got to have decent documentation. So the documentation is, when you go to zadb.org,
48:01
you have a sort of why use zadb statement and then links off to both non-reference and reference documentation. It's far more extensive than everything we had before. It could be improved a lot. And you can help me improve it by bitching at me about things that really should be documented. You can help me even more by writing documentation,
48:22
of course. But I don't mind writing documentation if you help me by telling me what you think needs to be explained better or needs to be explained more. I don't think this documentation would ever necessarily be a replacement for, say, something like the zadb book, but it should be fairly complete
48:42
and give concise documentation of pretty much everything people need to know, and including touching on topics like how do you do more scalable zadb applications. So there's more work that needs to be done, but I feel like I actually got it to a point where we could finally retire that guide that had been written 20 years ago and was woefully out of date
49:01
and written as a blog post. And the documentation is executable, thanks to Manuel. How many people know about Manuel? Manuel? I should have said Manuel. It's a very cool tool. If you write documentation for, like, software libraries,
49:21
it makes it really easy. So when I fell in love with doc tests, I really liked the idea of executable documentation, but what I learned way too late, just look at the build.docs, is that tests don't make good documentation. But good documentation can help with the tests.
49:43
And so what I did for Bobo Bobo NGI and the zadb docs is I wrote documentation, and then I made sure that all the examples were executable.
50:04
So a couple of infrastructure projects that I've been thinking about for a while in terms of, again, performance and, you know, so I've operated big zadb databases for multiple zope core customers for several years.
50:23
And in addition to the overall performance, packing was kind of a big deal. So file storage, first of all, the implementation hasn't changed
50:40
in, again, probably close to 20 years. And it, you know, it works, but it's pretty slow. And, of course, it's particularly problematic because of the gill, because while you're packing, then you're sort of starving other things, even though it runs in a separate thread. So one of the first things I did was
51:02
zc file storage, which does most of the packing off in a separate process. It actually creates a sub-process to do most of the work. And I also wrote zc zadb dgc, which was primarily to deal with the problem of
51:21
garbage collecting multi-databases, but it turns out that garbage collecting as a separate process from packing actually is also a big win, because you can do it in a separate process, and you can do it at your leisure, and I think it's a much better model. And so zc file storage actually doesn't even support garbage collection. But even with all of that,
51:40
and all of that could use a lot of polishing up, a big problem was that when you pack a database at the end of the pack process, you're copying packed records at the same time you're committing, and there's a lot of contention there at the end, and so we got pretty good at Zobe Corporation
52:00
about having lots of metrics, and you could always see a pack because all the metrics would go awful at the last, you know, depending on the database, 10, 20 minutes of a pack. And so we always try to time things that that happened in the middle of the night, but it was pretty bloody. It was pretty awful.
52:20
So file storage is, file storage 2 is designed largely to solve that problem. It learned some lessons from file storage. When I wrote file storage, I never imagined that it would work as well as it does. It's pretty efficient, it's kind to SSDs, it's, you know, it's a pretty simple model.
52:43
But, so anyway, so, you know, it sort of, file storage 2 sort of still keeps the file storage model, but it removes some cruft-like back pointers and versions. Back pointers, you can argue whether they're actual cruft, but since people hardly ever use...
53:01
So, I'm glad you asked that question. A question! So, today, primarily for undo, when you undo transactions, it doesn't actually write any new data, it just it writes, it writes back pointers
53:20
to the data. So it doesn't actually copy the data record, it just says, okay, the data record is back here, which is elegant, and it was actually a big part of the version feature. So that when you committed a version, it would do a similar sort of trick. But it adds a lot of complexity to the implementation and since people don't really use undo that much anymore, I don't think, it wasn't
53:41
really worth keeping it. So it uses multiple files and the idea is that you have an active file and then some previous files, you know, zero or more previous files, and when you want to pack, the first thing you do is you split and you create a new active file and that's a very cheap operation,
54:00
and then at your leisure you can pack the previous files and that doesn't, that doesn't affect you know, the act, you know, and when you create a previous file it also writes the index in a way that it can be used as a memory mapped file, so it still uses memory, but it uses memory a little bit more efficiently for the old indexes.
54:21
But more importantly, you can if you've got a jill to deal with you could pack the previous files in a separate process and then again, at the end of that, there's just a fairly fairly inexpensive handoff to get the database to use the index of the new files. The indexes of the new files instead of the indexes it was using before.
54:44
So that's really the big win. I mean, I think from an operational point of view that would, I didn't document packing as part of the documentation, partly because nothing else was documenting it
55:01
and also because it made me cry. so the other thing I've been wanting to explore is, could I get the Xeo server to be a lot faster if I didn't write it in Python? And I've been dating different languages over the last few years looking for you know,
55:21
a possible choice. I did I did a there's there's if you're running an AWS and if you use a lot of blobs you can save money by putting those blobs in S3 and so I wrote essentially
55:40
a sort of an S3 blob cache and I wrote that in Scala. Scala's are really a lot of fun, I enjoy it quite a bit but it wasn't very fast. It was probably my fault but it it failed failed the test for this so I decided this go around to use Rust
56:00
Rust is interesting in a number of ways, it's very fast I mainly started looking at it because a friend of mine suggested it and because I found some things that suggested that it was faster than Go and the reason I think it's faster than Go is because it has no runtime and instead of using a garbage collector it uses stack based memory management
56:21
so basically the sort of standard way of doing things in Rust is everything is either on a stack or if it's in the heap then there's sort of a pointer to the heap from something that's on the stack so data is garbage when it goes out of scope and so basically the memory management decisions are all made at compile time which is pretty intriguing
56:42
and it could provide some performance wins. It's a little bit more complicated than that you end up using some reference counting but it's a relatively small subset of your data that's reference counting and of course it has no gill. So I started working on this
57:01
a few weeks ago some people saw a blog post that I posted a while ago, that was about when I decided to start working on it. It includes a file storage to implementation it's very sort of early in development now I've been trying to get it just to the point where I can run benchmarks so lots of things aren't implemented yet.
57:23
The internal API is very different than the way it is in Zio the sort of pluggable storage API isn't really a thing here. It implements object level locks and it should be as easy to set up as Zio or ZRS
57:41
probably easier I imagine that replication will just be built in I'm very happy with so ZRS is pretty cool because in ZUDB it has this sort of pluggable storage architecture and we've gotten used to this pattern of just layering things which I think has worked really well and ZRS is just another layer
58:02
and so you can run you can run ZRS replication without running a Zio server, it's just another storage. But here nothing is pluggable, everything is about trying to go as fast as possible and so I expect ZRS to just be built in
58:23
so I've been scrambling to try to get to the point where I could do some performance testing before this talk and I finally got everything running for the benchmark this morning and
58:40
the initial results on my Mac it's a four core Mac so it's not a terrible machine for initial tests are pretty encouraging it's probably certainly it's twice as fast as Zio for writes not quite it's maybe 50% faster for reads
59:02
but I think I can take it a lot further so I'm hopeful lots of work remaining including in some Python work this whole issue of waking up an event
59:22
loop I think is a significant performance hit and it's been part of the Zio design for like ever so it's kind of embarrassing but I think if I address that I can actually make Zio go quite a bit faster just in Python I have some
59:40
tests but I'll need a lot more and lots of features aren't implemented yet so some things and an earlier slide that I skipped past I threatened once again to implement
01:00:00
a transaction run decorator for running transactions with retries. And I threatened it in anger a couple of weeks ago, and I still haven't done it. So that's going to be one of the next things. But so things that I think I would be, from my point of view of pain points that I felt,
01:00:21
and performance has definitely been, since I've worked on fairly large projects, things that matter to me, a lot of them are about performance. So things that I want from ZADB is more speed. I don't want people to choose ZADB because it's fast. It's never going to be a NoSQL database, believe it or not.
01:00:41
But I would like speed not to be a disqualifier, at least for many kinds of applications. More documentation. I think object-oriented conflict resolution would be interesting. If we could get to a point where for a few critical data structures, conflicts could always be resolved,
01:01:02
I think that would be a big thing. A really tiny feature, tiny, tiny, tiny, that would be a big win for certain kind of applications is the ability to subscribe to object updates. This would be interesting for GUI applications. I've got an application that I should be finishing,
01:01:22
but I'm not. It's called a two-tier kanban board. And this board, basically, it's meant to be like a kanban board that most people use. And whenever somebody makes a change, everybody's view gets updated automatically
01:01:41
using long polling because WebSock has burned me in the past. I've heard they've gotten better. But in order to do that, then something needs to know that, OK, there have been changes. And so it's an easy, quick hack to hack the DB class to do that, but that should really be a built-in feature.
01:02:00
That would really be beneficial and open up some interesting possibilities. As an aside, at Zope Corporation, towards the end, we really achieved quite a lot in terms of automation.
01:02:20
They say that really lazy people are good at automating things, and I'm really lazy. So but we leveraged Zookeeper pretty heavily. And Zookeeper is pretty cool. And there are other similar sorts of tools like etcd because you can sort of, the idea is it provides a service registry.
01:02:40
And it's a service registry where it not only knows when services appear, but it also knows when they disappear. And so you could sort of get notifications that, oh, the service fell over. I need to adjust my load balancer or I need to start a new one or what have you. And so this idea of being notified of things is pretty important. And conceivably, if the ADB had this,
01:03:02
it might have been an alternative to Zookeeper that might be a lot easier to operate because Zookeeper was a little bit of a pain to keep running at times. Let's see, so another project that I think would be really interesting for somebody to do, and I feel like maybe at some point
01:03:20
I should at least enable this a little bit, but we often use, in some of our applications, we use Solr as an external index. And keeping Solr up to date was kind of tricky, especially since Solr itself was replicated. And what we ended up doing was having the update process
01:03:42
keep track of what data Solr had seen last. And we sort of kept track of like an index number for a data set. And of course, a much more straightforward way to do this would be to leverage ZRS replication. So the way ZRS's replication protocol is extremely simple.
01:04:02
It's basically a client connects to the Zio server and says, I've got this TID. And the Zio server then says, OK, I'm going to send you all data after that TID forever until you disconnect. And that's it. And basically, they're just sending data that's very similar if you've ever run one of the database
01:04:22
iterators, like the file storage iterator. And so it's a really simple protocol. And so I'd like to see people write applications where, instead of replicating to another ZODB database, they look at that stream of data and update Solr or Elasticsearch or a relational database.
01:04:42
So any kind of situation where either you have an external index that you want to update or maybe an external sort of replica that might be easier to write reports against, this would be a really interesting way to approach it. And so one of the things I'd like to do if I don't have any reason to do this myself would be maybe to, at some point,
01:05:01
write a little module that just provides you the iterator that sort of does most of that for you. Sure. Thank you.
01:05:55
Potentially. Yeah. Well, I think it does. I mean, that's how OIDs are dealt with now.
01:06:02
Although, I'd like to sort of sometimes not do it that way. And what you're talking about is not so much of a serial, it's just a sort of way of generating IDs that aren't necessarily ascending.
01:06:52
So one of the challenges in my evil plan
01:07:00
to get ZODB sort of more alive as a project is to get the wider Python community to know about it again. And one idea I've had, at my last job, I did quite a bit with pandas. As an aside, I found myself saying something that I never thought I'd say, which is that I found it much easier to do data manipulation
01:07:22
and data wrangling in PostgreSQL than I did in Python and pandas. But anyway, I used pandas quite a bit. And it would be in it, but sharing the data sets was kind of awkward within the team. And so I think it would be really interesting to have
01:07:44
persistent pandas data sets built on top of the ZADB blob mechanism. So that would be a fun project to do. And I might do that soonish if somebody doesn't give me better things to do. Something that we've talked about for quite a long time is something that I call a JSONic API,
01:08:02
because I like to make up words. And so the idea is that when looking at a database, you should be able to look at it without having any classes around. We've had several ZADB browsers like that,
01:08:21
but I think that would be nice to be a more widely available sort of API, both for accessing a database, as JSON rather than objects. Or when I say JSON, I really mean dictionaries and lists and tuples. And having that mechanism be sort of readily available
01:08:43
might be interesting. Again, if you're listening to a ZRS stream, getting JSON rather than pickles might be very, very much more convenient for some applications. So Carlos, where's Carlos? He's threatened to sprint on this during the sprint.
01:09:02
So maybe somebody is interested in helping him with that. I'm going to try to help remotely. So ZRS failover is manual right now, and I'd like to make it automatic using some sort of leader election protocol.
01:09:27
We're pretty lucky at Zobe Corporation. We hardly ever needed to failover. I mean, I think there were only one or two times we had to failover unexpectedly. But AWS is pretty awesome. They would often tell us in advance that a machine was going away so we could plan.
01:09:43
And it also seemed, I don't know if anybody else has noticed this, but we made heavy. So at Zobe Corporation, when we were thinking about servers, we classified them into precious and despicable. And so the despicable servers were always
01:10:01
run in autoscaling groups, even if there was only one of them, so that if they fell over, they would be replaced. Whereas the precious servers required a lot more care. But it seemed like the despicable servers tended to get wiped out a lot more often than the precious ones. And I have the suspicion that AWS, that's part of their policy.
01:10:21
If it's an autoscaling group, it's despicable. Docker images, maybe official Docker images would be good. There are actually several Docker images. I looked on the Docker Hub this afternoon and was surprised. I wasn't actually really surprised.
01:10:41
But there are several Docker images, which is good. Although the one that was most popular was the Plone Docker image, which I guess just implied that it was Python 2. Unfortunately, because of the fact that Zio currently uses Pickle as the basis of its networking
01:11:02
protocol, you can't have a Python 2 client talking to a Python 3 server the other way around, which is really unfortunate. The byte server uses message pack in part to sort of escape Pickle, sadly.
01:11:24
So a Docker image should really, unfortunately, until that problem solved, probably identify what Python it needs to be used with. I think a Zio authorization model would be really interesting, especially if people ever start maybe using ZODB for non-traditional applications like client server applications.
01:11:44
And it occurred to me that just stealing the Unix file system security model, the traditional one, would be pretty easy to implement and could be pretty useful. Persistent classes, I still think, are with all the trouble the Z classes had. Mostly, I think, because I left them to wither.
01:12:07
But I think persistent classes are potentially pretty interesting if we could figure out how to do them someday. At Zobe Corporation, whenever software was in the database, I could deploy it transactionally, which is really, really cool.
01:12:23
Other languages, I'd like to increase ZODB's audience. JS, unfortunately, is the most obvious choice, even though it's not very compatible. Ruby would probably be pretty compatible. Scala would be really interesting, just because I really enjoy Scala, and they've got this macro system that would probably allow the sort of automatic persistence thing to happen.
01:12:43
And that's it. So you've been pretty quiet. Any parting questions? Yep?
01:13:01
Right. I think that's a great idea. Oh, you mean the client cache? Oh, the client cache, right.
01:13:22
The funny thing about the client cache is that the better you configure your systems, the worse it is, because both the client cache and the object cache try to use what I prefer to call a most
01:13:43
recently used model. And the thing is that the object cache is really successful at keeping the most recently used, the most used objects, then it's hardly ever going to request any objects from the client cache. And so the client caches don't really have a signal of what's actually good.
01:14:15
So except that the placebo effect actually works.
01:14:26
I would love to do that. I'm a big fan of instrumentation at Zope Corporation. I'm really proud of a lot of the sort of DevOps-y things we did towards the end at Zope Corporation. And we did a really good job of having lots of metrics and having lots of graphs of metrics.
01:14:42
So I'm a big fan of that. That's kind of challenging, because the tuning is complicated. And I definitely think it's a good idea. I think things like that should be instrumented. I think the storage server should be instrumented better as well in terms of knowing what kinds of requests
01:15:01
you're getting, how many reads. Something we did a lot with, all in a very hacky way, was getting a better handle on conflicts and what was conflicting so we could try to figure out why. Because it's really sadly easy to make a mistake like allocating keys sequentially that cause lots of conflicts that you don't expect.
01:15:23
Or like a pattern that we came up with in Zope 3 that I regret was the int-id service, where it kept a table of object IDs to integers
01:15:41
and a table going the other way around. And I forget. Basically, you think you solved the problem in one direction, not realizing that the problem is still there in the other direction. And so you'd still end up getting conflicts due to bucket splits. If we had object-oriented conflict resolution, then we could possibly make that whole problem go away.
01:16:01
But yep, yes.
01:16:30
Well, it's actually pretty easy to arrange that the storage server only accepts certain globals.
01:16:43
Well, OK, if we get away from pickles, then it gets actually harder. It used to be possible. Well, what we used to do is we used to actually, did we do this on the server or on the client? I don't know.
01:17:00
It should be possible to do some sort of whitelist. Well, OK, so on the potential victim clients, it would be pretty easy to have a whitelist. And we had a server, a storage wrapper that was trivially implemented that provided a whitelist.
01:17:20
We just, by accident, never open sourced it. And so corporation is gone. So it's kind of gone. But it would be really easy to implement. It was a trivial implementation to do that. And it could just be done as a storage layer. I'm sorry, actually, it could be more easily done if we hooked that into the ZADB machinery itself.
01:17:41
So if when you created a database, you could provide a whitelist, that might even be a better way to do it, because then it'll be part of the regular deserialization mechanism.
01:18:13
I mean, some things that we've taken for granted in the past get harder when you have parent pointers. So for example, object export, exporting a part of your object tree is, I can't remember
01:18:25
if we solved that in ZOP3. We may have. But we had to really work a lot harder, because something that I've thought about doing over the years, and I've sort of started to implement at various points, was to do reference counting garbage collection in the storage server, which you could still do.
01:18:44
You could sort of do a lot of garbage collection earlier and more easily without having to open up the records. Like for example, if the data format that we sent to the server had the external references outside of the actual data payload,
01:19:02
then you could sort of, without looking at the payload, you could still do reference counting garbage collection. Or you could do any kind of garbage collection. But you could do reference counting garbage collection sort of in real time, potentially, yeah.
01:19:22
Back there?
01:19:54
That's a good question. That's a good question. I wish I could say that I've run its test
01:20:01
and verified that it works with everything else, but I haven't. Somebody should run its test. Transactional undo doesn't have to go away. Because undo is rare, you could just get rid of the optimization and say, when I undo records,
01:20:23
I just copy the older records forward. History is still there, yeah. Time travel, I'm a huge fan of time travel. I'm especially a huge fan of a certain time travel error.
01:20:44
Anything else? So I would love it if you would chip in on the ZADB list, which is a Google Groups list with, you know,
01:21:01
when you leave this room and you remember your pain points, and they're ones that I haven't mentioned, I encourage you to bring them up. Thank you very much.