The inner guts of Bitbucket
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 99 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19968 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201498 / 119
1
2
9
10
11
13
15
17
22
23
24
27
28
41
44
46
49
56
78
79
80
81
84
97
98
99
101
102
104
105
107
109
110
111
112
113
116
118
119
00:00
CodeInformation systemsInterior (topology)Grand Unified TheoryFocus (optics)Multiplication signRight angleSoftware developerRandom matrixComputer animationLecture/Conference
00:47
Interior (topology)Grand Unified TheoryValue-added networkSoftware developerComputer architectureMetropolitan area networkDigital photographyComputer programmingQuicksortOrbitBitMultiplication signPoint (geometry)JSONXMLLecture/Conference
02:15
SpacetimeMultiplication signPermanentElectronic visual displayFlow separationBitPlanningMereologyDemosceneContext awarenessQuicksortExpected value
03:24
QuicksortGreatest elementComputer programmingStaff (military)SurfaceRepresentation (politics)Lecture/Conference
04:08
BitComputer architectureQuicksortMultiplication signRepresentation (politics)Grass (card game)Instance (computer science)Proxy serverLecture/Conference
05:04
Level (video gaming)Data storage deviceWorld Wide Web ConsortiumNormed vector spaceMetropolitan area networkServer (computing)Structural loadServer (computing)VirtualizationDependent and independent variablesFrequencyOpen sourceError messageData compressionMereologyProxy serverLevel (video gaming)Web 2.0Virtual machineReal numberPhysical lawComputer hardwareBuffer solutionConnected spaceStructural loadData centerRepository (publishing)Instance (computer science)CodeState of matterReverse engineeringIP addressData storage deviceTwitterMultiplication signDifferent (Kate Ryan album)High availabilityPoint (geometry)Category of beingCartesian coordinate systemNatural languageBit rateChannel capacityComputer architectureControl flowView (database)Event horizonBit2 (number)CountingAreaProcess (computing)LastteilungEncryptionFront and back endsCommunications protocolDiagramProgram flowchart
09:35
LastteilungBitCartesian coordinate systemGraph (mathematics)Channel capacityProcess (computing)Bit rateHooking2 (number)Web 2.0MathematicsServer (computing)ResultantTheoryQuicksortCombinational logicContinuous integrationSoftware repositoryLine (geometry)IntServExecution unitWave packetProxy serverMultiplication signConnected spaceLecture/Conference
11:01
Server (computing)BitScripting languageWebsiteCartesian coordinate systemQuicksortLink (knot theory)Lecture/Conference
11:28
Real numberProcess (computing)Mobile app10 (number)Standard deviationServer (computing)WebsiteVirtual machineWeb 2.0Different (Kate Ryan album)LaptopSynchronizationConfiguration spaceMathematicsPatch (Unix)Web serviceMultiplication signIdentity managementSelf-organizationRhombusProgram flowchart
12:27
Patch (Unix)Process (computing)Standard deviationMathematicsOpen setTraffic reportingJava appletInstance (computer science)DatabaseCategory of beingPhysical systemPublic-key cryptographyLevel (video gaming)Computer fileFile systemOpen sourceServer (computing)Erlang distributionImplementationHigh availabilityVirtual machineSoftware repositoryQuicksortService-oriented architectureLecture/Conference
13:34
Server (computing)Connectivity (graph theory)Multiplication signState of matterWeb serviceMedical imaging
14:04
DatabaseHash functionPasswordMultiplication signPattern languageAngular resolutionCryptographyRing (mathematics)Group actionAlgorithmWordCodeNeuroinformatikCombinational logicBefehlsprozessorForcing (mathematics)Different (Kate Ryan album)Cycle (graph theory)IterationLaptopData dictionaryQuicksortScripting language2 (number)Program flowchartLecture/Conference
16:20
Metropolitan area networkCountingPasswordWindowOrder of magnitudeLaptopAuthenticationBefehlsprozessorVolume (thermodynamics)Cycle (graph theory)CalculationServer (computing)Computer configurationPasswordExtension (kinesiology)Multiplication signSingle-precision floating-point format10 (number)AverageIndependence (probability theory)Source code
17:34
PasswordVirtual machineBefehlsprozessorCore dumpRight angleSemiconductor memoryAuthenticationData storage deviceData dictionaryDatabaseMappingCryptographyForcing (mathematics)Table (information)1 (number)Moment (mathematics)Maxima and minimaCalculationMereologyPoint (geometry)RamificationPhysical systemClient (computing)Bit ratePasswordBitCache (computing)AlgorithmFilm editingLevel (video gaming)Lecture/Conference
19:57
PasswordBefehlsprozessorMultiplication signPoint (geometry)Server (computing)Virtual machineSampling (statistics)Exterior algebraEncryptionLecture/Conference
20:43
Data storage deviceData storage deviceSet (mathematics)AuthorizationLimit (category theory)MultimediaStandard deviationState of matterTrailRepository (publishing)SequelUniform resource locatorFile systemMultiplication signPhysical systemInstance (computer science)Server (computing)Flow separationCloud computingObject-oriented programmingLine (geometry)Principal idealQuicksortCartesian coordinate systemLevel (video gaming)Computer fileContent (media)Local ringObject (grammar)Lecture/ConferenceProgram flowchart
22:24
QuicksortDatabaseVirtual machineCartesian coordinate systemMultiplicationWebsiteServer (computing)Level (video gaming)Structural loadSingle-precision floating-point formatImplementationPoint (geometry)Exception handlingWeb serviceHigh availabilityEqualiser (mathematics)Lecture/Conference
23:35
Data storage deviceHausdorff spaceLocal area networkLevel (video gaming)Point (geometry)JSONXMLUML
24:05
Hausdorff spaceRow (database)Lecture/ConferenceJSONXMLUML
24:29
Connected spaceMultiplication signServer (computing)Proxy serverDatabaseDifferent (Kate Ryan album)BitBridging (networking)Virtual machineRadical (chemistry)MereologyInstance (computer science)Program slicingPoint (geometry)Replication (computing)Web 2.0Figurate numberOpen setClosed setProjective planeState of matterQuery languageConnectivity (graph theory)Real-time operating systemBit rateBoss CorporationSelf-organizationQuicksortTotal S.A.NumberGoodness of fitRight anglePhysicalismLecture/Conference
Transcript: English(auto-generated)
00:15
Okay, Eric is going to tell us all about Bitbucket with a focus on Git, Eric, is it? Judging by your shirt?
00:24
Just with Git? Yeah, with a focus on Git. Okay. All right. Well, that's a lot of faces. More than I think I've ever seen in one room staring at me. We'll see how this goes.
00:42
So I'm Eric, and I'm with Atlassian, and I work on Bitbucket. I'm one of the more back-end-y developers on Bitbucket. And I'm going to tell you all about Bitbucket's architecture and infrastructure, or at least as much as I can in 30 minutes.
01:00
Before I do that, though, I want to share with you this photo. And those who don't instantly recognize the rocket here, this is a Saturn V rocket. It's the rocket from the Apollo program, that's the moon rocket. So that's the one that, or maybe not that particular one, but got Armstrong to the moon and back.
01:22
And I want to show it to you because it's, like, the whole Apollo program is, I find, a fascinating piece of history. And I'm sure I'm not alone here. It's a, like, this rocket when they built this, and I guess the program around it. Really sort of the pinnacle of innovation and engineering at the time.
01:42
And a goal that they set out to achieve was so ridiculously ambitious in the 60s, sending a man to the moon and bringing him back, when I guess the state-of-the-art was, you know, the Russians who had just flung a chunk of metal into orbit. It was quite something. An enormous undertaking. I think at some point, like, 500,000 people were working on it.
02:02
Ridiculously large. Billions of dollars. But it worked. And so you must sort of assume that, you know, only really the smartest people worked on that and were able to pull this off. So quite literally, rocket science. I'm a bit of a nerd, and earlier this year I actually went to Florida and visited the Kennedy Space Center in Cape Canaveral.
02:28
And they've got one of these things on permanent display. So there's a, here it is, an actual remaining Saturn V rocket that they've taken apart into the separate rocket stages.
02:42
So you can see up close. You can see sort of what's inside, right? And what struck me when I was there and I looked at this, the first time I'd ever seen this stuff, is that it looked sort of, I don't know, simple. Maybe that's not the right word, but rudimentary perhaps. As in, it was very functional.
03:02
Look at this thing. It's like a sheet of rolled up metal around a, I guess, a massive gas tank. I mean, it's really not much more there. I mean, there's some plumbing, but even that is limited. You know, I guess I never really considered what would be inside of a rocket like that, to be able to do the things that it did.
03:23
But I guess I sort of expected something more complex, you know, more ingenious. I don't know. It's a similar story at the back or the bottom. It sort of, you know, it ends here. There's a flat surface and, you know, we'll just bolt some engines at the bottom.
03:42
If you're there, you can actually see the engine mounts, like the screws and everything. It's not really polished. You see bolts protruding everywhere. Now, I don't mean to disrespect the Apollo program, by the way. I mean, it's still as amazing as I thought it was. But it's sort of seeing this stuff up close, sort of, I don't know, made it more approachable.
04:02
It brought it down to earth, if you will. And I think that is representative of how we tend to perceive technology that we have in high regard, but we don't really know much about. We tend to assume that things are more complicated than they really are and that the people working on it are, by definition, much smarter than we are.
04:25
You know, the whole grass is greener thing. And it is that potential perception that I want to debunk today by laying out the architecture behind Bitbucket. And also, at the same time, share some anecdotes and I guess, you know, some of the instances where we screwed up.
04:46
So, if you are a little bit like me and you sort of, you tend to assume that other people are smarter than you, then you'll be glad to hear that there's really no rocket science behind Bitbucket. And, you know, everything that is running now is sort of built around the same tools that you will use yourself.
05:04
So, sort of try to break it down a little bit. So, this is roughly the architectural Bitbucket. I've separated into three logical areas. So, there's the web layer, which is responsible for load balancing, high availability, that kind of stuff.
05:21
Then there's the application layer. That's where, you know, our code is. That's where all the Python stuff is. Bitbucket is almost exclusively written in Python. And then, lastly, the storage layer where we, you know, we keep our repository data and all that. So, we'll talk about each layer individually and time permitting, I'll share some anecdotes.
05:42
So, the first layer, the web layer, consists really of two machines only. So, Bitbucket is all, there's no virtualization in Bitbucket. We run real hardware. We manage them ourselves. We have a data center in the U.S. And we have two load balancer machines. And those are the, they own the two IP addresses that you see when you resolve Bitbucket.org.
06:05
These machines basically run NGINX and HAProxy. Web traffic that comes into the load balancer first hits NGINX. NGINX is a, for those who don't know it, it's an open source web server. It's pretty good at SSL.
06:22
It can also be used really well for reverse proxying. And that's what we do here on this layer. So, when a request comes in, it is encrypted. Everything on Bitbucket is always encrypted. So, the first thing we do is strip off the encryption. And that's done using NGINX. And then once it's decrypted, we forward it on to HAProxy, which runs on the same machine.
06:44
HAProxy, also an open source reverse proxy server. But it's really good at doing load balancing and failover when you have a whole bunch of backend servers. And so, that HAProxy inspects the request and, based on some properties, decides how to forward it on. Ultimately, it will forward it on to one of our many actual application servers.
07:05
And on there, there is another NGINX instance. So, this NGINX instance is also just a reverse proxy server. It's not our actual web server. It takes care of things like request logging, compression, response compression, and asynchronous response and request buffering.
07:22
And that's why, logically, it's part of the web layer, because it doesn't actually process the request. And then, ultimately, that forwards it on to the real Python web server on the application server. Now, so that's HTTPS. We also do SSH. SSH takes a bit of a different path.
07:41
SSH is a different protocol. We can't easily decrypt it first, but we do need to load balance it. So, it does go through HAProxy, just as a TCP connection. And HAProxy now forwards it on to the least loaded backend server. So, that path is a lot simpler. But make no mistake, it's not necessarily easier to run that reliably.
08:03
As we found out really just recently when users started to complain about SSH connections dropping out sometimes. Like, users would say that they'd get hung up on. And looking at the error messages, it seemed like that was indicative of a capacity problem.
08:21
Like, we had not enough capacity on the server side, basically, to handle the request rate. But our monitoring tools told us a different story, that we had plenty of capacity. And so we were stunned for a little while, until we started analyzing the network traffic on the load balancers. In particular, we looked at the frequency of SYN packets that were arriving.
08:45
And so a SYN packet, SYN packets are part of TCP, and it marks the start of a new TCP connection. And so time-stamping those, each single one of them, gives you a really good, accurate view of the incoming traffic. You see that here. So what you see here is an interval of 16 minutes over which we captured every SYN packet.
09:05
And you can see right away that it is ridiculously spiky. And these spikes are, aside from being very high and very thin, are also very evenly spaced. If you count them, you'll see that there are 16 spikes in an interval of 16 minutes.
09:20
And that's no coincidence. These spikes occur at the start of every minute, like precisely at the start of every minute. They last about one to two seconds only. But you can see that the rate at that point is ridiculously high. It's like three to four times higher than our average load. Our working theory behind this is that this is the result of thousands of continuous integration servers
09:43
all around the world that are configured to periodically poll their bitbucket repos. And that in combination with NTP, everybody I guess these days uses NTP, and clocks are really accurate. And this is what you get. And that was a bit of a problem because even though we have enough capacity for the average rate,
10:04
during these spikes we actually don't have enough capacity. Now, solving this, we can't really quadruple our SSH infrastructure to be able to deal with the large spikes. And so what we did instead is we went back into the web layer, into HAProxy,
10:20
where we basically have a hook into that traffic that comes in. And we configured HAProxy to never forward traffic at a rate higher than what we knew our capacity could take. But then don't make any changes to the ingress side. And so during these spikes, HAProxy will happily accept all the incoming traffic, but it won't actually connect or forward all of the connections at once.
10:42
And so it sort of spreads it out over a few seconds. And now this graph on the application server is a lot smoother than it is on the load balancer side. You can probably see it if you have a cron job and you start at that very start of the minute every time, versus any other second. It'll probably have a few seconds less lag.
11:04
So it's a bit of a funny problem. Never really considered it until it crept up. Probably won't have it really with websites that have humans click on links, but if you operate like a public API that is very popular and people script against it, you might see similar issues.
11:21
So the application layer then. This is where all the magic happens, sort of. This is where the website runs. And this layer is distributed across many tens of servers, real servers. So they all run a whole bunch of stuff. They run the website. The website is a fairly standard Django app, really.
11:43
Bitbucket started out as a pretty much 100% Django app and it's still very important. We run that in Gunicorn, the web server. Gunicorn, Python web server, that is relatively simple. We run it in perhaps the most basic configuration. So we use a sync worker, meaning that our processes process one request at a time.
12:05
And so we have a whole bunch of processes and multiprocessing to get concurrency. And then SSH. We handle SSH using really just the standard OpenSSHD server daemon. The same one that you all run on your Linux machines and laptops with one difference.
12:25
So we made a small change to it, a small patch that allows us to use the database, do lookups in the database for public keys. OpenSSHD looks on the file system. It's hardwired to look at the file system to find public keys. It's not practical for us and so we have a little change to make that happen.
12:42
Other than that, it is the standard OpenSSHD server. So we don't need to maintain that. We also do background processing. So any sort of job or process that we can't guarantee will respond in a few milliseconds. We dispatch off to our background system.
13:01
That's comprised of a cluster of high available RabbitMQ servers. RabbitMQ is an open source Erlang implementation of AMQP broker. And to consume jobs we use Celery. So we have a whole farm of Celery distributed across all these machines to process these jobs.
13:22
Examples of that are if you fork a repo for instance, there's actual copying of files involved. So that might not complete immediately. That's dispatched. So it looks very basic. At the end of the day it's the same components that you all run distributed statelessly across multiple servers.
13:40
There's nothing really special about it. As simple as usually good. We've had this set up for years. Bitbucket is now I think over 35 times bigger than it was when we started. When we acquired it I should say. And this held up really well. However, you can still screw up as we do from time to time.
14:04
One of those examples was when we decided to upgrade our password hashes. So up until that time we never stored passwords in our database but we stored salted SHA-1 hashes. Which is very common.
14:20
So that means that if somebody for some reason gets a hold of our database they only have hash values and so they still don't have your password. However SHA-1 hashes for passwords are slowly being phased out and replaced by more strong secure hashing algorithms. And the reason for that is that even though you can't decrypt a SHA-1 hash to get a password.
14:44
What you can do is think of a word that might be the password, compute the SHA-1 and then compare it with what's in the database. And if you just think of enough words and try enough combinations you might brute force the password. Now if you have a strong password chances of anybody brute forcing that through
15:01
SHA-1 are gonna be careful with some cryptographers maybe here in the room. But let's call it negligible. However we have millions of users on Bitbucket and not everybody has a strong password. And if you have a password that is a word in the dictionary then, well I don't have to tell you I guess, then it's a whole different story. Cause there really aren't many words in the dictionary.
15:21
Certainly not when it comes to a computer computing SHA-1 values for it. So you're really at risk. Now short of forcing people to not use simple passwords another thing you can do is sort of upgrade to a stronger hash. Now what these things do, they're nothing special really, but they're hash values or hash algorithms that are more expensive,
15:45
deliberately more expensive by rehashing their hash value over and over again thousands of times, deliberately spending more CPU cycles. And that's what we wanted to upgrade to. Let me show you just how big that difference is. So we wanted to upgrade to bcrypt hashes. bcrypt is one of the sort of more modern iteration style cryptographic hash algorithms.
16:07
And we compared it with SHA-1. So this script measures how many hash values you can generate in one second. And for bcrypt you can see that my laptop was able, so this uses Django's code, so Django's hashing algorithms,
16:20
all that code, with the required, or required, with the optional C extensions to make it as fast as you can. On my laptop that amounts to three hashes per second for bcrypt versus 160,000 for SHA-1. So it's five orders of magnitude more expensive, just CPU cycles. And so that is absolutely huge.
16:41
And it's great because it means that even your weak password may stand a chance. But you have to realize that as a server you have to incur that cost of that massively expensive calculation every single time somebody uses a password for authentication. And we run a really popular high volume API and a lot of people use basic auth for authentication.
17:07
It's all SSL, right, so it's not like it's plain text passwords. But it means that we have to compute bcrypts for every single request. And our API requests are relatively quick, like on average it's in the tens of milliseconds.
17:21
So you can imagine that if you add a 300 millisecond password check to every single one of these requests you have a problem. And we did because we naively rolled this out and instantaneously the website went down. All cores and all the machines and all CPUs went to 100% and were calculating bcrypts.
17:42
And we realized our mistake fairly quickly, obviously not quickly enough, but fairly quickly. So we roll it back and we're able to keep the downtime minimal. But then we had a bit of a problem because we still wanted to move away from SHA-1. Now you can't really make bcrypts cheaper. Actually you can but you don't want to because the whole point is to have an expensive algorithm.
18:03
And so what we could do however is do less of it. You know when people use the API and write a client they typically do more than one request in quick succession. And so they'd be using the same password over and over again and we'd be computing the same bcrypt over and over again.
18:22
And so we decided to implement a sort of two-stage hashing system. Where when a request comes in, instead of computing the expensive bcrypt, we now compute an old-fashioned salted SHA-1 value. And then we use that to look up in a in-memory map dictionary the bcrypt value.
18:42
Now if that's empty in the beginning, we then compute the bcrypt value ourselves. Check it against the database to see if your password is correct. And then store that mapping like SHA-1 versus bcrypt in that in-memory table. Then the next request that you make with the same password is able to look up the bcrypt value from the in-memory cache.
19:03
That way we were able to cut out, I'm guessing, 99% of all the bcrypt calculations. But it's important to understand the ramification of this system. Because you might be tempted to think that, well, you've now sort of weakened your bcrypt authentication back down to SHA-1 strength.
19:23
It's not a whole story. The important thing, or the main thing I guess, is that SHA-1 values never hit cold storage anymore. So the database is all bcrypt. So if you get a hold of a database, you still only have bcrypts. And even if you were able to somehow tap into our servers and get a hold of, like, copy memory access,
19:42
then you'd get some SHA-1s. But you'd only get SHA-1s from the users that are active at that very moment. Because these cache entries, that's essentially what they are, are expunged very, very quickly. So we're able to get this thing running and upgrade to bcrypt.
20:02
But even then, the remaining 1% of the time we spent on bcrypt is still very significant. Very significant. Just look at that ratio, 160,000 versus 3. And right now, today, if you look at one of our servers, and you run like a perf top or something,
20:23
you can see that the bcrypt cipher method is the most expensive method that runs on that machine. At any point in time, I think it eats like 12% CPU. So it's still hugely expensive. So in the future, I guess, we should probably be looking at migrating or offering something like an alternative to basic auth.
20:43
Maybe standard, relatively standard, HTTP auth tokens, which are revocable and have a limited privilege set. And then let's move on to the storage layer. So here we keep track of all your data, obviously.
21:04
The biggest amount of data that we store, of course, is the contents of your repositories. And there are millions and millions of repositories, and we decided to keep the storage of that as simple as we could, sort of in line with everything else that you've seen so far.
21:22
And we decided to just store that stuff on file systems, just like you do on your local machines, right? Git and Mercurial were designed for file systems. They work really well. As opposed, for instance, to modifying Git and Mercurial to be able to talk to a distributed cloud-based object store system of some kind,
21:45
for instance, what Google Code does, we decided to keep it simple. The file systems that live on specialized appliances by NetApp, a commercial company, are accessible from the application servers simply using NFS.
22:01
And then, aside from that, we have other, so that we have sort of NoSQL storage, like distributed map systems. We have Redis and Memcached. We use Redis for your newsfeed in the repository activity feed that you see. And we use Memcached for basically everything that is transient that we can lose.
22:25
And then the data for the website is all stored in, or traditionally, just in SQL. So we use PostgreSQL, and the data is manipulated and accessed basically exclusively through the Django ORM,
22:44
and that works pretty well. The only thing is that SQL databases, Postgres is no exception, are generally kind of hard to scale beyond a single machine. Transparently, I should say. So, unless you go implement application level sharding to separate your data across multiple databases,
23:05
transparently scaling an SQL database across multiple machines isn't entirely trivial. So far, we've kept things simple. We are running a single Postgres database. It's a very, very big machine, and it has no trouble with the load at this point. And then, for high availability, we have several real-time replicated hot slaves on standby.
23:28
But yeah, in the future, should that thing ever sort of become a bottleneck, which hopefully it will, because that means the server is popular, I guess we have to look into sharding.
23:41
I could talk about this stuff all day long, and I wouldn't mind to do so either, because there's only a few minutes left at this point. So, I want to leave it at this. If you have any questions, I'm happy to take some now. We don't have a lot of time, but I'll take some now. Otherwise, come chat to me afterwards.
24:00
We also have a booth in the lower level, and so you can just find us there. And otherwise, I'd like to invite you for a drink tonight. We are hosting a drink-up in a bar nearby, starting at 7, at Mein Hausamsee. And so, I'd like to invite you over.
24:20
Come have a drink on us, and you can talk all about this stuff. There's two of my colleagues, too. We're also hiring, so if you want to talk about that, that is also possible. And with that, I want to thank you very much for listening, and I hope to see you all the time. Thank you.
24:44
Thank you, Eric. Would the next speaker like to come up and set his slides up? If you'd like to take any questions over there, Eric. Any questions? Hi. Question about HAProxy and Nginx in the beginning of the request.
25:07
So, HAProxy actually has the SSL support. Have you tried that? Yeah, it does. So, our setup on the web layer is a little convoluted, maybe. There are a lot of components, as you saw, that's not strictly necessary.
25:23
Part of that is sort of organic growth in historical. So, HAProxy hasn't always been very good at SSL, at least not in our experience. We've experimented with a ton of different SSL terminators. We've used Stunnel, a bunch of others, and at some point we found that Nginx was, at least for us,
25:42
the most reliable. So, we've left it there, and I know the situation has definitely changed in HAProxy, so it is something that we intend to revisit at some point in the future. So, yes. Hi. So, I noticed that Bitbucket uses quite a lot of JavaScript on the web server side,
26:03
and I wonder if you use WebSockets, and if you do, what do you use for them on the server side? So, do we use WebSockets? No, we don't currently use WebSockets. We've experimented with WebSockets quite a bit for things like real-time notifications and pull requests, for instance, those kind of things.
26:22
But, no, we're not currently using WebSockets. Okay, thank you. Anyone else? Yep. What's PgBouncer? What are you used for? PgBouncer, you said? Yeah.
26:40
So, I had a whole spiel about PgBouncer, but there wasn't enough time to go into it, and there's not enough time to go into all of that right now either, so I invite you to chat later on, but PgBouncer is a Postgres connection pooling daemon. So, we use Django. Django, by default, doesn't come with any connection pooling,
27:02
and so getting Django to talk to a database efficiently is a bit of a challenge. It's not a challenge, but you need something else. So, PgBouncer is part of the Postgres project, and basically what it does is it makes stateful connections, long-lived connections to the database, a limited amount, and then you configure Django to talk directly to PgBouncer.
27:21
It acts like a database. And so then, if Django open and closes connections at a very high rate because you're serving a lot of connections, that is a lot cheaper than open and closing actual database connections. And so it bridges between the two to make that more efficient and to also be able to limit the total number of connections that you end up having on your database.
27:41
There's a lot more to it, by the way. So, if you notice, we have two layers of PgBouncer, and there's a good reason for that, but as I said, I'll have to talk about that afterwards because there's no time. Cool. Thanks. Yeah, no worries. Hi. You talked about the machine with the database, like a large machine.
28:00
As far as I could understand, that's physical machine? Yes. So, what happens if that goes down? So, if the physical machine goes down, we have several real-time replicated hot slaves. So, we're streaming replication for Postgres to have a bunch of slaves. So, if the machine goes down entirely, then we steal the IP, basically,
28:23
and almost instantly, hopefully, move over to the other. Yes. It never happened, by the way, but yes, it's configured that way. All right. Thanks a lot.