We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How we scaled GitLab for a 30k-employee company

00:00

Formal Metadata

Title
How we scaled GitLab for a 30k-employee company
Title of Series
Part Number
20
Number of Parts
89
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O. We will also show how we achieved high-availability after the changes.GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O. We will also show how we achieved high-availability after the changes.
81
Real numberForm (programming)Group actionSoftware developerCuboidBlack boxOpen sourceOperator (mathematics)Point (geometry)Cartesian coordinate systemInteractive televisionImplementationFile systemRepository (publishing)Virtual machineTrailSelf-organizationArchaeological field surveyWeb pageContent (media)Front and back endsRow (database)Scaling (geometry)Physical systemData storage deviceForcing (mathematics)View (database)Queue (abstract data type)Monster groupCache (computing)Task (computing)Mobile appAbstractionTwitterPersonal area networkCloningComputer animation
ImplementationGastropod shellSource codeLevel (video gaming)Component-based software engineeringElectric generatorLogic gateLibrary (computing)DebuggerOperator (mathematics)Social classWrapper (data mining)ParsingSampling (statistics)Web pageService (economics)MereologyBoss Corporation4 (number)EstimationTask (computing)Open source2 (number)Cloning
Data structureScaling (geometry)Traffic reportingPhysicalismComputer animation
Uniform resource locatorStack (abstract data type)Server (computing)Slide ruleTelecommunicationVirtual machineCloningInteractive televisionSoftware repositoryProjective planeCommunications protocolFront and back endsDirectory serviceLine (geometry)Key (cryptography)Client (computing)Web pagePoint (geometry)Different (Kate Ryan album)DebuggerMereologyLastteilungBitCodeInformation securityComputer architectureVulnerability (computing)VirtualizationMechanism designBroadcasting (networking)Open sourceProcess (computing)Logic gateDefault (computer science)Service (economics)Cartesian coordinate systemSoftware developerInstance (computer science)Real numberConfiguration spaceSocial classPatch (Unix)Kernel (computing)Latent heatWater vaporRepository (publishing)Computer animation
Rule of inferenceMoment (mathematics)MereologyData storage deviceMobile appService (economics)Repository (publishing)Cartesian coordinate system4 (number)12 (number)DivisorTrailEvent horizon
Computer fileData storage deviceTwitterContent (media)Service (economics)System callCartesian coordinate systemOperator (mathematics)Axiom of choiceScaling (geometry)Repository (publishing)Point (geometry)Computer configurationPhysical systemUser-generated contentMereologyOpen sourceCloud computingAvatar (2009 film)
GeometryServer (computing)File systemInstance (computer science)Data storage deviceComputer configurationContent (media)Virtual machineReplication (computing)Repository (publishing)Computer animation
Network-attached storageReplication (computing)GeometryRepository (publishing)TheoremSpherical capNamespaceMereologyFront and back endsComputer configurationPhysical systemElectronic mailing listWeb pageCartesian coordinate systemPoint (geometry)InformationLevel (video gaming)View (database)MassProcedural programmingInternet service providerComputer hardwareSpacetimeDivisorSoftwareData recoveryTask (computing)Virtual machineCorrespondence (mathematics)System administratorTrailMiniDiscSocial classMathematicsMultiplication signConsistencyHash functionPartition (number theory)File systemFile formatRule of inferenceData storage deviceAuthenticationInheritance (object-oriented programming)Software repositoryQueue (abstract data type)RootMobile appLogicComputer animation
Content (media)SoftwareMereologyFront and back endsData storage deviceImplementationCloud computingObject (grammar)Level (video gaming)Cartesian coordinate systemFile systemMathematicsNP-hardArithmetic meanOperator (mathematics)CodeComputer configurationSelf-organizationComputer fileRepository (publishing)Service (economics)Logic gateAuditory maskingProjective planeGroup actionOrder of magnitudeCausalityComputer clusterBranch (computer science)Graph (mathematics)Sinc functionMassGraphics tabletPattern languagePosition operatorComputer animation
Directory serviceComputer fileData storage deviceNetwork topologyRepository (publishing)Front and back endsComputer animation
File Transfer ProtocolType theoryInsertion lossComputer fileCloud computingBranch (computer science)Directory serviceComputer animation
File formatMetaheuristikObject (grammar)Data storage deviceGrand Unified TheoryTask (computing)Subject indexingComputer fileMultiplication signPattern languageContent (media)Server (computing)Transformation (genetics)Repository (publishing)Doubling the cubeChainData transmissionCloningValidity (statistics)Real numberDirac delta functionCommunications protocolHeat transferRule of inferenceLogic gateCharacteristic polynomialMereologyRange (statistics)Cloud computingBinary codeLevel (video gaming)MathematicsFile systemCASE <Informatik>Goodness of fitPhysical systemReading (process)Right angleRootEmailOperations support systemMiniDisc
Cache (computing)Operator (mathematics)Computer fileServer (computing)Revision controlSet theoryMultiplication signGroup actionDefault (computer science)File systemFront and back endsSystem callDisk read-and-write headResultantMultiplicationProcess (computing)Data structureRange (statistics)Automatic differentiation2 (number)Repository (publishing)MathematicsWeb pageInstance (computer science)Content (media)Data storage deviceBenchmarkRemote procedure callReal numberTransport Layer SecurityRight angleLevel (video gaming)Electronic mailing listMiniDiscArithmetic meanForm (programming)Moment (mathematics)Radio-frequency identificationCasting (performing arts)Open sourceCASE <Informatik>Computer animation
Computer animation
Transcript: English(auto-generated)
When I submit this talk to RailsConf, it is in the track of we're living in a distributed world. But I'm surprised to find that I'm the only talk
in that track. Seems like there are no other tracks that talks about scaling of Rails applications and distributed systems. So I think the reason might be that as Rails developers we are following some best practices so that making our app distributed or scale our apps
doesn't seem that hard, doesn't seem that problematic. But this one, this GitLab thing is a bad boy, I would say. It really got some problems. And oh, I mainly gonna talk about how we fix those problems. So thank you very much for coming to my talk.
My name is Minqi Pan. I came from China. I work for Alibaba Group. And that is my GitHub account and my Twitter handle. You're welcome to follow me. So what is GitLab? Well, GitLab is, well, let me say it secretly,
it is just a GitHub clone, open source clone of GitHub. But nobody likes to say that. So a better way to saying it is a Git box that you could deploy on your machine. It is installed on premises.
So just a quick survey. How many of you use GitLab in your organization? Oh, two of you, thanks. So GitLab is, if you see it as a black box, it actually exposes two ports. One is HTTP, the other is SSH. And HTTP are used on two purposes.
You can clone a repository via HTTP and you can push content to a repository via HTTP. And also, more importantly, as a Rails application, it provides rich user interactions with the webpage. And on the other hand, the SSH only allows you to clone to Git operations.
And in the backend, from a very simplistic point of view, it stores this content on Git. And that is what makes this thing a monster to scale. Very problematic on that part. So if you look closer, it also uses some other stores on the backend.
One is MySQL. Actually, they also support PostgreSQL because they use Active Record, which abstracts the actual implementation of the DB, so it's changeable. And the other is Redis. Use it as a queue for delayed tasks
and also as a cache. And the other is FileSystem. They use FileSystem to store the Git repositories. So that's the black box. If we opened it up to see what's inside, then you could see it's basically structured like this.
It's all open source, so you could also download the source code and see it. When you deploy it on the front end, there are two parts, Nginx and OpenSSH server. Well, the reason why those components are inside GitLab is because GitLab has a omnibus package
that you could install, and they actually depend on those two other packages. Nginx is for HTTP and OpenSSH server, as we mentioned, is for the SSH port that opens. And when some requests came, for HTTP requests, they came to the second layer, unicorn is for the ordinary Rails requests,
but for the requests for Git, like clone and push, it goes to GitLab workhorse. It's another service written in Go to make it fast. And if it came as a SSH request, it goes to the third part of the second level,
namely GitLab shell. And on the third level, the third level is called by the second level components. Mainly Rails was responsible for operations on page, and GitLab Git is a wrapper around Rugged,
and Rugged is a wrapper around libgit2 on the fourth floor. And Sidekiq, yeah, that was for some task handling. And on the lowest level, it is Git and libgit2. They utilize both implementations of Git.
You know, libgit2, if you don't know about it, is actually a rewrite of Git in a way that is portable, embeddable, works as a library. Ergo the name Git2. They see it as the second generation of Git, but with a lib as a prefix because it's a library.
So this structure works really great for small teams, but the company that I work for has 30,000 employees. Well, this is from the physical year report of last year.
They just published a new one this year, days ago, the day before yesterday, and the stock price went up. It looks good. It's a public company. So let's scale it. So how we do this? Well, first consider about the problem on the front end.
When the request came, it's either HTTP or SSH. As Rails developers, we are most familiar with HTTP, and on the server, it's actually run as unicorn instances, and that's something we are very familiar with as well. We just put the engines in front of them, set upstream in the configuration,
let them point to the unicorn servers in the back, and we're done. But for SSH, how to deal with this is a problem. So I started a project called SSH2 HTTP. It's open source on my GitHub account.
It basically eliminates all those SSH requests, because the way Git interacts with the server between HTTP and SSH is very similar, and the request to SSH could be easily delegated to a Git request on HTTP.
And as we could see from the slides later, SSH is actually such a pain in the ass. There are more complications to this. So I guess that is the reason why GitHub nowadays set HTTP as a default. You know, when you go to a public repo on GitHub,
and the clone URL, as far as I remember, is defaultly as a HTTP URL instead of SSH one. There are actually complications to the architecture that makes the SSH access a little bit slower
than HTTP one. But actually, in Alibaba, we did not use my approach. My approach was this slide, but actually we used this slide. What we did was we are not using engines as the front end. We used something called LVS, and it is a feature from the Linux kernel,
and the specific part of it that we're using is called IPVS, which expands to IP virtual server, and LVS stands for Linux Virtual Server. It is actually a layer four switching service, unlike engines which operates on layer seven
of the TCP IP stack. It does load balancing on the transport layer, so it supports all communications as long as they are TCP IP. The difference between HTTP and SSH are illuminated,
but that comes at a cost as well, because when you go down to the fourth layer, you lose the ability to do health checking with the status code returned by the request, because on the seventh layer, you could see actually what's the status code
of your HTTP requests are, and mark some server as health or not healthy, but on fourth layer, you cannot see those. You can only see packets. You can only see the date, and URL rewriting, you lose that ability as well, because that's on layer seven as well,
and like I said, that comes with complications, because SSH protocol involves some security mechanisms that checks with your keys, and if you have more than one machines in the backend,
their keys are not the same by default, so when you deploy the application, you first have to copy the host keys across the whole cluster to make the host key the same. Otherwise, when you connect to more than one servers,
the client will complain, saying that, oh, the SSH key was different. This is a security vulnerability. You gotta check it out, and it will not connect, and secondly, if you remember, you could add SSH keys from the client
via the web pages when you clone a repository, like on GitHub, and same thing happens in GitLab, so when you add your SSH key to the server, it has to dispatch or copy all of that keys across the entire cluster
to make every machine accept your key. Specifically, they add a line in the .ssh directory slash authorized keys, and they have to do it on every machine, and we did that, well, you cannot do that via Sidekiq,
because Sidekiq, only one machine with a cluster fetches that job, and the other will ignore the job, so you have to do it in a way that broadcasts all the keys across the whole cluster, and we did that via Redis PubSub data structures.
And that goes the back end. Well, the real trouble begins at the part that GitLab stores its repository on the FS,
and I wanna pause a moment to remind you of the 12-factor app. The reason why GitLab is such a bad boy, unlike other Rails applications, is because it violates the fourth rule
of the 12-factor app, but that is some principle advocated by Heroku, where the fourth rule says back-end services should be treated as attached resources, like Twitter service, Amazon service, MySQL service, they should all be configured as a UIR
that could be easily attached and detached, but GitLab stores some content on file system, that is the source of all evils. The content they store are, firstly, Git repository,
and secondly, user-generated attachments and avatars. Well, we are going to move them to the cloud to make it scale. Well, actually, standing at this point, you have a lot of choices. The choice that I'm going to elaborate might not be the best.
I wanna analyze the options that we have. So if you, when they run into a Rails application that has a similar problem, you could evaluate those options as well. So first option is some feature provided
by GitLab Enterprise Edition. It's called GitLab Geo, and that doesn't really solve the problem. You see, how GitLab Geo does things is they make full replications of your GitLab instance across servers.
It assumes that each machine of your cluster has enough file system storage to hold all the content of your Git repositories, and they make 100% copies across them. It's officially supported, but it really didn't solve our problem at Alibaba,
because the size, the overall size of all repositories are big. We don't wanna store them on one single machine. There are not enough disk space to hold them.
So from a distributed system point of view, GitLab Geo is a one master, enslaved, full replication design, and in CAP theorem, which says consistency, availability, and partition tolerance cannot be achieved at the same time. You can only achieve two of them,
so GitLab Geo achieves A and P of those three parts, and there are no disaster recovery supported, and absolutely no sharding, because it's full replicated. And the other option that we could use is seemingly a very perfect way to solve the problem. Well, first of all, we eliminate SSH
by that gem written by me called SSH2 HTTP, and so that we could forget about the problem of SSH and focus solely on HTTP. And seemingly, there is something we could take use of. It is the, you know, every repository stored on GitLab
could be routed using namespace slash repo name, and that part appears in almost every UI or of every request. Like when you see the repository commit history on page,
the root format contains that part, and when you clone it, when you push it, they all contain that part, so why not use that part as a routing key and make some routing logic into nginx to make a sharded GitLab?
And by doing that, every request after coming to nginx will be sharded. For example, if we are going to have a cluster of size three, we could invent some hash algorithm that distributes, that hash the namespace slash repo name
into the cluster, into any one of those three machines. So seemingly, it's perfect, but can you spot some problems inside this?
Actually, one problem is Sidekiq does not have sharding. Oh, maybe it does, but you have to dig into it and see how you could do that. You know, each shard of those three GitLab shards could spawn some Sidekiq tasks
which needs to be consumed by corresponding Sidekiq shard as well. So when you start the Sidekiq shards, you have to start it with special queue names as well. That's one complication, and there are others. Changes have to be made on application level as well because not every page on GitLab falls into a single shard.
Like in the admin page, you could see a list of all the repos with their sizes. Well, if that request falls down into only one single shard, you will not get that information because some repos reside in some other shards.
So major changes will be introduced to the application level as well. And also, you need super user authentication because the SSH requests are not designed to access all repos.
There are user authentication layers in front of them. It's also another application layer, logic layer change that you have to be introduced. So this is actually not ideal. Every way of solving this comes with a cost. So let's then think about how to deal
with the file system storage. Well, we got a lot of options. Well, first, we could make it a 12-factor app by making the file system attachable. There are some vendors who provide such solutions, like hardware network attachable storage. They usually call it NAS, and there are software NAS as well,
like Google has GFS. And also, we could use remote procedure calls to only make shards on the FS level instead of on the application level of the entire GitLab. And also, we might consider killing it. We could maybe use Amazon S3 to replace the FS
as the backend for Git stores. Well, we evaluated all those options. It turned out that NAS is not for Alibaba. Hard NAS, well, Alibaba do not buy those things because it has no IOE policies. And soft NAS, Alibaba does not have that yet.
Like Google have GFS, but Alibaba does not have AFS. But I have to remind you that those two options might be good options for your organization if you want to scale GitLab. You know, they are really good means to solve the problem because it introduces very little change to your application level
because all the change are confined on the lower service that got attached to GitLab. But I did not try them, and they surely come with a cost as well because software NAS tends to be very complicated. As far as I know, there are some good solutions
called CFS, which just came to stable about a month ago or days ago. And if something happens on that layer, you need to have some very talented operation or DevOps engineers to solve those problems. And also, by attaching a NAS, soft NAS,
you will also lose performance because each operation to each IO to the FS is now networked and they are added latency to each network IO. And you are replacing the thing on a very lower level, so the added cost will be much.
So those two operations, if you have a chance, could dig into it. And RPC, well, that is a good solution. I looked up how GitHub solved their problem. Seems like they are doing RPCs. They are dispatching access to Git to some RPC calls into Git shards
instead of Git lab shards, where it's a shard on a different level. They does that. It surely looks like a good solution. And what we did at Alibaba is use the fourth option. We killed the FS and used the cloud. What clouds we use? Well, it's called Alibaba OSS. Well, it's something not that well known,
but you could thought of it as the same thing as Amazon S3. It's object storage in the cloud. And how we did that? Well, the rest of this talk will become a little bit technical. It turned out that Git lab has three ways
to access Git repositories, namely libgit2, git, and grit. Grit is a very old gem. It is written in Ruby. Well, we found that it could be eliminated, making the whole problem easier, because it's only used in the Wiki part of Git lab,
and it's used in a gem called Gollum. And Gollum was designed to have this Git access part pluggable. So we unplug Grit, and we plug Racket, which uses libgit2. Okay, so that makes this only Git and libgit2.
And we compare those two projects, Git and libgit2. Well, Git was pretty old. It's probably written by, started by Linus Torvaz, and it did not consider the problem of backend to plug and unplug. So it's backend is hard to replace.
All of the code are written to access content from the file system. But libgit2 is very modern. I don't know how their creators think about the problem, but they designed the backend as replaceable. You could write your own backends. So the basic idea is we write our own backends.
We write the backends that actually stores the content on the cloud storage. And also, well, the grid has been eliminated. Also, we have to implement Git on top of libgit2, because Git cannot easily replace its backend storage. But libgit2 could.
So cloud backend, cloud-based backend, what's that backend looks like? Well, that involves some details about Git. Git has two parts to storage content. One is called ODB, and the other is called RefDB. ODB is for the chunks of data
that you put inside the repositories. And RefDB is the branches, tags, that you put in the repositories. And for the ODB, there are also two parts, two kinds of ODBs.
The first is loose ODBs. Those are, you know, Git is fundamentally a content-addressable file system. The content-addressable being the sha1 value of the object that you are trying to fetch. So loose storage actually stores each sha1 values.
I will, you know, open up a example. Maybe I will talk to you guys.
Huh, that's a Git repository. And if you go into the .git directory and you see tree, yeah, you can see there are some, like those files, those are called loose-stored files. And there are also pack-stored files. Those are the pack-stored files.
That's what I meant. So we wrote a cloud-based backend to store both types of those files. The basic idea is for the loose files, it's pretty straightforward. When you read, you make a HTTP request
to read it from the cloud. Oh, I forgot to explain the refdb. It's very similar to loose files where you can see it's under the refs directory. All of your branches are inside the refs has master, and master will tell you a sha1 value.
So it's basically k-value store. And that translates to HTTP request pretty straightforwardly. You see each refdb read, we made it a HTTP read, and each refdb write, we made it a HTTP write.
Each loose odb store, we make it a HTTP port, and each loose oss store, we made it a HTTP read. So that's the simple part. The complicated part is the packed content of it, because if you only store those loose content,
it will be as slow as SVN. The very reason why Git is so fast is because it has a very good design of packs. Pack files are used both as a way to transfer content between server and the client, and as a way to store the content of your repository on disk.
It's both a transfer file format and a storage file format. The way we write those packs are easy. We just translate them to port request to HTTP, but the way we read it is complicated. You see, every pack came with an index file,
and that index file tells you about if you are looking for some object in the pack, where to start. So each request will be translated into a lot of range HTTP requests. First, it will read the IDX file to find the next range to read in the pack,
and then it read only that small portion of that file using the range header from the object store. So as an example, if Git need to read this content,
then first the byte will be blah, blah, blah, and it will binary search in the index file, and it will get a offset to begin in the pack file, and in the pack file, it will see if this content stored is a delta or not. If it were a delta, then it has to continue
looking for the base of that delta, and the whole chain continues, continues, and here you find the root, and by combining all the deltas with the base, you get the object that you're reading.
And here's an example. It's a real-world example. The chain is as long as file. You have to jump inside the pack file to actually got the thing that you want to read, because each time you read it, it's actually only a delta. So that is a real problem to us, because if the IO pattern inside this pack file
is not good enough, then you will end up having a lot of range requests on the HTTP. That will make the thing awfully slow. But the good news is the inventors of Git, they made some very good heuristic algorithms to when the pack files are generated so that those IO patterns are not that bad.
So when we make a range request, we could actually make the range bigger than we needed. Therefore, we could fetch bigger content from each range request, and that content would be sufficient to fetch all the way to the root of that object.
And by this good characteristics, we reduce many HTTP requests to make this whole solution not that slow. That's one part of it. And the other part of it, as I said, you have to make Git talk to libv2 because Git does not have the backend replaceable.
It turned out that this is pretty easy. Actually, the inventors of Git, they're pretty smart folks. They write Git in a very Unix way. All of the commands, they call each other, like in git fetch and git clone. On the server side, they first got called
was git upload pack, and git upload pack will then call another command called git pack objects. And for the commands that deal with the transmission protocols, we will not touch it, and it's complicated, and we do not touch it. We only touch the thing that does IO from the disk,
so we only need to replace the git pack objects. And in the git push scenario, we only need to replace or re-implement git unpack objects, and implement it on top of libv2 is very easy. It's no big task. And also, there are other scenarios.
There are two scenarios when doing git push. There are small data, got unpack right away, and got written to the loose storage, and their big data didn't got unpack because unpack consumes time, and they directly create the index for it and write those pack.
So in this case, we need to re-implement git index pack, which is pretty easy task turned out to be. All right, so after all of those changes, let's see how the performance looks like. It definitely gonna be slow because you're still changing a fast file system IO
to some slow HTTP IO. So let's see how it looks like. Well, the text fixture we use is a repository called GitLab CE. It has more than 200,000 objects, and when packed, it weighs more than 100 megabytes.
And git push, well, about same performance because, you know, file system, we write it directly to FS, but on cloud, we write them directly to HTTP,
and there are not too many new operations created. It's just only added a small amount of time to each of those two operations. And git push delta also, like I said, there are two scenarios.
When you push large content, it only stores a pack. So this is large content scenario, and if you only push a little content, it got unpacked and stored loosely. And this is the delta case, well, also not too much time added. And git clone, well, it is actually 100% slower
because when you do clone, the range operations happen, and that's what makes it slow. And also git fetch, it got way more slower because this is a delta fetch, and this usually happens when you do git pull, when your coworkers updated the REPL.
And it also has to go through the whole process of the range operations that I mentioned. So it's really slower. But the good news is it's not that slow. The user has to wait longer, but it's not something that they cannot wait.
And also on the page, it got way slower. All of the Rails operations were affected because we are operating on a deeper level, and the Rails will call Ragged, Ragged call libget2. Libget2 is slow, so Rails is slow. Like on this page, we're listing a file list,
and the show actions not take five seconds to run. Well, let me see, all of those benchmarks are all without cache, so the real world scenario will be better because we have cache. And like this, this is another Rails operation,
and before the change is 50 milliseconds, and after is about five seconds. So that's the reason why we have to add much cache to it. We added a cache on multiple layers. Like those Rails layers, we added them.
I'm not gonna elaborate on all the cache that we add, but for some interesting aspect of this, this is something interesting. Well, libget2 was designed in a way that it could have more than one ODB backends,
and you could even set a priority to it. So we basically made a hamburger structure of that backend. We added two new backends to it, which is the cache backend. The servers that we deployed those things to still got a file system to use, and we used that as an on-disk cache.
If we read some content once, we'll store it on the file system so that the next request hit, it could just read the content from the file system instead of making remote HTTP calls. And the good news is the ODB of Git never change.
You can only put data into it, but you can never modify data, so we are free from the problem of cache expiring. And also, the refdb could cache that via Redis, but that's way more complicated. That might not worth the effort.
I might remove it in the future because you have to expire the cache. Refdb's got updated all the time. When you commit a new commit to master, say, the refs slash heads slash master got updated, and you have to expire the cache.
So I'm not gonna go into details of when the cache got updated. And lastly, I wanna say something about future work. For right now, it seems like this idea works more or less acceptable,
and if you guys love it, I will try to do a AWS S3 version of it because it's currently working on OSS, which is not so widely used. And there is some need for this. The reason why there may be some need for this
is because GitLab cannot be deployed to Heroku at this moment. And if we could make this backend for AWS S3, then the users of GitLab could have a chance to deploy it to Heroku. And also, GitLab still has many direct calls to Git,
like for the history page of, the commit history page of a repository. It actually spawns another Git instance to fetch the result. So we could eliminate some direct calls to Git. And after,
if we develop that backend for AWS S3, we could add settings for the user to choose which backend he wants to use. It could be either File System or AWS S3. That would be perfect. And Gollum, we could do some work to make them use Ragged as the default.
In LibreGit 2 itself, we found it less performant, we found it slower in many scenarios compared to Git. So we could improve its performance in the future. And I will be actively do those jobs on my GitHub account. So if you're interested,
you could look into my account and see how it goes. Thank you very much.