In Current MVCC solution of PG, ProcArrayLock is the major bottleneck on many core machine 120+ and can scale up to 30 connections in TPCC test. Done experiment with lock free MVCC solution, and it can Scale up to 120 cores. We have taken the CSN based solution proposed in PG community, and implemented a lock free version of the same. By considering the High Memory and other resources in many core machines, locks are avoided in all the performance patch and only in some rare paths locks are used.
a lot of technology for the last 9 years and have been working for a while as in-house database which is based on the force from last 5 or 6 years and making in various modeling optimizes executed years and make them legislation to scale up solutions so it's late go according to the testing that are actually no speech is based on that was going on medical machine which is that the 60 posted on was 120 hybrid threats and their planning defeated on to androgen more quote machine so that this specific go along with need not be down the and showed people and what it means doing the research the so all of this
What we have going to discuss: First the seawater executive Committee issuing the at equal machine and then the solution. We introduce a solution - a lock-free CSN solution for solving the problem of MVCC scalability issue and go through the performance and conclude the presentation.
The performance test on TPC-C benchmark: The machine has 60 physical cores with hyperthreading (120 threads total). Performance peaks at around 25-30 connections with 150 thousand transactions, then plateaus and comes down - it doesn't scale beyond that point. CPU utilization shows the system is not scaling linearly. Analysis shows that the ProcArray lock is consuming significant time, with high contention as the number of connections increases.
Analysis found that a high percentage of threads block on the ProcArray lock. The ProcArray lock is used for snapshot data and program conditions.
To understand the problem, we first briefly introduce the concept of MVCC.
MVCC (Multi-Version Concurrency Control) allows concurrent transactions without blocking reads and writes. Snapshot isolation is used to ensure consistency - each transaction takes a snapshot and follows that snapshot throughout execution. This is achieved using snapshot metadata.
The system maintains snapshot metadata with variables for the highest and lowest transaction IDs. When taking a snapshot, we determine which transactions are visible by checking if they fall within the boundary. For visibility evaluation, a transaction should be smaller than the snapshot's xmax and greater than xmin. If xmin is already committed, the transaction is visible.
The problem with taking snapshots: We need to run through all the processes to find out which transactions are committed, requiring a shared lock on ProcArray. This shared lock creates contention. Even though it's a shared lock, the exclusive lock needed for modifying the transaction list creates bottlenecks.
The solution: Instead of calculating the complete transaction list, assign each transaction a commit sequence number (CSN) indicating the order in which transactions commit. We can directly compare values using this identifier.
With CSN, instead of taking a snapshot of all running transactions, each transaction gets an incremental CSN value when committing. When taking a snapshot, we assign a CSN to the snapshot, and each transaction has its corresponding CSN. For visibility detection, we directly compare the snapshot's CSN with the transaction's CSN.
The challenge is generating and comparing CSNs efficiently. We need a mapping between transaction IDs and CSNs. The mapping must be very efficient and all operations (reading/writing) should be lock-free.
We use a circular map where we get a slot by the formula: XID mod map_size. When a transaction commits, it gets a CSN and stores it in the circular map.
that I say the get slot and
then every time it's being assigned a theUS number that's so the that hold all the operations the US lazy lot the operation so is any question summaries of so now we
discuss some not the operation that how we I managed to achieve the operation of the whole is so
The circular map can handle conflicts when multiple transactions map to the same slot. If a slot is occupied by a transaction that is still visible to some snapshots, we need to move it to a sparse map. The sparse map handles overflow cases.
log the opposite of that is is the kind of of that right there and I what I'm telling you they're not actually the spanish and that it is that I think to the basement and it is meaningful enhancement so stop documentation intimate dilation need to write into the business studies have right this exciting and coming back you have great this season and technique to get the is the number for the position in so how hope of many expenditure developed of use ones that ball will indeed they didn't I variability existing the PEP for managing below not operations supporting other step and it should start translation need to unlock slot bit excited it was I assigned x 80 and fairly the detectors it need to be the diversity so on its opposite find again that a given the experience so what already openly but problem comes then but sloppy already did in the 1st year and that Internet and if transition or amending and I want to use slot by the band somebody want to read the so it can happen like indexing and indeed this is important and uses it later in the season the disease and is wrong because the new is coming to that is known mainly fleshed out those sparseness and that have a problem what we do is we use the mean value less than double the so you like you like this using the start nature wouldn't even detect posterior dexterity media the CSN needed figured again so operation is an order of it because of the heavy media data and the reading that 18 so they should have said is still the same what we want with latest history and then you the the invexity noticed this and we need to with the the sparseness and from this past that is how we can manage that he and I denote the for this right of and what
I feel pretty I just described and out and it's not the present but maybe many without considering that the focus on the and
so conflict handling were creating Murray discusses this when they go would like going this slot 1 believe the slot that they for somebody to your reading the map how would move to the sparse and what the market thinks that it is that discussed in what you really think so now the X.
We provide options: users can choose between CSN snapshots (fast) and XID snapshots (for long-running queries). When the fast map is 80% full, we can convert CSN snapshots to XID snapshots for long-running transactions. This prevents the need to maintain too many entries in the sparse map.
so this is the operations what I statement invisible to him when he did go for it and what you going so when starting the actually if 1 were problem can come when most attention started the what happened is when the starting it can happen in movies all and again the starting so that they are any of you have XAB genlock so that is actually made this edges predicament actually genlock divide is starting in the like the into the next so that we don't want any problem so what you do it in the connected cannot be just a look at the slot in the slot they automation Shady's learning all is found that and it was done by nova magnetic you would do the Spartan but if it is something that is a problem with over learning not to write them in the pictures I want to move sparse what the coming dilation want the extended to when right on the beach and I cannot do that all the operations that is very very you that when I finished my 100 thousand validation that would is a student independent the learning that is that is that was the need to have the handling for that and that gets and keeping 1 flag and having some other legal reason for that lecture will like is that there is a learning I take that flag and then only a weak spots an income dictionaries is actually saying this year's and those obtained deflect funny had happened at and ideas and see if I can do this by so that's the 1st definition
that entire nations what did I just it wouldn't start
intention to with invisible
detect 100 discussed later we directly get the slot and how we do those lock the operation maybe we need that 1 of the and and paying getting bigger novel and there's
a problem commonly encountered is how local when I'm telling that investors should be and I don't think any law so all we can do with X and for me and how to handle what percentage of the Linux section the yellow implementable solutions you have played with the solutions when global X this man go with the it's in this good let's and that's what solution what they're doing is still are not having any locked into this infinite time not good thing global xmin into the gas sensor data is still and moving them but with the property and find even though it going to be indeed later delayed because improper evaluation for that session may become economic next section need to be in the same maximum so did the did did somebody that may not be that some of the top of it's supposed to be the regulatory and legal maybe next step so that 10 years from maybe somebody can have feedback that maybe something should be doing that which is not enough but it would be clinically so what we do is invest a program transition to take the shared up online we can click the X. mean for the projection of phi into that but the detection is important but we even X in the debate is some and mean and that any is not useful in the calculation of for the book on this text in in that manner and taking the snapshot full in the same session next by the end of the collection for that so the conclusion that go with his this isn't it so as to get the more Peterborough in the log of X travel compared with the global maximum solution in that what we do is any advantage in this lecture we had the isn't that is and I I think this the ascending for that is the exact and n-dimensional on the incident everything global economic include the global seasoning from all the snapshot I find that whatever the familiar for them I conclude the main reason for that is that I want to go with his and men and go over the assignments and need dimension having this smaller than that is not available in all the lectern mosaic on the the feasibility of the other dependent logic which is on the order excellent I need to ask for it to be based on the global the in Lake vacuum actually needed about but sometimes have the leaders and upload and that hydrogen is the window you miss everybody see the couple at the data and then you move all sexually I can have the same business season if for sum of all the which are deleted by some foundation and diffusion of that is smaller than the global the assignment and actually taken to I can remove this from the space so we
Performance results: Using the same 60 physical core system, we achieved almost 2.5x performance improvement. After tuning configuration parameters (adjusting slot numbers, shared buffers), performance improved to almost 5x compared to the baseline.
Testing showed approximately 40% improvement in distributed scenarios compared to the baseline.
and all that because the machine actually will of course is 60 courses so we feel that if you get a look it this machine if you get on the commission video of yes it based on on the CPU long then the next bottlenecks immediately what next looks less local in this field or on the part of what was thought so these events this is well
Performance scales well up to 70-100 cores. The ProcArray lock bottleneck is now resolved.
The experiment shows that the ProcArray lock modeling is now resolved. We are exploring new bottlenecks and configuration tuning for even better scaling.
