Migration of a 5 Node Oracle RAC to PostgreSQL
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 37 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/49121 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
PGCon 201813 / 37
1
4
5
7
10
11
12
14
15
16
18
22
23
24
27
30
33
34
36
02:26
Computeranimation
03:20
Computeranimation
04:54
Computeranimation
05:40
Computeranimation
07:16
Computeranimation
09:34
Computeranimation
11:00
Computeranimation
12:25
ComputeranimationDiagramm
13:43
Computeranimation
14:38
Computeranimation
16:32
Computeranimation
22:39
Computeranimation
23:57
Computeranimation
26:19
Computeranimation
31:01
Computeranimation
39:54
Computeranimation
44:54
Computeranimation
45:38
Computeranimation
47:19
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:15
All right, so good afternoon, and I'm Avinash Valarapu.
00:20
As you could see here, but you may call me Avi. That could be simple, the first three letters of my first name. And yeah, I'm currently working for Percona as a support tech lead. And I think most of us have seen the news that Percona has started supporting Postgres, starting from July 1st.
00:40
So yeah, that's why you could see Percona and myself now. So OK, so I'm going to talk more about one of the POCs that we have done some time back later, which got implemented, of course. So it was a migration of a final Oracle rack to PostgreSQL.
01:00
So I think there have been several talks in the past where people have talked about how to migrate Oracle to Postgres, or what data type to choose for what. And apart from that, like how to migrate your store procedures, what are the tools available. There are several documentations available.
01:21
There are several tools like Ora2pg, schema conversion tool, or DMS. There are several things around in the market. Everybody have been watching all the stuff around since quite a long time. So I may not actually repeat most of the stuff, but I'm going to talk about an experience
01:41
that we had in the past, and also the problem statement at that point of time, and how we resolved it. What is it that we did in Postgres to make it Oracle-like Postgres, right? And also, it's just to give that trust and confidence
02:02
to everybody that if you think about migration, it's not that, oh, Oracle to Postgres migration. I mean, it's huge. I don't think everything that's possible with Oracle is possible in Postgres. I mean, it's just to get away from all those thoughts. I was just thinking to talk about only a few
02:23
of the things which are really important for all of us. Yeah, this has been one of the diagrams that we have been looking at since quite a long time. I know numerous times, like DB engine ranking. And we have touched that 400 score. And Postgres has only been that database, along with MongoDB, which has always
02:42
had that positive trend, which actually indicates that there are a lot many migrations happening to Postgres. And I'm not sure if everybody has seen the Postgres 11 release notes and the next release roadmaps. But it should be quite interesting for all of us
03:00
to see where Postgres is going towards. I mean, if you think it takes time to get up to Oracle features, actually that's not a right way of thinking, because it has already got to that stage. And it has actually got much more better things happening at Postgres, right? So yeah, so I'm going to talk about Oracle fine order
03:26
active Postgres for sure, what are the tools that we have used for migration. And now we'll talk about Oracle Enterprise software. And we have RMAN, all the other backup tools. What kind of backup tools can I use
03:41
for Postgres for that equivalent backup strategy that I can set up? And at the same time, what are the parameters that I need to really concentrate on once I want to migrate to Postgres or get a better performance with PostgreSQL? Will I get the same TPS as Oracle,
04:01
or will I get the same performance as Oracle if I have a business logic, or if I have any of the complex queries running? And also apart from that, I have a lot of business logic. How can I tune my stored procedures? What can I do to see where the problem is? How I can modify my stored procedures
04:23
to go ahead and fit it well for Postgres? And I know with the community or the vanilla Postgres that I get, I may not get everything that I need to make it like Oracle. Are there any other extensions that I can create that are supported by community or available
04:44
as an open source extension, of course, that I can use to achieve Oracle-like feature? So we are going to talk most of all these today. First, I'll get to the diagram of what we have seen when I was talking to the customer.
05:03
So the customer said, OK, we have a fine old Oracle rack, and we have 10 application servers connecting and doing a lot of writes. And the reason why I highlighted that storage is because we know it's a shared storage architecture. I mean, high availability rack with shared storage,
05:22
multi-master, you call it, or whatever with the interconnect. So this was the diagram presented to us. Now, what's happening was they have a fine old Oracle rack and 10 application servers always writing and even for reporting.
05:41
So you could see the specifications, like it's a 48 CPU machine. Each of the servers, 120 GB RAM, Oracle 11 GR2, recently they have upgraded. OEL 7 and database size 800 GB. When I say database size 800 GB, I know most of us could be from the Oracle background.
06:00
It's a logical database size, like it's 800 GB that is accessible from any of the nodes. It's not like 800 GB in each rack node, right? I mean, I know like most of us are aware of Oracle rack 10 application servers. So what customers said is like now after new acquisitions, we have increased the product base.
06:22
Definitely, the transactions per second are expected to grow more. And several thousands of vendors being added, extensive reporting happening. And there is license cost involved. So should we scale vertically or add more rack nodes? What should we do?
06:41
Or can we think about Postgres now at this point of time? Do you think you can save our license cost and get us the same kind of performance with Postgres and help us save some license cost and achieve very good performance the same way? Now, OK. This is the point where most of the database administrators
07:01
or the developers or the program managers start thinking like, OK, this kind of strategy that may work, may not work are very complex. But in order to proceed further, what we did was we first understood the history, where
07:22
they started from, and how they got to five nodes, and what's happening in the machine right now. And can we see by learning from the application, the way application connects to the database, and apart from that, the way transactions are happening in the database, can we see some pattern with which we can say like, OK, we
07:43
can rather change the design this way or migrate to Postgres using some of the design. We just wanted to see the history. So what we understood was when they started, they were a two-node rack. They just had two schemas and 50 application modules.
08:03
So what happens was, among those two schemas, one was kind of a global schema that has got all the SKUs as well as the vendor details. So some kind of schema which is always static, and one schema which was always taking writes.
08:23
And then this slowly got up to a five-node rack, like six schemas, 750 application modules. When I say six schemas, what's happening was, leaving that common schema aside, there were five schemas which were actually kind of independent to each other. They're not dependent to each other, of course.
08:41
That's why there's five separate schemas. And each of the schema corresponds to a certain application logic. So there are a certain type of orders that actually write to one schema, certain type of orders that write to a second schema. So that way, they had multiple schemas.
09:02
And depending on the type of orders chosen on the web, they are redirected to the appropriate application server. And in Oracle, we know, whenever we are using rack, most of the Oracle rack environments have observed, they've got dedicated services created on the database rack.
09:22
I mean, each of the service has got preferred instance and an available instance. So for example, you have a five-node rack in this example. You have service one, service two, service three, service four, and service five. You could see in this diagram, any application that's actually writing to schema one, it
09:43
will actually connect through service name one, which has got node one as preferred. And apart from that, if in case node one goes down, it's redirected to the available node through the service name. So that is how high availability achieved, right?
10:02
So what's happening is these two application servers, for example, are configured or have got the logic to accept a certain type of orders request and have got application modules in place that write to a specific schema. And now, it connects to the service that redirects the connection to node one
10:22
all the time for schema one. And now, if node one goes down, it connects to the available instance node two, right? The reason why they have actually started writing to individual nodes is because they understood that over a period of time,
10:40
after five years, like when they are trying to write to the same schema on multiple nodes because of the interconnect or because of the global cache fusion, right? They were getting into a lot of contention issues, and they started writing to the preferred nodes.
11:00
And now, OK, this was a history which was a kind of a good indication for us. Now, apart from the five schemas that actually take a lot of data, I mean, a lot of writes, there was one global schema which actually is common for all the schemas. So when I say one global schema,
11:22
it has got the SKUs and the vendor details and all that that is common for all the schemas. So all the transactions use that global schema, and it depends on the global schema. Now, they definitely need to write more. So will they vertically scale the server?
11:43
Can we create one PostgreSQL node that actually can take care of all the writes and reads and have multiple slaves that can help a lot of reporting for read queries because you know that a PostgreSQL slave can take reads, right? Just like the Active Data Guard or, I mean,
12:02
just like the option that you have with Oracle, like read only with apply the slaves, right? So yeah, we were thinking to do that. And then we understood that over a period of time, they'll be writing a lot more. And we then thought about, OK, let's, instead of thinking about vertically scaling it,
12:22
see if we can horizontally scale it, right? And then, yeah, and then what we actually thought about was this design, right? And so we have one database, one database cluster,
12:41
which is with two set of slaves. I'll tell you why we got to this structure a little better. So we have one database for individual schema, like schema 1, schema 2, schema 3, schema 4, and schema 5. And for the common schema, we have actually got the application logic modified in such a way
13:01
that because it is just a 250 megabyte schema, which takes writes like hardly, I would say like once in a month or once in every quarter when there is a new vendor list added or new set of SKUs added, right? So we modified the application logic. I mean, we suggested them to modify the application logic
13:22
to write to all the databases the common schema, just the common schema, which is actually not a huge load for them. So now we got away from that rack to kind of every schema has got an independent database cluster with two slaves,
13:45
right? So before going there and explaining you that more, right, what are the steps that I feel, as you know, are actually involved in a migration. First is a design. We definitely need to understand how the existing architecture is and then see how we can actually
14:05
go ahead and think about the current existing application architecture application layer, see how we can modify that to see if we can improve it. Or it's definitely the existing architecture and the history that's very important to go ahead
14:21
and proceed with the design, as we discussed. And then business logic conversion, there are several tools available to do that. We're going to talk about that as well. Data migration and post-migration. This is what I feel is involved in migration, but we could definitely get to a lot more stages, a lot more steps.
14:40
So we also started analyzing the current Oracle production database, what's running in the Oracle production database at this point of time. Like, starting even from CPU, memory, IO, contention, what amount of bloat is there in the Oracle database? Is there a partitioning enabled, archiving, purging jobs?
15:01
And what is a working data set? You have several schemas, like five schemas, that are taking writes on individual nodes, though it's an Oracle rack, right? But what is a working data set? What we understood was their reporting queries are actually running. And kind of, you know, for example, a schema 1,
15:24
I mean, the node 1 that has got all the cache of schema 1 is actually taking reporting queries of schema 4 or schema 5. So whichever node has got the lowest load, they start going ahead and running their reporting jobs
15:40
there. So that way, the active data set that actually could sit in the cache, right, was kind of fluctuating. So we understood that in the beginning. And we also found out that their tables are not being maintained properly. So the database size that they currently have could actually shrink to much lower size, even the tables.
16:05
So we analyzed that first. So we went into the existing Oracle database architecture. We saw how it is performing and what's happening in each of the nodes. What are the current issues? How many transactions are running? What are the tables that are actively
16:23
involved in transactions, right? What is the amount of data set that's actually involved in transactions? Because that helps us design the PostgreSQL database server. So this is something that we were discussing about some time back. Like, each schema has 150 individual application modules
16:41
writing to it. And one common schema that is like 250 MB in size. And reporting queries run on all the fine nodes. And for them, high availability is, of course, a concern. That is why they're using preferred and available nodes. So this is what we designed, like one PostgreSQL high
17:03
availability cluster, one master, and two slaves. Now, why did we choose one master and two slaves? It is because, for them, license is a concern. Getting another server is not at all a concern. To enable high availability, most of us, what we do is,
17:22
we have our master, which is a primary database server in one region. And we have our disaster recovery environment, like the slave, or standby in another region. Now, we wanted to set up two slaves, one in the same region as a master, and another one, which is in a different region,
17:44
to enable high availability. And for them, one of the important, like as we are from Oracle background, we might have heard about something called Flashback, where I could go and implement some DDLs or whatever changes
18:02
I could do. I could create a restore point, and I could flash back my database to that restore point. Now, can that be possible with Postgres, is what their question was. And for that, this was a solution that we have given. We could set up a delayed standby, which
18:20
could be delayed by four hours. And their concern is gone. So if they would need to go ahead and roll back the changes that they have done, they can automatically go ahead and switch to the delayed standby and promote it. And that's how they can go ahead and kind of flashback.
18:43
And if you do not want to do that in your environments, just like AWS says, like you could go ahead and do, I mean, use our snapshots to do point-in-time recovery, you need to have better backup strategies, using which you can achieve point-in-time recovery, if you need to go ahead and roll back your changes.
19:03
And I'm going to talk about one more experiences that I had with one of the recent Oracle migrations when I was working with Jim here, and how we have achieved one more strategy towards, like, versioning of changes. So I'll talk about that.
19:21
And for PostgreSQL-like databases, make sure that you are reading about or knowing about connection pooling, if you're already not aware of. So we implemented the well-known connection pooler, pgBouncer, and HAProxy for connection pooling and load balancing. So HAProxy, especially for load balancing,
19:43
and pgBouncer, especially for connection pooler. Why a connection pooler is needed is just to ensure that we have persistent connections on the database that can be reusable. So opening a connection and closing a connection is a complex operation on the server. And as Postgres is totally process-based,
20:03
Postmaster forks another new process every time a connection is received, right? If it's not a persistent connection, that opening and closing the connection is always a complex operation, which could be resource intensive. And that is why you need to have a connection pooler in place if you do not have a perfect native application
20:24
connection pooler. And it happened in their case. So we have gone ahead and suggested the pgBouncer. And that was installed on all the application servers. And you have two good solutions for high availability at this point of time for Postgres.
20:42
There could be many that are coming up, but RepManager and Petroni. So you may want to look into that, which could allow you to enable automatic failover as well. So in RepManager, you'll have configuration file where you can go ahead and script your promote command.
21:01
When I say script your promote command, to promote command, you can use a bash or a Python script, whatever you want. You can code the steps that you would manually perform in the event of a failover. That means, what is a failover? Master is not available. I want my slave to be promoted to a master.
21:22
When I do that, I need to ensure that all my applications connect to the slave. And in order to let it connect to the slave, I may need to go ahead and modify the application connection strings. So the set of commands involved to do that and the set of commands involved to promote the slave
21:40
can also be coded. And all that sequence of steps can be written as a script and passed to the RepManager. When the daemon detects that, it actually detects that the master is not reachable, it can go ahead and run that script as well. So depending on the environments that we are working on,
22:00
we could modify our scripts in such a way. And likewise, we also have Petroni, and that's really getting popular with Postgres. I would request you all to look into that. That depends on kind of a, you could use HCD on top of it, which is more of a consensus-based algorithm,
22:21
which detects who should be the leader, and Petroni will actually take it from there. So it has got good APIs, which could be used by HAProxy to detect who could be master, who could be slave. So everything, the failover can be automatic.
22:41
So yeah, so I was just talking about the connection cooler and the load balancing thing. So why load balancing? If you are getting any read requests, you can load balance a read request to both of your slaves as well. If you know that there can be some reporting queries that could be work, I mean, that could run on your delayed standby as well.
23:01
If it's trying to look for some data, which is a day ago or more than a day ago, then it could actually be directed to even the delayed standby. And if you have any other reporting jobs that could connect to the slave, it can still connect. And I still remember the issue in Oracle, which is like snapshot to old error,
23:20
because you maintain or we maintain a global undo image, the undo segments in Oracle. But in Postgres, every table has got its own undo, I would say, right? So it's created as older versions, and if Postgres understands that there is no such query that's actually
23:43
depending on that older version, that's the only time when it could actually get that off the table. I mean, to say there is a lot of process involved in that, like auto vacuum and the other things that we can read about. Yeah, and talking about the business logic migration,
24:01
we have actually seen 150 stored procedures in them, but they're not as complex stored procedures as we have thought about. We have used Ora2pg. You could also use Ora2pg, or depending on the instance that you have selected or migrating to, you may want to use SCT, DMS.
24:21
In Ora2pg, you'll have very good configurable options, I'm sorry, where you can actually tell what is your Oracle database and what is your Postgres. If you want to directly apply the changes after converting your Oracle schema to Postgres, or you want to create script files, for example, you have 100 tables in each of these schemas.
24:45
When Ora2pg converts the schemas, it will create individual .sql files for all the schemas. And at the same time, it can also create individual .sql files for all the tables, procedures, functions, views.
25:00
So for all the type of database objects, it can. And at the same point of time, you can also migrate all your data using Ora2pg. I mean, it uses copy commands behind, and you could even use parallelism, parallel processes in order to make that data migration faster.
25:21
And you could also tell for every chunk of records, like one million records, you need to have a parallel process created in order to migrate the data. And if you have modified, because we have modified the table structure, and there were a few changes that were implemented
25:41
where the schema got changed. So we even used foreign data wrappers in that case, where PostgreSQL can talk to the foreign server, which is the Oracle server, and select the data from there and insert it into Postgres. So Postgres can talk to another heterogeneous database using foreign data wrapper, right?
26:01
So that's, again, something that you could use. And so you can all look into, I mean, there are several presentations in the past too that have actually happened towards Ora2pg, et cetera. So I didn't want to go more deeper into that and go to everybody if they've already seen it.
26:20
And then, once we have done the migration, like, you know, let's say all the stored procedures got converted, now it's not that all the migration is completed and all the, I mean, the entire database is actually ready for, you know, like, transactions. You need to tune it.
26:41
And in order to tune your database, you need to have good resources that could actually help you do that. So if you're talking about stored procedures, how can I tune my stored procedures, right? You can explore more about PL profiler. And so that is a tool that could actually help you
27:01
go ahead and look into the stored procedure call stack. When I say call stack, it's basically like every line of code, the time taken at every line of code or the SQL, you know, you could actually get an HTML report out of it and you could understand what is that line at which the execution time is lot more.
27:23
So PL profiler can help us tune the stored procedures in PostgreSQL. So you just need to run PL profiler for the function. And when you run that, it just generates an HTML report for you for the entire function code.
27:40
And that report tells you, okay, this for loop is taking more time or, you know, there's a skill statement that's running inside this for loop is taking more microseconds. So let me see if I can tune this SQL code, right? At the same time, if you would want to go ahead and look at the time taking SQLs,
28:00
in Oracle, we may have the options of querying the DBA hist tables or, you know, all the other tables that could actually help us see what are the SQLs that are taking more time and what are the SQLs that we may need to improve depending on the, you know, like,
28:21
execution times, et cetera. But in Postgres, everything is actually returned to the log file, right? So you can go ahead and tune your parameters like log min duration statement that helps you capture all those statements that run for more than log min duration statement. Like, you can set it to one microsecond or one millisecond or one second or one hour.
28:44
Depending on the time that you set to it, those statements are logged. And auto explain, that is something that actually helps you even capture the execution plans. One thing that you need to remember is in Postgres SQL, unlike Oracle, you don't have soft parsing all the time. So if you use prepared statements,
29:02
the execution plan can be reused. But if you're not using prepared statements, it may have to go through hard parsing. And in that case, like, if you would like to know any of your SQL has changed this execution plan or what is the runtime execution plan of a specific SQL statement, if you will need to see that, right?
29:21
Enable auto explain. So that would actually run an explain analyze output for you in the log file. So you can go ahead and let's say you want to run a function and you wanted to see what are the SQLs running inside it and what are the execution plans of individual SQLs.
29:41
Enable like session level auto explain and run that. End the session. Only for that session, whatever you run within that session, the results are captured into the log file. Open the log file and look at it. At the same time, tune log statement log line prefix so that you can configure the way
30:01
the log should be printed in the log files. And on top of all this, you could actually go ahead and run your pg badger report. So if you are new to Postgres, I think pg badger is something that you need to look into, which is a log analyzer and it can run an HTML report out of the log files
30:20
and get you some viewable report. Like you can just click on it and say like time consuming reports. Okay, these are the time consuming queries. These are the time consuming queries. Or what are the queries that are actually, I mean like what are the number of logs, right? Or a lot of information related to vacuum,
30:42
so all that can be visible in pg badger reports. So all this actually helps you analyze the performance and concentrate on the queries that are taking more time or help you identify the problematic queries or help you identify the methods to tune your stored procedures. So running a PL profiler.
31:02
And a few of the important parameters that you always need to tune for your Postgres SQL, right? Shared buffers. That is something like database buffer cache for your Oracle. So when I say database buffer cache for Oracle, that is the database's memory
31:20
which has got, which caches all the, you know, it uses LRU algorithm and whatever are the blocks frequently used, you know, it caches them, which can be reusable. However, as Postgres SQL is not direct IO, we have been always looking at some magic numbers like eight GB set to shared buffers
31:40
or documented 25% of RAM for shared buffers, right? But it may not be always true. For one of the POCs, I mean, especially this one too, we have observed that as we got more RAM which can actually take all the active data set, setting a 75% of shared buffers
32:02
has actually got more TPS than setting a lesser shared buffer. So it's not always true that less shared buffers would actually help us, but it always depends on the active data set as well as if your database, entire database can sit in memory, then there's no point setting,
32:21
you know, eight GB or two GB of shared buffers or 25% of RAM for shared buffers. So you need to ensure that you run a proper benchmarking before setting shared buffers. Without doing that, let's not just go by the documented formula. Work mem, look at the work mem carefully
32:42
because I've seen environments where they set like 512 MB of work mem, 200 MB of work mem. It is a sort memory. So every sort can actually take so much of memory from RAM and your pro-stress server could go out of memory. So you may be running a database
33:02
that could just require eight MB or 16 MB of work mem. Just see pgstat database and see if you have a lot more temp files generated or look at the explain analyze report and see if more temp files are being generated after sort. And if yes, you may need to tune your work mem,
33:21
but if no, look at the work mem that could actually help you. Auto vacuum always needs to be on. Ensure that you do not turn off your auto vacuum in Postgres when you migrate. I mean, we see many discussions that we have to go ahead and perform vacuuming on, like manually, and it's not something
33:42
that we could directly tune in the Postgres server, but yes, it still needs to be on and you need to always tune table level vacuum settings like auto vacuum, vacuum scale factor, vacuum threshold, or you know, analyze scale factor, analyze threshold, but always have some kind of manual activities,
34:03
I mean, manual vacuuming in place, which could be like a weekly job, I mean, nightly job or a weekend job that could go ahead and run a vacuum on the entire databases or a set of selective tables. Maintenance work mem, if you're loading a lot of data,
34:22
which could be even creating indexes, maintenance work mem is something that could actually help you increase the performance, especially when you're creating indexes or performing a vacuum, having more maintenance work mem could help you do that, and I've seen database environments where like the default random page cost is set to four,
34:44
and the sequence page cost is set to one, but recently in one of the benchmarkings, I've seen that when I was using an SSD, and when I know that I've created efficient indexes, random page cost of one was much better,
35:00
because what it means is fetching an index page is more costlier than fetching a sequential page from a disk, so there could be a chance that Postgres could go or choose sequential page scan than using an index, so if you are, I mean, nowadays everybody uses SSDs, so check if you need to tune your random page cost,
35:24
and never try to turn off full page writes and fsync unless you know why you're doing that, because you are actually compromising on the asset properties of Postgres, right, so that could lead to corruption or, you know, I mean, if Postgres needs to perform crash recovery,
35:45
it could fail doing that, so stay careful. If you know that full page writes is actually improving performance, so you can disable it, that's not always true. You need to have good file system that could support you doing that, like a cow file system with journaling in place,
36:02
and again, it is not for sure that it could help you, so it is by default on, and if you turn it to off, you may see very good TPS, but not always recommended, right? Wall compression, and I mean, PostgreSQL writes all its walls to pgxlog directory,
36:24
which is by default in the data directory, which is like redo log files for Oracle, right? All those transaction logs like get generated into the pgxlog until 9.6 or pgwall from Postgres 10, but I've seen in OLTP environments
36:42
where if it's actually a write-intensive machine, it could generate a lot of walls, and that actually takes a lot of IOPS on the machine, and I did see great improvements when we enabled wall compression. That means you are actually writing less to the disk and not using up all the IOPS,
37:00
so you can go ahead and enable wall compression if you would need to, so that is something that writes lesser walls. For example, in the recent benchmarkings that I was doing, I could see that without wall compression, I could see 4,000 walls being generated, 4,000 GB of walls, and with wall compression, I could see like 1,900 walls, right?
37:23
So that's less writes to the file system, and at the same time, always try to distribute your walls and the logs to multiple different file systems. Do not try to write your data, walls, as well as everything else to the same file system, right?
37:44
So log-min duration statement, everything, we all discussed that, and finally the checkpoint, right? Frequent checkpointing could actually be a concern too, like you are writing a lot to the disk, so try to tune your checkpoint settings by tuning max wall size,
38:00
so the wall size, the maximum amount of wall that has been generated, after which a checkpoint should be forced, right? So we can set that to several tens of gigs, even depending on the amount of transactions happening, and at the same time, checkpoint timeout is something that forces the checkpoint
38:22
after that time, and checkpoint target timeline, I've seen environments which they, you know, they set it to 0.9 as checkpoint target timeline. What it means? It means if checkpoint timeout is set to one hour, it means, and when checkpoint target timeline is set to 0.9,
38:43
Postgres has 90% of time before the next checkpoint to perform the writes, right? Like 90% in the sense, like 54 minutes more in order to, you know, scatter the writes after the checkpoint,
39:01
so that is kind of something that, I mean, you are not strictly writing or flushing the disk with all the writes immediately, which could get you into performance issues, so it's distributing the writes, and hot standby feedback, right? Some of, I mean, some of the times we have seen customers complaining,
39:22
like any reporting queries I'm running on Postgres slave are getting terminated because of some vacuum running or a DDL running on the master, because master does not know that your slave is actually needing them, so in order to avoid that, turn on hot standby feedback and even explore max standby streaming delay
39:42
and archive delay, that could actually help your slaves do not kill the long running transactions if there is any change happening on the master, so all these are very important, and now, question will be like, is everything possible on Oracle available with Postgres?
40:02
Yes, I would say it is like, it may not be readily available, but we can create it for sure, so when I say we can create it, for example, I'm not sure if everybody would have heard about edition-based redefinitioning in Oracle, right? There was one such feature called EBR in Oracle,
40:22
which helps you, for example, you have 10 application servers, and you want to deploy some code changes into the application server, and now, that code changes could have also got relevant database changes. Now, to avoid the entire downtime for all the applications,
40:41
you may want to first run your code changes on the first set of five servers and restart them, and when you do that, you need to also run some associated database changes, but the other five application servers, which do not have the code changes done yet, should be able to see the past image of the database.
41:01
That means it is versioning in the database, just like GitHub, so I should be able to see one branch, and another app can see another branch, so that is something called edition-based redefinitioning that's available with Oracle. So when we were working on another project, like I have Jim here,
41:21
who actually suggested at that point of time while I was working with him, this way we can achieve it, so what we have done, we have used event triggers, and we have used search path, and got all that magic done, so what we did is, on an event of an alter table, right,
41:42
like an add column, drop column, or create index, create sequence, drop index, we have found all such DDL statements that could really trigger some change to the database, and if that event happens, we have created views using which,
42:01
so what we have done was, I mean that could be a very huge topic, but for every schema that we have, we have created some kind of edition schema, because in Oracle it's called editioning, right, edition schema, that is a parent schema that has got views created on top of the exact schema,
42:22
and also for every branch or every version you create, for every change or a DDL that you implement, for that table or the object for which you have implemented a table, I mean a change, it will have another view created in the new child schema,
42:40
and your database, when it starts to connect, it needs to run a function in order to know which view it needs to connect to, so it's based on all views, event triggers, and search path, so it's just to tell that, you know, everything that you want is possible, so we were able to implement versioning,
43:01
even versioning in Postgres SQL, right, history, can I look at what's happening in Postgres, right, I have pgstat statements for sure, which actually is a counter that, you know that's an extension when you create it, you can view or query the pgstat statements and you can see what are all the SQLs that I've run so far,
43:23
how many times have they run, what are the total number of executions, what is the total amount of time, minimum, average, shared, blocks, hit, local buffers, all the details are there, so you can snapshot that and have snap IDs created for every snapshot you collect
43:42
and try to, you know, differentiate between each snap, so that way you actually know more about the history, so flashback, I already discussed about the delayed standby and point-in-time recovery and other few extensions, pgbuffer cache to see what's inside the Postgres SQL shared buffers,
44:01
what tables are actively cached and used, what amount of table is already cached, pgprebomb to warm up your cache in Postgres, right, plprofiler, important tool to tune your stored procedures pldebugger, pgpartment for partitioning, you know, until 9.6,
44:21
and again, partitioning since 10, Postgres 10, declarative partitioning and even Postgres 11, not yet released, but you could see once it's released that it also supports hash partitioning, so a lot of advanced features have been already
44:41
being developed, but it's just that everything that we think we can make it possible, it's just that we need to have a requirement and we need to think about, you know, the solution that we could put in to achieve it, and finally, the backups that we proposed, pgbase backup, you can also use backup solutions
45:02
like pgbackrest or pgbarman, right, but you have RMAN like, I mean, it may not be parallel, but you have something like RMAN by default, like pgbase backup, and continuous archiving and pgbase backup could actually be a good backup strategy for you,
45:20
and also have weekly pgdump and pgdump all in place to ensure that you are able to select or query your database, and ensure that your database is not in a corrupted stage, right, so if your pgdump is successful, it means that your database is not corrupted most of the times,
45:40
and something like dbms redefinitioning in Oracle, which rebuilds a table online, you can use pgrepack, that could actually help you rebuild a table online with some limitations, of course, and routine manual vacuuming, partition maintenance, re-indexing, and archiving purging jobs. This is something that could actually,
46:02
these are a few of the maintenance operations that we've observed, you know, that could help you keep your PostgreSQL healthy, so what we need to understand by all these exercises, I may not have actually shown you how to use a specific tool, or et cetera, but it's just that, you know, you have a few extensions that actually help you achieve
46:23
similar features as Oracle and Postgres, and also, in order to go ahead and keep your PostgreSQL up and running, or tune your PostgreSQL, you do have a few extensions that allow you to do that, you have similar backup solutions as Oracle for Postgres, right? So, even warming up the cache,
46:41
or talking about tuning your SQLs, or profiling your functions, or, you know, anything, you always have all the tools around it that are still in place, and even to migrate your business logic, you have tools. So, yep, just to help you understand that if you're new to Postgres,
47:00
and if you want to get away from Oracle, it's not that it's complex, it's just that we need to think and start doing it. That's everything from my side.
47:20
Any questions? Very extensive migration that you've done. So, is it fair to say that the main motivation was the saving and licensing costs? Because there's not a performance issue that you have to deal with. Like, if I'm not mistaken,
47:41
because there's 800 GB, and there are very few writes, so the majority of the workload would be in reporting, I would assume, or maybe even more. Actually, they were taking 400 TPS. It's huge writes, as well. However, their problem statement, as I discussed, was after a few acquisitions, they started adding new vendors,
48:01
and now, they are actually increasing their TPS more. They start hitting more orders. Now, they want to scale up. So, they wanted to see, I mean, when we have analyzed the existing Oracle nodes, right, they were already hitting their 90% of load averages, talking about, you know, starting from CPU or everything.
48:22
So, it's a must-have. They need to upgrade their existing architecture, existing servers, physically, right? So, now, they were kind of, should we vertically scale it up, or should we add more nodes, and scatter some kind of transactions to more nodes? But, when we start writing to two new,
48:42
I mean, one schema from two or three nodes, we had some contention issues in the past, so we don't know what to do. And, at this point of time, do you think we can also think about migrations to Postgres, and try to tune Postgres to take this kind of load? So, it was not just the license cost. It was also scale up.
49:02
So, a lot of things involved. So, at this point of time, they wanted to see if we can give a try to Postgres, and achieve the same features with Postgres. We know we can achieve this with Oracle. We can improve the hardware. We can add more nodes, or do whatever. But, can we migrate to Postgres, and reduce the cost? At the same time, achieve the same features, you know?
49:21
So, that's the main reason why this was in progress, at that point of time. Just a follow-up question. Did you do the migration in parallel, which means existing hardware, and the US was there, and then you had a? Oh, yes, yes. We got new servers, and switch, and then decommission the old servers, yes.
49:44
So, the hardware and the US were the same? There are changes, actually. There are changes. I mean, not much change. It belongs to the same family, like Redat, OEL, CentOS, so we use CentOS.
50:10
Kind of manufacturing, and e-commerce, especially e-commerce, yeah. Yeah. Yes.
50:28
Yes. So, to enable flashback-like features, they were interested in also getting a delayed standby, right? So, when I say delayed standby, it could be a standby that could be delayed
50:42
by four hours of, you know, while applying. So, it is like, if you look at the master, and the first slave, right? The data that it has got may not be in the third slave, right, because it's a delayed standby, but that actually helps faster point-in-time recovery,
51:01
or a flashback mechanism, like Oracle. Yes. Oh, the switchover, as I mentioned, it was like six hours, I think, yeah. Yes.
51:24
Yes, exactly. Yeah, but the cutoff, usually, every time, when it's a huge migration, right? It is a huge cutoff, for sure, but there are, I mean, like, in the past, my friends, and I have experience with even,
51:40
like, you know, Jim, and there are too many people who have used, like, you know, external, even open source, or the other tools, like, you know, even the Pentaho, or other ETL tools that actually help them create jobs that could migrate the data while the Oracle database is running,
52:01
and be in a sync, and even you can use Oracle's Golden Gate, too, right? That can be a replication kind of thing. So, it depends on the, you know, client. If they can achieve, I mean, if they are okay with the downtime, we could make it much planned. If they are not okay with the downtime, yes. There are solutions, like, people have used Golden Gate,
52:22
people have used Pentaho, and other ETL tools, you know, that could help you create some jobs to replicate data. Yeah, sure.
52:43
Yes.
53:08
Yes. Yeah, I mean, I know, like DBMS job, or DBMS scheduler, I am only aware of PG agent,
53:21
but I may need to explore more on that, yeah. Yeah, there are a few extensions that could actually help you doing that.
53:40
Yeah, yeah, yeah, yeah. I hope no questions anymore, okay?
54:03
Thank you all.