PostgreSQL Distributed & Secure Database Ecosystem Building
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57075 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2022259 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
Process (computing)Kernel (computing)DatabaseEncryptionRelational databaseWordOpen sourceArchitectureCustomer relationship managementCodeLine (geometry)Level (video gaming)SoftwareTwitterPoint cloudDatabaseAuthenticationProjective planeProxy serverCartesian coordinate systemMereologyCASE <Informatik>Computer programmingInformationQuery languageJava appletSoftware frameworkClient (computing)Shared memoryPhysical systemInterface (computing)RootHand fanFunctional (mathematics)High availabilityMobile appCustomer relationship managementElasticity (physics)Database transactionAcoustic shadowSoftware developerSet (mathematics)TwitterGroup actionConfiguration spaceRule of inferenceDefault (computer science)Expected valuePoint (geometry)Information securityServer (computing)Gene clusterInstance (computer science)Product (business)2 (number)CryptographySkewnessRegulator geneOpen sourceState observerDiagramComputer animation
08:37
AuthenticationEncryptionParsingProxy serverHeegaard splittingKernel (computing)Message passingLimit (category theory)LengthServer (computing)Client (computing)Data typeDatabaseStandard deviationRule of inferenceSystem administratorUniqueness quantificationPlanningQuery languageArithmetic meanTable (information)Java appletDifferent (Kate Ryan album)Proxy serverSkewnessDistribution (mathematics)Gene clusterCartesian coordinate systemVideoconferencingEncryptionMereologySphereServer (computing)Customer relationship managementInstance (computer science)Configuration spaceProcess (computing)Formal languageMathematicsClient (computing)Order (biology)DatabaseMultiplication signInteractive televisionPhysical systemCommon Language InfrastructureVirtual machineProjective planeService (economics)Point (geometry)Kernel (computing)Message passingDemo (music)CryptographyComputer animation
17:08
Proxy serverStandard deviationUniqueness quantificationSystem administratorPlanningRule of inferenceDatabaseDemo (music)PasswordExpressionTable (information)BenutzerhandbuchPlastikkarteContent (media)EncryptionProcess (computing)Table (information)Standard deviationProjective planeProxy serverView (database)Instance (computer science)Server (computing)DatabaseLogicAlgorithmKey (cryptography)CASE <Informatik>Selectivity (electronic)outputMereologyInformation securityInformationUniqueness quantificationMobile WebProduct (business)Computer animation
25:38
Element (mathematics)Multiplication signMechanism designDatabase transactionComputer animation
26:36
Multiplication signDatabase transactionPlanningPhysical systemAsynchronous Transfer ModeACIDProjective planeMeeting/Interview
28:03
Multiplication signData recoveryMassMeeting/Interview
28:56
Computer animation
29:32
Computer animation
Transcript: English(auto-generated)
00:05
Hello everyone, I'm Trista, the CFIES co-founder and CTO. All right, let's get started. So today I will give a talk around the following items. Most of you come to this track, you're interested in PostgreSQL.
00:22
You have more needs about PostgreSQL. Maybe you're a big fan of them. Then at this background, how do we make the PostgreSQL cluster become a distributed database system? That's the one issue. Another one is about cluster management.
00:41
As you know, you have a lot of PostgreSQL instance databases, clusters, but we cannot watch them every minute, every second. We want to have a system to help us to manage the database clusters. And yeah, that's another need, right? The next one is about the same observability,
01:03
because as we said, cluster management absolutely include the observability. Apart from that, we also have another needs or requirements about management, like skew out, skew up, or elastic skew,
01:21
or like make it become more robust, like your high availability. Something about that. And also the last one is about the data encryption, because as you know, there are more and more regulations or rules about user's privacy. So how to protect users more robust or more seriously.
01:46
That's another issue I will address today. Yeah, so because we have more requirements, we have more expectations based on PostgreSQL. So the next topic I will introduce
02:01
that will become interested in you, because I know you want to solve these issues, right? So Apache Sharing is a fear ecosystem. Why you want to use it? Before I answer the question, I just want to give you a overview about this project. First, a very important point is that it's open source.
02:22
It's the Apache Top Lab project. I guess most of the developers will know Apache Foundation, right? They have a lot of wonderful projects, like Hadoop, Kafka, right? They're very useful for our developers, for our DBA, and Apache Sharing Sophia,
02:41
it's one of Apache Foundation Top Lab project. And it's very active. I mean, it has a very diverse community. So the next part is about global development, because it's Apache Foundation project. So I know, I guess most of the users or come from the global overseas or everywhere,
03:05
as in that's our Twitter account. So if you're interested in this project or you know the recent activities from this community, I guess you can follow its Twitter account. The next one is about Cloud Native Computing Foundation,
03:20
CNCF, that's very, I mean, famous in called the native industry. And also Apache Sharing Sophia, it's one of the project of this CNCF landscape. That's the basic information about this community, about this project. The next part about its value or feature.
03:41
This project or this program located in during the application, among application or any users and the databases. So every queries will go through Sharing Sophia Proxy. And actually Sharing Sophia Proxy,
04:02
this ecosystem have two projects or clients for user to pick up. First is Sharing Sophia GDBC. Sharing Sophia GDBC, it's a lightweight Java framework or like because it implements notes of the GDBC interface.
04:21
So if your program is developed by Java, I guess that will be the best one for you to solve some like data sharing, data encryption, authentication, talent, such issues. And the next one is Sharing Sophia Proxy. Sharing Sophia Proxy, it's like, you can regard it as like PostgreSQL distributed server
04:46
or MySQL distributed server. I mean, you need to independently deploy this, I mean, projects. And then when you visit your like PostgreSQL,
05:01
you need to, inside of a visit in your PostgreSQL and send your requirements or, you know, requirements, requires into Sharing Sophia, this server. And then this server, this proxy will rewrite your SQL or to find which the PostgreSQL is the target
05:24
for this query, yeah. Another one is to do, is the sidecar. It's for the database called native. I guess we will have updates recently. So give us a watch or look. The next one is about a storage layer.
05:42
I mean, apart from PostgreSQL, it also support another databases because I want to say that, what's the definition of Apache Sharing Sophia? It's a distributed system, a database system above the traditional DBMS,
06:02
providing wonderful features, more than sharing. Yeah, that's the definition. So it will include more and more DBMS. PostgreSQL is one of them. And also about MySQL, that's two projects or that's for databases and will be supported by Sharing Sophia proxy.
06:23
But for JDBC, Sharing Sophia JDBC, this client, they have more because the root rhythm that this project just a lightweight JDBC interface or framework. So no matter what the database is, your app is always face the JDBC interface, right?
06:43
Or Sharing Sophia JDBC interface, yeah. So Sharing Sophia JDBC provides MySQL, PostgreSQL, SQL Server, Oracle, yeah. Okay, so the next part about the function layer, I mean, no matter you use a sharding JDBC or sharding proxy,
07:03
both share the same functions, like sharding, re-res-fleeting, elastic-skill-out, distributed transaction and data encryption, shadow database and distributed authentication, yeah. All right, and next is about a solution layer
07:22
because as you say, that this ecosystem, this project have a lot of, have two clients, have many features, support many databases. Then for the beginner, for the newcomers, it's a little complicated. So we just provide some like template configuration
07:45
or default settings for our beginners. If you want to have the PostgreSQL distributed database solution, then you can choose this feature groups
08:01
or this product or this template configuration. If you just focus on the data security, you can just pick like authentication and data encryption features and with or sharing Sophia proxy, you will make your PostgreSQL become a secure database with data encryption
08:25
and distributed authentication feature. Yeah, that's simple, right? So solution layer, just about the frequent, common user cases from our community. Actually, if you really dive into this ecosystem
08:41
that you can find that you can group different features, different story, just different clients and make it become a user-defined yourself solution. It's very cool, right? Actually, you can use data sharing and data encryption together and just to pick up the like MySQL or PostgreSQL
09:04
and choose like sharing Sophia proxy. You can make your database cluster become a secure distributed database system. What's the key point of this project? Actually, you can see it have a lot of project.
09:22
It can have a lot of features, but the basic kernel precise is simple, isn't concise because all of the queries will go through proxy and the proxy will know each queries. What's the meaning of each queries
09:41
because Prost can pass SQL, optimize your SQL. Yeah, so it knows what the meaning of people's requires and then it also know the database instance distributed. I mean, the local of each database. Therefore, it can rewrite your SQL
10:01
and reroute this SQL to a targeted database. It can help a user manage your database large clusters. Yeah, for the user, you just send your query. You just send your queries to proxy. That's all. You make the proxy as a database server.
10:26
It can skew out, skew up, skew down, and it can help you to manage your database clusters. But for the user, there is only one entrance. That is proxy. Yeah, so the sharding sphere proxy have the parser,
10:43
optimizer, router, rewriter, manager. It will merge the part of readouts and make it become the final readout and return the readout to our end users. That's all. So based on this kernel process, the sharding sphere could have, could include more and more like a feature plug-in,
11:03
SQL audit, sharding, high availability, rewrite splitting, authentication, encryption. No matter the feature, the proxy will do the same kernel process for each of them, basically, yeah.
11:20
But you can have the different configuration to use the different configuration to tell the proxy that your expectation, like use the SQL audit configuration and tell the proxy that you just want to make or this proxy or this distributed system
11:41
just want to do SQL audit action, or just want to have sharding action. Yeah, I see there is a very simple CLI querying, but actually there are many, many times the interaction between clients and your servers. And for a proxy, the proxy needs to cut down
12:04
each of the interaction and do some changes and then repackage the response and send to clients or to the server. Yeah, so when you use the where shot, you can see a lot of like the message in each of the packages.
12:20
That's an example. Next part, I will introduce two very common solution for you to use Apache Sharding Sophia. First one, it's about how to make your PostgreSQL become a distributed PostgreSQL database solution. No matter you use Sharding Sophia proxy
12:41
or Sharding Sophia GDBC, the basic idea or concept is the same. For the applications, imagine you only have a table named T order. Yeah, so you can write your SQLs with T order, like a select all of the columns from T order,
13:01
this table and no where condition, right? But for the sub tables or for the shards, actually in this case, there are two shards. I mean, two sub tables like T order zero, T order one from demo DS zero.
13:23
Same tables from demo DS one. Yeah, that's the table distribution in this distributed system and also all of the PostgreSQL tables, yeah. So, but you can see if you use Sharding Sophia proxy,
13:41
you need to independently deploy this project in a server or a machine and your application will first visit this machine or this service and then the Sharding Sophia will forward your queries. I mean, the queries need to do some modification
14:00
in Sharding Sophia proxy, within Sharding Sophia proxy and resend this modified query to the target one, maybe demo DS zero, maybe demo DS one or both of them, yeah. But if you use a Sharding Sophia GDBC, you can see you don't need to independently deploy
14:23
this project. It's like a framework, right? So you just need to make your like the video methods of the database using Apache Sharding Sophia GDBC,
14:41
this method to help your application to video database. Therefore, in the local, in your application local, you can just send the requires from the local, from Sharding Sophia GDBC and it will do some modification in each of the queries and forward these modified queries
15:03
to different PostgreSQL tables or instance, yeah. So you can see if your application, it's developed by Java and you want to have a higher TPS or QPS, Sharding Sophia GDBC is recommended.
15:21
But for Go language or Rust, I guess, or just for DBA, you have no experience about Java. Yeah, Sharding Sophia proxy will be better. Okay, so suppose you have deployed Sharding Sophia proxy
15:42
or use Sharding Sophia GDBC, you have your original PostgreSQL instance. The next one is about how to create a Sharding table. Yeah, so that will become interesting because we create a SQL-like language for you to create a distributed database
16:04
or distributed table in Apache Sharding Sophia. So the first question is about how to understand the distributed SQL. Here, I gave its definition. It's a built-in SQL language unique to Sharding Sophia
16:22
because as you know, your application or DBA, what's the first language comes to your mind when you want to interact with your database? Definitely, it's a SQL, no matter if it's a MySQL instance or a PostgreSQL instance, SQL, right? That's like the common for us, common practice.
16:43
But for Apache Sharding Sophia, also you can regard it as a distributed database like a solver, but at this moment, you want to interact with a solver and to make it or light it,
17:01
create some like a Sharding table or encrypted table. This feature, it's out of the like traditional DBMS, but with SQL, how to operate this wonderful feature, then it distributes the SQL stage.
17:21
It's their capability. Yeah, let me give you an example here. Like you just logging your Sharding Sophia proxy here, and then you create a database, demoDS, but I have to say demoDS is not just a standard database.
17:41
It has become the distributed database through Sharding Sophia proxy. And then you run the first distributed SQL of Sharding Sophia. It's at resource demoDS zero. How to understand this distributed SQL?
18:00
Because from your PostgreSQL prospect view, you think, it's something new for me. I haven't encountered this SQL before with my PostgreSQL. So, but it looks like a SQL. That's the distributed SQL. That's unique for Apache Sharding Sophia.
18:22
You just to run at a resource, make your proxy sense your first PostgreSQL instance. The next one, add a resource demoDS one. Yeah, then we'll make the proxy know that there is a demoDS one PostgreSQL instance.
18:45
Okay, so next one, you need to create a sharding table view. I mean, sharding table algorithm. What do you want? How do you shard a table? You need to split this logic table to make it become a four sub table or only two table.
19:04
It depends, it's up to you. But you need to tell proxy. Here, another distributed SQL is that you need to create sharding table view. Here, it define that this torder table will be located in demoDS one
19:22
and demoDS zero. And it will become four tables, four sub tables stored in two PostgreSQL instance. And the sharding algorithm, it's, oh, where?
19:43
Oh, sorry, here. It's like the sharding key is a user ID and the sharding algorithm is mod. User ID mod two. And by this sharding algorithm, you defined how to distribute your table torder.
20:05
Yeah, next one, you just ran the standards, the common create table SQL here. And then with this predefined table view, sharding algorithm,
20:21
this table will become actual four sub tables distributed in two PostgreSQL instance. Yeah, so next one, you can just run the standard SQL, like insert, select. And also here, another wonderful feature
20:42
and a wonderful distributed SQL. Preview your SQL. It's like the plain SQL in MySQL, because you want to know that if I run the SQL, this SQL will be executed in which part? Which PostgreSQL instance?
21:00
So when you tap preview this one, this SQL, and the readout site, oh, when you run this logical SQL, sharding SQL proxy will rewrite this logical SQL into the following actual SQLs. And which one will be run in which part, which instance?
21:22
It will show all of the information about this SQL here. Yeah, very useful, right? And that's why it's about when you log in the actual PostgreSQL, you will see the data distributed case here, like in the table, the first table,
21:43
the second one, third one, and the fourth one, yeah. So the next part is about secure post database solution. The same, I mean, there is the same products for you to choose, sharding SQL proxy
22:00
or sharding SQL JDBC. But here, the table name is being changed. Here, it's a table named t-detail, because at this case, you want to make your mobile phone of t-order be encrypted and decrypted automatically by sharding SQL proxy.
22:21
So here, you imagine there are sharding SQL proxy server or their database server. It has only one table, t-order. This table only has one column, mobile, right? But for the databases, actually, that's your actual database, actual PostgreSQL instance, there are t-detail table,
22:42
but it has another column named mobile cipher. Therefore, for users, because they visit the sharding SQL proxy, this server, they think this server contains all of the plain text of t-detail mobile column data.
23:03
But actually, when you visit the actual PostgreSQL instance, you found it's not the plain text. It become the cipher text of this column of t-detail. Yeah, so sharding SQL proxy or sharding SQL JDBC,
23:21
they will automatically encrypt user's plain text input, input it, and then decrypt the data from the database and return the decrapped data to our end users. So for users, they don't know,
23:41
they aren't aware of this whole decryption or encryption process. That's the magic of the sharding SQL proxy or JDBC. Same here, you need to use distribute the SQL to define your database, your table,
24:03
your encrypted view, you're just logging the sharding SQL proxy and light your sharding SQL proxy. Beware of this PostgreSQL instance, and then just create an encrypted table view
24:22
for this table t-detail. And the column, I mean here, mobile, it will use AES, encrypted algorithm, to encrypt your data and decrypt your data. Yeah, so the next one, you create this table
24:42
and then you'll insert. When you get the readout from sharding SQL proxy, you will see all of the plain text of your mobile phone. But if you log in your actual PostgreSQL, you will find all of the mobile phone
25:00
has been encrypted here. So even though some bad person logging your PostgreSQL instance, they can get nothing from your mobile phone. Yeah, that's all, yeah. So that's a very common cases from sharding SQL proxy.
25:20
Actually sharding SQL proxy can do more than secure solution or sharding solution. If you want to learn more, you can just give a look about this Twitter, about this project on the GitHub. All right, so thanks for your listening. If you have any questions, just contact me.
25:40
Yeah, thanks for your time.
26:03
All right, thank you very much for your amazing talk. Very, very interesting. So we are ready for going for some questions. We already have something. The first one is one I asked by myself.
26:21
So the first question is, how it manages the system, if it manages a global transaction within the shards, if there's any global transaction mechanism. Okay, all right. So it's my time to answer my question, right? Yes. Okay, so I think it's a very great question
26:43
because when we think about the digital system, the transaction global transaction in this digital system is very important. And about Apache Sharding Sophia, currently we support XA ACID transaction. Yeah, so if you use Apache Sharding Sophia project,
27:02
and if you want to use global transaction, I guess you can open the siting and let the Sharding Sophia know that you want to use the XA transaction mode. But I also think it's not very perfect in this digital system. So now we have open another issue
27:21
or another plan to create like the MBCC very new, I mean, a very new plan projects to make Sharding Sophia has the perfect digital system. Yeah, that's my answer. But I also think that XA transaction, it's great,
27:44
but it's not just for to handle with the back of the sharding system. So if we really want to support it a while, we also need another solution that is an outside MBCC plan next year.
28:00
Yeah, this year, 2022. Okay, understood. Thank you very much for your answer. I think we have time for just one question. After the, then we will move into a chat only. So if anybody wants to ask, so if one of the Sharding nodes is lost, can you disaster recovery by just restoring the node from the backup?
28:22
All right, so currently Sharding Sophia, it's created above the traditional DBMS. So we don't want to handle with the HA of your databases, not about your mass QL or PostgreSQL,
28:40
but when you found that one like a replica is crushed down and you can't use, distribute the SQL to tell Apache Sharding Sophia, then Apache Sharding Sophia will found, oh, there it's one.