How to solve data minimization in your SQL database
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 62 | |
Autor | ||
Lizenz | CC-Namensnennung 4.0 International: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/59714 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
FrOSCon 202247 / 62
4
9
12
13
17
18
20
25
28
33
34
39
40
42
44
45
48
54
55
58
60
62
00:00
HypermediaOpen SourceFreewareGesetz <Physik>Güte der AnpassungSystemverwaltungServerStreaming <Kommunikationstechnik>VersionsverwaltungSchlussregelLokales MinimumDatenbankAuflösung <Mathematik>BitVorlesung/Konferenz
01:56
EDV-BeratungRechenschieberMereologieDistributionenraumOpen SourceFreewareComputerHypermediaArithmetisches MittelGruppenoperationGraphfärbungTouchscreenStandardabweichungMereologieEDV-BeratungAutorisierungRechenschieberData MiningComputervirusVideokonferenzNeuroinformatikHypermediaDatenmissbrauchInformatikDatenverwaltungBitInformationsspeicherungSoftwareentwicklerKonstruktor <Informatik>Vorlesung/Konferenz
06:37
FreewareOpen SourceSoftwareAtomarität <Informatik>Gesetz <Physik>Exogene VariableSchnittmengeTabelleMereologieLokales MinimumRelationale DatenbankDatenmissbrauchSoftwareGruppenoperationDatenbankRückkopplungSchaltnetzProzess <Informatik>InformationZellularer AutomatURLSoftwareentwicklerTeilbarkeitComputerspielIdentifizierbarkeitSystemidentifikationVorlesung/KonferenzComputeranimation
14:29
Open SourceFreewareLokales MinimumRelationale DatenbankWort <Informatik>Rechter WinkelAutorisierungLokales MinimumDatenbankEntscheidungstheorieTranslation <Mathematik>Computeranimation
17:33
Open SourceFreewareSchlussregelInformationsspeicherungInverser LimesComputeranimation
18:56
Vollständiger VerbandARM <Computerarchitektur>SpezialrechnerE-MailOpen SourceFreewareGammafunktionInverser LimesProzess <Informatik>PlastikkarteDruckverlaufCoxeter-GruppeSchlussregelPackprogrammInformationsspeicherungAusnahmebehandlungGesetz <Physik>AutorisierungVorlesung/Konferenz
25:41
MakrobefehlFreewareOpen SourceGammafunktionInformationsspeicherungFormale GrammatikAusnahmebehandlungFacebookInformationLokales MinimumBildschirmmaskeInformationsspeicherungSystemidentifikationFrequenzInverser LimesComputeranimation
28:46
FreewareOpen SourceLokales MinimumInformationComputeranimation
29:47
InformationsspeicherungInformationVorlesung/Konferenz
31:00
Prozess <Informatik>EinflussgrößeIntegralComputersicherheitAutorisierungComputeranimation
31:53
Open SourceFreewareDatenbankSoftwareentwicklerDatenbankDesign by ContractNotebook-ComputerRechenschieberGesetz <Physik>Exogene Variable
33:37
FreewareOpen SourceMachsches PrinzipMereologieURLKartesische KoordinatenGesetz <Physik>GraphiktablettProgrammierungLokales MinimumRelationale DatenbankDatenbankDifferenteDatenstrukturKonstruktor <Informatik>BitCodeTabelleProzess <Informatik>DatenfeldGrenzschichtablösungQuellcodePackprogrammSchnittmengeOrdnung <Mathematik>
42:59
BitSoftwareIdentifizierbarkeitProdukt <Mathematik>AusnahmebehandlungReelle ZahlDatenbankGesetz <Physik>Metropolitan area networkAdressraumWort <Informatik>ComputersicherheitProzess <Informatik>Relationale DatenbankSoftwareentwicklerInformationsspeicherungSchnittmengeTabelleTypentheorieMultiplikationsoperatorAutorisierungInformationVorlesung/Konferenz
52:20
GammafunktionICC-GruppeEDV-BeratungDatenbankSicherungskopieInformationDifferenteFensterfunktionPackprogrammKanalkapazitätVersionsverwaltungPartitionsfunktionAdressraumTabelleOffene MengeMultiplikationsoperatorLokales MinimumParametersystemGesetz <Physik>DatenmissbrauchInverser LimesVorlesung/Konferenz
01:01:42
EDV-BeratungComputeranimationJSONXMLUML
Transkript: Englisch(automatisch erzeugt)
00:06
Welcome to my Sunday afternoon talk. After three years, finally in present again. And my first talk today is how to serve data minimization in your SQL database.
00:21
I have to do this talk in English because it's announced in English. Even if here in the room, we all might be able to speak German, there might be people on stream that can't speak German. And so it doesn't matter if we all speak German in this room or not. I have to do it in English because it's announced in English.
00:42
And or better, the first one submitted the English version of this talk, not the German version. And I think this has a reason. How to solve data minimization in your SQL database. I have some questions first because this talk is announced in law session.
01:04
But it's very technical. Do you know SQL? Who knows SQL? OK. Good. So what we will do here is we
01:20
are looking a little bit into the GDPR. And then we are collecting ideas how we can solve this in the SQL database. Who is developer? Who is administrator? OK. Is somebody here a developer, administrator
01:41
who is not developer? No? OK. Good. I'm just asking because I want to know who you are. So we start. I have some rules at the beginning. Hello. Yeah. So I'm not doing individual consulting in this talk.
02:04
I think that's pretty clear. My slides don't have any standalone meaning. I once had problems with a huge IT company for which I worked. I made a talk. I don't know if it was here or in Brussels. And after that, I got really, really trouble
02:20
because there was a sentence on the slides. And my company thought I would talk bad about this company. And I had really, really trouble with this. So my slides without the spoken word, it has no standalone meaning. That's very important. Yeah. You know, copyright.
02:42
And yeah, I'm Mira-C. I'm born as Susanne Ebrecht. I'm married as Susanne Huerta and Susanne Holzgrefer. So I don't know. I once started again with Susanne Huerta. It's easier. So Mira-C is my official nick.
03:02
Yeah, I have to do this with this hand-made microphone because there's no way to put the headset on me, on my clothes. So I have a little bit trouble with these two devices in my hand. We will start. Yeah, who am I? And now I hope that above the colours are good
03:24
because here on my screen they are bad again. The rock team was here and looked for the colours because the colours were a little bit crazy. There is a problem with the wires here in the switch. So yeah, I have a diploma in computer sciences
03:42
and computer and media sciences. I'm working for a senior consultant and trainer for ProSQL, MySQL, MariaDB and also for data, for privacy and data security. I'm an author and speaker.
04:00
I write lots of articles. You will find lots of articles of mine. You will find tons of videos in media.ccc.de. So I'm doing this already since beginning of the century. So you will find very, very, very much stuff from me. I am former member of ISO IEC JTC1 SC32.
04:23
Do you have a rough idea what this could be? That's the ISO standard, that's the SQL standard. No, it's the group for data management and data storage. But that's where the group who writes the SQL standard.
04:41
So I was lots of years. I was member of it. First, I represented USA. And then I represented Germany as member of DIN, the German Deutsche Institut für Normung. Yeah.
05:02
And I was there representing MySQL. That was my part of this group. Yeah. As I said, I'm former developer of PostgreSQL and MySQL. It's very funny. I started with PostgreSQL as a student, and then MySQL hired me.
05:22
So very funny. And when MySQL got Oracle, I concentrated to be just senior consultant trainer. Yeah, and I had to pass. And last century, I'm construction and audio engineer as well. And it's just me.
05:42
So we start with GDPR. And the first question is, what do you think are personal data? So GDPR always talks about personal data. What do you think? What is it?
06:00
Any ideas? Any data which can be connected to you? Yes, that's good. Any other ideas?
06:37
Either data in a one-one relation or that are,
06:41
what did you say? Conclusions in this. OK, I will just show you the law text for the original GDPR law text. And this says, personal data means any information relating to an identified or identifiable natural person
07:02
that's called data subject in the GDPR. An identifiable natural person is one who can be identified directly or indirectly, in particular by references to an identifier such as a name, an identification number, location data,
07:21
an online identifier, or two, one or more factors specific to the physiological, genetic, mental, economic, cultural, or social identifier of that natural person. That means if you have a set of data,
07:44
doesn't matter what's in it, as soon as you can identify by this set of data, by this set of data, a single person. Or our law said not a single person, a group of three persons. The minimum is a group of three persons. So if you have a set of data and you
08:03
can identify three persons in this group, then it's a personal data. If you can, by the fact, you can just say the people in cell one at 1530, lots of people,
08:22
so you can't. We have the cell and you have lots of people, so you can't identify five of them or anything. That's anonymous. You can't identify a single person. But our law says when the group has to be bigger than three persons,
08:40
then it's not a personal data anymore. And it's always the whole set. It's not just the name or something like this. It's the whole information you have about this. You have very often when you read feedback, for example.
09:05
So feedback where you can multiple choice, can mark something or stuff from where do you come, how old you are, and so on. Very often, this feedback isn't really anonymous because you have persons you can identify just
09:23
by the combination of answers they give. So that's a very, very interesting job for data for the privacy specialists because they have to analyze this if it's possible on these elections or on these questions if they are really anonymous
09:44
or if there are combinations which just relate to a group of three persons. So very interesting. That's not just the name or passport ID. It's really the set of data which makes it personal.
10:05
And for discussing what we do in our database, how can we design our tables and our relational databases that they are GDPR conform.
10:22
The talk just put this part minimalization. We should know what are the principles of GDPR. So the GDPR has some principles. And it starts with personal data shall be.
10:43
And then there were lots of. And I put what we are talking about first, adequate, relevant, and limited to what is necessary in relation to the purpose for which they are processed. That's data minimization.
11:04
So that means you should only put in your SQL database in your tables data. We are always talking about personal data, whatever else you put in it doesn't matter. That are adequate, relevant, and limited
11:21
to what is necessary in relation to the purpose. Very often as a software developer here you have the problem with your customers. They want more. Usually they want more. And that's a big problem because today you
11:44
could say a software developer. I don't mind. My customer wanted. He's responsible. And when they want more, I will do more. I am not responsible. My customer is responsible. So he will go in jail or will get this big, huge penalty.
12:02
But there are laws in the other way, the European Union and also the German government is discussing that they want this responsibility to the vendors
12:23
and to the developers because they say the developers should be responsible too. The software vendors, especially some big companies, should be responsible too.
12:41
Not just the small company, the small, I don't know, electrician or whatever else, or the small restaurant owner should be responsible. So the vendor and the software developers should be responsible too. That is a huge discussion.
13:02
And it looks like they will create some laws for that. So it's not just, OK, my customer want that I store the whole life and whatever they do in bed or whatever from the persons in my database. I don't mind because it's not my responsibility
13:21
of customers or whoever use this software. So it's recommended that you think about it because there might be the day where you are responsible. And yeah. So for each information you will store, you should consider,
13:44
is it adequate? Is it relevant? And do I have a purpose? And necessary whole purpose, that's the first important stuff. And the other is you need to delete it
14:05
if not needed anymore. So that's also data minimization also means you need to clean your tables and your database very often. And you have to look, is it really needed
14:21
or can I finally remove it? That was, as you see, that was GDPR article 5, paragraph 1, let's see, is minimization. And as I said before, it's personal data shall be.
14:44
And of course, those principles start with A and not with C. And for solving this minimization, we should look into it, into all the other principles too to be sure what we should do in this database. We don't, of course, we have
15:00
lots of other articles in the GDPR. But the most important are the principles for this talk here because we are talking about minimization and not if it's a free will that the user wants to store it or not, that I'm allowed to do it, and so on. So we are just thinking about how to solve minimization
15:22
and not, there is something mini, mini, there's a typo in it. I just see it anyway. Minimization, so. So the A, the literal A is lawfulness, fairness, and transparency. So personal data shall be processed lawfully, fairly,
15:45
and in a transparent manner in relation to the data subject. So data subject was the person who belongs to the data. And the German word for fairness,
16:01
it's very funny in the GDPR. Very funny in GDPR are the translations into German. Our authorities have a real problem with it because, especially with the word fairness, I think you all have an idea what fairness could be, should be, and so on.
16:23
In German, it's translated with treu and glaube. Nobody knows why our authority said it's fairness. So data should be fair. It should be fair what you are collecting and there are already court decisions what's fair.
16:45
Especially big companies who collected way more, they are allowed to. So lawfulness, fairness, and transparency is very, very important because when you collect the data, even a single name, or a birth date,
17:07
the person who belonged to this data has a right to know why, for what, and especially how long you will store it. And not from you, from the developer,
17:20
but from the responsible person or the responsible company, it needs to know, it has a right to know, hey, for what do you need to store my birthday? When I have something, you need to be a teen, an adult. Yeah, then I show my passport and you see I'm an adult.
17:41
Do I really need to store this information? Or is it enough? What will happen when I'm not an adult? You won't store me anyway. So you can consider, okay, all persons I stored are an adult. So you don't need to store this birth date. So you really have to think about each datum
18:00
which you store, do I really need it? Is it fair to store it? And it's transparency. And how to explain why I am stored. Because, believe me, your customer will try to let you do this transparency rules.
18:24
To say, hey, tell me why do you store it? Because the customer is too lazy to follow these rules. And they usually, very often they want that you write down this document why you store it. And how long you store it and how you will delete it. So you have to think about each datum you are storing,
18:44
which could be a personal data, about this transparency and then that means also how long it will be stored. Then we have this purpose limitation. Okay, it's very long and it's very with exceptions
19:00
for government and so on. So it's collected for specified explicit and ligatimated purposes and not further processed in a manner that is incompatible with those purposes. So for each datum you are collecting, you need a purpose.
19:20
As I said, with the age, you could also say, I want to have the age because I want to write a birthday card to you or I want to give you a birthday present each year. Lots of companies doing this. Or another for the age, I think there's a law in the banking area, credit card area,
19:42
there you have to store the age, the birthday date. Yeah, but this birthday card is very funny because that means the person, you only can store this with the aloneness of this person, there is no law that you are allowed to store just for birthday present.
20:04
So the person needs to agree with it. And the agreement always is a little bit tricky because it belongs to the free will and to be sure that it's a free will and not because of social pressure or whatever else is very, very difficult.
20:21
So this agreements are always very, very tricky and often most agreements won't be legal because the free will isn't given. So agreements are always tricky, very often very tricky. But now let's say you have this agreement.
20:44
What do you think? How long are you allowed to store it? I just took this birthday card present. Any ideas?
21:02
That's it, yeah? Ah, yeah. I think as long as you have got the purpose or if the person, until the person revokes his will. Yes, of course. When the person says you are not allowed to do it anymore, then of course you have immediately
21:22
don't do it anymore and delete it. Yes, no, but you can't store it forever. There is, yeah, you have an idea? Six years is the idea here.
21:40
Maybe it's Saxonia or Bavaria. Could be. I'm not on Germany, not sure if Schlesig-Holstein or Hamburg has one year. And then you need to ask again. And there was another authority says three years. So maximum three years. But I also said Saxonia was three years,
22:02
so it just was fun, yeah? Also, if the purpose continues, I give you an example. We are giving tickets, long-lasting tickets, upon a month to our customers and they keep it for years.
22:22
How much? Four years? And we have the purpose for many years. And we have a purpose to store the birth date. Yeah? Yeah. I didn't know that we are forced to delete this just after time.
22:46
There are laws for store the birth date. As I said, by credit card and so on, there are some laws for some purposes. I just took the birthday for the only reason why you store the birth date is you want to send birthday presents.
23:01
That was my example for the beginning. But yes, you have. When you look to Amazon or Barn, oh, I tried it, I was so annoyed at the end. You have this credit card, this payment. Amazon and also Barn ask you if you buy a Barn ticket
23:24
or if you buy something from Amazon. They ask you if they are allowed to store your credit data, your banking data. And then you say no. I gave up after they asked me 20 times or so. I was so annoyed, I said yes.
23:41
But when you watch it and look at it, they will ask you every three years. When you have a new credit card, I think credit cards are two years, three years. I have not four years. Yeah, yeah, but it's okay. But they ask you when you fill in a new credit card, they ask you again, shall we store it for the next time?
24:06
Barn, Barn, exactly Barn. When you get a new Barn card, they ask you. It's very funny. So there is this limitation. And I think four years is also fine. But you can't not just store it and say, okay, I have it in 10 or 20 years. So there are companies, they have this data
24:22
for 20 years and more, and people don't know it anymore. And so you have to find a process to re-ask. As I said, Amazon and Barn just look when credit card is not valid, and then more than they ask again. It's just an example.
24:41
But you should be careful with it when you have agreements. On agreements, of course, other than the purpose is not, that's fully, or when you have archiving laws, of course you need to store it, that's purpose. And that's also, I didn't read it here.
25:00
There is a sentence for archiving, or is it in here? No, it's not in purpose. But there will be also special rules for archiving because you have laws for archiving, and then you have a purpose. You have the purpose to, that you store this data for archiving.
25:21
But you need to, in your transparency documents, you need to say that you do it for archiving reasons. I are here for archiving purpose. It's given here. Third row, there's archiving. Then we have the accuracy.
25:43
Accurate and, where necessary, kept up to date. So personal data should be accurate and, where necessary, kept up to date. Every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purpose for which they are processed,
26:03
are raised and rectified without delay. That means, when you don't know if the data are correct, delete them. So, are they right? Are they correct?
26:21
And there was a company who's doing it very often if, as far as they don't know, I'm not sure who was it. And they're not sure that the person, they immediately delete the whole account. Was one of the social networks. I can't remember if it was Facebook.
26:42
I mean it was Facebook. Facebook or Google, I'm not really sure who was it. So, and personal data share. That's the storage limitation. That's also, again, with minimization limitation and minimalization.
27:00
So, kept in a form which permits identification of data subjects for no longer than it's necessary. We discussed this already. For the purposes for which the personal data are processed. We also had it before. Personal data may be stored for longer periods in so far as the personal data will be processed.
27:23
So, lei for archiving purposes. That's the archiving. In the public interest, a scientific and so on. Yeah. So, it's a storage limitation here. Again, very clear.
27:41
You are not allowed to store what you don't need or for which you don't have an agreement. Any exception, the only exception is archiving. So, public interest is very, very rare when you have personal data for public interests.
28:01
In Germany, it's, and now in Austria, it's not so rare because in Austria, you have the politicians are persons of public interests. In Germany, most are not. Mostly, they are not. They are, that's tricky.
28:20
They are with some artists. You have it too, but usually, public interest is a very, very rare information. Do you remember the first person who had corona in Germany? There were very interesting informations
28:40
going through the press. First of all, the press told the person is male. Okay, 50% of Germans are male, so 50% less. Then, it was an adult, okay. It was 8.5 million kids back,
29:04
so we have 8.5 million people less. And he came from Bavaria. Then, there was, I think it was the village from which he came.
29:21
He was in an, so there were two or three more informations and every journalist now in which hospital and which room this person was. It was a really, really big press error, I think, because it would have been enough
29:40
to say it's male in Bavaria because what happens when such, you can imagine what happens in front of the hospital with journalists and so on. So, be careful here. Didn't you say that something? Be careful here with what you are storing
30:02
and how, it's just an example how fast it really goes to have an identified person. So, yeah, and was it from public interest who it is or just was public interest that we have one person? In my opinion, it just was, we have one person,
30:23
it doesn't matter if male, it even doesn't matter if male or female. It was one person has corona, so. And if he comes from Bavaria or Hamburg, it doesn't matter as well. It was in Germany, so. Yeah, to protect this single person,
30:45
you should have too much informations because, yeah. Yeah, we have the storage limitation, I think that's pretty clear. And then personal data should be processed
31:01
in a manner that ensures appropriated security on the personal data including protection against unauthorized or unlawful processing and against accidental loss, destruction or damage using appropriate technical or organizational measures.
31:20
So that's integrity and confidentiality. And yeah, confidential. For us, it's pretty clear. For your customers, maybe not. So, and that's the protection I said already.
31:43
You need to protect them. And or better, your customer needs to it but customers usually try to say, oh no, we are not responsible. I already saw contracts for software developers where the customers tried to give
32:01
the responsible to the developer. At the moment, the court is protecting them and saying, no, the law says you are responsible and you can't give the responsibility to someone else because, yeah, you can't say when you make a law,
32:25
when you make really something against law, you can't say, no, it's not me. I won't go in jail. This other person will be in jail, go in jail. No, because it's always jail behind it and lots of penalty or both.
32:44
It depends what you are doing with this data. But there's also the option of jail. So, now I want to discuss with you and I see a lot of you have a laptop and we are not so much so it should work.
33:05
How to implement it in your SQL database that you can, what we can, how we can solve this, that we have some technical solutions for this law.
33:23
That's the end of my slide. Just wait. I have to do it that way. So, I have this pad.
33:42
It would be funny when we ask some more for collecting ideas. So, the pad is a very, very simple URL. You can see it. It's pads.ccc.de and then it's, come on here.
34:01
It's FRASCON minus MIRAC. Or you just can say what you have on ideas. How you can solve in your SQL database the minimalization. Do you have any ideas? Something like how can you solve an automatic deletion
34:24
or so, ideas. There's a microphone behind you.
34:40
One important problem I think is the database design and the constraints you put on your data. For example, sometimes you have data sets that contain both personal data and non-person related data. And in these cases, deleting personal data
35:02
might result in incomplete data sets. So, in these cases, you need to think about your database design in order to even allow deleting personal data. That is. How would you redesign this?
35:22
So, there are like two possibilities. One possibility is creating a separate table for like person less data or data that has no relation to any person. And after the three years run up,
35:42
you copy over your data set into this second like archive table that does not contain any personal data. The other way would be to allow for certain data fields to be null. But this is something your application has to deal with then.
36:01
So you have to go through all your source code and check that it can deal with nullable fields. And the other possibility would be to, if you have strings or something, it might be easier. You could just replace them with dummy data. You could. So, dummy data to overwrite the original data.
36:25
Yes. This is something. Dummy, yes. But what about overwrite the original data with dummy? What about a column last updated
36:41
or something that you know how old the data are? This is important too, of course. Okay, yeah. What did I wrote? Overwrite the original data with dummies with anonymous dummies. Yes. Then we have, of course, a date.
37:03
Date, columns, last updated, or what I do in databases usually is I have a date column delete on.
37:23
Date to delete. So that I can do a con job, a daily con job looking for date to delete and delete them. Wait. Or date. Hello, this is Patrick.
37:41
We are not so much persons, so. Or date to delete. Deletion date. Deletion date, or date for deletion, or deleting, so.
38:09
Any other ideas how to solve minimalization? How could you, when you store data, how could you help yourself to process
38:22
that you can remove them easily out of the database? Here's some ideas.
38:42
Or do you have any seen databases where it was that way? What others do? Do you look into a database schema and what did you see, what they do in their schemas
39:01
or in processing to data minimalization? You said something about archiving, just archiving the data you need? Yeah, using like a separate structure,
39:22
using a separate structure that allows saving the data without the personal bits. And this basically prevents you from trying to process this data with code that expects the personal data to be present. So if you don't want to overwrite it with dummy data,
39:43
that might be a solution. So you might have purposes where you need the other data that's not person-related. And then you can use this separate table for these applications and use different parts of the program with different data structures.
40:03
But then you still have the data in your database? If you delete it from the personal, from the structure that contains personal data and you transfer it into a different table that contains no personal data because the structure doesn't require it, then you don't have the personal data anymore.
40:23
But you can still use it for some purpose, the entity. Like if you have an entity that consists of person-related data and non-person-related data, that's the problem I was talking about. You also have some other problems because you sometimes have processes
40:42
where you have to store some specific data but not all. For example, when, yeah, that's not personal data. It's company data. But there are companies. Now, you are in construction.
41:01
In construction, you have this in construction works because in construction, the text law says that even when you are a private person, the company has to write, in Germany, has to write a bill to you with your name on it. So a personal bill.
41:21
For all others, they say not for private persons, but in construction, when you build a house or have something on your house in construction work, you will get a personalized bill, a receipt. So this has to be stored 10 years, archived for 10 years.
41:40
So that means, okay, you have this name stored by this. But there might be, okay, in construction, I have not really an idea what could be so personal. Yeah, but the question is, there might be some other stuff of you stored as well
42:02
in this whole process. Let's say, yeah, I don't know. Birthdate, maybe. Birthdate, yeah, birthdate for whatever you needed, something like this. Something not related to this receipt. So this, where you don't have any law for archiving it,
42:25
this you should delete. So you have parts you should delete and parts you should not delete. And it depends on your design. So when you design a database, you should think about it that you have parts who need to be deleted
42:41
so you maybe don't want to store it in the same table or not with the same constraints on it. So separate it that way that you can delete it or can overwrite it, can do it, make it anonymous. And yeah, another table would be an idea
43:09
that's very tricky because you always have to think for each information you store, you have to think about, wait, what's GDPR say? What do I want to do with it?
43:21
And very often in modern databases, you see updated, last updated to delete and so that you already know how old is this data set. But there's some more. And when you are able to, when you have the, when you can design your database because you are creating a new software,
43:41
then of course you should think about it. It's very hard to redesign an existing database. And so the law said, okay, you had, and that's where our authorities today say,
44:00
it's six years already. GDPR exists since 2016. And now they say you had six years time to redesign your database. And when you don't do it, and so it's not that they say to you, they say to your customers
44:21
because they are responsible. They say you had six years to redesign your software. You had six years to take other software to redesign your business. Unfortunately, they do it with smaller companies and not with huge companies.
44:42
But the authorities get stronger, way stronger. And when you look how they go against huge software products who don't be GDPR conform, even when you are with small software, it's very easy to say no, you don't,
45:02
you're not allowed to use it anymore because at the other, that the authority says you aren't allowed to use the software anymore. So your software, so it's not just, okay, I don't care what's given because it's not my, it's customer's responsibility, but the customer won't use it, yeah.
45:23
So it's again, you, yeah. Do you have any other ideas how we can do this minimalization? And have you any ideas, or you will ask us, do you have any ideas what you tell the customer when he wants to store lots of things that he don't, is allowed to?
45:47
When he wants, not just, when he wants to write a bill or receive to private persons with address and it's not allowed, any ideas?
46:15
And that's the tricky thing today at the software developer.
46:21
It's sad, but you need to know law too very, very often. And often your customer says, yeah, I want to write, each person should get a receipt with a full name and address because of guarantee.
46:43
Because of guarantee. Then you have okay. Because tax law won't work. Tax law says for private persons you just need a receipt without name, not personally, so just with, in Germany we say, just a receipt.
47:04
No name on it. The exception is construction work. So yeah, what will you do? When you go in an electronic market today, you really get a receipt without name.
47:23
Very interesting. And yeah, what do they do with guarantee? You have the receipt. So show the receipt and you will have the guarantee. You don't need the name for guarantee. And that's very tricky because your customers
47:40
try to get these names because the real reason very often isn't that the receipt should be the name. The real reason is that they can do marketing stuff for it. And marketing stuff again is again spamming and spam law and so on and so on. So yeah.
48:01
So you have, you can make something, an identifier on your receipt without the name. Just for guarantee. Or for, if there's something wrong with the product. Of course you need the name and so on on the address when you want to deliver.
48:22
And it's also funny when you look into these huge companies, what do they do with delivery? The same as I said with credit cards, Amazon. Believe me, it's very, very annoying when you type 20 times or more your address in
48:42
because you said don't store my delivery address. That will be the day you will say store my delivery address. And I also had on Amazon that they asked me if they are allowed to store my delivery address further on. If it's correct, is it valid again, is it still valid?
49:03
So I can't remember if it was after three or four or five years, but they ask it after a while. And that's something you easily can solve in SQL in your database with this simple date.
49:21
How long was it stored already? And then you can go with this cron job and look. Or when the customer opens the page or whatever, you can look, okay, it's stored already three years. I need to roll out the question or not.
49:41
It's one idea. So what you need to do in summary, the most important thing in SQL is that your data set gets dates. That you put a column with something like last updated,
50:01
deleted or whatever else. That you have lots of date columns in your tables where you can roll out queries and look up stuff. And you need to anonymize if you have this constraint problem
50:21
and you want to be, has it, you want to use it. The constraints, very often constraints were kicked out because of performance. It's one of the performance killers. So very often you will see outside databases without constraints, without implemented constraints.
50:45
Yeah, so but when you really have implemented constraints and you say, yeah, I can't do it, then you need to polynomize or if you really don't need it anymore, you need to anonymize it.
51:02
And that's a process then you can think about it. You can make the columns null. You should have a constraint not null. That's very counterproductive. Or you can say, I saw a software from a bank database from a bank.
51:22
There was a dummy name was Boz Becker. That's not a clever idea. It's 10 or 12, no, it was 2005. So it's lots of years ago. But that's not a really good idea. So sometimes you see Mr. Master Man or so,
51:41
or Mrs. Master Man. In social regions, you very often see social health region, you see that they're just given men or women. So they just overwrite it with the word men or women. And yeah, that's how you could solve it.
52:04
It would be better if you had a solution that you really can delete this instead of just anonymize it. And of course, you have to look to the security tool
52:22
who else can look into it. So you need to do it very, very soon after the purpose is over. The question also is when it's purpose, when is the purpose done? That's a question what normal people usually don't expect.
52:48
The purpose is done when the person can't complain anymore. So when we are doing this usual sales business, it's not done after the payment got on the account.
53:05
No, after the guarantee, for example, or after the person, the buyer can't complain anymore. So after all this limitations on, you can go against it.
53:26
Usually it's three months or a year. Very often you have a single purpose waiting, okay, it's over. And then there is, because of lost stuff, there is three or six months or one year.
53:42
And also it's always just after one year, minimum is one year. Because yeah, that's also, so the courts say you need to do two deletions once per year after you're cleaned. After all your tax stuff is done,
54:02
you should do it once per year. So it's always after. After the end of the business year. Now I have the word, after end of business year. So after business year, you need to clean. So it's always after business year. And then you have to look, ah, can the person complain? Is there still the possibility that's able to complain?
54:26
Credit cards, usually you can get your money back six months after buying or something like this. So that's the six months. So that's very, very tricky because you have to look into it. It's not just what we think the purpose is,
54:43
I bought this stuff and done. No, there are some lost stuffs and some payment and whatever stuffs which extend the purpose time. And yeah, that's something you need to think about. And as I said, try to ask your customer.
55:04
It's very funny what you get for answers. How the answers was customers said, I don't know. It's not my business. And you think, okay, you are CEO and why don't you know it, yeah? How do you handle data stored in backups?
55:23
If you do backups and you should do, then you probably have data which comes to a backup now should be deleted tomorrow because tomorrow it's two years old if the limit is two years and other data from the backup can stay for another 15 months.
55:41
How do you handle this with various generations of backups? So we have a different understanding from backups and archiving. A backup is a backup which I do daily, for example, or weekly. And I do two or three database. When I have a terabyte of database, I have maximum of two or three versions
56:03
and then I overwrite them. So my backup maximum, the oldest backup is one week. Let's say one month, not more. So there is no problem with backups because latest when the backup is overwritten
56:23
and it's very fast. So in law speaking, it's very fast because a month, that's less than a lawyer have time to complain or to write some stuff. So that's overwritten very fast. And archiving, of course, when you do an archiving,
56:42
you need a law that you are allowed to archive. Otherwise, you are not allowed to archive. So for tax law or something like this, you need to archive 10 years and that's archiving. And also there is given in the GDPR.
57:02
Yeah, but it's, yeah, very funny. It also was given in the former German privacy laws. You should, when you archiving, you should epsiloniminize the data.
57:20
I know nobody is doing it. So you should not store them in clean, the information you should epsiloniminize them for archiving reasons, but that's nobody is doing. But you really need to think about
57:42
am I allowed to archive them or not? And when I'm not allowed to archive them, then I don't need to archive them. That's a simple reason. But you have to look, of course, you have to think about what do I really need to archive?
58:02
And then we are, again, in this SQL, you can do and that's something you will see sometimes. It's, you should think about petitioning. That you think about petitioning in a time base. So let's say you have a table for one year,
58:22
a petition for one year. You have 20, 20, 20, 21, 22 in a single partition and you say, okay, I need this data for three years or five years or so on, then you can delete the whole partition in your table.
58:41
That would be an idea too, partitioning. Yeah, I know from a health system, not a German health, a European health system, they have petitions for six months and for health data and they really make it
59:01
every six months they override the petition. So the January petition will be overwritten in July and so they always just have half a year of this health data. Health data are a specific topic too and it's just an example how you could solve it.
59:23
But you need to, backup is, I know this argument backup, but usually a backup isn't old. A backup is something that's in a very short time because then it's overwritten because you don't have backups for two years or so.
59:40
Usually you don't have the capacity to have daily backups for a terabyte database that are two years old and for each day. Yeah, I doubt it. I would say I doubt it. I'm pretty sure it's outside. There's nothing outside. You have never.
01:00:00
scene or so. Yeah, you have a question. Oh, my time. My time is over. Oh, wow. Yeah, thank you. I hope we have some ideas. OK, the pet. Yeah, but we had already discussions and the room would be full how this pet would work
01:00:22
if or not. Yeah, I hope I could get your mind for thinking about GDPR. It's not so difficult to make your database GDPR conform a little bit.
01:00:44
My experience is that the biggest problem is the customer. Is your customer on because they often want way more that you store than you are allowed.
01:01:02
Yeah, thanks for listening. I have another talk later on at 5 about fancy SQL. That's just the talk, not with pets. It's just the talk. Yeah, I will explain window functions and CTEs and joins.
01:01:20
Yeah, and I wish you enjoy for us gone and have a nice day. And I'm wondering why Max isn't here for the next talk. OK. And yeah, my address is open. Open. Further, maybe some film.
01:01:41
And that's my hand.