Merken

Building a multi-purpose platform for bulk data using sqlalchemy

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the loss and I'm working in the team what we process the data we make that well before the machine learning and what we therefore API so and that's the main topic of my my talk it's maybe 3 things you can see here in the title it building a mighty purpose platform meaning we want to use that uh for different business domains so I will explain that later on we have by data so much as a a few datasets but lots of data which need to be
processed in the fast lane and to reduce the sequel coming to the 3 points you will see throughout the presentation yeah introducing a way of building data processing applications that can be used in many this this estimate that yeah sequel argument we are building on signal up to me I mean most of you guys you know that it's just a statement from the website which are copied here and have to say what I really like about sequel comedians that gives so much flexibility in how to build your application you really have both ends of the spectrum doing and much database oriented for high performance and you can be also have away from where you can program in a much more abstract way and within our application we use both of these flavors depending on in what area we are and what we do exactly the API processing part of show here so let's build a much domain platform for what do we have to do what's to customers expect from us what we have to go to the bar data why it see we find this is what it was shown my example all in real life we also use XML files we get into the system why are aware at the interface where post repressed we have to worry about that data we have had several talks already regarding being that big data which is not always the which can be quite messy and so that our machine learning can work on that we need to have a clean data that as well preferences and that's 1 important part of all applications what we do yet and we use it for different this is main so what we currently are doing our company is for retail for tourism and for areas but yeah what it was shown that the war will expand on the next slide and so there's still a lot of
technical to do so what we have to do we have to create a database schema based on the system a in red parts of the years we save that policies we data in the database lift Willie data data what additions there can be multiple things we have to check that the required fields of fields with to check that the data is correct and that in the state fields there is no time for example all no other descriptions like today or tomorrow and we have to check that references between the data records correct we want to give the feedback we want to give feedback to the customer about the press
processing status of the state of what was accepted as to whether we were able to process it this time with and is important for us that we can separate the data that we received from the customer from the clean and will dated Beethoven will use for machine learning so we want to be always be able to trade what was sent to ask and what we make from the having thought about that
let's have a look at all 1st customer the 1st customer support and what could have happened 1 from a machine learning algorithm it wants to predict whole many drinks also it's in the next unit so that they can plan accordingly all much to buy a much wider it's to so to
do that they want to send us the trends that have a little they want to send us the whole in principal although for even name for the last half year and in harmony was a task that on each evening so that we can do a how could a data model
for this looks like it's quite simple we have on the 1 hand the drink the although as reference the brings and the was just that's just another table that we have for information I told you that we need to seperate the data we got from the customer to the data that we wanted to do that we have 2 sets of tables on the 1 hand because of the state's table these the data we got from the customer just as we get it from the customer so maybe he sent several updates and we have several lines and then maybe there are some public because he said the same pipeline system we
have several lines in there and maybe there are some rules then we also have them all in the states what you want to get out of that process this caller and in the course we also have to drink so all and with but there who have 1 a unique ID for each data record and 1 we have updates to the data we will update that data record and not uh staring at several times so the machine learning algorithms can use that and can be confident that it would be sensible to how could such C is the delivery look like well let's take a simple problem we have some here we have some additional information here alcoholic content we have was the let's see maybe the parts in Scotland was a Scottish whiskey without the need and we have some hope for the people of wanting some your was and they have these or on that day so it's 10 year and a Coke on the 11th of July 15th year and 2 whiskey and on 12 off July so that India and 1 year well we got a new way from Ireland there are other was the with but it would be bad for us that's the only know the Scotch whiskey but things can happen we get its delivery now what we want to do with that ideally what our coach should be able to do it should find references between object so we have that stage table Europe these are the orders we get and you can see that sticks to other coaches song that's the drinks that's the column you saw on all this table and then there's this new column the trains reference this is nothing the customer-centric this there's the reference to their unique ID of the strings we want to find they are well over a year in the cost you can see the court table is seen yeah we have you know I've been and we want to write that in in just 1 implementation the telomeres at the last slide here we don't have a foreign key relationship between this column and this column you can be anything and so at the moment it's empty but in the call we find so far in the
relationship between the table and that table so that all the database and shows that really the sensible data and here so whenever we want to copy that data to that table we really need to make sure that we find the correct preferences 1st this we do 2 steps we have the reference findings that which writes them in here and then when they're in a drives the qualitative data to the call and copies and then you can see on just this information but keeps the reference information with the foreign key to that drinks take and you also see in the last line is submitted this whiskey could not process the copy it in there we have to decide no application where we throw an exception then whether to write some more predictive information to the customer in some way properties it should not come to the core so our task as you will is how can we write the code that that's the steps how do we do that we
have several possibilities we have played SQL works fine
and if we want to start playing around with that that's always a good choice on just playing around with the database to be able but that's really sensible way of doing that we can do that in the course so sequel can be quite model which is closely resembles the sequel Archimedes and what we have here all the last stage this is the sequel Comunità that object which contains the information about the stage play before the we should update statement and we say what are the well use the warriors are that the strings reference column should the filled by a select and you want to select the idea is the table brings where the external coach of the court table equals to print the name of the all the states and let me just go back to show it to you again here we want to make sure that really that this idea gets into that problem and that for exactly where these this name of Prince matches here here's the external code of the call so therefore obtain this works fine that's a nice idea and I would say maybe would be the best for implementation we have a slightly in all in the back of all I had to that you might get different customers different models and we are thinking about what maybe it would be a good idea to look in the 0 and so there's a lot more flexible there we have here the the tables as objects and we have each row is an object and it's much nicer to implement stuff here we can look over the order we can create really that we can't really read that table with the part filters and update the table it works fine also but as we do here the single database access that might be not a good idea from a performance point of view when you really have to be customers but these other things the tools we have at hand at the moment so let's assume you know what you we use that statement and we're really happy and everything works fine you have great data customers have been awarded to scientists at the things now what this it's good or bad uh the customers had the the the park and she tells the summary about that so they are talking when they get in the liver and the story is quite excited because they say well that machine learning stuff for you right about that in the newspaper so we are thinking about all read we have some machine we we have the boilers and the 4 main tasks and you have some sense cells and then we we measure some stuff like temperature and pressure and then must be some way to to find out I mean proving and storing view is quite a long process will want to know in the beginning what will be the quality of all of you in the end could you help us with and all data scientists are quite happy with that's interesting new task and we just need to get the data to the system this can't be that complicated what looking at that statement here it might be we have to rewrite all that because that different references between the between the categories we know and she's not have sensors enough measurements all named
differently and to make that customer happy we would have to read write that complete statement so it would work but when we look into the future and maybe they're more interested more aggressive and more interesting business domains then we might have really lots of work to do so what would be the solution we thought more and
we said well we could describe these links in more abstract way we can say we have 1 business domain which is
the part 1st and the pub consists of categories with the trains with the all some of the was task they consist of the elements well that taller coat with reference to trends have some types from each of the database and what this some of these elements of special we looked at the reference finding task and we have seen they need special processing and it can be good if you just could have a way to determine that these special elements that we we can inherit from the elements and we also have an image alignment has a name which we seen this years we find and the chess and name on the database which while most cases to see is the name in capital letters but for this orders brings references you had seen this an additional fees this this name reference you remember in our reference findings that we want to fill that column in this stage so we asked the students are sub-classes and we say that belongs to the category so what
does that help us if we can do that I mean we also want that
somehow resembles the a sequence of the model also I mean quality the model also has some table that has some elements it has some types it has foreign keys but is the goal of the moment for a database description is not so much from an algorithmic processing of that stuff so therefore I will discuss that at the end we said it really makes sense to have this and wants to
hello now does that look like a musical alchemy also has a positive urine all and for example with the generic and Parsons and we have here in the sequel alchemy model we all this domain and that the business domain that is also described in the application so in both here we need to have that this system and what we wanted to do we wanted to factor all some knowledge of the so that this application doesn't need to know the business domain that you really can't set this year and the domain model and that we we can have specific task from the or for reference finding will have fewer 1 renderer which use that information to generate sequel statement and we can have 1 for other tasks
look like I will give you use your code sample for the top so we have here in the domain the category the elements from owning a package we have that problem which is the main and the top consists of 3 categories the drinks the visitors of the order so that's quite easy to write down I mean there's there's nothing more than you need and it's easily understandable the interesting stuff this year that reference which you see links directly to the drinks having that how does
that task-specific renderer looks like so this is Python code which at 1st checks for each category what other references in there and we can just check that with an instance and you over the references with few just 1 but there might be other categories where we have multiple once we're getting the stage and can't tables from this equality metadata on this interview we can find that by the name of the State stable we can find the reference card table and we should update statement so this up statement here is the update stage with what 1 these will use are constructed dynamically so because we cannot give you the key what arguments dynamically we constructed peaked at 1st and this is year the updated with the keyword will use and you see here that's the column which should be updated and this was a sequel alchemy cost statement hold talks with so repressed that I will show that the more and then we can execute so now let's switch during that you
can see that this really works whether prepared and the moment simplest let like database and I have prepared some scripts you know it's quite and is that right right
so at the moment we do not have to see polite databases you these tables command exist at all and what we want to do we want to create an we do that by calling on Python scripts for each database and these tables are there you can see here by the way level configuration file was he what's a database and what does all this is so now we need to get data so we conduct patterns we do this yes we import we do that 1st of the trans category and you see here that's the sequel statement right from that and it's here in the category so now we do that for the all antecedent in you now we want to the reference finding and 4 so before we find references the all of the princess need to get into the core otherwise exam so what do we do is we are calling the quote Scripture and reviewing the 1st for the drinks you see here that was the generated sequel statement and this is the core and you see that the screens graph column is and now for the most interesting you can see these as prior statements are issued and here you see that filter and also that all the scores the Suffolk completely on the screen but you can see that it's all in there so that works fine that's great but that was a step we were already 5 let's thank you know let's go back
and let's say that we have another domain you have to read and the primary with said has machines that has samples and its measurements and the story here you can see the growth the sentence reference genes and the measurements references the senses I would like to show that the more so but unfortunately the time is not sufficient for that but it's really as simple as we have seen here you just need to change the comfort Jason you create the database and the new tables will be there and you can import the differences we find this will be the so what does that mean when we say
that what does this domain model model helps us what is optimized for high throughput as we you see these as well as segments of the issues that can be processed directly on the database so we put all things in the database into the stage and now we have some processing of all domain knowledge and would generate a sequel statement and that things will be processed and that's that's really good fit for analytical models when I when thought about it the more you for the talk i found it might not that fit that willful transactional models for you have more complex and and relations but for analytical models this is really great and helps a lot when a compound that to sequel alchemy model which is also some kind of make a model and we see that as the glycan is focused on the database description the domain model in contrast can contain more information in all of our team we had also the task that we have from that have time time-dependent stuff so some strings of only a winnable at several days or maybe they were when they were last week but another above this week and we need to check his cross time dependencies also this can be done in the domain model also we can note that and can generate a sequel model all of the domain model so in that case we have what is an additional
bonus s we can use the domain model for much further stuff you can see here we can generate this equality model we can generate a sequel for all tasks we have to see if the local configuration but also what we do we generate documentation artifacts the whole filets is we tables and we can generate the more data that most of but that's just to show you some ideas so that you are able also to have some questions I will close you and that's what
I wanted to show you but any questions as is the single
orbitals in the no this is something we will look into a million I mean what it did you for that type of prepared the small the modification and also multiple providing that but I've seen this takes uh much stuff around to make that example some sensible and yet before making the open source me I would have to take also internally and our company and before going into that direction I just wanted to see whether this process at all and that so if have some questions to that I want to get some for about this which is interesting after the what questions I think you should
Arithmetisches Mittel
Einfügungsdämpfung
Attributwertebereich
Subtraktion
Vorlesung/Konferenz
Algorithmische Lerntheorie
Systemplattform
Rückkopplung
Web Site
Multiplikation
Punkt
Prozess <Physik>
Ablöseblase
Parser
Kartesische Koordinaten
Fortsetzung <Mathematik>
Kombinatorische Gruppentheorie
Hecke-Operator
Systemplattform
Punktspektrum
Computeranimation
Attributwertebereich
Deskriptive Statistik
Last
Attributwertebereich
Mapping <Computergraphik>
Datenverarbeitung
Algorithmische Lerntheorie
Schnittstelle
Schätzwert
Addition
Videospiel
Parametersystem
Befehl <Informatik>
Datenlogger
Datenhaltung
Web Site
Physikalisches System
Elektronische Publikation
Rechenschieber
Datenfeld
Rückkopplung
Flächeninhalt
Ein-Ausgabe
Mereologie
Leistung <Physik>
Aggregatzustand
Attributwertebereich
Metropolitan area network
Virtuelle Maschine
Prinzip der gleichmäßigen Beschränktheit
Multiplikation
Algorithmus
Rückkopplung
Ein-Ausgabe
Ablöseblase
Parser
Computeranimation
Aggregatzustand
Prinzip der gleichmäßigen Beschränktheit
Datenmodell
Datenmodell
Physikalisches System
Computeranimation
Task
Twitter <Softwareplattform>
Menge
Speicherabzug
Information
Ordnung <Mathematik>
Gerade
Tabelle <Informatik>
Aggregatzustand
Prozess <Physik>
Wellenpaket
Momentenproblem
Implementierung
Kartesische Koordinaten
Code
Computeranimation
Task
Metropolitan area network
Virtuelle Maschine
Task
Algorithmus
Zählen
Ordnung <Mathematik>
Gerade
Prinzip der gleichmäßigen Beschränktheit
Datenlogger
Kategorie <Mathematik>
Datenhaltung
Datenmodell
Systemaufruf
Schlussregel
Ausnahmebehandlung
Objekt <Kategorie>
Rechenschieber
Mereologie
Speicherabzug
Information
Ordnung <Mathematik>
Schlüsselverwaltung
Ext-Funktor
Aggregatzustand
Tabelle <Informatik>
Zeichenkette
Subtraktion
Punkt
Prozess <Physik>
Sechsecknetz
Momentenproblem
Zellularer Automat
Implementierung
Fortsetzung <Mathematik>
Code
Computeranimation
Task
Metropolitan area network
Virtuelle Maschine
Datensatz
Informationsmodellierung
Task
Speicherabzug
Ordnung <Mathematik>
Auswahlaxiom
Einflussgröße
Implementierung
Filter <Stochastik>
Befehl <Informatik>
Sichtenkonzept
Kategorie <Mathematik>
Datenhaltung
Einfache Genauigkeit
Systemaufruf
Physikalisches System
Menge
Objekt <Kategorie>
Bildschirmmaske
Druckverlauf
Mereologie
Information
Ordnung <Mathematik>
PRINCE2
Tabelle <Informatik>
Zeichenkette
Attributwertebereich
Befehl <Informatik>
Attributwertebereich
Datentyp
Element <Mathematik>
Datenmodell
Binder <Informatik>
Ext-Funktor
Computeranimation
Prinzip der gleichmäßigen Beschränktheit
Datentyp
Wellenpaket
Prozess <Physik>
Element <Mathematik>
Kategorie <Mathematik>
Datenhaltung
Machsches Prinzip
Datenmodell
t-Test
Magnetooptischer Speicher
Winkel
Element <Mathematik>
Computeranimation
Attributwertebereich
Task
Metropolitan area network
Bildschirmmaske
Computerschach
Task
Twitter <Softwareplattform>
Mereologie
Datentyp
Ordnung <Mathematik>
Bildgebendes Verfahren
Befehl <Informatik>
Folge <Mathematik>
Datentyp
Prozess <Physik>
Momentenproblem
Element <Mathematik>
Datenhaltung
Machsches Prinzip
Datenmodell
Magnetooptischer Speicher
Kartesische Koordinaten
Fortsetzung <Mathematik>
Physikalisches System
Element <Mathematik>
Computeranimation
Attributwertebereich
Task
Deskriptive Statistik
Attributwertebereich
Informationsmodellierung
Generator <Informatik>
Task
Datentyp
Volumenvisualisierung
Information
Schlüsselverwaltung
Tabelle <Informatik>
Stabilitätstheorie <Logik>
Element <Mathematik>
Befehl <Informatik>
Fortsetzung <Mathematik>
Element <Mathematik>
Code
Computeranimation
Attributwertebereich
Metadaten
Attributwertebereich
Code
Stichprobenumfang
Volumenvisualisierung
Zählen
Ordnung <Mathematik>
Parametersystem
Befehl <Informatik>
Prinzip der gleichmäßigen Beschränktheit
Kategorie <Mathematik>
Element <Gruppentheorie>
Stichprobe
Binder <Informatik>
Chipkarte
Zeichenkette
Differenzkern
Ordnung <Mathematik>
Cloud Computing
Ext-Funktor
Tabelle <Informatik>
Instantiierung
Aggregatzustand
Tabusuche
Prinzip der gleichmäßigen Beschränktheit
Befehl <Informatik>
Momentenproblem
Graph
Kategorie <Mathematik>
Datenhaltung
Fortsetzung <Mathematik>
Information
Elektronische Publikation
Computeranimation
Übergang
Metropolitan area network
Diskrete-Elemente-Methode
Code
Mustersprache
Zählen
Skript <Programm>
Speicherabzug
Ordnung <Mathematik>
Konfigurationsraum
Tabelle <Informatik>
Touchscreen
Subtraktion
Prozess <Physik>
Fortsetzung <Mathematik>
Analytische Menge
Information
Computeranimation
Task
Attributwertebereich
Virtuelle Maschine
Deskriptive Statistik
Informationsmodellierung
Attributwertebereich
Task
Code
Stichprobenumfang
Kontrast <Statistik>
Einflussgröße
Befehl <Informatik>
Datenhaltung
Relativitätstheorie
Güte der Anpassung
Datenmodell
Ausgleichsrechnung
Portscanner
Arithmetisches Mittel
Information
Modelltheorie
Ext-Funktor
Tabelle <Informatik>
Zeichenkette
Fitnessfunktion
Datenmodell
Fortsetzung <Mathematik>
Computeranimation
Task
Attributwertebereich
Metropolitan area network
Attributwertebereich
Informationsmodellierung
Generator <Informatik>
Differenzkern
Konfigurationsraum
Tabelle <Informatik>
Demo <Programm>
Prozess <Physik>
Open Source
Orbit <Mathematik>
Vorlesung/Konferenz
Richtung

Metadaten

Formale Metadaten

Titel Building a multi-purpose platform for bulk data using sqlalchemy
Serientitel EuroPython 2015
Teil 40
Anzahl der Teile 173
Autor Trebing, Christian
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20095
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Christian Trebing - Building a multi-purpose platform for bulk data using sqlalchemy At Blue Yonder, we've built a platform that can accept and process bulk amounts of data for multiple business domains (e.g. handling retail store location and sales data) using SQLAlchemy as a database abstraction layer. We wanted to use as much of SQLAlchemy as possible, but we quickly found that the ORM (Object Relational Mapper) is not suitable for handling large amounts of data at once. At the same time, we did not want each team of developers working on individual business domains to have to handcraft their own SQL statements. To solve this problem, we built an application configuration that closely resembles an SQLAlchemy model, but also contains application-specific logic settings. In this talk I will demonstrate: - an application architecture for multiple business domains - the structure of the domain configuration utilized in the generation of the SQLAlchemy model, SQLAlchemy core statements, and other application functionality - how the domain configuration is used throughout the application (consuming and parsing incoming data, storing it in a database and ensuring data quality)
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Ähnliche Filme

Loading...