Logo TIB AV-Portal Logo TIB AV-Portal

Building a multi-purpose platform for bulk data using sqlalchemy

Video in TIB AV-Portal: Building a multi-purpose platform for bulk data using sqlalchemy

Formal Metadata

Building a multi-purpose platform for bulk data using sqlalchemy
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Christian Trebing - Building a multi-purpose platform for bulk data using sqlalchemy At Blue Yonder, we've built a platform that can accept and process bulk amounts of data for multiple business domains (e.g. handling retail store location and sales data) using SQLAlchemy as a database abstraction layer. We wanted to use as much of SQLAlchemy as possible, but we quickly found that the ORM (Object Relational Mapper) is not suitable for handling large amounts of data at once. At the same time, we did not want each team of developers working on individual business domains to have to handcraft their own SQL statements. To solve this problem, we built an application configuration that closely resembles an SQLAlchemy model, but also contains application-specific logic settings. In this talk I will demonstrate: - an application architecture for multiple business domains - the structure of the domain configuration utilized in the generation of the SQLAlchemy model, SQLAlchemy core statements, and other application functionality - how the domain configuration is used throughout the application (consuming and parsing incoming data, storing it in a database and ensuring data quality)
Keywords EuroPython Conference EP 2015 EuroPython 2015
means machine learning Lecture/Conference different Domain loss platforms
point Slides presentation Sequel files Development load Domain states feedback time argument part fields estimates machine learning several website input Multiple platforms descriptions systems area data processing addition parsing interfaces feedback Databases applications powerful processes Computer animation different statements website life Domain log files spectral
parsing algorithm several Computer animation Domain states feedback machine input pub Multiple
modes information states Core sets lines Twitter structured data orders structured data Computer animation table pub tasks systems
Slides implementation states code time machine counting part rules training structured data string DAS core pub tasks exception man algorithm information key moment Ext Databases lines applications system call category processes Computer animation orders des log files objects table tasks
point choice filters implementation Sequel code views machine part orders different string single model implementation tasks systems man information cellular Core moment Databases system call measures category processes Computer animation orders statements Prince objects table pressure form tasks record
structured data Computer animation link Domain statements Ext Domain data types
chess Domain angles student part elements Twitter training MACH structured data image tasks data types Magneto-optical drive Databases elements pub Types category processes Computer animation case orders form tasks
Sequel Domain angles elements structured data model descriptions tasks systems data types man generate key information moment Databases applications sequence elements Types processes Computer animation statements Render Domain table tasks
link Sequel Domain code states equal cloud platforms argument counting metadata elements orders statements pub stable sampling code smart Ext instance elements category sample Computer animation string orders statements Render Domain table
script man screens Graph files Sequel demo moment states code Databases counting dem Sans category orders Computer animation core statements level configuration pattern table pub
Super Sequel Domain time machine structured data goodness different string information model contrast descriptions tasks relation information sampling fit code analytics Ext Databases scans measures means processes Computer animation case statements classes model Domain table fitness tasks
man generate Sequel Domain equal structured data Computer animation configuration Domain model Gamma table tasks
processes Open Source directions orbit
the loss and I'm working in the team what we process the data we make that well before the machine learning and what we therefore API so and that's the main topic of my my talk it's maybe 3 things you can see here in the title it building a mighty purpose platform meaning we want to use that uh for different business domains so I will explain that later on we have by data so much as a a few datasets but lots of data which need to be
processed in the fast lane and to reduce the sequel coming to the 3 points you will see throughout the presentation yeah introducing a way of building data processing applications that can be used in many this this estimate that yeah sequel argument we are building on signal up to me I mean most of you guys you know that it's just a statement from the website which are copied here and have to say what I really like about sequel comedians that gives so much flexibility in how to build your application you really have both ends of the spectrum doing and much database oriented for high performance and you can be also have away from where you can program in a much more abstract way and within our application we use both of these flavors depending on in what area we are and what we do exactly the API processing part of show here so let's build a much domain platform for what do we have to do what's to customers expect from us what we have to go to the bar data why it see we find this is what it was shown my example all in real life we also use XML files we get into the system why are aware at the interface where post repressed we have to worry about that data we have had several talks already regarding being that big data which is not always the which can be quite messy and so that our machine learning can work on that we need to have a clean data that as well preferences and that's 1 important part of all applications what we do yet and we use it for different this is main so what we currently are doing our company is for retail for tourism and for areas but yeah what it was shown that the war will expand on the next slide and so there's still a lot of
technical to do so what we have to do we have to create a database schema based on the system a in red parts of the years we save that policies we data in the database lift Willie data data what additions there can be multiple things we have to check that the required fields of fields with to check that the data is correct and that in the state fields there is no time for example all no other descriptions like today or tomorrow and we have to check that references between the data records correct we want to give the feedback we want to give feedback to the customer about the press
processing status of the state of what was accepted as to whether we were able to process it this time with and is important for us that we can separate the data that we received from the customer from the clean and will dated Beethoven will use for machine learning so we want to be always be able to trade what was sent to ask and what we make from the having thought about that
let's have a look at all 1st customer the 1st customer support and what could have happened 1 from a machine learning algorithm it wants to predict whole many drinks also it's in the next unit so that they can plan accordingly all much to buy a much wider it's to so to
do that they want to send us the trends that have a little they want to send us the whole in principal although for even name for the last half year and in harmony was a task that on each evening so that we can do a how could a data model
for this looks like it's quite simple we have on the 1 hand the drink the although as reference the brings and the was just that's just another table that we have for information I told you that we need to seperate the data we got from the customer to the data that we wanted to do that we have 2 sets of tables on the 1 hand because of the state's table these the data we got from the customer just as we get it from the customer so maybe he sent several updates and we have several lines and then maybe there are some public because he said the same pipeline system we
have several lines in there and maybe there are some rules then we also have them all in the states what you want to get out of that process this caller and in the course we also have to drink so all and with but there who have 1 a unique ID for each data record and 1 we have updates to the data we will update that data record and not uh staring at several times so the machine learning algorithms can use that and can be confident that it would be sensible to how could such C is the delivery look like well let's take a simple problem we have some here we have some additional information here alcoholic content we have was the let's see maybe the parts in Scotland was a Scottish whiskey without the need and we have some hope for the people of wanting some your was and they have these or on that day so it's 10 year and a Coke on the 11th of July 15th year and 2 whiskey and on 12 off July so that India and 1 year well we got a new way from Ireland there are other was the with but it would be bad for us that's the only know the Scotch whiskey but things can happen we get its delivery now what we want to do with that ideally what our coach should be able to do it should find references between object so we have that stage table Europe these are the orders we get and you can see that sticks to other coaches song that's the drinks that's the column you saw on all this table and then there's this new column the trains reference this is nothing the customer-centric this there's the reference to their unique ID of the strings we want to find they are well over a year in the cost you can see the court table is seen yeah we have you know I've been and we want to write that in in just 1 implementation the telomeres at the last slide here we don't have a foreign key relationship between this column and this column you can be anything and so at the moment it's empty but in the call we find so far in the
relationship between the table and that table so that all the database and shows that really the sensible data and here so whenever we want to copy that data to that table we really need to make sure that we find the correct preferences 1st this we do 2 steps we have the reference findings that which writes them in here and then when they're in a drives the qualitative data to the call and copies and then you can see on just this information but keeps the reference information with the foreign key to that drinks take and you also see in the last line is submitted this whiskey could not process the copy it in there we have to decide no application where we throw an exception then whether to write some more predictive information to the customer in some way properties it should not come to the core so our task as you will is how can we write the code that that's the steps how do we do that we
have several possibilities we have played SQL works fine
and if we want to start playing around with that that's always a good choice on just playing around with the database to be able but that's really sensible way of doing that we can do that in the course so sequel can be quite model which is closely resembles the sequel Archimedes and what we have here all the last stage this is the sequel Comunità that object which contains the information about the stage play before the we should update statement and we say what are the well use the warriors are that the strings reference column should the filled by a select and you want to select the idea is the table brings where the external coach of the court table equals to print the name of the all the states and let me just go back to show it to you again here we want to make sure that really that this idea gets into that problem and that for exactly where these this name of Prince matches here here's the external code of the call so therefore obtain this works fine that's a nice idea and I would say maybe would be the best for implementation we have a slightly in all in the back of all I had to that you might get different customers different models and we are thinking about what maybe it would be a good idea to look in the 0 and so there's a lot more flexible there we have here the the tables as objects and we have each row is an object and it's much nicer to implement stuff here we can look over the order we can create really that we can't really read that table with the part filters and update the table it works fine also but as we do here the single database access that might be not a good idea from a performance point of view when you really have to be customers but these other things the tools we have at hand at the moment so let's assume you know what you we use that statement and we're really happy and everything works fine you have great data customers have been awarded to scientists at the things now what this it's good or bad uh the customers had the the the park and she tells the summary about that so they are talking when they get in the liver and the story is quite excited because they say well that machine learning stuff for you right about that in the newspaper so we are thinking about all read we have some machine we we have the boilers and the 4 main tasks and you have some sense cells and then we we measure some stuff like temperature and pressure and then must be some way to to find out I mean proving and storing view is quite a long process will want to know in the beginning what will be the quality of all of you in the end could you help us with and all data scientists are quite happy with that's interesting new task and we just need to get the data to the system this can't be that complicated what looking at that statement here it might be we have to rewrite all that because that different references between the between the categories we know and she's not have sensors enough measurements all named
differently and to make that customer happy we would have to read write that complete statement so it would work but when we look into the future and maybe they're more interested more aggressive and more interesting business domains then we might have really lots of work to do so what would be the solution we thought more and
we said well we could describe these links in more abstract way we can say we have 1 business domain which is
the part 1st and the pub consists of categories with the trains with the all some of the was task they consist of the elements well that taller coat with reference to trends have some types from each of the database and what this some of these elements of special we looked at the reference finding task and we have seen they need special processing and it can be good if you just could have a way to determine that these special elements that we we can inherit from the elements and we also have an image alignment has a name which we seen this years we find and the chess and name on the database which while most cases to see is the name in capital letters but for this orders brings references you had seen this an additional fees this this name reference you remember in our reference findings that we want to fill that column in this stage so we asked the students are sub-classes and we say that belongs to the category so what
does that help us if we can do that I mean we also want that
somehow resembles the a sequence of the model also I mean quality the model also has some table that has some elements it has some types it has foreign keys but is the goal of the moment for a database description is not so much from an algorithmic processing of that stuff so therefore I will discuss that at the end we said it really makes sense to have this and wants to
hello now does that look like a musical alchemy also has a positive urine all and for example with the generic and Parsons and we have here in the sequel alchemy model we all this domain and that the business domain that is also described in the application so in both here we need to have that this system and what we wanted to do we wanted to factor all some knowledge of the so that this application doesn't need to know the business domain that you really can't set this year and the domain model and that we we can have specific task from the or for reference finding will have fewer 1 renderer which use that information to generate sequel statement and we can have 1 for other tasks
look like I will give you use your code sample for the top so we have here in the domain the category the elements from owning a package we have that problem which is the main and the top consists of 3 categories the drinks the visitors of the order so that's quite easy to write down I mean there's there's nothing more than you need and it's easily understandable the interesting stuff this year that reference which you see links directly to the drinks having that how does
that task-specific renderer looks like so this is Python code which at 1st checks for each category what other references in there and we can just check that with an instance and you over the references with few just 1 but there might be other categories where we have multiple once we're getting the stage and can't tables from this equality metadata on this interview we can find that by the name of the State stable we can find the reference card table and we should update statement so this up statement here is the update stage with what 1 these will use are constructed dynamically so because we cannot give you the key what arguments dynamically we constructed peaked at 1st and this is year the updated with the keyword will use and you see here that's the column which should be updated and this was a sequel alchemy cost statement hold talks with so repressed that I will show that the more and then we can execute so now let's switch during that you
can see that this really works whether prepared and the moment simplest let like database and I have prepared some scripts you know it's quite and is that right right
so at the moment we do not have to see polite databases you these tables command exist at all and what we want to do we want to create an we do that by calling on Python scripts for each database and these tables are there you can see here by the way level configuration file was he what's a database and what does all this is so now we need to get data so we conduct patterns we do this yes we import we do that 1st of the trans category and you see here that's the sequel statement right from that and it's here in the category so now we do that for the all antecedent in you now we want to the reference finding and 4 so before we find references the all of the princess need to get into the core otherwise exam so what do we do is we are calling the quote Scripture and reviewing the 1st for the drinks you see here that was the generated sequel statement and this is the core and you see that the screens graph column is and now for the most interesting you can see these as prior statements are issued and here you see that filter and also that all the scores the Suffolk completely on the screen but you can see that it's all in there so that works fine that's great but that was a step we were already 5 let's thank you know let's go back
and let's say that we have another domain you have to read and the primary with said has machines that has samples and its measurements and the story here you can see the growth the sentence reference genes and the measurements references the senses I would like to show that the more so but unfortunately the time is not sufficient for that but it's really as simple as we have seen here you just need to change the comfort Jason you create the database and the new tables will be there and you can import the differences we find this will be the so what does that mean when we say
that what does this domain model model helps us what is optimized for high throughput as we you see these as well as segments of the issues that can be processed directly on the database so we put all things in the database into the stage and now we have some processing of all domain knowledge and would generate a sequel statement and that things will be processed and that's that's really good fit for analytical models when I when thought about it the more you for the talk i found it might not that fit that willful transactional models for you have more complex and and relations but for analytical models this is really great and helps a lot when a compound that to sequel alchemy model which is also some kind of make a model and we see that as the glycan is focused on the database description the domain model in contrast can contain more information in all of our team we had also the task that we have from that have time time-dependent stuff so some strings of only a winnable at several days or maybe they were when they were last week but another above this week and we need to check his cross time dependencies also this can be done in the domain model also we can note that and can generate a sequel model all of the domain model so in that case we have what is an additional
bonus s we can use the domain model for much further stuff you can see here we can generate this equality model we can generate a sequel for all tasks we have to see if the local configuration but also what we do we generate documentation artifacts the whole filets is we tables and we can generate the more data that most of but that's just to show you some ideas so that you are able also to have some questions I will close you and that's what
I wanted to show you but any questions as is the single
orbitals in the no this is something we will look into a million I mean what it did you for that type of prepared the small the modification and also multiple providing that but I've seen this takes uh much stuff around to make that example some sensible and yet before making the open source me I would have to take also internally and our company and before going into that direction I just wanted to see whether this process at all and that so if have some questions to that I want to get some for about this which is interesting after the what questions I think you should