Merken

SQLAlchemy as the backbone of a Data Science company

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
really has that don't buy that half man from Briand who talked about how the user's group community in the company and then Minister and finding the questions answers thank you will come with the arm instead of having and I'm a senior software development we under probably and the leading provider of cloud-based predictive applications for the retail we develop machine learning algorithms to deliver the best decisions daily to our customer and these decisions could be for example setting optimal prices in an online store what are providing customer was replenishment orders so that 2 stores don't go out of step stock or don't have too much waste and fresh products and we have eliminated the sequel databases and MapReduce algorithms but control surprise how how far we can scale the relational data model with the right database and plea under we use ecologically in all stages of data science workflows and to handle billions of records to feed into all predictive algorithms this talk I will dive into Izumi beyond the moment object-relational mapping parts and concentrate on the dietary the core and then expression language as well as database immigration so before I start digging into the criterion I'm just some overview of a baby and simplified to build a of the and I want to show the the data flow in our dataset and company and the most important part of from more information from machine learning algorithms is historical data with features and the target In all cases master data like products location information as well as transactional data like sales and stock combined with additional features like promotional data and where the features are delivered daily by the customers to us a typical day starts with the customer sending as XML files with flat record-based sales information for the last thing we look we looked at data in our data warehouse um the data warehouse part an extra solution from which you node in memory column-oriented database system from Germany and the next start is once the data loading is done some reporting tools inter active data analysis from our data scientists start if the data for the days correct on machine learning algorithms to start to treat the data and generate predictions for the next thing the machine and the reasons most work 2 dimensional feature matrix of these features and the target most of the work in all workflows standard batch oriented way the state data sizing is not really big data in terms of terabyte of data but too large to fit into memory on a single machine In the end of the day and all the predictions are calculated the customer retrieves the order proposals proposals similarity and this presentation will be into 2 parts the 1st 100 give a brief overview of the Karachi and the 2nd 1 will show some uh usage patterns How we use the our company so why people like me all that because all of all the residents dependent data and all the data wrangling querying and processing instances sequel actually being used as snowflake schema and store the massive data intrinsic data of of our customers in the axis Solution database each table in the snowflake schema correspondence delivery category which are our customers can deliver roots API Dixon schema and parsing code is directly derived from the crew can table this definition all ETL processes using using it to information from the tables and relations to generate the booking will be on the fly features for the machine learning algorithms are also collected from automatic query builder based on the specific snowflake schema for the customer new features can dynamically be added to the delivery categories and I am on the fly away with data available statistical I can easily create Our according to was a bit around dynamically-generated REST API so this is all only possible because we have an abstract model of what just the amount of people actually
so let's start 1st isn't fictional overview profs equalizer this equalizer means that the affect the way of working with relational databases in Python it's about great represent common sequel statements was the quot and the expression language the diagram on the right side is the famous pancake diagram which shows the different layers of sequel alchemy and a button on there are was always at the DAPI compiled compatible interface for specific database the part of piety database API API specified in Step 249 and defines a common set of operations to represent different databases and provides an abstraction layer 2 different databases who different driving implementation the top reason to use it like you mean it starts structure called away from the underlying databases and its association and C. Cooper Julia larities actually supports common statements and types and and short sequel statement of generated efficiently and properly for each database type and when that without you having to think about it the sequel actually 1 way similar to many other object-oriented uh was that you might have thought of in other languages it it's focused on the domain model of an application leverages the unit of work that don't intend to maintain object state and also provides a high-level extraction extraction to work on top of the sequel expression language and enables the user to work in the morning you might agree this talk will concentrate on the sequel iconic on the expression language because it's much much slower and much directed to the actual database schema and allows the better performance if you want to read more about the sign decisions and more about the pancake grammatical like me I can only recommend to you and the chapter from might have the creative sequel can mean in the book the architecture of open source applications that's available online as a free download let's try the definition
of the progeny again was a code example so to gradually course path way of representing elements of both equal commands and data structure called sequel expressive language we can define database schemas types relations of the table in the pure Python syntax the same applies to data manipulation like inserting updating indeed statements it occurring select statements the can actually connection can connect to lots of different databases school database-specific privacy threats from single-file databases like collide to massively parallel in which know databases like XM solution in our case biomass and redshift of course you can also use the well-known open source database system I. sequel prosperous and they support in rural the code and the bottom shows the definition of a new meaning most stable I defined different columns with different properties and primary keys and foreign keys each column has collection type and then at the bottom is using the credit to calculate a location for certain product which you already can see here is how good decrease text fits into Python code you can for example G a dynamically generated list of columns and Python code and feed it into the query if you want
to go over so that is the 1st thing you have to do is to create a database schema see criteria provides the metadata object which is used together with the database structure so it can be quickly accessed inside the large you can think of metadata as a kind of a catalog of Table objects with optional information about the engine and the connection table objects are initialized in this sequel iconic or with the metadata object by calling the table construction was the the name of the table and the made about object any additional argument assumed to become objects and column objects each reprend represent a field in the table columns defined the fields that exists in our tables and they provide the primary means by which we define other constraints from the keyword arguments different types of problems have different primary arguments for example string types can have column length as the primary argument when numbers have friction component that will show how the precision and the length of most types of arguments Signal
defines a large number of generic types that are abstracted away from the actual secret types supported by each database for example the William generic type usually used uses simply equal time and on the price of side deals with true and false however it uses a small and on the back end database if it doesn't support a billion times and you as a developer don't have to deal with that and your Python code you can just use each 1 falls the same applies to date and date and time when available sequel alchemy user corresponding types in the database and creates Python data daytime objects in the results then specific types are useful and uh available and in the same way as a sequel than that's however there are only available available with specific again for example you can see here the powerful Jason field that was introduced in prosperous if that's your thing and it's just available from the dialect implementation and you can use it in this anomalous ecologically growing syntax you can use secret
expression language for why the bicycle actually quantities of data and the tables to do this you just have to call the inside method of table object trick create and ends at the statement and then add the values the keyword arguments for each call you want once supplied with the use of the various will replace the column name a sequel statement which is how article I can be represent parameters displayed the other string functions meant as I used to have to ensure that our data has been probably escaped which is which is stable disable security issues such as people injection we can also think of multiple records at once by using a list of dictionary with the data is going to submit example at the bottom shows how to build a delete statement sequel Archimedes heavily relies on the pattern of the pattern of massive change additional clauses or restrictions to the statement all this is done directly in syntax and integrates nicely with the Python language to the creatine can use
system the Select function which is similar and I let G to the standard sequel select statement as you can see in the statement in the example code this very easy to build a career different steps and use the full genomic power of pricing and to generate the queries on the fly with dynamic values additional trying conditions of specialized restriction the psychological language supports most of the the features you can use instead and planned sequel school prior different joints like they're left trying right trying out depending on the database and the driver implementation and also supports more sophisticate statement like window functions the return value of query the a result of which is so around DB API object the main goal is to make it easier to use and manipulate the results of a statement that allows you to access the rows using the index named a column object or the name of of the field if you want to see the columns that are available in result you can use to Q messages on the result set to get a list of common introduction you should use a white using fetch all because it loads all the data from the results of into memory the results as proxy implements the iterator protocol so can easily walk over these results 1 by 1 mostly in the rows into another function In addition to the fetch methods you can use the 1st method which returns just the 1st record of this of statement encloses the connection after wrote it can use to fetch 1 which which returns 1 role and leave to cover open for additional to make additional effect should the skyline messages and trans a single value from a career was a single record in 1 column
and because of the and we rather difficult snowflake schema so we have implemented a below which allows data scientists to just list the features they want to use in the models and the credibility test the does the rest it does the right giants and applies the right restrictions providing an abstraction to our database schema of metadata scientists could just specified future columns you wanted do using this model provided a real productivity boost in terms of developer productivity Arsenal flake because of Snowflake Schema is really more complex the mind also includes temporal temporal tables which a little bit more difficult to join and always lead 2 bucks in our code so we could build this screwed the because we have been able to represent structured definition of the database schema holds true this equal like animated data object and this really provided much benefit to be fully able to describe all different delivery categories and tables we had to add some more metadata to the schema and table and column definitions but this was possible was OJ much hassle so this is just a
rather simplified version of crew below for starts asking act and as a set of implementations a little bit more complex but it uses the same principles and in principle you just specify the columns of names that you want to create and then you move over the tables in your metadata descriptions and take the fields and tables you the user has selected after that just feel that they did not include John expressions and with the tables you need in the query and at all the columns the user has to choose when
great feature of sequel item means is that you can re-use parts of of selectable other queries so you can build common building blocks and reuse them in other queries selectable behave the same as so that they can use the same way in other statements smaller tables and this is really a great future and to compose larger sequel statements through redefining the building blocks of queries and composing them together they also support the same metadata as moment table of definitions so that you can example query all the fields all the types and all this is inferred from the the select statement to defined earlier in the match
statement is the locals in our internal extra and transform processes to get the customer data into a normalized form into a snowflake schema of the data warehouse our customers and thus they recognize basics files for every table delivery categories which are defined in snowflake schema a 1st step is the validation process the syntactical arrows and then start simple data upload into the staging area of the database each table in the snowflake schema has a corresponding table in the staging area with additional meta-information after all the data has been loaded into a staging area we ran a couple of sequel statements to the reference foreign keys and check the data validity all records which passed the validation are marked with a status flag once the data checking this than we used in that statement to update existing records in the court tables or in the new ones and if they do not exist yet the match of relations is a deviant internal highly efficient copy update operations and allows us to update 40 million stock records in a stock table was more than 40 billion records in less than 20 minutes and don't
much statement is not supported by all databases is also known as an absurd and recently landed and post post press and but it's supported by rather big databases like Oracle axis solution all the mass is clear as you can see here this is a simple definition of how you provide the method station and you build modes to match that and you always have a wide source and the target table 1 and then a condition on how to combine the 2 state to 2 tables together together was an update and insert statement that to tell this much statement by what to do in case the record already exists in the target table and how to insert it if it does not exist development statement is part of this equalization inaccessible dialects and the just
generates from plain sequel in the back and that's how quality always works you define the tree or a statement in plain Python code and then it generates sequel called on the bait inside this is the same statement as before but in my opinion it's much easier and to to build this kind of curious in pure Python syntax and they're always doing string manipulation to build this in and as a sequence lexical
surprisingly allows you to the functional test against a database of of our crews and connections most of our applications consist pose of you know different unit because I'm doing integration tests against an actual database it's important but it's often very costly feature so common pattern is to use the data access layout which acts as a proxy to the database queries so you can easily easily replaced the paid database success and unit tests if you just want to test the application logic on the other hand an installation of database cruise helps of behind the data access layer have so much in testing only testing the against the real database was on the rest of your application because of
testing against the real data databases rather on others rather time consuming so we do testing against an in memory the collide database in cases we just want to test features that don't use a database specific features so that the common had an interesting elements equal can if you want to reuse the fixers between 2 test once you can also use a set of method instead of a set of talent available for the whole test cases in cases where we
rely on database of specific and like like the merger statement we have set up and fixed cells in the in an actual database and the use of transactional that mechanism in our tests done to test the queries all code that needs to to modify the database is trying to interact into a transaction and short rolled back once test has passed testing all called looking up statements takes up to an hour which is really a long and there are some it but it's not the other possible another way because this is a feature only the actual solution to this and then and then another
nice feature of CQI k-means database reflection introspection as I showed you showed you earlier on you can define this schema of the database and in Python syntax the normative table create statements but you could also reflect the whole supply communitaire model from the database another way to introspective database on an island databases inspect module and sequel for example you can list all the table names or see how to get the foreign keys for specific so I think
most of you know the the little will tables that comic so 1 thing we learned is that you always you should never trust user input and always use to the means to generate expressions I don't know if you can see the code just never to string interpolation of his user provided the text all input even in all cases the everything that we mostly about this data the data analysts in in-house data analysis to the uh which are not interested in doing harm to be learned that even in this case is it could be possible to do harmonize production databases we have been doing any so sort certification with a security company and these are these guys are much more clever than you think that try to in the data and 1 of automated databases which was using the Internet of Services to build queries so never trust the user and the grows thing for matchings always uses the glycan expression language and the parliament of binding escaping will be handled for you in security if you
build a large applications that involve over time and often forgotten aspect is and that was unloading the application the persistence layer is needed to be and to be updated and changed you might need to modify the structure of the data for example you have to add new fields or remove existing ones on demand constrained to 1 of your columns all you need to update the data inside inside your tables because it is so slow to change Alembic is a great tool to help users database and it adds an identifier to database schema and provides a convenient way to define updates and downgrades in Python scripts you can run Alembic against a database and the check the current version of this scheme and apply a series of updates to bring it up to the current rows versions once you get used to this you will never want to go back
because they are heavy use of sequel actually intermediately and we have developed some tools and I all and there's learning development and that's the truth so much % you the 1st thing is that the progeny axis so dialect as a set area earlier it's a solution is an in-memory column-oriented Relational Database system use it in classifiers from 8 to 16 nodes and it's really powerful databases so we developed a sequel dialect for the axis solution database and open source of and could have if you want to try it out and do not have an excess and access and axis solution database you can get appropriate carried out for free the axis along which is is a single node instance another
thing that we are working on this what it the which is said to be a 2 . 0 compliant ODBC interface so that the moment we are using unique so that see but we a rather unhappy was performance so that's why we started the probity received and why should we use it because it's faster and we also planning direct export into number arrays which will speed up complications and the lower the memory requirements another thing
a a colleague of mean with coreness working on is to get them to use to work toward a b c to get database results of number I erase or Apache a without an intermediate trans transformation Apache our own is an in-memory data structure specification is used by engineers including data systems that's mostly used independence and our community and is the data and use of a language agnostic from a column memory layout permitting 0 1 a random access and the layout is high highly catch catch efficient analytic workloads developers can create fast algorithm was processes that are most data structures and it's also possible to efficient efficient and fast to data into exchange between systems without the serialization process associated with other systems like thrift of or particle buffers if you want to learn more about the ongoing default to get Apache arrow into Python and some of them out of the database I just can recommend you this software engineering daily podcast school with all his talk applied it up there is conference
so that's the last slide I just some links and implementation my slides will be available and so if you want to learn more about the as a textual psychologically I can just recommend you would be open source applications for all of course the essential sequel like in the book it's available in the 2nd edition and also sequel IQ me project page provides lots of great documentation so thank you and any questions rule we have defined from the more than 5 minutes for questions this way for the microphone so that the period when there are recorded you talked about the updates of that that the baseline and then he if a word in role that and they want to update the the scroll would warrant the name of the the much statement commits the In this and he's is protected by an effective model decorum machines this is making that they did the same time it's protection but transaction here but there is pocket thinking the I wanted to ask if you know uh and measured at their the performance so the penalty and employed by using the and leaner and not just the the court or in other words what if you use the full state of skillful community and not just the core would it be much slower so this really depends on the type of the Korean so normally in the war and you have lazy loading and if you access nested data structures from other tables then this the and will always do another query for each field so that's will be much slower than just generating 1 from 1 to select statement and just getting the results that said so In its for data science of analytical workloads the or em is not an option because of the data access patterns of that it's implemented so it's much more of users in our regret applications and then that's fine but for getting much data out of the database of the Xs that much that it's about and in general over and there is a prefetch related incident related which literally the foreign keys is there anything similar regarding the question of human and you can define the foreign keys and then if you join 2 tables and therefore foreign key relation psychological will automatically choose the right to join condition if you a view it it will already bring the the foreign keys together challenges will query would have everything yes but it is also the true that it was a rather simplified version and and so on all of them join conditions a little bit more complex and they can't be automatically expressed in anything like this so they have to put some more logic into the Caribbean look no OK so I see that you're using method declaration right so what is a very specific reason for not using the decorative based approach now I'm not using the method of them oppose using the or em on by fusing the mated metadata so as much using the map this is part of the way Amendment of the 2 things from any more questions no more and thank you for your attention Thank you
Matrizenrechnung
Bit
Momentenproblem
Datenanalyse
Formale Sprache
Gruppenkeim
Fortsetzung <Mathematik>
Kartesische Koordinaten
Service provider
Computeranimation
Arithmetischer Ausdruck
Algorithmus
Prognoseverfahren
Mustersprache
Objektrelationale Abbildung
Mixed Reality
Wurzel <Mathematik>
Metropolitan area network
Umwandlungsenthalpie
Addition
Softwareentwickler
Multifunktion
Kontrolltheorie
Krümmung
Kategorie <Mathematik>
Datenhaltung
REST <Informatik>
Abfrage
Ruhmasse
Ähnlichkeitsgeometrie
Biprodukt
Entscheidungstheorie
Konzentrizität
Software
Transaktionsverwaltung
Generator <Informatik>
Funktion <Mathematik>
Rechter Winkel
Festspeicher
Ein-Ausgabe
ATM
Information
URL
Datenfluss
Ordnung <Mathematik>
Aggregatzustand
Tabelle <Informatik>
Instantiierung
Standardabweichung
Multiplikation
Data-Warehouse-Konzept
Stapelverarbeitung
Kombinatorische Gruppentheorie
Term
Code
Virtuelle Maschine
Datensatz
Syntaktische Analyse
Softwareentwickler
Speicher <Informatik>
Relativitätstheorie
Datenmodell
Physikalisches System
Elektronische Publikation
Cloud Computing
Datenfluss
Mereologie
Notebook-Computer
Speicherabzug
Stapelverarbeitung
Verkehrsinformation
Formale Sprache
Befehl <Informatik>
Fortsetzung <Mathematik>
Kartesische Koordinaten
Element <Mathematik>
Computeranimation
Metropolitan area network
Arithmetischer Ausdruck
Einheit <Mathematik>
Vorzeichen <Mathematik>
Minimum
Speicherabzug
Schnittstelle
Umwandlungsenthalpie
Nichtlinearer Operator
Befehl <Informatik>
Kategorie <Mathematik>
Datenhaltung
Abstraktionsebene
Güte der Anpassung
Abfrage
Bildschirmsymbol
Biprodukt
Entscheidungstheorie
Softwarewartung
Arithmetisches Mittel
Menge
URL
Schlüsselverwaltung
Aggregatzustand
Tabelle <Informatik>
Relationale Datenbank
Subtraktion
Stoß
Regulärer Ausdruck
Implementierung
Code
Domain-Name
Ganze Zahl
Datentyp
Biprodukt
Datenstruktur
Assoziativgesetz
Einfach zusammenhängender Raum
Relationale Datenbank
Datenmissbrauch
Open Source
Relativitätstheorie
Datenmodell
Mailing-Liste
Physikalisches System
Objekt <Kategorie>
Diagramm
Differenzkern
Mereologie
Binäre Relation
Computerarchitektur
Resultante
Nebenbedingung
Subtraktion
Reibungskraft
Implementierung
Zahlenbereich
Fortsetzung <Mathematik>
Online-Katalog
Extrempunkt
Code
Computeranimation
Metropolitan area network
Metadaten
Standardabweichung
Front-End <Software>
Datentyp
Zusammenhängender Graph
Datenstruktur
Softwareentwickler
Leistung <Physik>
Einfach zusammenhängender Raum
Data Encryption Standard
Konstruktor <Informatik>
Parametersystem
Multifunktion
Dicke
Datenhaltung
Dialekt
Konfiguration <Informatik>
Arithmetisches Mittel
Objekt <Kategorie>
Generizität
Datenfeld
Information
Tabelle <Informatik>
Zeichenkette
Resultante
Proxy Server
Subtraktion
Stabilitätstheorie <Logik>
Formale Sprache
Mathematisierung
Implementierung
Iteration
Fortsetzung <Mathematik>
Code
Computeranimation
Metropolitan area network
Arithmetischer Ausdruck
Datensatz
Trennschärfe <Statistik>
Minimum
Mustersprache
Leistung <Physik>
Soundverarbeitung
Einfach zusammenhängender Raum
Parametersystem
Lineares Funktional
Addition
Befehl <Informatik>
Protokoll <Datenverarbeitungssystem>
Diskretes System
Datenhaltung
Computersicherheit
Abfrage
Ruhmasse
Systemaufruf
Mailing-Liste
Physikalisches System
Data Dictionary
Objekt <Kategorie>
Datenfeld
Druckertreiber
Fensterfunktion
Automatische Indexierung
Last
Festspeicher
Konditionszahl
Injektivität
Message-Passing
Standardabweichung
Tabelle <Informatik>
Zeichenkette
Subtraktion
Bit
Versionsverwaltung
Implementierung
Term
Code
Computeranimation
Metadaten
Deskriptive Statistik
Arithmetischer Ausdruck
Reelle Zahl
Biprodukt
Softwareentwickler
Inklusion <Mathematik>
Implementierung
Softwaretest
Kategorie <Mathematik>
Datenhaltung
Abstraktionsebene
Datenmodell
Datenmodell
Biprodukt
Objekt <Kategorie>
Datenfeld
Menge
Tabelle <Informatik>
Prozess <Physik>
Momentenproblem
Data-Warehouse-Konzept
Befehl <Informatik>
Fortsetzung <Mathematik>
Computeranimation
Metropolitan area network
Metadaten
Datensatz
Normalform
Fahne <Mathematik>
Trennschärfe <Statistik>
Datentyp
Speicherabzug
Zeitrichtung
Flächeninhalt
Tabelle <Informatik>
Inklusion <Mathematik>
Addition
Nichtlinearer Operator
Befehl <Informatik>
Schlüsselverwaltung
Matching <Graphentheorie>
Kategorie <Mathematik>
Datenhaltung
Relativitätstheorie
Gebäude <Mathematik>
Stellenring
Validität
Abfrage
p-Block
Elektronische Publikation
Menge
Lemma <Logik>
Datenfeld
Flächeninhalt
Mereologie
Schlüsselverwaltung
Tabelle <Informatik>
ATM
Folge <Mathematik>
Befehl <Informatik>
Datenhaltung
Ruhmasse
Befehl <Informatik>
Fortsetzung <Mathematik>
Quellcode
Extrempunkt
Code
Dialekt
Computeranimation
Netzwerktopologie
Metropolitan area network
Datensatz
Differenzkern
Existenzsatz
Konditionszahl
Arbeitsplatzcomputer
Mereologie
Biprodukt
Softwareentwickler
Zeichenkette
Aggregatzustand
Tabelle <Informatik>
Proxy Server
Subtraktion
Komponententest
Stoß
Kartesische Koordinaten
Element <Mathematik>
Mathematische Logik
ROM <Informatik>
Computeranimation
Metropolitan area network
Einheit <Mathematik>
Softwaretest
Reelle Zahl
Front-End <Software>
Mustersprache
Einfach zusammenhängender Raum
Softwaretest
Umwandlungsenthalpie
Datenhaltung
Objektklasse
Integral
Portscanner
Uniforme Struktur
Menge
Festspeicher
Softwaretest
Umwandlungsenthalpie
Automatische Indexierung
Kraftfahrzeugmechatroniker
Befehl <Informatik>
Spiegelung <Mathematik>
Schlüsselverwaltung
Datenhaltung
Cursor
Datenmodell
Einfach zusammenhängender Raum
Abfrage
Zellularer Automat
Fortsetzung <Mathematik>
Objektklasse
Modul
Code
Computeranimation
Portscanner
Transaktionsverwaltung
Softwaretest
Zurücksetzung <Transaktion>
Normalvektor
Normalvektor
Tabelle <Informatik>
Harmonische Analyse
Datenanalyse
Formale Sprache
Versionsverwaltung
Kartesische Koordinaten
Extrempunkt
Code
Computeranimation
Eins
Metropolitan area network
Arithmetischer Ausdruck
Datensatz
Skript <Programm>
Gravitationsgesetz
Datenstruktur
Hilfesystem
Tabelle <Informatik>
Digitales Zertifikat
Matching <Graphentheorie>
Datenhaltung
Computersicherheit
Reihe
Abfrage
Strömungsrichtung
Migration <Informatik>
Ein-Ausgabe
Biprodukt
Systemaufruf
Quick-Sort
Arithmetisches Mittel
Interpolation
Identifizierbarkeit
Tabelle <Informatik>
Schnittstelle
Momentenproblem
Zahlenbereich
Datenmanagement
Fortsetzung <Mathematik>
Kartesische Koordinaten
Computeranimation
Richtung
Physikalisches System
Knotenmenge
Freeware
Softwaretest
Modul <Datentyp>
Softwareentwickler
Array <Informatik>
Schnittstelle
Gerichtete Menge
Datenhaltung
Open Source
Einfache Genauigkeit
Physikalisches System
Menge
Dialekt
Flächeninhalt
Festspeicher
Instantiierung
Resultante
Bit
Prozess <Physik>
Kartesische Koordinaten
Fortsetzung <Mathematik>
Inzidenzalgebra
Computeranimation
Homepage
Metadaten
Algorithmus
Total <Mathematik>
Mustersprache
Default
Umwandlungsenthalpie
Befehl <Informatik>
Sichtenkonzept
Datenhaltung
Abfrage
Frequenz
Arithmetisches Mittel
Rechenschieber
Transaktionsverwaltung
Datenfeld
Wahlfreier Zugriff
Rechter Winkel
Festspeicher
Konditionszahl
Deklarative Programmiersprache
Projektive Ebene
Schlüsselverwaltung
Software Engineering
Aggregatzustand
Tabelle <Informatik>
Zahlenbereich
Implementierung
Transformation <Mathematik>
Analytische Menge
Mathematische Logik
Virtuelle Maschine
Puffer <Netzplantechnik>
Datentyp
Zeitrichtung
Softwareentwickler
Datenstruktur
Soundverarbeitung
Transformation <Mathematik>
Stochastische Abhängigkeit
Open Source
Relativitätstheorie
Datenmodell
Schlussregel
Physikalisches System
Binder <Informatik>
Mapping <Computergraphik>
Beanspruchung
Array <Informatik>
Mereologie
Serielle Schnittstelle
Speicherabzug
Wort <Informatik>
Partikelsystem

Metadaten

Formale Metadaten

Titel SQLAlchemy as the backbone of a Data Science company
Serientitel EuroPython 2016
Teil 126
Anzahl der Teile 169
Autor Hoffmann, Peter
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21205
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Peter Hoffmann - SQLAlchemy as the backbone of a Data Science company In times of NoSQL databases and Map Reduce Algorithms it's surprising how far you can scale the relational data model. At [Blue Yonder] we use SQLAlchemy in all stages of our data science workflows and handle tenth of billions of records to feed our predictive algorithms. This talk will dive into SQLAlchemy beyond the Object Relational Mapping (ORM) parts and conentrate on the SQLAlchemy Core API, the Expression Language and Database Migrations with Alembic. ----- This talk will dive into SQLAlchemy beyond the Object Relational Mapping (ORM) parts and conentrate on the SQLAlchemy Core API and the Expression Language: - **Database Abstraction**: Statements are generated properly for different database vendor and type without you having to think about it. - **Security**: Database input is escaped and sanitized prior to beeing commited to the database. This prevents against common SQL injection attacks. - **Composability and Reuse**: Common building blocks of queries are expressed as SQLAlchemy selectables and can be reuesd in other queries. - **Testability**: SQLAlchemy allows you to perform functional tests against a database or mock out queries and connections. - **Reflection**: Reflection is a technique that allows you to generate a SQLAlchemy repesentation from an existing database. You can reflect tables, views, indexes, and foreign keys. As a result of the usage of SQLAlchemy in Blue Yonder, we have implemented and open sourced a SQLAlchemy dialect for the in memory, column-oriented database system [EXASolution]

Zugehöriges Material

Ähnliche Filme

Loading...