We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Pydantic Configuration Management With ConfZ

00:00

Formal Metadata

Title
Pydantic Configuration Management With ConfZ
Title of Series
Number of Parts
9
Author
Contributors
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
"Config management in larger applications can become quite complex: Information needs to be loaded from different sources like environment variables, config files and command line arguments, the loaded data needs to be validated to ensure all expected information is present and in the correct format and then distributed to different locations in the codebase. This holds true especially in data science projects, having rich model and training configurations. To simplify this process, we developed and open-sourced ConfZ, a config management library for Python based on pydantic. It easily allows to load config values from heterogeneous sources, validates them and makes them accessible as Python dataclass-like objects with full IDE support. It furthermore supports in common use cases like having multiple environments, lazy loading and unit testing. Within two months, we already reached more than 100 stars on GitHub. In this talk, we show how ConfZ is used and how it compares to other config management solutions. We then dig into the pythonic details and see how meta classes drive the internals of the library." About the speaker: Silvan is a data scientist who really likes to get his hands on the full stack from cloud data pipeline setup over ML processing to frontend web development and data display. Since 2019, he works for Zühlke, where he focuses on computer vision and medical data science applications.
Conic sectionData managementDistribution (mathematics)ParsingMultiplication signSource codeParameter (computer programming)Information managementLatent heatLibrary (computing)MereologyPoint (geometry)Point cloudProjective planeKey (cryptography)Component-based software engineeringInformation Technology Infrastructure LibrarySoftwareWave packetIntegrated development environmentComputer fileEndliche ModelltheorieComputer programmingDifferent (Kate Ryan album)Line (geometry)Variable (mathematics)Scripting languageMaizeCollaborationismGoodness of fitPosition operatorState of matterStructural loadArc (geometry)Revision controlValidity (statistics)
Library (computing)Information managementSource codeIntegrated development environmentStructural loadMountain passFunction (mathematics)Cache (computing)Local ringInstance (computer science)Line (geometry)Logical constantLibrary (computing)MultiplicationPoint cloudSoftware testingAsynchronous Transfer ModeSocial classIntegrated development environmentComplexity classValidity (statistics)Logical constantBitKey (cryptography)Extension (kinesiology)Cache (computing)Information Technology Infrastructure LibraryOpen sourceVariable (mathematics)Source codeType theoryStandard deviationSet (mathematics)Data storage deviceConfidence intervalMaizeLine (geometry)Keyboard shortcutParameter (computer programming)State of matterIdentity managementComputer animation
Conic sectionGame theoryComplexity classType theoryProgrammer (hardware)Cartesian coordinate systemStandard deviationPasswordStructural loadData conversionComputational complexity theoryPhysical systemDatabaseComputer fileState of matterVariable (mathematics)Source codeBitCore dumpVideo game consoleString (computer science)Letterpress printingData loggerJSON
Integrated development environmentSource codeConic sectionSoftware testingSoftware testingSource codeIntegrated development environmentLine (geometry)Complexity classEndliche ModelltheorieCASE <Informatik>Context awarenessNumberComputer fileMathematicsData managementDifferent (Kate Ryan album)Data conversionScripting languageReal numberVariable (mathematics)1 (number)Electronic mailing listMultilaterationSingle-precision floating-point formatUnit testingLevel (video gaming)BitFunctional (mathematics)Power (physics)Serial portLocal ringSpeicheradresseMultiplicationIdentity managementOperator (mathematics)Function (mathematics)Combinational logicDatabaseParameter (computer programming)State of matterSet (mathematics)Cartesian coordinate systemHypercubeLecture/Conference
Meta elementSocial classGamma functionMetaprogrammierungInstance (computer science)SummierbarkeitComplexity classMeta elementComplexity classType theoryDescriptive statisticsComputer programmingElectronic mailing listImplementationInstance (computer science)Arithmetic meanFacebookInfinityLoop (music)Pattern languageCASE <Informatik>Object (grammar)Social classProjective planeFunctional (mathematics)Rule of inferenceScripting languageForcing (mathematics)Interpreter (computing)Matrix (mathematics)MetaprogrammierungMultiplication signWebsiteVector potentialSystem callBitMetreRevision controlProgram flowchart
Complexity classMeta elementInstance (computer science)Attribute grammarSocial classLibrary (computing)Object-relational mappingComplexity classInstance (computer science)Attribute grammarLine (geometry)CodeMeta elementSystem callFunctional (mathematics)Arithmetic meanContext awarenessTrailBuffer overflowVariable (mathematics)Type theoryTheory of relativityObject (grammar)Library (computing)Multiplication signCASE <Informatik>Endliche ModelltheorieObject-relational mappingDatabaseTable (information)WordData managementForcing (mathematics)MetreInterpreter (computing)Computer programmingSocial classStack (abstract data type)Function (mathematics)JSON
Term (mathematics)Meta elementSocial classWindowColor managementMassGamma functionComplexity classComplexity classMeta elementMathematicsConfidence intervalInstance (computer science)Context awarenessMatching (graph theory)RepetitionIntegrated development environmentIdentity managementParameter (computer programming)Data conversionVariable (mathematics)Endliche ModelltheorieBootingComputer fileSystem callFunctional (mathematics)Type theoryPattern languagePatch (Unix)Logical constantMereologyStructural loadSource codeRevision controlSource codeJSON
XMLUML
Transcript: English(auto-generated)
next one is also something that I suffer from like I have I am very biased so every time I have to do configuration I have to do I usually opt for doing like text protobufs so I really like to see a good configuration management
engine thank you so I have the awkward position to be between you and your launch I'm sorry for that and this talk is on configuration management so my name is Sylvan I'm a data scientist at Silke and as data scientists we have a lot of challenge in the collaboration management but this configuration management so basically when you write the program you get
configuration from different sources like we have seen before that you have paths hardcoded in the script that's what you for sure do not want and that's why we get this configuration from outside this can be configuration files it can be environment variables especially for secrets can be command line arguments as you've seen before so all of this is like configuration who tells our program how to behave in
basically every program we write the same few steps so we somehow load this config we parse it we validate it we persist it and then we distribute it to our different software component I mean usually that's not a big deal I've seen before or parse three to four lines perfect you're fine but as the
software grows and especially in data science project configs tend to get really large I mean a lot with the model config training config maybe have some cloud connectors maybe have even like vaults like secret vaults for your keys somewhere in the cloud and so on so this can become quite complex and at some point it's actually worth it to invent a library and that's
what we actually did I mean there are again a lot of libraries already for configuration management when you also evaluated them but we did not like find the sweet stuff it was like they're all something missing like either they had only very specific sources for example you have seen with our parts it's really just meant for arguments I mean it's not does not want to solve
all these problems you have seen here or will evade validation mode to environment ID support so there was always something missing that's why we thought okay there are 10 competing standards let's invite an 11th and that's what we did and we wrote concept so concept is a configuration management library we developed at Zilke and it's open source so it's under
the MIT license anyone can use it it's we developed it like a year ago and continuously improve it yeah it's on GitHub you see the world has not like waited for this library but it's also not completely unpopular and maybe if everyone gives a star it's already twice as large I mean and it says of course documentation and read the docs and you can install it with peep or
also with corner golf of course so what's it about and we will first go into like the basic principles and then also look a bit under the hood because I think it's quite interesting so some of you might know pedantic for those who don't it's basically data classes but with validation and rich types and
pedantic we received afterwards you can define a data class but you have way more types than the standard types you know from Python and you get validation so you cannot just put any data in these data classes they have to be correct basically concept is just a layer around pedantic so it allows to load the data from the sources that's what pedantic cannot do and then it passes it to pedantic and pedantic then does all the validation
and persisting on its own and we don't do a caching you see this later with lazy loading how this works and it allows for a heterogeneous set of sources we see this afterwards like if environment variables command-line arguments constants and so on and it's easily extendable as I said if you have keys in the cloud somewhere on Azure Key Vault just write an extension
and you can load them from them and if like special support for multiple environments and testing because we have seen how important testing is before so instead of going like into a programmatic approach and how everything is set up we just look at how can you use it by examples see how it behaves and then we look a bit under the hood and see why it can
behave the way it does but that's the basic usage and that's I guess whenever you want to use composite you just need to know this in theory so we have here an example we have a database configuration it consists of a user and the password so we have here it's from Confset so that's what you write you define how a config looks like because I mean you as a programmer of
course knows how your config looks like and you can define you can use standard Python types like string or also a bit more special types like the seeker string from pydantic it's basically a string but it will not just print so if you on accident like dump your dbconfig onto the console or it's even so log file it will not be there in plain sight so it's a bit
hidden and which is a good thing here and then if the API config for example here in this example we code an API and we say well our API it runs at some host it has some port and it connects to some database and the database is configured by the database config so far this was actually plain pydantic so it's not special but now we add this class variable config sources here
where we define where this config is stored this is just one example we see any more examples later where we say well all this config is actually stored in this yaml file here at path.config and that's actually all you need from now on whenever you need some of these config variables wherever
you are in your system you can just like import this config and just directly instantiate and load it and access the values and under the hood when you access one of these values it gets loaded from the file and later on it will be persisted so it only loads it once of course it also loads it only at the first access it will not load it already at the import because that's
bad practice then and actually that's all you need to do to use confset in a like very basic sense. As I said it has some caching and lazy loading so when you instantiate API config for example twice here we have the is operator which checks for identity in the sense of if it's really
at the same memory location not just if the values are correct and that's then true in this case and also if you try to overwrite it it will fail because the config should be immutable in this case and you can use the full power of pydantic so whatever you want to do with this config maybe drop it as a JSON somewhere whatever you can use all the methods pydantic provides for
example here the JSON serialization. As I said before we have for certain scenarios a bit more functionalities for example multiple environments maybe we have like our local dev environment we have a staging environment we have a production environment we can have a lot of environments and we might for example have different config files for these
different environments and that's no problem in confset the only thing you need to do is instead of defining a file you just define a folder and then tell pydantic the source of where it should know what environment to use. So this example it will read the environment variable environment and then look at the value here and then read the config file which is in the folder with the name of the environment variable. So that's all you need and
now we went from a single environment setup to multi-environment setup. We've already seen that this config sources is quite flexible it actually is it can also be a list and then if you go through the different sources and later sources can overwrite earlier ones. So from
example here we also have an environment source so we say well you can use environment variables you can use all of them so allow all is true and instead of using the real environment variables you can also for example use a .env file if you again like files and if it's not here then it really uses the real environment variables and you can also say well afterwards look at the command line arguments look at all the command line
arguments which start with conf and if there is some of these defined then overwrite this and if these two line of codes we can actually define our application with environment variables but then overwrite them with command line arguments. So for example if you're on AWS SageMaker and you do some hyper parameter testing you will get the hyper parameters as command line
arguments and then the only thing you need to add is this one and then it will be correctly verified and parsed. We have so far always seen this global config and this is not always a thing you want to have especially if the application gets very large it has this singleton thing which is not always very nice. So what you can also do instead of having this
config sources class variable you can just define standard configs like here. So this is now really like Pydantic but instead of inheriting from Pydantic paste settings you inherit from conf set and then by instantiating this class you can provide it with config sources and this is now it has the same power as
before as the class variable config sources but it's now a key line a command line argument no sorry a class an instantiation variable yes and it can also like combine this with traditional keyword arguments here like you can even just use it as if it would just be a data class or a Pydantic base model so you can do all the combinations here. Now we have some
configuration and we have this maybe like a local small database locally but for testing very often we want to override certain configuration for testing and that's also something which is supported so we have here an example if our configuration which does consists of a number which is usually
read from a file it's maybe not the most interesting configuration in this example and then you can just read this number and it will be printed I mean this is now a small script here as an example but now if you go to testing you might want to override this number that's actually possible so you can just define new sources so we say well we now have a data source so
the source of the configuration is a constant which is in this dictionary defined here and then every conf set config class allows to override the config with a context manager as you can just call my config change config sources and give it a new source and then within this context it will load the new source and use the new source and then it comes to here we have 42 and once you leave the Python context and you get back to the old
variable and have the number from the config file again so this is very helpful if you need for testing to override certain stuff and if the context manager you make sure that you do this very locally and do not affect other unit tests for example and so on yes this was very fast and we
now go under the hood we have already heard a teaser of meta classes and that we should not use them in Python actually we had to use them in this project but what is a Python meta class and to me that's very new and I thought it would be maybe interesting to show this here so what is meta
programming first so meta program is a bit like a concept like it sounds very like general AI and whatever it's like the potential for a program to have knowledge of or manipulate itself so it sounds very very advanced but actually meta classes are everywhere in Python so they're always there you
just do not see them and according to Tim Peters one of your original C Python implementers you actually do not need to care about them in most cases so he writes here that like 99% of a user should not care about it and if you ever wonder if you need them you don't and only if you really know that you need them then you will see so this keep list in mind you will
probably not date them in your daily work but I think understanding them really helps to understand internals of Python so it really helps to understand what is actually going on when I write the Python program and I mean there's some intuitive description and we will see this afterwards so a meta class
is to a class the same as a class is to an instance to this mind what does this mean well you might have heard that in Python everything is an object and here we have an example of what this means if you define some class foo this is now completely detached from conflict you see afterwards how this glues together we define some class foo and then we
create an instance X and then we look at what's the type of X well no surprise the type of X is the class foo then we might ask ourselves okay and what is the type of foo or what is the type of the type of X and then we say well the type is type okay and what is then the type of type or the type of type of type of X and well it's again type so we see here
this pattern so X is of type foo, foo is of type type and actually type is of type itself so we end up with an infinite loop and this is usual that's not surprising but now meta classes come in so meta classes allow you like to
define your own type so we have here an example we define a meta class called meta that's nothing to do with the Facebook concern and it inherits from type meaning that it's a meta class and then we define the same class foo again but now we say well the meta class is meta so if you don't define it the meta class is like type but now we specifically define it then
we again create an instance now something interesting happens so because now what is the type of X it's foo not surprising but what is now the type of foo so the type of the type of X and this is now actually meta so it's no longer type and then the type of meta is then again type so we have
here this new meta so X is of type foo, foo is of type meta and meta is of type type so I mean that's a nice thing we can do this we did not gain a lot by doing so at the moment and but now it's really interesting to see well first of all what happens if we create an instance in Python I mean that's not
as surprising maybe depending on how much you already know about Python but if you have this script here like you have class definition start class definition end I hope it's readable by the size and then you have these two magic methods new and init which you can always define and call and now if I create an instance of X so what happens that's basic Python what
happens well the Python interpreter goes through it I mean even though we don't like script it's always a script we go through it class definition start class definition class definition ends and then since we create an instance of X first new is called so new forces us to create an instance of the class so we have to return an instance of this class if
we write defined function new if we don't then this happens under the hood and then init is called so we have to have this new instance here which is now paused here to us and we have to initialize it and so we have init start init end and then we also call X and X the calling of X then
calls this magic method call which then is here class called that's not new but it's very nice to see this comparing to what happens with meta classes because let's say we have exactly the same class foo but we have our meta class here so we have the class meta of type type and it has also we can save it as a new it has an init and it has a call and again if
new we have meta new start, meta new end, meta init start, meta init end, meta call then and then our foo where we have the class definition start then we define foo as of type meta then we have the things from before and then class definition end so what happens now if we run this program and actually it's
very interesting so we have class definition start then we have a function description and now we see that actually under the hood stuff happens so really Python it's an interpreter it goes through the language like through the program line by line it sees okay class definition start I have a class foo which is of meta class meta this means I have to define a new class this
means I have to create an instance of the type of this class which is meta and we see now exactly the same behavior we have like meta new start meta new end, meta init start, meta init end where our meta class forces us to create a new class and then initialize this class and then the
class definition is ended so by just defining a class things happened under the hood and Python like created this class by instantiating the type of this class which is meta in this class in this type and then the same thing happens as before first this happens so now meta is called so
when you remember here we create an instance of foo and I mean if you look at this what we actually do is we call the type where we call the class that's what actually happens so the class is called there's already the type is called and then it gives back the instance so we have created the class from the type and then we create the instance from the class
and then the things happen as before so if the class new start and class init start then again it is called like it's ending here so all this instantiation of the type happens within the call function of the meta type and then we're done and then again we have the class called here because here
we called X so we see it's really a meta class is to a class the same as the class is to an instance it's really just an instance and by defining a class we actually instantiate the meta class the same as by creating an instance of a class instantiate the class and what can we do with this so
far it was nice to see I mean nice that Python works this way but also we wouldn't really need to know I mean why would we care there are some things we could do with this I mean here this dummy example so we could just add an attribute I mean it's not something you would really want but it shows nicely that you have the full freedom to manipulate these classes as you want this doesn't mean that you should do so because it gets very
transparent what happens but we see this here like we have this meta class and we say well whenever a class is created which has as meta class meta it should add a class attribute called new attribute and then we have here foo it's an empty class but still by just saying it's the meta class meta who has now like a class variable new attribute which is also an instance
variable new attribute here so we can manipulate these classes and I would say like the most popular example for this are singletons so when you search for how to do singletons in Python you might have done so you end up on Stack Overflow with a post with so many outputs and there is an example here and most people look at it and say well okay that's not for me
that's very weird I will not do this but actually now if this context it's not that bad so what we can say is well we have a meta class called Singleton and this meta class keeps track of all the instances it has already created meaning of all the classes it has already created and now whenever some class is created so we have like an instance of the
underlying class is created so the call function is called we only create this instance if it does not already exist and if already exists we just return the previous instance and now is this what is it seven lines of code six lines of code we can define tons of classes just assign them the
meta class Singleton and now whenever we want to instantiate this class we just instantiated the first time and every other time you just get back the cached instance which is in the meta class that's the Stack Overflow topic for Singleton's meta in Python. Where are meta classes used
otherwise and they're actually mostly used in libraries where you need to somehow define API's and you do this as classes so for example in Pydantic you define a base model and this is like a class to define a class you have these class variables with type variable and type and so on
or like in object relation manager like in junk ORM you also define your database tables as a class and so this is where actually meta classes come in because you use the class syntax to define something and then the library needs to understand or to define it needs as an overview of all the classes you have defined and needs to even modify these classes on the spot
and in the chunk ORM it will then add functionalities as you can save these models you can load them you can search for them and so on and so in these libraries they are heavily used and that's where they should be used so that's like the best use case for them in most other cases they should not but as I said we mostly do not need them but actually in Confset we had to use
them so that's where the two ends of the talk come together because Pydantic needs meta classes and we wanted to wrap around Pydantic and to do so we had to actually wrap around the meta class of Pydantic. So what does this mean? I would say like one of the most important parts of Confset is this
Confset meta class which inherits from the meta class of the Pydantic base model and to do so since the meta class of Pydantic base model is not publicly exposed I mean it's we could import it but it would break we just get the type of this Pydantic base model so it looks very weird so we inherit from the type of the other model and by doing so we create the
meta class so PyCharm has no chance of type checking this it does not understand what's going on here at all but what we have here is this Singleton pattern so we have to call function here where we Singleton is like let's say negatively connotated name but it's actually more I would say a lazy loading and caching so what's happened here is that whenever you
have already instantiated a config class it will just like and return the config class again like the instance and only if it was never loaded before it will really actually load the config and it is load config is then there all these loaders we have seen come in place like the file loader the
environment variable loader the command line arguments loader and so on and it also uses this to like load them if you do not provide the class variable and just like provide them as instance as keyword arguments so if this concept meta class we can wrap around the meta class of Pydantic and then
the concept class wraps around the class of Pydantic like base model and defines the meta class of the concept and then we have the other stuff here which is necessary like the context match for the change conflict sources and so on but basically that's what's at the heart of concept is this meta class which wraps around the meta class of Pydantic this was it I think
again oh no I'm late sorry oh I also did not check the sheets yes thank you