Split Up! Fighting the Monolith
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 153 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21203 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
CodeSet (mathematics)Branch (computer science)Library (computing)Interior (topology)BuildingRepository (publishing)Process (computing)Software testingExecution unitServer (computing)PressureInstallation artPersonal identification number3 (number)Error messageCommutative propertyCore dumpComplex (psychology)Data structureRepository (publishing)Cartesian coordinate systemLibrary (computing)Commitment schemeHeegaard splittingData structureComputer configurationProduct (business)Water vaporRevision controlComputer architectureSet (mathematics)Standard deviationPoint (geometry)Software testingUnit testingConstraint (mathematics)Software developerProcess (computing)Vector spaceIdentity managementContinuous integrationNetwork topologyBranch (computer science)Data storage deviceDifferent (Kate Ryan album)Module (mathematics)Sheaf (mathematics)GenderElement (mathematics)Multiplication signOrder (biology)QuicksortDirection (geometry)Integrated development environmentTheoryDecision theoryComputer fileSemiconductor memoryCue sportsServer (computing)CASE <Informatik>Structural equation modelingComplex (psychology)Workstation <Musikinstrument>Insertion lossGraph (mathematics)MathematicsMetric systemPiLie groupExecution unitConnected spaceMereologyResultantLaptopParameter (computer programming)FeedbackCycle (graph theory)Dependent and independent variablesProjective planeCodeException handlingRight angleChainForestEntire functionWeightConnectivity (graph theory)INTEGRALDigital electronicsPersonal identification numberSoftware bugService (economics)Metropolitan area networkScanning tunneling microscopeInstallation artError messageString (computer science)Hash functionBeta functionAlpha (investment)Code refactoringSimilarity (geometry)Plug-in (computing)InternetworkingGoodness of fitComputer animationLecture/Conference
Transcript: English(auto-generated)
00:02
So, this new talk is the talk of Patrick Malbauer. He will explain how to split a monolith application with a repository and the rest. Okay? So, good luck. Thanks. Yeah, hi, welcome to my talk. I'm Patrick.
00:20
I'm working as a software developer at Blionda. And if you came here because you only read the title without the abstract, and like I do sometimes, and are now expecting I will show you how to split a monolithic application into a microservice architecture, then I have to disappoint you.
00:40
That's not what I'm talking about. We want to talk about what we did with our application, which consists of various Python packages, and we had all Python packages in one repository. And at one point we decided, yeah, we want to have one repository for every package we
01:05
developed. And yeah, so let's start with this easy example, how a Python package usually looks like. You have, for example, this structure, you have the actual package, my superlibrary
01:24
with this standard init module in it, maybe you have a requirements file, you probably have a setup.py file, and of course you will have tests. And for us it looked like something like this, so we had lots of these packages in
01:44
one repository. What we also had was this one requirements file for the whole application. This was probably the worst design decision at the beginning, so even for the unit tests
02:04
of every of these libraries, we just installed everything and used that in our version to run the tests. We had actually a lot more than four of these packages in it.
02:24
Yes, so why do we even want to split it? So there were various reasons, one of the reasons was other teams of our company started using some of the libraries in their own projects, and they always complained about
02:45
okay we want to contribute, but every time I have to look into this big repository it's such a pain to get it just running, whatever.
03:00
So yeah, this was one reason. Another reason was, okay I called it spaghetti code here, it's this, what I mean here is cross dependencies, so you for example in library one you import from library two even if you don't actually want it at the end, because it's not well structured then.
03:28
And yeah, so as other teams started to use our libraries we also had to release them, that's also easier if they have their own Git repository, and we also wanted to
03:47
use something like setup tools SCM to get automatic versioning, so if you don't know what setup tools SCM is, it creates versions for your Python package out of your Git versions
04:04
or material versions, so usually a setup.py file looks something like this, and you have this version keyword argument where you have to manually adapt the version string every time you want to do a new release, and setup tools does this for you automatically, so
04:28
here you would just say yeah my setup requires two setup tools SCM to be available, and I want to use this SCM version, that's here, and what such a version then looks
04:44
like is something like that, so here we have one commit after the latest tag, so before I set a tag 0.0.1, and this def1 then says okay you are one commit after the latest
05:05
one, and at the end you have after plus ng the start of the current Git commit hash, okay so talking about Git commit hashes, so we now decided we want to split our monolithic
05:26
Git repository, but how do we do it if you just move certain sub packages somewhere else, and then initialize the repository from the start, we would lose all the history we
05:43
already have, but there's, if you use Git, I don't know if there's something similar in Mercurial or any other SCM, but with Git you have subtree, and subtree has another subcommand split, and split creates a new history of commits for a specified prefix,
06:13
so here I have a prefix library 3, and if you also specify a branch name, it will
06:21
create a new branch which exactly has this newly generated history, what you can then do is create your new package and initialize it with Git, and then you can pull this branch you created before from the monolithic repository, now you have a new repository
06:46
for your new library 3, and with all the history which affected library 3, so okay this has
07:00
to be done for all packages in our monolithic Git repository, what then changed for us is how do we do our continuous integration workflow, because before we had just this one repository, it didn't matter in which package we made changes for our next feature, as we used
07:24
Jenkins, Jenkins just checked out the latest commit, every change in every of our packages were available, and yeah that was easy, that's one of the advantages if you have monolithic applications, architectures, whatever, what we then did, we saw okay now we have lots of
07:47
Git repositories, let's just check out every Git repository at the beginning and create our application artifact out of this, this ended up in a really messy Jenkins job, so if you
08:03
don't know Jenkins, Jenkins has lots of plugins, one of those plugins is the multiple STM plugin, there you can specify multiple repositories which should be checked out at
08:21
the beginning of a job, and yeah we did this for all of our extracted libraries, and had this huge list, and this was horrible, it was really horrible when we had to do bug fix releases, I mean can you imagine how hard it is if you have to configure
08:47
all this, okay so in Jenkins you have to specify which tag branch or whatever has to be checked out, so if we wanted to do a bug fix release, we had to specify the tags or commit hashes
09:05
which were used back for our release, so that we only change the repository where we had to do the bug fix release, and I think you can imagine that this was really horrible,
09:21
so don't do that ever, and what you actually want to do of course is just use your libraries in your application just like any other library, and the problem with that was if you
09:42
add your libraries to your applications requirements txt, every time you change something you would have to erase the version in your requirements file, because you pin your requirements of course, and yeah that's also not a good workflow, because it happened quite often that we
10:12
implemented new features in our library packages, then the unit tests for this library passed,
10:20
then we thought okay we can do a release, but when we actually used the new version in our application we saw oh no that's not working at all, and you want to have this feedback a lot faster, so what can we do here, we came up with this workflow, so we run unit tests of our
10:47
libraries, if they pass we let Jenkins upload the real to our internal devp server, and at the beginning of our application job we just install those with the minus minus
11:09
pre option of pip install, with this option you can install pre releases, so beta releases, alpha releases or those dev releases you saw earlier created by setup tools SCM,
11:25
so we always had the newest versions in our continuous integration pipeline of all libraries, very new the unit, at least the unit test pass, yeah and then we created another job
11:46
for actually doing application releases, so if you want to do a release of our application we had to, we now have to do releases of all the libraries where we know okay with this version
12:02
it works, and can then run this extra release job where this minus minus pre option is not used, okay I mentioned devp, who of you knows devp or uses devp,
12:25
okay not many hands, so devp server is a pipi server, we use it in at Blue Yonder for our internal packages, you but you can also use it on your laptops, so it's also just a mirror for
12:46
pipi and for example if you are on the train and want to hack but don't have internet connection and have to install and start a package you could do it offline if it's already cached,
13:00
yeah you can whitelist, blacklist and do lots of more things, another thing is requirements pinning, who of you ever had this version conflict exception,
13:20
okay my colleagues and a few others, so what we agreed on we said we don't pin the requirements in the setup py file, so you have that this install requires, because if for example library a requires
13:47
requests greater 1.0 and smaller than 2.0 and another library wants to use the next feature of which came with request 2.0 for example, then you get this annoying version conflict error
14:09
and most often it does not make sense even giving a saying okay I want to have smaller than 2.0
14:22
and yeah the application is then responsible to use the correct requirements and this will avoid lots of these exceptions and one other comment, so we don't pin the requirements in the setup py file but we pin them to have
14:44
a special set of requirements for running the tests so and another developer can then check out tag 5.0 for example and see with which requirements the tests actually passed back then,
15:03
okay so now we have all those repositories what did we actually gain, so we can use setup tools stm for every repository don't we have happy library contributors
15:23
but on the other hand for us application developers it got a lot more complex, if you develop a new feature sometimes you have to do changes in three four five repositories
15:42
you have to keep them updated all the time and that can be really annoying sometimes but the quality and the structure of our code really improved so now we have
16:01
defined the requirements which every library needs just this minimal set so it does not happen that easy that we have this ugly cross imports which you don't actually want and so on and that's actually a point now that you have a cleaner structure
16:25
it might be easier to see okay which components change in the same with the same speed or something like that and there might be an extra service
16:46
yeah where might it be good to introduce a new service with only that library packages so maybe that so i think what we did up till now is a step before actually getting to microservices so
17:06
yeah and i think that's all for now if you have any questions okay so thank you for your
17:24
talk have you some question for another friend so thank you for your talk it was a nice experience we have a pretty the same problems in our project and so you mentioned that your
17:45
during the deployment your jenkins job your jenkins deployment is a simple pip install and it always installs the latest versions right so i mean it is the latest development versions for
18:01
ci and the latest stable versions for for the production release what if you deployed some broken package to the production and you need to roll back to the previous version how do you deal with that since i i don't think you've been versions during your deployment so
18:21
how do you do roll back to the previous version um no no um that does not happen so we don't pin what the libraries want but um we pin in our application and we also then install
18:41
with minus minus node apps so that we don't get any recursive requirements i'm not sure if i understood your question correctly so do you been versions of your
19:01
somewhere during the production deployment yes of course so um okay like i said we for in the libraries we don't pin in the install requires section instead of pi but we pin our we have an extra requirements file where we pin the versions and we use this versions to run the
19:23
tests for the library and in our application we then pin we want library is like is like 5.0 and so it cannot happen that something else comes in okay we can talk later maybe thank you
19:44
you mentioned that you have dependencies between the libraries so library a imports library b
20:01
and each other so how did you resolve it that depends on how it depends sometimes um code duplication is better than having this cross dependencies um
20:28
yeah that's basically it i it depends on how this cross reference looks like so i don't have a special example right now sorry how about depending one library on the other so
20:47
so for for the library a you require a library b okay sometimes it's not a problem at all but we noticed that for some in some cases we um had libraries which depended another library and
21:08
this one again required lots of others and we had again this big unstructured thing so
21:23
yeah i don't have the did you consider using git sub modules for your problem like for pinpointing versions and then being able to have different kind of
21:45
release we said from the beginning we don't want to use skit sub modules it's you tried no we didn't we just said nah that's too dangerous to get uh fucked up okay yeah because we are
22:04
doing that okay that's the way how we can pinpoint different versions and do like even yeah hotfix branches on a super repository okay so maybe we can talk if it works out for you great
22:24
hi uh you mentioned something about the code quality like by following this refactored code structure you had some improvement in the code quality i'm just curious like how did you measure that do you measure some kind of psychometric complexity or something else like what were your metrics to to say that your code quality was improved okay i have to admit that's
22:46
a feeling i had and no i'm that came it's difficult to explain maybe we can talk after
23:02
that it's another one no okay thank you so much patrick for your talk