GitLab pipelines for every need: testing, documentation, and writing a paper
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 60 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/42490 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
6
13
21
25
41
53
00:00
Software developerSoftware testingAnalytic continuationBitPoint (geometry)JSONXMLLecture/Conference
00:24
Operations researchCore dumpIntegrated development environmentMilitary operationParallel portSuite (music)AutomationSoftware testingCASE <Informatik>CompilerLibrary (computing)Computer configurationSoftwareComputer-generated imageryGastropod shellCodeLevel (video gaming)Physical systemServer (computing)Scripting languageOperator (mathematics)Physical systemCASE <Informatik>Software testingHome pageStapeldateiRevision controlIntegrated development environmentCodeSuite (music)Gastropod shellInstallation artWeb pageMedical imaging1 (number)Core dumpConfiguration spacePoint (geometry)Data managementLibrary (computing)BitSet (mathematics)Graph (mathematics)Right angleComplex (psychology)Office suiteStudent's t-testOrder (biology)Focus (optics)Process (computing)Standard deviationResultantQuicksortComputer animation
04:30
Computer-generated imageryLevel (video gaming)Physical systemScripting languageSuite (music)DisintegrationSoftware testingCASE <Informatik>SoftwareServer (computing)Client (computing)CodeExecution unitSuite (music)Level (video gaming)Line (geometry)Software testingBitGroup actionCellular automatonMultikollinearitätComputer animation
04:59
Physical systemSoftware testingSoftwareServer (computing)Client (computing)CASE <Informatik>DisintegrationSineBroadcast programmingIntegrated development environmentLevel (video gaming)Euler anglesComputer-generated imageryProbability density functionCompilerSource codeSoftware testingConnectivity (graph theory)Branch (computer science)CASE <Informatik>Dependent and independent variablesSubject indexingClique-widthSuite (music)Sound effectClient (computing)Server (computing)INTEGRALPhysical systemDefault (computer science)Line (geometry)Video gameProcess (computing)Division (mathematics)ResultantBasis <Mathematik>AdditionCodeMedical imagingScheduling (computing)Uniform boundedness principleWordBoundary value problemData storage deviceProgramming paradigmStapeldateiPhase transitionUser interfaceIntegrated development environmentHome pageLevel (video gaming)Execution unitMultilaterationUnit testingProbability density functionUniform resource locatorVirtual machineComputer animation
08:42
Computer-generated imageryProbability density functionCompilerLevel (video gaming)Scripting languageComputer fileProbability density functionCodeLine (geometry)Process (computing)BitMultiplication signComputer animation
09:21
Computer fileProbability density functionUniform resource locatorComputer-generated imageryCompilerLevel (video gaming)Scripting languageEwe languageMultiplicationRepository (publishing)Euler anglesSuite (music)Software testingCache (computing)VolumeConfiguration spaceLevel (video gaming)Computer fileOperator (mathematics)Multiplication signFrame problemProduct (business)Configuration spaceSoftware testingSet (mathematics)Suite (music)Mechanism designForcing (mathematics)Open setNetwork topologyPlotterArray data structureRight angleVolume (thermodynamics)Open sourceDirectory serviceCache (computing)Message passingRepository (publishing)Expert systemProjective planeSoftware repositoryCommitment schemeWeb 2.0Regulärer Ausdruck <Textverarbeitung>Front and back endsComputer animation
12:39
Configuration spaceSuite (music)MultiplicationRepository (publishing)Software testingCache (computing)VolumeComputer animation
12:55
Probability density functionRevision controlINTEGRALMultiplication signPosition operatorWindows RegistryTouchscreenInternet der DingeArithmetic meanExecution unitMereologyCASE <Informatik>Server (computing)Analytic continuationLecture/Conference
14:26
Discrete element methodRange (statistics)Menu (computing)HypothesisSound effectProcess (computing)QuicksortMetropolitan area networkComputer animation
14:54
Process (computing)Sinc functionCross-platformDifferent (Kate Ryan album)FreewareVariable (mathematics)Point (geometry)Multiplication signWindowService (economics)Projective planeComputer configurationLimit (category theory)Computing platformMobile appRepository (publishing)Shared memoryCircleTemplate (C++)Server (computing)Lecture/Conference
Transcript: English(auto-generated)
00:01
Today, already, we heard quite a lot of really nice talks on testing, continuous integration, and better software development practices in general. And I'd just like to share my bit. So GitLab pipelines.
00:21
What are they? Ah, before I proceed, can I have a show of hands, please? How many people here are actually using GitLab pipelines? Oh, OK. That's great. But maybe I'll be preaching to the choir. Yeah, sorry about that. Just ignore the uninteresting bits. Or maybe tell me what I'm doing wrong.
00:42
OK. So the core idea behind CI-CD pipelines. As far as I understand it, the idea is to apply well-defined operations on your code base automatically. Now, how does it work? On each commit, or according to some more fine-tuned criteria, a predefined environment is created
01:03
typically using Docker. And then a predefined set of operations are run on the code base. And these operations can be linked together in a fairly complex and flexible manner. One operation may depend on a previous operation, or they can be independent, and you can create
01:27
a complex graph of these operations if you want. So use case one, automated testing. Let's say your code base already has a comprehensive test suite. So a big step already done.
01:40
But your new team member cannot run it because some tests depend on some library being compiled with some option. So as a result, nobody knows whether the code base really is deployable. This is a problem which I faced, and I guess it is common.
02:01
So then the question that comes is, what if we could run our test suite on exactly defined environment on each commit? And we could see which commit made our test suite to fail. Then we can have a really nice blame game, right? Point two fingers. Why did you break the test suite?
02:20
Don't take me seriously. Don't do that. So we can have exactly that. So here we have a, this is just a commits page on vanilla GitLab, which you normally have. And there's this really nice crosses and ticks as a new column, which tells which
02:40
commits actually passed the test suite and which ones failed. So here you see, I myself introduced a problem in the test suite and broke. The suite, which is not a problem because if you know, then you can begin to fix that stuff, right? Okay, so we can have more cool stuff.
03:01
We can have a badge on the homepage. So when you open, go to your repository, you can have a badge screaming loudly that your code base is broken. So here you see this pipeline failed batch. What could be better? Well, the pipeline having passed would have been better. So, how to do this?
03:22
Use a CI CD pipeline to run the tests. And the steps are rather straightforward. First you choose a Docker image, which already includes the base dependencies of your software, like your needed Python version or whatever. And then if necessary, you can install additional dependencies using a package manager.
03:45
And then you can run the test suite using whatever test runner you want. So long as the shell gets a zero return code on successful run of your test suite and a non-zero return code. That's all you need.
04:01
So how to set this up? This is a very basic CI CD configuration. So first I have got this, defined this Docker image with a Debian base. And then we specify certain steps to execute before our test suite is run.
04:22
In our case, we're using apt to install certain system packages. And then we install ourselves. It's a Python-based project, so use pip to install ourselves. And then we define this so-called stage called unit tests,
04:40
where we just use pytest to run our test suite. It's really relatively few lines of code, and that already works. Well, it was all good for a bit, but then our code burst grew and we realized more things that we wanted to have. So we started writing tests which are more integration tests,
05:04
or I don't know the proper name for it. So we're basically simulating our deployed system. So it involved gluing up all the different components and then firing up our server, Apache Thrift server, but doesn't really matter here, and create a few clients and then fire up a few hundred requests to the server
05:22
and see if our server came up with proper responses according to certain predefined boundaries. So running these tests didn't really fit into our CI CD paradigm because they just take too long to run.
05:43
And we also didn't really need them run on each commit on every branch. All we wanted was basically a nightly run of these things on master. So we managed to have this thing. It looks like this.
06:01
We now have two badges. First one says unit test results and second one integration test results. In this case, one of them passes, but the more complicated tests fail. So how to do this?
06:21
As usual, we specify an environment using Docker, and then in addition to the unit test, we define a new stage in the CI pipeline that is to be triggered on a schedule and not each commit, which is the default behavior. It's really just these two lines.
06:41
You use this only keyword of GitLab CI system, and you say that the first line is saying that this should be run only on a schedule. And this should only be run on the branch master. And then the schedule, one configures it using the web interface.
07:05
One can use cron syntax to specify, to fine-tune this schedule. And then there's one step still left, which is probably not so high importance, but generating these custom badges so that in the homepage
07:23
we can see whether our test suite is failing for the integration test. For this, we use the so-called artifact system of the GitLab CI infrastructure. The same thing is actually also available for GitHub,
07:41
so it doesn't really matter which one you're using, but we'll come to that later. This artifact stuff is just, again, just these four lines of code where we basically say everything on this public folder should be stored. All right.
08:01
Use case three. I originally wanted to talk about writing a paper, but then I realized, well, I could just also as well give this talk using GitLab CI. So yeah, this is how it works. That's the URL. I think I managed to make the repository public, so it should be accessible for all of you.
08:21
And in the README, there's this nice batch showing that the PDF compiles, and you can download the precious PDF. That's how I got this on that machine there. Okay. So how does this work? Again, fairly simple. Get a Docker image with tech life in there and define a job that invokes a LaTeX MK,
08:44
which automatically takes care of running BIPTECH and whatever, run BIPTECH once, LaTeX twice, and then BIPTECH five times, whichever is necessary. And the whole thing is basically literally five lines of code.
09:02
And use this job artifact things to actually save that PDF. And then the only thing left, well, this is the syntax, and this basically is telling that everything with this shell, wildcard, star.pdf is to be not chucked away and stored.
09:20
And then we can have access to this PDF under this predefined URL, all documented very nicely on the GitLab CI CD pipelines. Yeah, that's about it. Some outlook. Some more stuff which one could do.
09:46
Basically just throwing some ideas at you. Multi-project pipelines. This is also possible. I haven't done this yet. If you have a stack distributed across multiple repositories, maybe one backend, some web frontend stuff,
10:03
it's actually possible to have a pipeline which runs the pipeline on all of them. You can automatically run your test suite on merge requests. Many open source projects actually do this. You can just contribute to NumPy.
10:20
You can just submit a pull request to NumPy if your test suite is not passing, then not even look at it. And if your pipeline needs to download large amounts of data so that your pipeline takes forever to run, then you can use this cache mechanism of pipelines,
10:45
which basically means that a specific directory will be reused between test runs so that only one time, all these time-consuming steps will be run, and from the next run onwards, it will keep reusing that data, saving quite a lot of time.
11:02
I'm not sure. Maybe the Docker experts can tell. Docker volumes also sound like a good idea for this kind of things. I'm not sure. Never tried this. Another trick which I find interesting is skipping the pipeline for sudden commits. For example, if you just change your README file,
11:23
it doesn't make sense to run the pipeline. But this is also possible. You can use the accept keyword in the CI configuration, and it accepts fairly complicated regexes to skip the pipeline if your commit message matches some criteria. You can use the pipelines to deploy your packages
11:43
to open source repositories or deploy to your staging or even production. I don't know how to do that, but I don't know. And last of all, maybe reproducible papers. I don't know. Never done this thing. Maybe some of you did. I mean, since pipelines are nothing but a set of operations,
12:00
in principle, you could just download your dataset for some open data repository like IMAG or, I don't know, what's it called, Edmund or some other thing and make all your plots and then compile your paper using a pipeline.
12:21
That would be great. That would be, at least in my opinion, as reproducible as it gets. But I don't know if it works. Anyway, with that, I'd like to thank you for your attention and questions. Thanks a lot, Topchanka, for this really interesting talk
12:44
full of ideas, I would say. Are there any questions or any ideas maybe as well?
13:03
I think there is one interesting thing you have forgotten. In GitLab, it's possible to connect with your private Docker registry and then you can do Docker and Docker in continuous delivery and not only integration.
13:22
So, for your outlook maybe. It's quite interesting. Yeah, that's an interesting idea. Thank you. Any other ideas or questions? Questions?
13:54
Okay, the question was the artifact, so the PDF that is created in this case,
14:02
is it also under version control? And the answer is, well, not really version controlled per se, meaning it's not part of the repository, but you can access the PDF from each commit. So, I showed you this kind of a screen there.
14:27
For each commit, you see whether the pipeline ran or didn't. So, if you have these artifacts enabled, then there will be another column somewhere in here which actually allows you to download the artifact for that job.
14:41
And you can configure to delete artifacts for older ones because if you have a paper or thesis there, hundreds of megabytes, you don't want to store all of them. Yeah, a question, I guess not only for the speaker, but for everyone who put their hand up and said that he's using GitLab. I'm just curious to know if anyone has competing experiences
15:01
with different CI servers and whether there's any particular reason to switch to GitLab. Does it offer anything over any of the many other options available? Well, I don't really have much to say. I use both GitHub and GitLab and find all of them quite fine. Oh, one thing.
15:22
Since I have the mic, I will take the opportunity to take advantage. In GitHub, I found this really easy to test against multiple platforms. So, a Python package, for example, you can test against Python 2.7, 3.7, and PyPy. I couldn't find an easy way to do that in GitLab yet.
15:41
Maybe somebody else knows. Yeah, exactly, in GitHub and Travis. I don't know how to do that in GitLab. So, I've used Travis, I've used GitLab, and what was it? Circle CI. Jenkins, I haven't used. The reason I've switched away from Travis is basically the time limit for free jobs.
16:07
Azure I really like because it's very easy to share pipelines across projects. I'm not sure if GitLab has this already. So, basically, I have one repository that has templates for pipelines
16:22
and every job can then fill this with some variables. Yeah. Thanks for sharing. Yeah, I basically also wanted to mention Travis. And app buyer is also quite nice on GitHub for Windows jobs.
16:42
But, yeah. I think Travis can do all platforms now. And it looks a lot nicer than Jenkins, at least. And Jenkins is just ugly but does the job.