We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

openSUSE testing - an overview

00:00

Formal Metadata

Title
openSUSE testing - an overview
Title of Series
Number of Parts
40
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How is software within the openSUSE ecosystem tested? What kind of tests exist? Who is doing what? This talk will try to present an overview of how "testing" is done for software developed in the openSUSE ecosystem. The workflow of software contributions to the openSUSE distributions will be shown from testing perspective from upstream source code repos to feedback from users in the released products. Used tools will be mentioned, the testing approaches as well as the people involved. The relation to SLE testing will be described. As this "overview" will not be able to cover all approaches used by the community feedback by the audience in the Q&A part of the talk will be appreciated. Of course, openQA will be included but it is certainly not the only solution to be mentioned ;)
Statistical hypothesis testingPresentation of a groupGoodness of fitStatistical hypothesis testingOpen setComputer animation
Exploratory data analysisGraphical user interfaceBeta functionStatistical hypothesis testingSource codeSoftware repositoryStatistical hypothesis testingPhysical systemStreaming mediaIndependence (probability theory)Arithmetic meanSoftware developerStatistical hypothesis testingVirtual machineLevel (video gaming)Source codeState of matterSoftware frameworkPointer (computer programming)Execution unitComputer programmingRevision controlIntegrated development environmentDistribution (mathematics)INTEGRALSoftwareTraffic reportingFeedbackRight anglePatch (Unix)Repository (publishing)Physical systemProcess (computing)Set (mathematics)TensorSystem administratorWeb applicationExploratory data analysisOpen setGraphical user interfaceResultantProjective planeGreatest elementSoftware repositoryAreaSubsetThermal conductivity10 (number)User interfaceUnit testingOperating systemSoftware testingVirtualizationMultiplicationStapeldateiCovering space
Statistical hypothesis testingPersonal identification numberMultiplicationElectronic mailing listEmailLimit (category theory)Level (video gaming)Form (programming)Repository (publishing)Run time (program lifecycle phase)Rule of inferenceLine (geometry)Source codeBuildingComputer fileRevision controlIntegrated development environmentStatistical hypothesis testingMultiplicationComputer architectureMereologyBlock (periodic table)Projective planeProcess (computing)File archiverMedical imagingStatistical hypothesis testingPhysical systemOnline helpScripting languageProduct (business)Distribution (mathematics)Server (computing)CASE <Informatik>Open setView (database)Logic gateDifferent (Kate Ryan album)Multiplication signExterior algebraCombinational logicResultantLocal ringThermal conductivityInformationComputer animation
Factory (trading post)EmailEmpennageRun time (program lifecycle phase)CodeLine (geometry)Projective planeRepository (publishing)Software developerRevision controlBoilerplate (text)Server (computing)State of matterInternet service providerAdditionComputer animationLecture/Conference
Statistical hypothesis testingPhysical systemFactory (trading post)Level (video gaming)Projective planeStatistical hypothesis testingDistribution (mathematics)Software developerSource codeInheritance (object-oriented programming)RobotRepository (publishing)File formatComputer fileBuildingConfiguration spaceKernel (computing)MultiplicationResultantStatistical hypothesis testingINTEGRALFlow separationCodeBoilerplate (text)Computer architectureRevision controlCircleOperating systemState of matterWindows RegistryInstance (computer science)Physical systemAdditionWaveSoftware maintenanceView (database)FeedbackProcess (computing)Decision theoryProduct (business)BitVarianceDifferent (Kate Ryan album)Similarity (geometry)
Statistical hypothesis testingPhysical systemGraphical user interfaceStatistical hypothesis testingVirtual machinePhysicalismRight angleNumberDigital video recorderOperating systemLevel (video gaming)Product (business)Scaling (geometry)Flow separationBenchmarkSoftware frameworkContext awarenessResultantStatistical hypothesis testingDecision theoryVideo gameBootstrap aggregatingPhysical systemSubsetProcess (computing)CodeInstance (computer science)Branch (computer science)Repository (publishing)Declarative programmingScheduling (computing)Web 2.0Field (computer science)BitFile formatAsynchronous Transfer ModeData managementSoftware developerInteractive televisionGame theoryCartesian coordinate systemException handlingLogic gateDistribution (mathematics)
Keyboard shortcutFormal languagePhysical systemInstallation artOperating systemPhysical systemBootingBitWeb pageCartesian coordinate systemDigital video recorderReal-time operating systemEmailStatistical hypothesis testingClient (computing)Web 2.0Computer animationSource code
Inclusion mapGroup actionDuality (mathematics)WindowTouchscreenInstallation artRight angleVideo game consoleRadical (chemistry)BootingStatistical hypothesis testing2 (number)Different (Kate Ryan album)SoftwareScreensaverProcess (computing)VideoconferencingOpen setDistribution (mathematics)BitLevel (video gaming)Cartesian coordinate systemSelectivity (electronic)Source code
Statistical hypothesis testingGraphical user interfaceStatistical hypothesis testingExploratory data analysisBeta functionPoint (geometry)Source codeStatistical hypothesis testingFeedbackStatistical hypothesis testingDevice driverVirtual machineData managementResultantDistribution (mathematics)Level (video gaming)AreaScalabilityLatent heatRevision controlProper mapBeta functionExploratory data analysisPhysical systemSoftware bugEmailSoftware developerINTEGRALCASE <Informatik>Multiplication signProcess (computing)Link (knot theory)Single-precision floating-point formatComputer virusSoftware maintenancePublic key certificateCondition numberRight angleCombinational logicSoftware testingSoftware repositorySubsetRepository (publishing)Projective planePoint (geometry)Server (computing)Demo (music)WaveElectronic mailing listQueue (abstract data type)Structural loadTraffic reportingPhase transitionFrequencyRegular graphSpeech synthesis2 (number)Medical imagingWhiteboardPay televisionTask (computing)
VideoconferencingHypermedia
Transcript: English(auto-generated)
Good afternoon everyone. So my presentation will be about open SUSE testing and I'm trying to give an overview. And for this, let me check one thing first. I have to test something.
The scope should be to answer the question, how is software within the open SUSE ecosystem tested? What kind of tests exist? Who is doing what?
How is it done? And then I would like to come to some challenges in the end. I will try to use that illustration of the so-called test automation pyramid to guide through this. So we will go through this from the bottom to the top.
It starts with upstream source repotests, wherever all the software that we are talking about lives. This is where we start with. And then we come to package and project tests. So this means for all the tens of thousands of packages that we have in the distributions,
where we can also conduct tests. System level tests, and this is where the pyramid gets more narrow so we can say that we have, by definition, we have less tests where we combine all of that, but they have a broader impact on the whole system, the whole operating system. Then there are GUI acceptance tests. And also on the top of the automation pyramid is meaning,
talking about automated tests, this is now where we reach this cloudy area of exploratory beta testing. So something that, by definition, can not be running in automated tests.
But let's start from the bottom for the upstream source repotests. This is, well, something that everyone should do, right? Everyone that is doing software, we have some source code repository, and there, what we commonly see, you can see something like on GitHub, some projects,
where we have these nice batches and we have a green or red showing how this unit tests are running. Top right, that's a screenshot from Travis where we see checks that we even can conduct on pull requests
before merging something. And all these upstream source repotests, they provide a baseline for all the downstream tests, all the tests that come later. This is where we start with. Commonly, it's hard to cover distribution integration when we are talking about source code repos. This is something like on GitHub or other version control systems,
where we do not yet talk about Linux distribution mainly, but something about how does my software work in whatever environment I'm using for development. Now, who is doing that? Well, this is the upstream communities that could be single persons,
that could be bigger communities that have maybe selected some kind of target operating system as their main target. It could be also SUSE or OpenSUSE developers that develop software in that stage. And how that is done is very much ecosystem dependent.
So, for example, when your program is mainly in Python, then you're using something like Python unit test or Pytest. If it's in Ruby, then there are certain frameworks where you say like the state of industry which we select. And this defines mainly how you run these tests on that level.
Well, why this is done, this is the way to get the fastest feedback because we are talking about developers that run their machine, run some code, and want to test does it work, what I want to do here. So, it's something that should be available to the developers with the fastest feedback as possible,
meaning not too many steps in between. It's independent of distribution. Well, I would say kinder, because commonly you need to select something as your operating system on which you develop, and based on that, you're running the certain tests.
For example, if I run OpenSUSE leap 15.1, then I develop on that. The question is, does it still work on OpenSUSE leap 42.3, which is still supported. So, there's some question that is yet to be answered and probably not on that level. So, it can be simple. It could be like Python, Pytest, and talks,
or it can be more sophisticated. This is showing an example from the Travis test results on OpenQA itself, the OpenQA software, where we have some unit tests, we have some web-based UI tests, so we are checking the UI,
which is mainly the web interface for OpenQA, so we do that. For that, we are using containers, we are using virtual machines, something which you might have heard about in other talks. And it's also been used for automated administration. What you can see there in the bottom left is there are multiple check marks.
Each check mark stands for a certain set of tests. There are some unit tests, there are some integration tests, there are some UI tests, but there are also jobs which, for example, publish the documentation, which is generated from the source code repositories. So, something which you can also use Travis for, or CI systems.
Okay, so this is the level for source code repositories. The next level, and we are talking about OpenSUSE as a distribution, is where we come to the packages. That mainly means, in OBS, we have packages, where we get the source code from the upstream source code repositories,
and I would call that the foundation of distribution building, because we want to have a package for everything that ends up in the distribution. Of course, there are also other possibilities. It doesn't necessarily have to be a standard RPM-built package,
where you end up with a binary package. It could also be a container itself, or a flatback image, or just an archive of something which you don't even need to build further. Now, I would call the building process also a test.
So, you are on OBS, that makes it pretty easy. You build against multiple projects, against multiple products, in various versions, and also various architectures, and by that, you're testing, can I build that package? It might fail because of missing dependencies,
something which is not necessarily something that needs to change in your own source code. It might also be that you're relying on certain features, which is only available in certain versions of dependencies, or base layers, which is maybe not provided in an older version, or maybe a more recent version already behaves different.
And if we are talking about packaging, then commonly this is done using RPM based on spec files, and in spec files, there's a building rule, there's a rule for preparation, there's also one rule which you can use, which is called percentage check. And if you use that, then you can run the tests that maybe you have already run
on the source code repository level. You can also do that within OBS, and the advantage is that you do that for all the different combinations which I've mentioned earlier. Now I would like to present one slightly alternative approach on top,
which is what I would call the multi-built package self-test. The question is, what if the upstream tests are passing, but your package is broken? Or what if, because this had been also mentioned multiple times in different talks,
and I will come to that, what is OpenQA, but what if OpenQA system tests are too late or too broad, because all the other tests, they come later, they take longer, it takes longer for us to get the information. And as an example project, which for now I will not show on OBS, but what it does is, in OBS projects, it uses two files.
So what you need is two files to define that you want to run, second to the building step of building a package, you want to run some tests which are in another environment,
not the build environment, but another dedicated independent environment where you can test, does my package actually install? How does it work if I call a very trivial example, my script minus minus help or something? There are two files, underscore multibuild, there you can define a variant, so you're defining next to the normal variant of your package,
the test variant, and then there's another one which is in the spec file itself, so there you can see this name, colon, and then the definitions, this is what you commonly do when you build a package. The special part is something with this if test and something.
So let's talk about this block here. What you do here in a test environment, the test package requires on the build package. By doing that, when you build this test package,
you're trying to resolve all the dependencies, but you're trying to resolve the runtime dependencies which you would need. Now if you would conduct only a test within the build environment, you would not check for the runtime dependencies, but only for build time dependencies. This single line already can show you what you might be missing
regarding the runtime dependencies which I forgot to mention in the spec file. And the second part, later down here, so this is an example from a server application, and what you're doing here is you're calling commands, the commands which you would install by your package which you're building.
And by doing that, of course within the limits of the environment, in case of OBS for example, there is no external network access, but still you can run a local host server and try to register against that server locally. That all happens within OBS, and if you have seen OBS build results already and before,
this is resembling the same. It's just that you have a second half to it. So normally you have a package. I assume many people have seen a view similar to that. You have multiple repositories, you have multiple variants for the different architectures,
and you see that all the packages succeed to build. That means we have a package for all of these projects. However, the test package, which is this multi-build variant which I showed you before, shows that there are some problems. For example, on LEAP 42.3, it shows unresolvable.
Now what does that mean? It means that we could build the package, we could find all the dependencies that we need to build the package. This is why it succeeded to build, but afterwards, when checking for the runtime dependencies, we can see that LEAP 42.3 doesn't offer all the dependencies which we would need. This is why I've put in another variant of the repository,
which is the last line where I'm adding an additional repository. I'm saying, okay, then if that dependency is not there, let's try to add another development project which should provide the dependent package. And then you see that it can install the package, but then it fails.
So in the later step, when I was trying to register again my own server, I can see, aha, so this doesn't work. So something is about the versions of the dependencies, which is now different. Now, that adds quite some boilerplate code in the spec file. It's not actually nice or tidy,
because you're kind of abusing this instrument to build packages for testing. There's only one suggestion which I can give, is to use a separate spec file, which you can also do. So if you don't want to intermangle these all, you can have multiple spec files per package. And you would have a test package,
which is next to the build package definition. Now we are still on this package, packaging level, the project level. There's more on this level. There are repository install checks, which check if the repositories which are generated are installable.
More or less what I showed on the level of a single package and before, you can check, does it all install? This is what is done, for example, when a new snapshot for Tumblr is created, same as for Leap, it is checked, can we install packages within there? There are review bots. So one famous example is the legal bot,
which checks are the licenses correct of all the source files? There are further policy checks, for example, regarding the inheritance of the package. Is a Leap package coming from a SLE source or from a factory source, so that we don't have dangling packages, which are only Leap, but not in the other distributions.
And there are development project tests, so some development projects, which are a bit bigger, already have more and finer tests. For example, the KDE, just as well as the Gnome test, they run some tests based on the development project level, so we are not waiting until the end
of trying to create a snapshot of Tumblr, probably to say, does the latest git snapshot of KDE work? And there's the so-called staging projects, and these staging projects then, again, try to, when you create a submit request, to have something included, for example, in factory, then this is then checking for multiple things on that level
before a package is accepted further into the whole distribution. Okay, so who does that? I would say that is the maintainers of the packages or the build projects. How, one way, just by building it,
I would say that is a test. Using the check rule, as well as the approach which I showed with the multi-build package self-test, OBS bots, but also CI systems and containers, for example, using an additional Jenkins instance or containers registry or other tools
where we can just check out the latest state from some development project and then in a circle, feedback the result before creating submit. And why, I would say, well, integration is crucial, especially for distribution building when we are talking about multiple versions, architectures, and all that variance.
So the goal is to identify the package impact before accepting that into the whole system. Okay, and then we are on the next level. That is system level tests. System level test means that we can test end-to-end the whole operating system in that regard.
Now, this is testing the distribution as a whole. We can, or we should rely here on all the pre-integration test results. So knowing what we tested in before, then we can say like, okay, what else could go wrong after I accept the package now? What is different now on the level of system? And a good example is booting the system
or conducting an installation. This is something which we cannot do when we are talking about a single package, but there are a lot of things which can go wrong. It could be grab the kernel, some config files which rely on a different config format, and all that things. And this on that level directly feeds
into the product release decision process or when we are talking about something like rolling release Tumbleweed, same as for Leap, and on a similar level, same for Slea. System tests are conducted and based on that, a built or a snapshot of Tumbleweed is accepted or is discarded.
I would say the main workhorse here regarding the classical distribution is OpenQA. If you haven't seen it, this is one view how it looks like. So what you see there is Tumbleweed is being tested and all of these numbers to the right
is one single virtual or physical machine test that is conducted. You can see that there is one built for every snapshot of Tumbleweed which is conducted and based on the test results in there, this is feeding back into the decision should we release that build of Tumbleweed or not.
If you're interested later, I would be happy to give you an introduction into OpenQA. If you know it already, you might not know what is included recently, so I would like to present some recent new features which can make the life easier. There is an OpenQA bootstrap tool for easy installation.
So if you think, yeah, OpenQA was already cool, but I don't know how to install the thing, it's too complicated. I would say something like a one-click solution now so you can run that. And even that is not necessary to do if you want to try out OpenQA. It had been already mentioned today
in the morning in the home stock is you can run custom test code on production instances because we're relying on some virtual machines or in some physical machines to conduct the test. It's not necessary that you're trying out tests or some experiments need to be accepted into the main branch before tests can be executed.
So there's a way to have just your own GIT repository where you're trying out something, changing an existing test or adding a new test and that could be conducted on production instance. There is a YAML-based declarative schedule support now. So previously, some of the schedule was
definable only in the web UI by selecting some fields which is really easy to do and it's also pretty obvious what is going on there. However, if you want to go a bit further, more professional then it's good to have the schedule definition itself in a more defined format in the text-based format.
So this is what had been done recently based on the YAML text format. And also, there's a reworked interactive developer mode. So if you know the older interactive mode that one is way more stable now and it's really fun to work with. So what you can do is if you're running a test then you can interact with the VM while it is running.
Of course, it can impact the test result. This is why the individual job then is not regarded anymore for test results. But you can actually interact with the machine for debugging purposes, for example. Okay, so system level test, who is doing that?
Well, this is where release management comes into play and also quality assurance. This is also where I am participating as a QA, as a SUSE employee, as a QA engineer. We focus or we start from system level test. So we don't start from the package level. We rely on something that is done on the package level. Mainly we try to look at the product as a whole.
How is that done? Mainly using VMs because VMs are easy to scale and they are really isolated and separated. But also using containers. And then there is also different benchmarks executed
as well as other testing frameworks which are done within OpenQA but also in other contexts. And why this is done? I would say, well, this is what the user cares about when we are talking about OpenSUSE as a distribution or as an operating system. You use the system and this is also what OpenQA tries to do.
It uses the system as a user would do. But we are not finished with OpenQA yet. There is the next level. GUI acceptance tests. This is what I would say, this is where OpenQA shines. But I was presenting before system level tests
which have nothing to do with the UI necessarily. When we are talking about GUI acceptance tests, this is, yeah, okay, it needs to look correct, right? And we want to look at applications, they need to look awesome. So we want to preserve that and for that we can use OpenQA.
Which actually I would say is pretty fun to develop tests for because you can take a look on the screen and do what you would also do as a user, click somewhere and then you ask OpenQA to do that and you have that running in a test. Now, finally, something visual. This is a video recording
which is done automatically by OpenQA for every single job that it executes. And what you can see here is how OpenQA instructs the installer to install an operating system. And after the installation, it logs into the system and then clicks around, starts application and tries it.
What you can also see is that this is a bit faster than in real time. So of course when we are doing an installation, this is actually conducted in real time. It relies on the performance of the network because we're really downloading and installing the packages as a user would do. It's only afterwards that the installation itself
is a bit faster to do here. Okay, this is the full video recording. Later on it will boot into the system and it will open Firefox, go to another web page and then open the mail client and all that things. I think we do not need to necessarily wait for the whole.
However, what I would like to show is the booting process because this is something which is also pretty hard for other test automation tools to automate. So we are in the installer. We just stopped right before rebooting, collect some logs, then we boot. You saw KDE for a glimpse of a second.
Then we log into a text terminal we call zipper. We conduct some console based tests and then later on after we did that, we log into the graphical session here. We disable the screen saver and we test something about the network and all different kind of applications that we normally ship by
on the different distributions. We trigger here and at least try to poke them a little bit. We cannot have an in-depth test of all the applications and packages on that level because that would mean we would need to run like 20,000 packages on every run,
but that's a bit too much. So we are doing less than that. So if you want to see the full video, I invite you over to openk, opensuso.org, select any job and just enjoy the show there. Now, GUI acceptance test, who does that? Again, I would say release management and quality assurance
when we take a look at the results and we take a look like does it still render correctly, otherwise openk tests would also fail. How that is done using openk, at least for the distribution experience itself, and why? Because, well, compared to system level tests, this is what the desktop user cares about. There can be much more in-depth tests on the system level area,
but in the end it matters how it looks like. Okay, but as there was the last level, which is this cloudy area of exploratory and beta testing, and this is, well, manual by definition, I would say, because this is everything that you cannot automate, and this catches what was missed by automation
and it provides feedback on where to extend tests. This section in particular is very much dependent on all of you, on everyone, because, well, we don't know what was missed. We are relying on feedback, and this is also where it can scale out
to have something which I mentioned before, openk is mainly relying due to scalability reasons on virtual machines. If you want to go broader and further, then it's very much dependent on specific hardware drivers and all that things, and this is really hard to automate, even though maybe not impossible.
So this is what I would call exploratory beta testing. How that is done? Well, mainly by using it. So, you know, when Ludwig Nusselt is writing an announcement and saying, hey, there's a new version of LEAP 15.1, please try it, then we are relying also on the feedback from there. So I hope you're also providing the feedback
by creating the proper bug reports, or at least asking on the mailing list, hey, guys, do you also have that problem? Or is that problem important? No. Why? Well, no automation can be complete. Now this brings me to the points to take away. Important for me is that testing is no phase.
So this is something which you might have heard 20 years ago, very traditional, that you would have some development phase, some integration phase, some testing phase, and I would say that is hardly the case. We within SUSE QA, we test on a periodic regular base.
Everybody that builds packages is doing that all the time. You as users are doing the testing. So it happens all the time, and everyone is involved. It's also important to select the right tool for the purpose. So I presented some tools. I don't have the answer for all of them. And this is only what you can do for the individual jobs.
Do it in particular. Now, optional, this is something for you to explore further on. On your own, you can click around.
I provided some links for all the individual steps for one example project where we can follow on with all the steps of the pyramid which I have shown before in individual examples like pointing to GitHub, OBS, and further. The challenges regarding testing. Well, more tests are good, but how to know what is already tested,
that is hard to know. If you are having something tested on package level, you should know about it on a system level so that you know what I need to add or what not to repeat again. Some project packages are good at this, but how to scale? Not everybody does have to write the same level of testing, and tests may fail in any step.
But who can keep an overview? Speaking for myself as a QA engineer, it's hard to have an overview and really see what is tested where. I just know that it is, so I can trust that. With this, I'm at the end. Thank you.
Okay. So, any question, correction, note, single one for now, or meet me outside later?
Hello. I have a question about packaging maintenance. For example, recently we have a new version of third boat in Timbalwind, but it cannot renew the SSL certificate anymore because of some segmentation for it,
but as a packaging maintainer, they may not pass the package in some certain condition. The OpenQA can provide a test for some kind of important packages for servers,
because some people, maybe like me, just every night update the Timbalwind system without caring if every package is working, and if something happens, and, yeah. Right. So, I'm not sure I forgot this, right?
Is this about package dependencies which you might need for a certain version for something to work and you want to test a combination of packages? Maybe. Maybe. There is still not, it's not clear why this package doesn't work anymore.
Yeah, but if the package doesn't work with others or the new version itself has some problem. Yeah. I think this package shouldn't be. Right. So, in the end, what one is doing is building kind of your own distribution because you're having your package
which you're interested in relying on certain versions, which I would say normally you should rely on OBS to provide you that, maybe in a custom repository where you include all the other repositories, but then that can be tested. That could also be tested within OpenQA. Think about you as a user, how you would do that. You would add some repositories and then try if that works.
We can instruct OpenQA no problem to do the same by saying just use the latest Timbalwind snapshot plus this repo plus that repo and then see if something works, including upgrades. So, yeah, this is possible with the right combination. Okay, so thank you all.