How SAP is using Python to test its database SAP HANA
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33693 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2017129 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
DatabaseStatistical hypothesis testingExecution unitDisintegrationBuildingCodeFluid staticsMathematical analysisImplementationWordData compressionContinuous integrationFormal languageUnit testingOpen sourceStatistical hypothesis testingProjective planeDistribution (mathematics)CodeInterpreter (computing)Configuration spaceMultiplication signStatistical hypothesis testingClient (computing)Interface (computing)Sign (mathematics)Service (economics)Cartesian coordinate systemExecution unitLaptopPhysical systemMereologyDatabaseDescriptive statisticsRow (database)Web 2.0ResultantData managementProcess (computing)Table (information)Analytic setQuery languageSemiconductor memoryServer (computing)Right angleNormal (geometry)Relational databaseProduct (business)Repository (publishing)MathematicsGoodness of fitRevision controlScaling (geometry)Vertex (graph theory)Branch (computer science)INTEGRALWritingDialectBuildingFront and back endsBitAreaFitness functionSource codePattern languageDatabase transactionCAN busPiFreewareSound effectIn-Memory-DatenbankSoftwareGreatest elementEntropiecodierungSequelSequenceMathematical analysisOrder (biology)Computer animationDiagram
08:53
Statistical hypothesis testingDomain nameIntegrated development environmentScheduling (computing)Run time (program lifecycle phase)Division (mathematics)Linear regressionFunction (mathematics)Configuration spaceBranch (computer science)Broadcast programmingComplete metric spaceProcess (computing)Slide ruleRepository (publishing)Statistical hypothesis testingCodeAreaConfiguration spacePhysical systemSound effectCASE <Informatik>BuildingMereologyScheduling (computing)DivisorScaling (geometry)Statistical hypothesis testingExecution unitSoftwareINTEGRALBitSemiconductor memoryEntropiecodierungStrategy gameBlock (periodic table)Heegaard splittingTelecommunicationLine (geometry)Virtual machineNeuroinformatikResultantPattern languageDifferent (Kate Ryan album)MathematicsBranch (computer science)Linear regressionState of matterSoftware bugMultiplication signReal numberConnectivity (graph theory)Control flowSuite (music)Run time (program lifecycle phase)Office suiteUnit testingAdditionString (computer science)Point (geometry)Medical imagingProduct (business)Projective planeContrast (vision)TwitterLogicField (computer science)Front and back endsTracing (software)Standard deviationComplete metric spaceTerm (mathematics)DistanceScatteringEndliche ModelltheorieAverageArithmetic meanTask (computing)Causality
17:42
Scheduling (computing)Statistical hypothesis testingInfinite conjugacy class propertyQueue (abstract data type)Computer wormHeat transferLimit (category theory)Local ringCache (computing)Integrated development environmentParallel portSoftware developerImplementationElectronic meeting systemIndependence (probability theory)VelocityComputing platformArchitectureBefehlsprozessorVertex (graph theory)Scale (map)Read-only memoryOpen setWeb 2.0Statistical hypothesis testingResultantCuboidProcess (computing)MereologyPhysical systemQueue (abstract data type)WritingStatistical hypothesis testingScheduling (computing)Hand fanCurveFinite differenceDifferent (Kate Ryan album)CASE <Informatik>Latent heatInstallation artMikroarchitekturData transmissionComputing platformOpen sourceComputer architectureVelocitySoftwareShared memorySemiconductor memoryImplementationCodeDatabaseBitCache (computing)Integrated development environmentServer (computing)Line (geometry)Local ringProjective planeBlock (periodic table)Product (business)Exception handlingVirtualizationContent (media)CoprocessorScaling (geometry)Task (computing)Software bugComputer programmingScripting languageSolid geometrySound effectPerturbation theoryBit rateGoodness of fitDistancePatch (Unix)Pattern languageRelational databaseWeightDefault (computer science)FreewareSpecial unitary groupMiniDiscIntercept theoremBuildingMoment (mathematics)FreezingMultiplication signGroup actionRoundness (object)Centralizer and normalizerHuman migrationIndependence (probability theory)Service (economics)
26:30
Suite (music)Block (periodic table)Ocean currentProof theoryStatistical hypothesis testingRun time (program lifecycle phase)SoftwareBranch (computer science)CASE <Informatik>Physical systemWritingStatistical hypothesis testingMultiplicationPerformance appraisalResultantMereologyTraffic reportingData compressionData structureScripting languageMetric systemMoment (mathematics)Presentation of a groupStudent's t-testoutputDialectFactory (trading post)Configuration spaceFigurate numberError messageDefault (computer science)Scheduling (computing)TelecommunicationEntropiecodierungBlogLecture/Conference
Transcript: English(auto-generated)
00:04
Hello, everyone. So, first about me, so my name is Christoph. I'm working in the QA department of our in-memory database, SAP HANA, and this talk will show you
00:21
very, very short introduction about HANA itself, because it's important for the rest of the talk, and then about how we are using Python to test SAP HANA itself. So, let's get started with SAP HANA. It's a relative new product of SAP.
00:41
It's an in-memory database, which means we have some storage engines, and column store, and row store, which is really optimized to run in the main memory of a system, of a server. And with the optimized algorithm, it's much faster to find things,
01:01
to perform queries, and so on. And it fits very well for online analytic processing, so you can do normal analytic queries on it, or you can also use it for transactional processing. A typical database, you install it
01:22
maybe on just one system, but we can also build up a scale-out system, so you have multiple nodes, and connect each other to build a big database, and then you can distribute your tables across the systems, and so on. For you as a Python developer, it's actually not so interesting, because your interface is most of the time just SQL,
01:43
and then you use the database. So, it's basically not so interesting about the very detailed insights, but who knows? So, HANA itself is written in C++, so not so interesting for us,
02:03
but we deliver a lot of management tools and commands, which are totally written in Python, and one part of the HANA distribution itself is also Python interpreter. So, I told SAP HANA is an in-memory database,
02:21
which means you need a lot of memory, but you can start from the beginning. So, we have, for example, one small express edition. You can start with it on your local notebook. You just need 16 gigabytes of memory, but you can also scale out and have a system
02:41
with something like 48 terabyte of memory, and there are real customers which are running such systems, and it's still very impressive. So, if you take a look how you can use SAP HANA, or you can connect your Python application with SAP HANA,
03:01
it's very straightforward and very simple. We have a Python client signs basically the beginning of the time. With the next service package from HANA itself, the Python client will be fully supported and has also support of Python 2.7 and Python 3,
03:23
and then you interact with it over the typical DPRP interface. So, you open a connection, open a cursor, run some SQL, and you can fetch result. But, as most of the Python developer doesn't write SQL anymore, and personally I can totally understand this,
03:41
we have also some open source projects, or basically there are open source projects, to interact with the database. So, there's one dialect for SQL alchemy, you can run SQL alchemy very easy with FANA, and it makes a lot more fun than writing SQL, or there's also another open source project
04:02
to use SAP HANA as a data backend for Django itself. So, let's talk a little bit about testing a database. Testing a database is not so different than going to testing a software, because database is just a software. So, we have this very typical pyramid.
04:22
On the bottom, we have unit and component tests written in C++, because it's the main language of HANA, it's the main language of all developers, but if you start writing some kind of integration tests, or very complex end-to-end tests, or so, then C++ doesn't fit very well anymore,
04:44
and most of the tests are then written in Python. One disadvantage is that if you write more complex tests, more integration tests, they will be a little bit slower, and will be more expensive. So, unit tests are much faster,
05:02
and are much, much cheaper, typically. So, let's take a look at our development process, and how we integrate testing and quality assurance in this development process. A developer puts a change into our Garrett. Garrett is a very famous Git code review system,
05:24
and the whole HANA source code lives in one big Git repository. After the push from the developer, we trigger some quality assurance processes before the commit even reach any kind of branch. So, there will be no commit,
05:42
which will reach inside of a branch without testing, without building. And after the building and test processes are complete, and also other quality assurance things, like code analysis, style checkers, or sanitizers in the C++ coding,
06:02
there will be a review from a very dedicated team, which reviews your test results, and at the end, they are voting, okay, it's good enough, or please try again, and please fix some failures. And after the review, your change will get merged into the repository.
06:21
So, to build something like this, I mean, it's a very straightforward continuous integration landscape. It's also very common to do this. So, in 2010, it was like very common landscape.
06:42
So, developer is pushing to Garrett. Garrett will notify our Jenkins, your iServer about a new change. Jenkins will look into configuration. Maybe there's a job configuration for it, and then it will trigger this job, place it basically in a queue, and if there are some nodes with available resources,
07:05
one node will grab the job from the queue and execute them. Very straightforward. Let's take a look, a deeper look, into what such a job looks like. Such a job is basically divided in four parts.
07:22
So, you have to check out the latest source code. You have to build the database from the source code itself. We set up a complete database, and then we run the test. Very straightforward, and until now, looks like everyone basically does
07:41
in continuous integration. And one special thing is already included in the 2010 version. We have a central database, because we are a database department, and we store all of our test results in this central database. And the developer can afterwards take a look
08:03
on this data via web UI, and we can still access on this old data, and I can still review test results from 2010. It's actually not so interesting anymore, but we are still able to do it.
08:20
So, if you read the description of this talk, this talk is about scaling, and how we scaled our test infrastructure. So, the main question is, so why should I talk now about scaling? So, because until now, it looks like a very typical continuous integration system. So, let me try to prove it,
08:40
that maybe we have some experience with scaling. So, right now, our system is working for 600 developers, totally distributed across the globe. This developer is pushing around 700 commits every day into the system, into the repository.
09:01
We have around, we have above 30 million lines of Python testing coding in our repository, and we are performing, actually, every day testing of 36,000 hours. So, this is basically around four years.
09:22
So, we are running tests every day of four years. We do this on a landscape of, yeah, small 1,300 Jenkins nodes, and these Jenkins nodes are actually not so small, so we're not talking about RVS small editions.
09:43
We are talking about bare-metal servers, in most of the cases, and we are using around 408 terabyte of memory for testing. It's just about testing. So, we have much bigger systems with memory, but we are still using 408 terabyte for testing.
10:04
So, let's talk a little bit about how we did the scaling, and I just picked four topics for scaling. So, one interesting part is test run time, how we optimized that.
10:20
Test scheduling is also quite important thing. Add effects is also a very interesting area because it cannot move so much data around, and then you have to provide a very healthy test environment, especially if you test on bare-metal systems. Let's take first about test run time itself.
10:42
So, this slide shows the run time of a Jenkins job of around eight hours, and you can see the job doesn't fit anymore on the slide. So, you're pushing and you're waiting more than eight hours until your test result is available,
11:00
until you know if everything works well. It's not so great, actually, from a developer perspective. And we started to optimize it by applying a very common pattern from computer science, divide and conquer, or in our area, it's basically divide and test.
11:21
So, the first thing we did was we separated the test job from the build job itself, which means we can now run the build on a different machine, on maybe a machine which is optimized for building our product, and at the end, we run the test on a machine
11:43
which is optimized for testing. So, a very common example is that build machines has typically more CPUs, and test machines has more memory. So, but actually, this split now increase the time because you have now also this communication time,
12:01
you have to transfer the artifacts across the network to the different hosts, but this was a prerequisite. Now, we can also split the test block into smaller test blocks, which has also the benefit that in case one test block fails,
12:22
then we can reschedule it and we can still keep the time for review by around seven, eight hours, which is actually good enough right now. So, let's talk a little bit more about test failures
12:40
in our case. Tests can fail, and actually, it's the intention of test. So, you wrote something new, you don't thought about maybe unrelated component, and now the test fails. This is broken. In case the test is broken, we currently run the strategy
13:05
to rerun this test to verify that this is now a real regression. You really broke something, and it's not caused by some sporadic failures, kind of network latency issues, or some kind of general infrastructure problems, and so on.
13:25
So, after the rerun is complete, and in case the rerun is still, or the rerun is still in a failed state, then we know, okay, it's a real regression, and someone has to take a look at the results and take a look at the traces and decide,
13:42
okay, what is the reason for this? So, the main question is, who restarts fail tests? So, you can imagine in case a developer pushed something, and then goes, for example, home, he actually doesn't want to get in the office
14:02
at the morning and say, okay, test failed, I have to restart it, have to wait eight hours or one hour until the test result is available. So, this is the reason why we started thinking about more intelligent test scheduling. And we thought more about test scheduling,
14:21
and we found out that test scheduling is basically about two parts. The first part is about configuration. Which tests should now run? Which tests are now interesting for this change? So, for example, we have different configuration for our multiple hundred git branches with different components inside,
14:43
that if you push in your topic branch, which is something like a feature branch, then we will run tests optimized, or tests basically for this particular feature. If you push against one hour integration branches,
15:01
then we will run a huge suite of tests to avoid requestions and other components and so on. But we can also integrate things, and we also added features like layer testing, which means we first do some kind of unit testing, so we run first unit tests in our infrastructure,
15:23
and first, after this unit tests are successful, we run the really expensive integration test, and then we run the really expensive end-to-end tests. And as we have a large developer base and a large code base, it can happen that someone breaks a test,
15:41
and in case of you have a broken test, you cannot stop to integrate new changes in your integration branches. It wouldn't work in such a scale anymore, and this is the reason why we have also some features and ways to handle such broken tests, so you can, for example, move a test into a quarantine,
16:02
and say, okay, we know this test is currently unstable, there are some bugs inside, and we exclude this test in the execution to save runtime. The other big thing about test scheduling is to observe the whole test run,
16:21
so in case there's a failed test, he reschedules the thing, or in case the test is now complete, he can automatically perform a review of the test, and the most important thing is basically after someone pushed something,
16:41
you actually want to know, okay, now it's complete, because the test just runs eight hours. For this, we started to implement more intelligent test scheduler, obviously in Python, because we really love Python, which means after a build,
17:01
the build will trigger our more advanced test scheduling, we call it the waiter, and the waiter will then ask different systems about configuration, about states of certain things, so for example, in case you push something, and you reference a bug inside of it,
17:21
then we will take a look at the state of this bug, and only in case this bug is in a defined state and process, we will then start the test execution. After the waiter decided which test should run, he will schedule it in Jenkins,
17:42
and will still monitor it in case there's a failure or some test block is missing or something, he will reschedule it. At such scale, we have also to talk about queuing in scheduling of tests, so we have certain requirements,
18:02
so for example, our nightly test should be completed in the morning, or that bug fixes are a little bit more important than new features, so the test should run with a higher priority, and also that we should maybe finish,
18:20
start a test, finish the testing of commits in case they are not fully tested and we have some reruns. Jenkins currently only provides a first-in, first-out queue, and with first-in, first-out, it's really hard to implement such requirements,
18:41
and our solution for this was also to implement it in Python again, because we love Python. So we built a prioritized test queue, which means the waiter is putting now the tasks to run a certain test into this queue, and this queue will sort them around
19:02
based on priority, based on the content of test task, and then a processor will fetch queue items from the prioritized queue and will distribute it across Jenkins' hosts, Jenkins' masters, which is actually also a very required feature
19:23
to distribute across Jenkins' masters, because we just learned that Jenkins doesn't scale well with more than 350 server inside. So let's talk a little bit about artifacts. One thing is that our installer
19:42
is a little bit bigger than typical software products, than a typical Python package. Our installer is still 15 gigabytes and is living on an NFS share. After the build is complete, they will place the installer on the share,
20:04
and we will install from the share, and we have also test data in various ranges, something like four megabytes, but also above or until 800 gigabytes. And we are doing per week, actually,
20:21
nine petabytes of data transfers just to transferring the installer and test data to the real hosts which are running the tests. And to optimize this, we introduced some kind of caching. So we just place a very simple Python script, it's around 800 lines of code,
20:40
before we call the installer, and he will check is the installer already locally available, and in case there's a cache miss, he will fetch the installer from the NFS share, place it locally in the cache, and then we can run the installer right away from it. And we can also do the same thing for test data.
21:04
So in case it has requesting some specific test data artifact, we intercept this call, and then we are fetching the artifact from the central shares on the local disk, and we can import it. And with this, we actually saved a lot of traffic,
21:24
and this implementations are very straightforward and very easy. And we saved 66% of traffic, and we are now only transferring still much, but it's better than nine petabyte,
21:41
so we are still transferring three petabytes of test artifacts every week across our network. The next very important thing, especially in such a test environment, is that you have a very healthy test environment. So you have to make sure that all your external dependencies are available.
22:01
So as I mentioned, we have this NFS shares with artifacts, with test data, but we are also testing distributed systems, so your local host is not the only host which is currently interesting for your test run. And as we know, external dependencies will always fail. But we have also to make sure
22:21
that your local system is in a healthy state. As we are performing parallel testing on the same host, we have to make sure that there's no noisy neighbor around on the same host, and noisier test basically running on the host, which, for example, consumes all the available memory,
22:41
and then it's very logical that your test will fail because there's no memory available anymore. So to solve this, we started and implemented a health check, which runs before and after a certain test run.
23:00
And it checks all these dependencies, availability of external services, local memory usage, CPU usage, and so on. And all is implemented in Python. So what does it look like today? Today, we have still Gerrit,
23:22
but Gerrit doesn't trigger any more Jenkins directly, or there's no Gerrit trigger anymore available. We have now dedicated infrastructure for building the database itself. And after the build is complete,
23:41
we get some notification, and we will start our wait up, which will then schedule the test tasks, and then some test processes will distribute them across our Jenkins landscape. And the nice part is that if we now take a look what is currently running in our infrastructure,
24:02
we will see a lot of things are now totally written Python. So every blue box is now in Python. Also, the build infrastructure is heavily implemented in Python. And we still have our central database with all the results and the web UI to review these results.
24:22
And we are still heavy Python fans. So it's still very great that the learning curve is so easy to get started, that non-developers can write tests for us, for the database. We are very big fans of the community,
24:40
so we are heavily relying on open source tools like virtual AN, FlagPip. We are heavy user of Sentry, so we are storing a lot of exceptions in Sentry. We are big fans of the development velocity.
25:00
Performance is nice, but to bring a feature which you had in mind in the morning at the afternoon into production is much better actually. And that Python is platform independent allowed us also to scale across typical architectures. So currently we are running tests on three different CPU architectures
25:22
and over 10 different operating systems versions. So just give me a very short outlook what we are currently doing and what we are currently trying to achieve. So we are currently thinking how we scale across 3,000 nodes.
25:41
We have some POCs running with Apache Mesos for doing a more resource-based scheduling approach. We are playing a lot around with Linux containers, currently Docker, but maybe some other container engines could be also interesting, to limit resources to ensure that your test run
26:01
has enough resources locally available. And we are currently in a step to migrate to Python 3, so a lot of the code base is still in Python 2, but there are some projects already migrated to Python 3. So thank you very much. We are still hiring, so in case you want to play around
26:22
with a lot of memory, you should definitely talk with me. So I think we have enough memory for everyone. Thank you. Thank you, Christoph. Are there questions? Hello, thank you for the presentation.
26:45
I'm curious about how do you test the failure scenarios, like the, I think, I know it has like the distributed structure. Does it test on the Jenkins side, or do you test it with like the Python? So we're testing it in Python itself.
27:02
So in case you have a distributed system like HANA, it's possible for the developer to write test cases to say, okay, now I would like to intercept a certain network communication, for example, and then you can test the behavior, how it now works without network between two nodes,
27:23
for example. Thank you. Hello, thank you for your talk. One question you mentioned, you run relevant tests first for topic branches. How do you do that?
27:41
Like, does a developer have to define them, or are you able to determine that automatically? Currently it's configured, so the developer says, okay, I would like to run this test in my branch, but we're thinking about ways how you map, for example, source code to testing,
28:01
to test coding, and then we can decide, okay, what should run? And we have also some proof of concepts running to use our coverage data, for example, for such things. Hello, how do you split up your test suites so that they could run in under eight hours?
28:20
Do you do it by hand, or do you have something interesting scheduled in methods or something? We are currently doing mostly by hand, so one of this test block contains multiple Python test scripts, actually, and then we are trying to picking them together
28:42
to reach a runtime which is acceptable for us. We have some scripts which can generate these collections, but it's not so sophisticated. More questions?
29:01
We have a question, also performance tests, part of the suite? Yes, we are also doing performance tests with actually the same landscape, so all the performance tests, many performances are written in Python also, and we are doing also a lot of reporting and evaluation
29:20
of the results of performance testing with Python. Cool, more questions? That's not the case, so let's thank Christoph again.