We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Correlation analysis in automated testing

00:00

Formal Metadata

Title
Correlation analysis in automated testing
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Correlation Analysis is a statistical method that is used to discover if there is a relationship between two variables, and how strong that relationship might be. A correlation coefficient is a numerical measure of such correlation. According to the Cauchy–Schwarz inequality it has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. One of the axioms of automated testing is that tests are independent and in spite of that correlation coefficient should be equal to 0. But often it isn't. In this work, we are going to present a method of evaluation of tests suites quality based on correlation coefficient and finding their weak points. Using PC Engines open-source firmware regression test results, which are based on over 140 automated tests run with 2 flavors of software on 4 different platforms, we will show how its quality can be described numerically, and how that results can be used to optimize test criteria. As far as automated testing is considered all the tests can have only two expected output values - pass or fail. Originally Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations - the first question was how to do it for Boolean variables. We assumed that the only value that matters can be a failure of a test. During the lecture, we will present how mathematical analysis can reveal potential flaws in test criteria by targeting cases that have a large chance to fail simultaneously.
Software testingCross-correlationMathematical analysisCoalitionAutomationMathematical analysisCross-correlationComputer animation
NeuroinformatikLevel (video gaming)CASE <Informatik>Presentation of a groupDenial-of-service attackComputer animation
File formatParticle systemCASE <Informatik>FreewareProcess (computing)Protein foldingSupercomputerComputer simulationProteinStatistical hypothesis testingMobile appComputer animation
Planar graphLimit (category theory)Statistical hypothesis testingIterationMachine learningMultiplication signMathematical optimizationVirtual machineSimulationState of matterElectric generatorExpected valueComputer animation
Convex hullCross-correlationMatrix (mathematics)Menu (computing)Statistical hypothesis testingCross-correlationBenchmarkCondition numberSuite (music)Perfect groupCoefficientMultiplication signFunction (mathematics)Cohesion (computer science)
Mathematical analysisCross-correlationSoftware testingFinitary relationMountain passCross-correlationSoftwareStatistical hypothesis testingLinear regressionBoolean algebraRow (database)Green's functionRevision controlFunction (mathematics)CircleServer (computing)Service (economics)Source codeXML
Statistical hypothesis testingRevision controlMoment (mathematics)Power (physics)Covariance matrixSoftwarePlanningSpreadsheetGoogolCASE <Informatik>XML
MathematicsComputer animation
Matrix (mathematics)VariancePearson product-moment correlation coefficientVarianceDiagonalAutocovarianceStatistical hypothesis testingField (computer science)Arc (geometry)Collision
Variable (mathematics)Product (business)Pearson product-moment correlation coefficientMathematical analysisStandard deviationDemo (music)Cross-correlationMetric systemMatrix (mathematics)Correlation and dependenceStatistical hypothesis testingMathematicsDynamical systemRevision controlCoefficientCondition numberIterationSoftware maintenanceReal numberShape (magazine)Firewall (computing)Suite (music)Office suiteResultantCollisionRouter (computing)Point (geometry)Negative numberContinuous integrationLinear regressionKälteerzeugungComputer animationXMLProgram flowchart
Mathematical analysisStatistical hypothesis testingRouter (computing)Linear regressionSuite (music)CASE <Informatik>Computer hardwareFirmwareCross-correlationResultantSet (mathematics)Dynamical systemLine (geometry)Proof theory
Mathematical analysisFirmwareProof theorySet (mathematics)Cross-correlationLecture/Conference
Correlation and dependenceArithmetic meanVarianceArithmetic meanVarianceSet (mathematics)StatisticsCross-correlationMathematicsDifferent (Kate Ryan album)Graph (mathematics)CausalityComputer animation
Mathematical analysisDemo (music)Arithmetic progressionGoogolNeuroinformatikStatisticsSpeciesStreaming mediaStatistical hypothesis testingSoftware testingComputer animation
Point cloudFacebookOpen source
Transcript: English(auto-generated)
Thank you very much for coming. My name is Lukas Fuchisław, I'm here to talk about correlation analysis in automated testing, as you can read. At the beginning, I want to introduce myself, tell something about who I am and how the history of my education
moved me to such a spec of this presentation. Formerly, I thought about something slightly different, but it evolved into this. Then I will tell what was the problem and what was the purpose of this research. I will tell you about some map that stands behind all
this thing. And I will go to the use cases and some conclusions, maybe some anecdote. So first of all, what was my main purpose? I was a former computational biologist, bioinformatic.
I was a person who used very strong supercomputers to simulate biological processes like protein folding, an example. The most problematic issue with such research is that are very
pronounced to any performance issues, because there are millions of billions of billions of particles that can be connected to each other in many ways and can have influence
one or two in each other. So the performance was crucial. And as I started, as I moved to informatics, to IT, I started to be a tester in 3MD app in Poland. I started
to work with big test suits, not very big, but big enough to consider that the speed
of the test, the time that it would be done and the resources consumed are important. So I thought about making it some more reasonable to do some optimization of these tests. And
I think that nowadays it's a big team, machine learning. So why don't all the tests learn one from each other? You know, from the history, one test will know that other tests was run at almost the exact time if some other test failed, this test also failed.
Or if some other test was run several iterations and it couldn't reach its expected outcome, the other test would be either shortened or not done. Then I realized it was better
from the beginning, because if tests are correlated in such way, the test suit is badly created. It has some wrong assumptions and the test conditions are badly formulated.
So this is my assumption. I don't want to read it again. I want to make test suits
more elegant and save some time and resources. Because for the perfect test suit, there should be no correlation between the tests. If there are some correlations, tests should be passed because we don't need to be made. Then that might be some value measured that can
even benchmark test suits, the correlation coefficient. Let's see for an example. There are tests in rows and software versions in columns. Green is passed and red is failed,
seems obvious. And I considered tests a Boolean functions, which are true or false. Normally, when you compute correlation coefficient, you need to do regression first, because
it's Boolean function, even from zero to one, that wouldn't make any sense. So what I took was the probability, historical probability of passing the test. That was the value
that I was measuring if the test is below that probability or above that probability. Next, what I've done, it's obvious that test that will have 100% pass, it doesn't
really matter for us. Just the same as software version that fails all the tests. If we have power issue at the moment of testing, it doesn't really matter, yeah? It's not a matter of tests. So I left only the meaningful test cases, meaningful tests that
have sometimes failed, and I created a covariance matrix from them. This may look suspicious,
but it's really very easy math. It can be done in Google Spreadsheets. From this, I've created a covariance matrix, just as I said before, where on the diagonal is just a variance of each test that is meaningful, and the other fields are covariances between the tests.
Next, I computed a Pearson correlation coefficient. It may look scary, but it isn't, trust me. And create a correlation matrix. What we can see above, it's just the shape of the test
outcome that was from the beginning that was our input, and below it's scattered by me. It's only half of the matrix, because, to be clear, for clarification, it should have on a diagonal
one, because, and it should on the diagonal, because correlation between the test and the same test is one. Pearson coefficient has always, is always between minus one and one. Minus one is a negative correlation. If something passes, the other thing fails,
and one is positive correlation. If something fails or passes the other test, the other value is always the same. Zero means no correlation, and in a big test suit, or many test suits aggregated in regression, the real values are near zero. Plus or minus, but near zero.
If something is like, in this example, I don't consider one, because it would be obviously error, but 0.5 or something like this, it is significant. We can do that just to validate the test, but it wouldn't be much, it wouldn't have much value for us. What can have value?
We can, with each regression in continuous integration, continuous delivery, we can test each version and check the dynamic of changes of this coefficient. You know, if we can see that
some values that shouldn't be connected seems to be correlated, correlated from some version in the past, correlation is getting bigger with each iteration. We should see yellow light and check whether the conditions of the tests are prepared correctly.
This is a real test outcome from, we are in the 3ND app, maintenance of Swiss firewalls firmware of PC engines, IPO, maybe some of you used these routers, and this is the test results
of the firmware of one of the lines of the, one of the routers. We are maintaining them for above two and a half year, but these tests are not a good example because our tests are evolving too. Some tests were created and I don't know,
started to be used half year ago. So we have not completed large enough base of tests for long enough history to make this really useful, but large silicon vendors or hardware producers,
I suppose it could be useful just because there are very often cases that one person or one team is responsible for one suit in regression. All in all, they might not be aware of correlations
between the regression suits. So in the conclusion, the dynamics can be useful in large sets with a large history, and this is a proof of concept. Do not consider this to be a white
paper or something like this. Idea is quite new, and we started to work on this more seriously on our firmware, with our firmware. This is an anecdote, it may be some of the, some of you know it or have heard of it because these four sets of data have the same
as a certain below mean of X and Y, variance of X and Y correlation and so on. Although they are obviously different, so we have to be very, we have to look. I don't consider the statistics
to be a red light, to be a fail if something rapidly changes and something rapidly grows, because it may be some, from some other reason, but yellow light should be considered visible if something occurs like this. This is a biography, and thank you for your attention. I don't,
you do any computations, all the computations I've done in Google Specsheets. Thank you. Yes, I heard about it, but this is a new concept. We are just looking for ways that could improve
our test suits, and I think it should be considered. Thank you. I beg your pardon? I can't hear you.
People who are watching the stream have not heard that, so you should probably mention that there is a thing called mutation testing. Yes. Have I heard about the mutation testing? Because it can be used, it can simplify all the computations that we used here
with statistics. Yes, we've heard about it, but it's in progress. We are looking for new ways to do it, and I think it'd be considered. Thank you. Thank you very much for your attention.