Simple visual regression testing
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 32 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45447 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DjangoCon Europe 20194 / 32
5
8
14
22
24
27
29
30
31
00:00
Software testingLinear regressionVisual systemMathematicsSoftware testingBitPhysical systemProjective planeHypothesisFamilyMassProcess (computing)Generic programmingMultiplication sign1 (number)Coordinate systemLinear regressionComputer animationLecture/Conference
01:00
Software developerType theoryBitProcess (computing)Inheritance (object-oriented programming)Different (Kate Ryan album)Front and back endsStack (abstract data type)Right angleSoftware developerMathematicsStaff (military)Descriptive statisticsCASE <Informatik>Java applet2 (number)Computer animation
02:13
Sound effectMedical imagingCodeMathematicsSampling (statistics)MassPixelWeb pageWebsiteDifferent (Kate Ryan album)Graphics tabletWeb 2.0Cross-site scriptingBitSocial classContrast (vision)Demo (music)Pattern languageComputer animation
03:37
Web pageIntegrated development environmentMedical imagingCodeCodeLevel (video gaming)Software testingIntegrated development environmentMereologyError messageInheritance (object-oriented programming)Open sourceDifferent (Kate Ryan album)Graphics tabletSpacetime1 (number)MathematicsProduct (business)INTEGRALFunctional (mathematics)Projective planeVisualization (computer graphics)Web pageGraph coloringWeb 2.0
05:39
Device driverMedical imagingValidity (statistics)WindowLine (geometry)Local ringWeb pageFile viewerClique-widthRight angleWeb 2.0Video gameLevel (video gaming)Functional (mathematics)Pixel1 (number)
06:46
Medical imagingVideo gameImplementationMathematicsLevel (video gaming)Field (computer science)Social classWeb pageScripting languageGraph coloringWebsiteDifferent (Kate Ryan album)Element (mathematics)Multiplication sign1 (number)Decision theoryRegular graph19 (number)Block (periodic table)Computing platformTouchscreen2 (number)Demo (music)Computer animationLecture/Conference
08:25
Clique-widthComputer-generated imageryMedical imagingRow (database)MathematicsPerspective (visual)Library (computing)Level (video gaming)Software testingDimensional analysisDecision theoryConfidence intervalBitFunctional (mathematics)Letterpress printingContent (media)Term (mathematics)CASE <Informatik>Field (computer science)Inheritance (object-oriented programming)PixelWeb pageView (database)Uniform resource locatorVideo gameSystem call19 (number)Different (Kate Ryan album)TouchscreenSpacetimeRight angleComputer animationPanel painting
11:06
Computer-generated imageryFile viewerMedical imagingSoftware testingEmailCASE <Informatik>Different (Kate Ryan album)SpacetimeComputer animation
11:30
Computer-generated imageryClique-widthMedical imagingSoftware testingFunctional (mathematics)Data storage deviceCASE <Informatik>JSONComputer animationPanel painting
11:54
Software testingCASE <Informatik>Multiplication signImplementationINTEGRALMereologyProcess (computing)WebsiteComputer animation
12:21
Continuous functionDisintegrationMedical imagingCodeVideo gameImplementationMathematicsFunction (mathematics)INTEGRALSoftware testingIntegrated development environmentBitMereologyProcess (computing)Instance (computer science)Different (Kate Ryan album)Multiplication signContinuous integrationCircleElectronic mailing listDisk read-and-write headTouchscreenSimultaneous localization and mappingLecture/Conference
13:55
Open sourceLevel (video gaming)Lecture/Conference
14:34
Medical imagingCodeDampingMathematicsComputer clusterType theoryProduct (business)Level (video gaming)INTEGRALSoftware testingIntegrated development environmentFlow separationState of matterBitFunctional (mathematics)Linear regressionMereologyPairwise comparisonVirtual machineBranch (computer science)Process (computing)Cross-correlationDistribution (mathematics)Social classProper mapArithmetic progressionWeb pageWebsiteDifferent (Kate Ryan album)Multiplication signStandard deviationRepository (publishing)FeedbackCASE <Informatik>GradientCircleMiniDiscTouchscreenSuite (music)VolumenvisualisierungContinuous integrationLecture/Conference
18:29
Lecture/Conference
Transcript: English(auto-generated)
00:01
Hi, everyone. So today I'm going to talk about simple visual regression testing. The emphasis here really is on simple. Simple specifically because this project, this talk, is really generated from an issue I was having at work. I'm a full stack engineer, and so I was having issues with
00:22
knowing that my CSS changes were rendered correctly and I needed a simple way to test. So a little bit about me. I work for a company called Wealthy. It's a New York-based company, and what I've realized here now that I've been traveling in Europe for about a month or two now is we solve a problem that maybe is a bit unique to the United States. So we work with families who are caregivers
00:43
for people who are with chronic and aging loved ones, and we provide them care coordinators to really help them manage their care. Unique to the U.S. healthcare system, we have a whole massive insurance market, and negotiating that can be challenging, especially if you're trying to manage a full-time job. So we work with families for that.
01:00
So what is a full stack engineer? So I decided to go ahead and start with this little comic here. I think it does a good little summary of what is full stack. There's different types of full stack engineers. There's those who are more DevOps, and they sometimes do some other things on the side. There's backend engineers who sometimes do some CSS occasionally.
01:20
There's those who are front-end and are occasionally forced to go into the backend, and the little comic goes on to continue. It's from Commitstrip. It's a good little comic. I recommend it. It's super nerdy, super fun. And so looking at this, I would say that I'm probably the top right. I'm a full stack developer. I feel extremely comfortable in the backend.
01:41
I feel comfortable with Django. I can work my way around HTML. JavaScript is an old friend. My first job was in Java, so these are old friends. But CSS is a bit of a mythical art, and I respect all of you front-end engineers who are handling this well. I respect you. I value you. We're hiring a front-end engineer. Just going to throw that out there.
02:01
At which time, maybe I'll go back into the backend. And so considering now that I'm a front-end engineer, I still have to work with CSS. Work needs to get done. And so the problem I was having was I was trying to answer this question. How do you know your CSS changes don't have any unexpected side effects?
02:21
Being that if I move, change this class, how do I know that there's not unexpected changes on another page that shares the same class? Perhaps one way is, you know, don't reuse CSS if it's all unique. But then you end up with a massive code base that's unmanageable, nobody wants to use, and people like me would just avoid CSS even more.
02:41
Alternatively, you send it over to QA. QA magically, they do whatever they do, and potentially they come back to you with the request that this is off by a pixel. But sometimes even for a human eye, it's challenging to catch these differences that you see in two web pages, especially as they click through. So for example, if you have something like this,
03:03
who can tell me what's the difference between these two images? So for this demo, I went ahead and just spun up a sample site. I was being art through the other day, so I took a picture at a park of some flowers and threw it on here. And so it's subtle, it's visible, you can see. Any guesses? Yes?
03:24
Exactly. Yeah. So he's correct. It's the padding here on the images are off. And here, it's not high contrast, but you can still see it. It's a border, it's there. And so he's right, the padding's off a little bit. This is running just off my local host.
03:42
I spun up two instances. One's in a subdomain, calling that one staging, and then I have my dev environment, which is to mimic dev. So when I have this problem, I sometimes will catch it, but to be honest, sometimes I won't. And so rather than relying on QA or the person doing my code reviews to come up with these changes,
04:01
I decided to come up with a solution to really solve this problem for me. And it should be super simple. The code's pretty straightforward. It definitely could be architected more into something larger. I was working on making part of this open source but did not get quite finished before this talk, so coming soon. But it's just being able to integrate this
04:21
into your Python environment. So to do this, the requirements I needed, I needed something that would compare the same page into environments. So really whether it's your dev environment versus staging, staging versus prod, I really just wanted to look at two images programmatically and let me know if it's different.
04:43
Second, I really wanted to know exactly what's different. I don't need an error message that says, your CSS is off. Well, what does that mean? This color's wrong, the spacing's wrong, an image that should be there isn't there anymore because you broke everything. There's like a fire. Like, what does off mean? So I wanted a way to know that things were different,
05:02
and that's why I decided to go with something visual, that I could see if there was an error, something that would return so clear what I've done wrong. That way I can easily fix it. For me, the third requirement was I wanted to be able to run in my existing Django environment. So that means being able to run the test locally
05:21
before I commit being able to run the test in our continuous integration tool. I personally didn't feel it was necessary to run it between stage and prod, but that wasn't me choice. It definitely can be done. And so these were the requirements that I had when I started with this project. And so I really started with something super simple.
05:41
It's just a basic setup function, and I'm using Web Phantom JS. Phantom, I think they recently went to, like, they're not developing it anymore, so we'll see what that does long term. But the real rationale behind using that is I wanted something headless. I wanted something that would be fast to execute. And as you can see, this is super simple.
06:02
In the real world, I have these broken up into functions, but I just put all the lines there because it's so simple. It's first we stop with, I called it pick viewer because why not? And so the dev one is running on the dev, so this is based in my local host right now. And so I have to get the fix width when I get the window.
06:21
Then I go into the page, and I save a screenshot. Really, when you're starting up Selenium, depending on the page, these pages have no validations. But before you would really access these pages, if you need validations, you enter your username or whatever you need to do to actually access these pages. And so I'm doing the exact same thing,
06:40
which is why I pasted it here for both, for both staging and dev. So in the end, I should have two images. So to make my life easier, I made it clearly different rather than the last screenshot where it was pixels off. And so I have just a basic page. One's in dev, one's in staging. And it's an image viewer, same image, different borders.
07:03
So one's the one running in dev. I made a CSS change clearly to just change the color. So it should be blue because that's what it should be. Because I said so, I don't know. And the other one is red. But the real challenge here that you're going to have with this implementation that we ran into fairly quickly is the date.
07:21
So I just arbitrarily stuck 2019 on the top. What happens in 2020 if this is changing, if it's a daily thing, like happy Monday or a fixed date? That's a field that you know is going to change. And so if you're running, if it's running seconds, if it's running minutes, when you're trying to process these images,
07:40
they will be different always. And so you don't want a failure based on a time because you know the time should change, at least in a regular fashion. There's also the same issue with RSS feeds. If you have a feed that's going into your site or things that update fairly regularly but you know can be different and having different content isn't wrong.
08:01
So to address that, I simply wrote just a script that's executed. Once the DOM is rendered before I take the screenshot, I literally just go in. I have this class on there just for the demo as date. And within our platform, we do have fairly consistent CSS classes. And so it's generally fairly predictable of what element should change,
08:21
and I block it out. So what I'll end up with is two screenshots like this. So now we can see that the date has been just completely removed from it. It's really more of a practical, functional decision. I think that there are better ways to handle dates if you do want to validate the content in the date.
08:40
Perhaps you actually parse the DOM and you check the field when you're running the Selenium test. But for me, I don't care if it's 2019. I don't care if it's 1999. For this case, I really just care about the CSS and did the changes get rendered because I have confidence perhaps I'm rendering the date from a view in their test already.
09:00
So I'm not so much worried about that field. And so what I've done here is I got rid of the date. So really what's happening from a functional perspective of how we're doing this is I just use Python image library and we're comparing pixels. I kind of broke it out for this talk just so it's a little bit more visual as to what's happening.
09:22
But basically I take the two images and break it up into a grid and I just compare the brightness of the grid. So the size of each image that we're comparing from the dev image and the staging image, that can vary, yes. But here, this is just a simple going through and comparing and we create a grid.
09:41
And so just for visual purposes, this is kind of what that grid ends up looking like. So the top right, they all get compared. Generally, when I'm running this in dev, I don't print out this image because I find that there's no need for it. Really, the next step here is comparing the two images.
10:00
So it's top corner, top corner. And generally, by comparing the brightness, the images theoretically should be the same. I'm not brave enough to run live code on stage, so I have it all set up and it all works. But that's just really how it's working. It's super simple, super straightforward in terms of comparing the two images.
10:22
And so really just building out the same ad grid function. We have the images and generally in the tests, I know the URLs and I've already had the images, so I just throw it to the function. And this function, the rows and columns, they can be smaller or larger. Yet again, this is more of a functional tool
10:42
and I'm just trying to build something that will work, that will tell me what's different. And so we just really go through, we make columns and then we get the dimensions of the page or each image, which has been fixed when we originally take the screenshot. And I really just go in and I compare the image. And so for the case where the images are different,
11:04
that's the real case when I go ahead and I draw the grid. The real purpose of the grid is for me to visually see what's different. So the image is the same and the same basic header is the same and all the white space around is the same. But for me, this allows me to see what's different.
11:22
And so what happens when I'm running my test cases is I run my Selenium test, generally locally first, and things run and in the cleanup, which I don't think I put, tear down. In the tear down, there should be just a function. I don't think I included it here, but there should be a function to get rid of these images
11:41
that have just been generated. The reason behind that is if the test succeeds, I don't need all these spare images just sitting on my local database or in some storage that I'm using. So I'll generally go ahead and delete that. But for the case where the tests fail, aka it generates this result, for me it's really simple to see exactly what's gone wrong
12:04
and that's really what I need, especially when I'm dealing with CSS because my big challenge is maybe I'm not observant enough, maybe I don't catch these things, but it's a way to still do my job and execute it well without having to spend time clicking around on a site.
12:20
So once we have that, the last most challenging part of this entire implementation is the continuous integration. So all these things work so nicely on your local environment. I'm running Selenium, it's headless, life is good, it's giving me this output document, but what do you do when it's time to go to continuous integration
12:43
and someone else needs to validate that your code actually works and that the CSS changes you've made are fine? So for me, this was probably the biggest challenge of maybe this entire little project. Originally I was like, oh, we're using CircleCI, we'll just throw it up there and maybe we'll have a Docker instance that's reading that URL,
13:04
but the real challenge here is running this instance separately without running it in the Docker instance. So for us, we're using cross-browser testing and cross-browser testing allows us to take the screenshots of the different images and process them.
13:22
And what that really means is I'm just running my Selenium test in cross-browser testing and they have a built-in integration into CircleCI, which is really what I'm using. All right, I'm done and I think I spoke too quickly. Bit nervous, sorry.
13:45
No, that was great. So now we have some time for questions. So if you would like to ask questions either online using hashtag Django QA or in the IRC channel and in person.
14:11
Hi, that was brilliant, thanks. You said you were going to open source that, so... Yes. Are you going to? Yes, I am, I am. No, yesterday we learned even if it's not perfect,
14:20
we should put it out there. I wasn't bold enough last night, but yes, I will definitely. Hi, really, really cool. I was wondering how you handle stuff like very iterative pages, so you're working on something that hasn't been finished yet, but you just keep updating. How do you go about it? Did you just leave it out? That's actually a great question, and I realize now that I omitted that.
14:42
So there are times when CSS changes are valid, right? I made an update and I do want every page to change. And for those cases, we have the process of, we approve this change. So it's, do we acknowledge that, so it's a warning is what I have running in CircleCI, and you acknowledge that warning,
15:01
and if that warning is sufficient by the code reviewer, we go ahead and squash and merge, and that way the next time it's run, it will always, it will take a new image. And so that's why in CircleCI, we try to merge integration or merge branches first and then run the test runner. That way any new updates, especially if you're in between,
15:22
like it's in staging but it's not in prod, we'll be able to get those changes as well. Okay, cool, thanks. Hi, great talk. Thank you. Just a very quick question. What wasn't very clear to me, so you normally have lots of Selenium tests
15:42
where you test the behaviors of your pages and then all of them become screenshots automatically and then you compare that? So we have them running separately. So the image comparison one is run separately on really certain pages. We don't have them running on every page
16:01
because we're just parts of our site we don't use quite as often, but we have broken them out as separate because the Selenium test for functionality I think of as a little bit different because you're testing functionality and there is some correlation as if I type this, does this render? If I type this, does this render? There's a correlation there, but generally we choose to run them separately.
16:21
Perhaps over time this is going to become time consuming, but for now that's what we're doing. Nice, thanks. Hello, thank you so much for this talk. Is there anything you've learned from having something automated like this? I'm guessing a lot of changes you maybe didn't catch before
16:40
are now caught by the machine. Are there things about CSS and those differences between integration and production or like your dev environment and your integration environment that you've learned through having all this feedback? I think we've really learned. We have a standard now. If I work at a startup and we're very lean and we're trying our best to build an amazing product,
17:02
but we have now a CSS standard that's really helped me personally, but it also helps as a team, what class should we be using? We've been following that and that's really been the most valuable thing to hear these changes so that we're not just saying, okay, today it's a float. All right, now it's like blocking.
17:21
Whatever it is, we're using the same classes consistently and so it's really cut back on all the issues. Okay, thank you. Hi. I was wondering to turn that into or to use it for regression testing, you need to have some base
17:41
against which you're comparing your change. How do you... Because like in your demonstration, you were comparing dev and staging. But I guess in like for regression testing, you would compare the current state with like the suggested change. So do you track the current state in the GitHub repository or do you do that?
18:01
So the current state is really tracked based on prod. Our releases in prod are a little bit more stable and it comes as a chunk of work. And I mentioned that we're not currently doing the stage versus prod regression test, but I think that's where it should be done. But it's the issue that we were having was the continuous integration as we're constantly deploying into integration.
18:22
And so the integration of prod, I think, is really the next step of what we need to do to actually have proper regression testing. But that's a work in progress. Thanks.