Simple visual regression testing - TIB AV-Portal

Simple visual regression testing

00:00

4

DjangoCon Europe

Formal Metadata

Title

Simple visual regression testing

Title of Series

DjangoCon Europe 2019

Number of Parts

32

Author

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/45447 (DOI)

Publisher

DjangoCon Europe

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Visual regression testing is an afterthought to many developers working on lean development teams. We will go over how to create automated visual tests that are both scalable and simple to implement. Most of us will agree that testing CSS changes is important…in theory. But when you have limited development time and no QA resources, how do you ensure that pages look as expected after CSS changes? Instead of constantly having to click around we can programmatically detect UI changes In this talk I will start with how to set up your existing test environment with automated visual tests using Python and Selenium. Then we will discuss a few pit falls that I ran into when implementing visual regression testing The audience will be expected to have a general understanding about how test cases are set up in Django

DjangoCon Europe 20194 / 32

1

19:24

The 750,000-line long pull request: crafting a more resilient open source community

2

28:36

Take the goRe out of a DjangoReact stack

3

53:29

Keynote: Sketching out a Django redesign

4

18:31

Simple visual regression testing

5

33:02

Serverless Django with Zappa

6

28:58

Reduce, Reuse, Recycle - Persisting WebSocket connections with SharedWorkers

7

25:14

Pushing the ORM to its limits

8

29:06

Pentesting your Django apps

9

35:31

Nothingness and identity in Python and Django

10

27:15

Maps with GeoDjango, PostGIS and Leaflet

11

30:34

Making your life (h)APIer with Django

12

18:40

Fetching data from APIs (GitHub) using Django and GraphQl without hitting the rate limits

13

40:20

Keynote: Feeding the Pony: Contributing back to Django & How to make that work for you

14

25:39

Does this run in linear time?

15

48:24

Keynote: Docs or it didn't happen!

16

49:34

DjangoCon Europe 2019 - Lightning Talks Day 3

17

32:59

DjangoCon Europe 2019 - Lightning Talks Day 2

18

28:51

DjangoCon Europe 2019 - Lightning Talks Day 1

19

26:39

Building plugin ecosystems with Django

20

21:01

Building a Django Community in Africa

21

26:23

Building a custom model field from the ground up

22

31:00

Logging Rethought 2

23

46:45

Maintaining a Django codebase after 10k commits

24

27:40

Jupyter, Django and Altair

25

19:39

How PyLadies Brazil became the biggest PyLadies chapter of the world

26

29:44

How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

27

27:13

Here Come The Robots

28

52:40

Frontend Development for Backend Developers

29

14:20

djangocon.close ()

30

32:12

Django and Web Security Headers

31

36:59

Keynote: Apathy and Arsenic

32

43:13

Advanced, free, open-source application performance monitoring for your Python web apps

Automatic playback

Speech

Text

Image

00:00

Software testingLinear regressionVisual systemMathematicsSoftware testingBitPhysical systemProjective planeHypothesisFamilyMassProcess (computing)Generic programmingMultiplication sign1 (number)Coordinate systemLinear regressionComputer animationLecture/Conference

01:00

Software developerType theoryBitProcess (computing)Inheritance (object-oriented programming)Different (Kate Ryan album)Front and back endsStack (abstract data type)Right angleSoftware developerMathematicsStaff (military)Descriptive statisticsCASE <Informatik>Java applet2 (number)Computer animation

02:13

Sound effectMedical imagingCodeMathematicsSampling (statistics)MassPixelWeb pageWebsiteDifferent (Kate Ryan album)Graphics tabletWeb 2.0Cross-site scriptingBitSocial classContrast (vision)Demo (music)Pattern languageComputer animation

03:37

Web pageIntegrated development environmentMedical imagingCodeCodeLevel (video gaming)Software testingIntegrated development environmentMereologyError messageInheritance (object-oriented programming)Open sourceDifferent (Kate Ryan album)Graphics tabletSpacetime1 (number)MathematicsProduct (business)INTEGRALFunctional (mathematics)Projective planeVisualization (computer graphics)Web pageGraph coloringWeb 2.0

05:39

Device driverMedical imagingValidity (statistics)WindowLine (geometry)Local ringWeb pageFile viewerClique-widthRight angleWeb 2.0Video gameLevel (video gaming)Functional (mathematics)Pixel1 (number)

06:46

Medical imagingVideo gameImplementationMathematicsLevel (video gaming)Field (computer science)Social classWeb pageScripting languageGraph coloringWebsiteDifferent (Kate Ryan album)Element (mathematics)Multiplication sign1 (number)Decision theoryRegular graph19 (number)Block (periodic table)Computing platformTouchscreen2 (number)Demo (music)Computer animationLecture/Conference

08:25

Clique-widthComputer-generated imageryMedical imagingRow (database)MathematicsPerspective (visual)Library (computing)Level (video gaming)Software testingDimensional analysisDecision theoryConfidence intervalBitFunctional (mathematics)Letterpress printingContent (media)Term (mathematics)CASE <Informatik>Field (computer science)Inheritance (object-oriented programming)PixelWeb pageView (database)Uniform resource locatorVideo gameSystem call19 (number)Different (Kate Ryan album)TouchscreenSpacetimeRight angleComputer animationPanel painting

11:06

Computer-generated imageryFile viewerMedical imagingSoftware testingEmailCASE <Informatik>Different (Kate Ryan album)SpacetimeComputer animation

11:30

Computer-generated imageryClique-widthMedical imagingSoftware testingFunctional (mathematics)Data storage deviceCASE <Informatik>JSONComputer animationPanel painting

11:54

Software testingCASE <Informatik>Multiplication signImplementationINTEGRALMereologyProcess (computing)WebsiteComputer animation

12:21

Continuous functionDisintegrationMedical imagingCodeVideo gameImplementationMathematicsFunction (mathematics)INTEGRALSoftware testingIntegrated development environmentBitMereologyProcess (computing)Instance (computer science)Different (Kate Ryan album)Multiplication signContinuous integrationCircleElectronic mailing listDisk read-and-write headTouchscreenSimultaneous localization and mappingLecture/Conference

13:55

Open sourceLevel (video gaming)Lecture/Conference

14:34

Medical imagingCodeDampingMathematicsComputer clusterType theoryProduct (business)Level (video gaming)INTEGRALSoftware testingIntegrated development environmentFlow separationState of matterBitFunctional (mathematics)Linear regressionMereologyPairwise comparisonVirtual machineBranch (computer science)Process (computing)Cross-correlationDistribution (mathematics)Social classProper mapArithmetic progressionWeb pageWebsiteDifferent (Kate Ryan album)Multiplication signStandard deviationRepository (publishing)FeedbackCASE <Informatik>GradientCircleMiniDiscTouchscreenSuite (music)VolumenvisualisierungContinuous integrationLecture/Conference

18:29

Lecture/Conference

Transcript: English(auto-generated)

00:01

Hi, everyone. So today I'm going to talk about simple visual regression testing. The emphasis here really is on simple. Simple specifically because this project, this talk, is really generated from an issue I was having at work. I'm a full stack engineer, and so I was having issues with

00:22

knowing that my CSS changes were rendered correctly and I needed a simple way to test. So a little bit about me. I work for a company called Wealthy. It's a New York-based company, and what I've realized here now that I've been traveling in Europe for about a month or two now is we solve a problem that maybe is a bit unique to the United States. So we work with families who are caregivers

00:43

for people who are with chronic and aging loved ones, and we provide them care coordinators to really help them manage their care. Unique to the U.S. healthcare system, we have a whole massive insurance market, and negotiating that can be challenging, especially if you're trying to manage a full-time job. So we work with families for that.

01:00

So what is a full stack engineer? So I decided to go ahead and start with this little comic here. I think it does a good little summary of what is full stack. There's different types of full stack engineers. There's those who are more DevOps, and they sometimes do some other things on the side. There's backend engineers who sometimes do some CSS occasionally.

01:20

There's those who are front-end and are occasionally forced to go into the backend, and the little comic goes on to continue. It's from Commitstrip. It's a good little comic. I recommend it. It's super nerdy, super fun. And so looking at this, I would say that I'm probably the top right. I'm a full stack developer. I feel extremely comfortable in the backend.

01:41

I feel comfortable with Django. I can work my way around HTML. JavaScript is an old friend. My first job was in Java, so these are old friends. But CSS is a bit of a mythical art, and I respect all of you front-end engineers who are handling this well. I respect you. I value you. We're hiring a front-end engineer. Just going to throw that out there.

02:01

At which time, maybe I'll go back into the backend. And so considering now that I'm a front-end engineer, I still have to work with CSS. Work needs to get done. And so the problem I was having was I was trying to answer this question. How do you know your CSS changes don't have any unexpected side effects?

02:21

Being that if I move, change this class, how do I know that there's not unexpected changes on another page that shares the same class? Perhaps one way is, you know, don't reuse CSS if it's all unique. But then you end up with a massive code base that's unmanageable, nobody wants to use, and people like me would just avoid CSS even more.

02:41

Alternatively, you send it over to QA. QA magically, they do whatever they do, and potentially they come back to you with the request that this is off by a pixel. But sometimes even for a human eye, it's challenging to catch these differences that you see in two web pages, especially as they click through. So for example, if you have something like this,

03:03

who can tell me what's the difference between these two images? So for this demo, I went ahead and just spun up a sample site. I was being art through the other day, so I took a picture at a park of some flowers and threw it on here. And so it's subtle, it's visible, you can see. Any guesses? Yes?

03:24

Exactly. Yeah. So he's correct. It's the padding here on the images are off. And here, it's not high contrast, but you can still see it. It's a border, it's there. And so he's right, the padding's off a little bit. This is running just off my local host.

03:42

I spun up two instances. One's in a subdomain, calling that one staging, and then I have my dev environment, which is to mimic dev. So when I have this problem, I sometimes will catch it, but to be honest, sometimes I won't. And so rather than relying on QA or the person doing my code reviews to come up with these changes,

04:01

I decided to come up with a solution to really solve this problem for me. And it should be super simple. The code's pretty straightforward. It definitely could be architected more into something larger. I was working on making part of this open source but did not get quite finished before this talk, so coming soon. But it's just being able to integrate this

04:21

into your Python environment. So to do this, the requirements I needed, I needed something that would compare the same page into environments. So really whether it's your dev environment versus staging, staging versus prod, I really just wanted to look at two images programmatically and let me know if it's different.

04:43

Second, I really wanted to know exactly what's different. I don't need an error message that says, your CSS is off. Well, what does that mean? This color's wrong, the spacing's wrong, an image that should be there isn't there anymore because you broke everything. There's like a fire. Like, what does off mean? So I wanted a way to know that things were different,

05:02

and that's why I decided to go with something visual, that I could see if there was an error, something that would return so clear what I've done wrong. That way I can easily fix it. For me, the third requirement was I wanted to be able to run in my existing Django environment. So that means being able to run the test locally

05:21

before I commit being able to run the test in our continuous integration tool. I personally didn't feel it was necessary to run it between stage and prod, but that wasn't me choice. It definitely can be done. And so these were the requirements that I had when I started with this project. And so I really started with something super simple.

05:41

It's just a basic setup function, and I'm using Web Phantom JS. Phantom, I think they recently went to, like, they're not developing it anymore, so we'll see what that does long term. But the real rationale behind using that is I wanted something headless. I wanted something that would be fast to execute. And as you can see, this is super simple.

06:02

In the real world, I have these broken up into functions, but I just put all the lines there because it's so simple. It's first we stop with, I called it pick viewer because why not? And so the dev one is running on the dev, so this is based in my local host right now. And so I have to get the fix width when I get the window.

06:21

Then I go into the page, and I save a screenshot. Really, when you're starting up Selenium, depending on the page, these pages have no validations. But before you would really access these pages, if you need validations, you enter your username or whatever you need to do to actually access these pages. And so I'm doing the exact same thing,

06:40

which is why I pasted it here for both, for both staging and dev. So in the end, I should have two images. So to make my life easier, I made it clearly different rather than the last screenshot where it was pixels off. And so I have just a basic page. One's in dev, one's in staging. And it's an image viewer, same image, different borders.

07:03

So one's the one running in dev. I made a CSS change clearly to just change the color. So it should be blue because that's what it should be. Because I said so, I don't know. And the other one is red. But the real challenge here that you're going to have with this implementation that we ran into fairly quickly is the date.

07:21

So I just arbitrarily stuck 2019 on the top. What happens in 2020 if this is changing, if it's a daily thing, like happy Monday or a fixed date? That's a field that you know is going to change. And so if you're running, if it's running seconds, if it's running minutes, when you're trying to process these images,

07:40

they will be different always. And so you don't want a failure based on a time because you know the time should change, at least in a regular fashion. There's also the same issue with RSS feeds. If you have a feed that's going into your site or things that update fairly regularly but you know can be different and having different content isn't wrong.

08:01

So to address that, I simply wrote just a script that's executed. Once the DOM is rendered before I take the screenshot, I literally just go in. I have this class on there just for the demo as date. And within our platform, we do have fairly consistent CSS classes. And so it's generally fairly predictable of what element should change,

08:21

and I block it out. So what I'll end up with is two screenshots like this. So now we can see that the date has been just completely removed from it. It's really more of a practical, functional decision. I think that there are better ways to handle dates if you do want to validate the content in the date.

08:40

Perhaps you actually parse the DOM and you check the field when you're running the Selenium test. But for me, I don't care if it's 2019. I don't care if it's 1999. For this case, I really just care about the CSS and did the changes get rendered because I have confidence perhaps I'm rendering the date from a view in their test already.

09:00

So I'm not so much worried about that field. And so what I've done here is I got rid of the date. So really what's happening from a functional perspective of how we're doing this is I just use Python image library and we're comparing pixels. I kind of broke it out for this talk just so it's a little bit more visual as to what's happening.

09:22

But basically I take the two images and break it up into a grid and I just compare the brightness of the grid. So the size of each image that we're comparing from the dev image and the staging image, that can vary, yes. But here, this is just a simple going through and comparing and we create a grid.

09:41

And so just for visual purposes, this is kind of what that grid ends up looking like. So the top right, they all get compared. Generally, when I'm running this in dev, I don't print out this image because I find that there's no need for it. Really, the next step here is comparing the two images.

10:00

So it's top corner, top corner. And generally, by comparing the brightness, the images theoretically should be the same. I'm not brave enough to run live code on stage, so I have it all set up and it all works. But that's just really how it's working. It's super simple, super straightforward in terms of comparing the two images.

10:22

And so really just building out the same ad grid function. We have the images and generally in the tests, I know the URLs and I've already had the images, so I just throw it to the function. And this function, the rows and columns, they can be smaller or larger. Yet again, this is more of a functional tool

10:42

and I'm just trying to build something that will work, that will tell me what's different. And so we just really go through, we make columns and then we get the dimensions of the page or each image, which has been fixed when we originally take the screenshot. And I really just go in and I compare the image. And so for the case where the images are different,

11:04

that's the real case when I go ahead and I draw the grid. The real purpose of the grid is for me to visually see what's different. So the image is the same and the same basic header is the same and all the white space around is the same. But for me, this allows me to see what's different.

11:22

And so what happens when I'm running my test cases is I run my Selenium test, generally locally first, and things run and in the cleanup, which I don't think I put, tear down. In the tear down, there should be just a function. I don't think I included it here, but there should be a function to get rid of these images

11:41

that have just been generated. The reason behind that is if the test succeeds, I don't need all these spare images just sitting on my local database or in some storage that I'm using. So I'll generally go ahead and delete that. But for the case where the tests fail, aka it generates this result, for me it's really simple to see exactly what's gone wrong

12:04

and that's really what I need, especially when I'm dealing with CSS because my big challenge is maybe I'm not observant enough, maybe I don't catch these things, but it's a way to still do my job and execute it well without having to spend time clicking around on a site.

12:20

So once we have that, the last most challenging part of this entire implementation is the continuous integration. So all these things work so nicely on your local environment. I'm running Selenium, it's headless, life is good, it's giving me this output document, but what do you do when it's time to go to continuous integration

12:43

and someone else needs to validate that your code actually works and that the CSS changes you've made are fine? So for me, this was probably the biggest challenge of maybe this entire little project. Originally I was like, oh, we're using CircleCI, we'll just throw it up there and maybe we'll have a Docker instance that's reading that URL,

13:04

but the real challenge here is running this instance separately without running it in the Docker instance. So for us, we're using cross-browser testing and cross-browser testing allows us to take the screenshots of the different images and process them.

13:22

And what that really means is I'm just running my Selenium test in cross-browser testing and they have a built-in integration into CircleCI, which is really what I'm using. All right, I'm done and I think I spoke too quickly. Bit nervous, sorry.

13:45

No, that was great. So now we have some time for questions. So if you would like to ask questions either online using hashtag Django QA or in the IRC channel and in person.

14:11

Hi, that was brilliant, thanks. You said you were going to open source that, so... Yes. Are you going to? Yes, I am, I am. No, yesterday we learned even if it's not perfect,

14:20

we should put it out there. I wasn't bold enough last night, but yes, I will definitely. Hi, really, really cool. I was wondering how you handle stuff like very iterative pages, so you're working on something that hasn't been finished yet, but you just keep updating. How do you go about it? Did you just leave it out? That's actually a great question, and I realize now that I omitted that.

14:42

So there are times when CSS changes are valid, right? I made an update and I do want every page to change. And for those cases, we have the process of, we approve this change. So it's, do we acknowledge that, so it's a warning is what I have running in CircleCI, and you acknowledge that warning,

15:01

and if that warning is sufficient by the code reviewer, we go ahead and squash and merge, and that way the next time it's run, it will always, it will take a new image. And so that's why in CircleCI, we try to merge integration or merge branches first and then run the test runner. That way any new updates, especially if you're in between,

15:22

like it's in staging but it's not in prod, we'll be able to get those changes as well. Okay, cool, thanks. Hi, great talk. Thank you. Just a very quick question. What wasn't very clear to me, so you normally have lots of Selenium tests

15:42

where you test the behaviors of your pages and then all of them become screenshots automatically and then you compare that? So we have them running separately. So the image comparison one is run separately on really certain pages. We don't have them running on every page

16:01

because we're just parts of our site we don't use quite as often, but we have broken them out as separate because the Selenium test for functionality I think of as a little bit different because you're testing functionality and there is some correlation as if I type this, does this render? If I type this, does this render? There's a correlation there, but generally we choose to run them separately.

16:21

Perhaps over time this is going to become time consuming, but for now that's what we're doing. Nice, thanks. Hello, thank you so much for this talk. Is there anything you've learned from having something automated like this? I'm guessing a lot of changes you maybe didn't catch before

16:40

are now caught by the machine. Are there things about CSS and those differences between integration and production or like your dev environment and your integration environment that you've learned through having all this feedback? I think we've really learned. We have a standard now. If I work at a startup and we're very lean and we're trying our best to build an amazing product,

17:02

but we have now a CSS standard that's really helped me personally, but it also helps as a team, what class should we be using? We've been following that and that's really been the most valuable thing to hear these changes so that we're not just saying, okay, today it's a float. All right, now it's like blocking.

17:21

Whatever it is, we're using the same classes consistently and so it's really cut back on all the issues. Okay, thank you. Hi. I was wondering to turn that into or to use it for regression testing, you need to have some base

17:41

against which you're comparing your change. How do you... Because like in your demonstration, you were comparing dev and staging. But I guess in like for regression testing, you would compare the current state with like the suggested change. So do you track the current state in the GitHub repository or do you do that?

18:01

So the current state is really tracked based on prod. Our releases in prod are a little bit more stable and it comes as a chunk of work. And I mentioned that we're not currently doing the stage versus prod regression test, but I think that's where it should be done. But it's the issue that we were having was the continuous integration as we're constantly deploying into integration.

18:22

And so the integration of prod, I think, is really the next step of what we need to do to actually have proper regression testing. But that's a work in progress. Thanks.