We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Continuous Visual Integration for Rails

00:00

Formal Metadata

Title
Continuous Visual Integration for Rails
Title of Series
Part Number
66
Number of Parts
89
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Unit testing is mostly a solved problem, but how do you write tests for the visual side of your app—the part that your users actually see and interact with? How do you stop visual bugs from reaching your users? We will dive deep into visual regression testing, a fast-growing technique for testing apps pixel-by-pixel. We will integrate perceptual diffs in Rails feature specs, and learn how to visually test even complex UI states. We will show tools and techniques for continuous visual integration on every commit, and learn how to introduce team visual reviews right alongside code reviews.
81
Computer architectureSoftware testingBitVisualization (computer graphics)Serial portPixelAuthorizationBlock (periodic table)Mobile appComputer animation
INTEGRALSoftware testingExistential quantificationFunctional (mathematics)Physical systemStrategy gameGraph coloringDifferent (Kate Ryan album)Unit testingXMLSource code
MathematicsLinear regressionSlide ruleWeb pageMobile appXML
Type theoryProduct (business)Software testingState of matterLinear regressionMereologyPermutationVisualization (computer graphics)DataflowLogical constantProcess (computing)WordWeb pageLatent heat10 (number)Multiplication signSoftware developerMobile appComputer animation
Product (business)Software testingContent (media)MereologySocial classCuboidWeb pageHome pageGraph coloringDifferent (Kate Ryan album)1 (number)Computer animation
Medical imagingVideo gameMathematicsOrder (biology)Product (business)Software testingConfidence intervalState of matterBitQuicksortPerfect groupStrategy gameAbstractionOcean currentPixelFrame problemComputer fileDifferent (Kate Ryan album)Unit testingArchaeological field surveyContext awarenessMultiplication signDifferenz <Mathematik>Mobile appComputer animation
Medical imagingMathematicsLink (knot theory)QuicksortPixelWeb pageGraph coloringDifferent (Kate Ryan album)Differenz <Mathematik>Right angleThumbnail
Medical imagingPairwise comparisonDivisorPixelGraph coloringWebsiteDifferent (Kate Ryan album)FlagDifferenz <Mathematik>2 (number)Skeleton (computer programming)Demo (music)Source code
MathematicsSoftware testingState of matterForm (programming)Sheaf (mathematics)Term (mathematics)Visualization (computer graphics)QuicksortReal numberError messageExistenceWeb pageDifferent (Kate Ryan album)Object (grammar)Front and back endsDifferenz <Mathematik>Computer animation
Type theorySoftware testingBitQuicksortReading (process)Scripting languageDifferenz <Mathematik>Computer animation
Medical imagingCodeMathematicsPixelDifferenz <Mathematik>Code refactoringMobile appComputer animation
Software testingLinear regressionPhysical systemVisualization (computer graphics)Demo (music)Mobile appComputer animation
MathematicsOrder (biology)Type theorySoftware testingFiber bundleState of matterComplex (psychology)Device driverWeb pageAliasingStreaming mediaMultiplication signWeb 2.0Mobile appComputer animationSource code
Medical imagingMathematicsArithmetic meanBitConsistencyQuicksortDiscounts and allowancesPixelData storage deviceWeb pageGraph coloringDifferenz <Mathematik>Software developerMobile appComputer animation
Medical imagingCodeInformationMathematicsComputer clusterNoise (electronics)Transformation (genetics)Expert systemLibrary (computing)Category of beingSoftware testingBitLine (geometry)Linear regressionMereologyProjective planeVisualization (computer graphics)EmailQuicksortArtistic renderingCASE <Informatik>Process (computing)Spectrum (functional analysis)Presentation of a groupoutputPixelWeb pageWeb browserOpen sourceElectronic program guideDifferenz <Mathematik>Code refactoring2 (number)Right angleMobile appComputer animation
Medical imagingSoftware testingVisualization (computer graphics)Computer fontGraphical user interfaceOpen setWeb pageDirection (geometry)Web browserBlogDifferenz <Mathematik>Right angleWeb 2.0Computer animation
Medical imagingCodeInformationMathematicsIterationProduct (business)Level (video gaming)VideoconferencingINTEGRALSoftware testingIntegrated development environmentMathematical optimizationState of matterBitDeterminismGroup actionLimit (category theory)Complex (psychology)Linear regressionMereologyPhysical systemVirtual machineVisualization (computer graphics)Hydraulic jumpQuicksortMachine visionRevision controlArtistic renderingOperator (mathematics)Process (computing)Perfect groupConnectivity (graph theory)Asynchronous Transfer ModeMusical ensemblePoint (geometry)Codierung <Programmierung>GradientPixelWeb pageFile formatMarkup languageGraphics processing unitWebsiteNP-hardDifferent (Kate Ryan album)NeuroinformatikUnit testingConcurrency (computer science)FlagMultiplication signDifferenz <Mathematik>SpacetimeRight angleService (economics)Suite (music)Software developerMobile app1 (number)
Level (video gaming)State of matterVisualization (computer graphics)QuicksortProcess (computing)Right angleComputer animation
MathematicsBuildingLevel (video gaming)Variable (mathematics)Software testingIntegrated development environmentBeta functionBitDebuggerLine (geometry)Complex (psychology)Linear regressionMereologyLocal ringVisualization (computer graphics)Revision controlDependent and independent variablesBranch (computer science)Software repositoryAuthenticationWeb pageWeb browserEndliche ModelltheorieDifferent (Kate Ryan album)Front and back endsSingle-precision floating-point formatEvoluteClique-widthCollaborationismVirtualizationVolumenvisualisierungDefault (computer science)Mechanism designMobile appComputer animation
Computer animation
Transcript: English(auto-generated)
This is a visual integration talk. Thank you for being here. We got to go, go, go. We have a lot of things to get through. So just a little bit about me to start. My name is Mike Fotinakis.
I'm currently the founder of Percy, percy.io, which is a tool for visual testing. So I'm really excited to share with you some of the things I've learned over the last year about how to test apps pixel by pixel. I'm also the author of two Ruby gems, JSON API serializers and Swagger blocks. If you use either of those, I'd love to talk to you after or if you have any questions.
Okay, so let's jump right in. So this will come in like three parts, the problem, the general solution, and kind of how it works and architectures and methodologies and all the problems that come along with that. So let's start with the problem. So the problem is basically that unit testing itself is kind of a solved problem. We have a lot of different strategies
and techniques and technologies for testing the data of our systems, for the behavior of our systems, for the functionality of our systems and the integration of our systems with other systems and end-to-end testing our systems and smoke testing our deployments and we have a lot of tools and technologies for this. But how do you test something like this? So I guess the color of the text
has become the color of the button or the text now has zero opacity. Something's happened and this was fixed by an issue. Or another example, here's a 404 page of an app I used to work on. This is just what it's supposed to look like. It's pretty simple, pretty straightforward.
We launched a feature and then four weeks later we were told that our 404 page looked like this. Right, you've all seen this, right? And of course nobody caught this in QA because no one QAs the 404 page. And this was a simple change. Somebody had just moved a CSS file, everything else worked, but the 404 page was the one that was broken.
So then it got fixed and then the fix looked like this. So the CTA is totally covered up and it didn't QA the fix on mobile, so you're still continuing to fix. And then I went and pulled slides for this a while back and looked at the 404 page and it was broken again. So I reported this to my old team.
So this in the business, this is what we call a regression. And specifically this is a type of visual regression. So how do product teams fix this today? Shout out the answers. Hire more people, okay. How do you fix these kinds of problems? Interns, okay.
What are the interns doing? They're what? They're clicking around a lot, right? What's that called? Behavioral? Exploring? I'm looking for a specific word here. QA, thank you, QA. So QA's the big one, right? So and this can be developer QA,
this can be you doing QA on your apps, this can be you have QA engineers, right? QA can mean many things, but part of the job of this is to find these kinds of things before they hit production, right? Or that you get issues from your customers and you fix them after. But QA is very necessary, but it's also very slow and manual and complicated, right? And it's also pretty impossible to catch everything, right?
Even in like a medium sized app with just tens of models, you can have hundreds of flows and thousands of possible page states and permutations and constant feature churn, right? There's a lot of development that's happening in these apps and you can't catch all of this stuff all the time. So it's also very expensive, right?
QA, you're spending manual, human, often engineering hours paying for fixing these kinds of visual regressions. So let's go back to this button problem and let's, you know, my standard fix to this would be like can I write a regression test for this? Right, I'm a big TDD person, I love testing, I write tests for basically everything, so like let's go try to write a test for this, right?
So here's like an RSpec feature test that tests this part of the app, right? It does simple things like it visits the homepage and then it fills in some text box with a title and it clicks a button, right? And then you expect that the page has new content on the page. So there's a problem here, right? Like this test didn't fail.
The button still technically works, it's just visually wrong, right? And this manifests in tons of different ways. So what am I supposed to do here? Am I supposed to assert that some like CSS computed style of the color of the thing or maybe that it has a CSS class applied but that's not really testing the right thing
so I'm just not gonna do this, right? And no one's gonna do this because no one wants to write a test that's this like fragile and inflexible, right? Especially in a developing product. So my normal approach is very useless here. So the problem fundamentally is that pixels are changing,
right, but we're often only testing what's underneath. We're testing all of our abstractions on top of those pixels. But this is an important problem because the pixels are the things that your users are actually touching and seeing and interacting with all the time. And to go further than that, even with all of our current testing strategies and methodologies, we still lack confidence in deploys, right?
You can have a million unit tests for all the different data changes in the world but if you move a CSS file or change your CSS, you're gonna have to go look at it, right? You're gonna have to go check that and test it. So let's move on to the solution to this problem. And I don't like to say that this is the solution. I like to frame this as a solution. This is not the be all end all
of all testing strategies that will make your life perfect. But it's sort of a new tool in the toolbox. So the question I like framing is what if we could see every pixel changed in any UI state in every PR that we make, right?
So that basically is like, you know, what could we do if we could test our apps pixel by pixel? So in order to do that, I'm gonna introduce a new concept. You may or may not be familiar with these. And they're called perceptual diffs, they're called P diffs, they're called visual diffs. This has been pioneered many times. Brett Slotkin at Google has done quite a bit of work on this on the Google consumer surveys team.
You should watch his talk. It's about how he accidentally launched a pink dancing pony to production and then they ended up having to do this style of testing in order to prevent that from happening. So what is a perceptual diff? A perceptual diff is relatively straightforward, right? Given two images, what's the difference between these two images, right?
Like compute the delta between these two images. And that can be this, right? So all the red pixels are the pixels that have changed between these two images without any context about what the image is about, right? So you can compute this basically for any kind of image. So how do we compute these, right? Let's try another example. So shout out the differences in these two side by side.
And then we'll show the P diff and see if you're right. Background color on the top. Lost the link. Capital N in thumbnails. Danger button's gone. Right, you got all of them. So this is the P diff, right? And you can immediately see all of the changes in that image without having to sort of sift through it. All these pixels that have changed, these are the things that have changed on this page.
So P diffs in 30 seconds. Let's go do a P diff. P diffs are pretty straightforward, right? Okay, so I have these two images. Just new and old, right? So let's open new and old. Okay, so here are the two images, right? And this is just from the skeleton demo site.
So you can see there's some differences in them, but let's go make a P diff and see what that actually is. So I have ImageMagick, the library, installed, and I can just use the ImageMagick compare tool and compare old and new, and I'll store the image in diff.png. And now let's open diff.png. So cool, we have our first P diff, right? Like, those are all the pixels that have changed. And by default, it applies the images underneath
and makes it translucent and you can turn those things off, you can pass a bunch of different flags to this command, to like fuzz factor if you don't care that pixels have changed within a certain amount of colors or those kinds of things, right? So computing P diffs themselves are actually relatively straightforward. So here's a couple of P diffs in real life, right?
So if you try to figure out the difference between these two, it might take you a second, but the difference in this P diff, you can kind of immediately see that the, do you agree to the terms of use section of this page is gone, right? It just no longer exists. And I kind of love this because this is a test for an error condition, but it's basically like a back-end change manifesting as a front-end failure, right?
This is a Rails form object that somehow has gotten into a weird state that is manifesting as this sort of front-end failure. And you might have a test for this, but this form probably doesn't submit now, right? You probably can't actually submit this form. So here's another example. Here's like a normal visual change, just a visual change you actually might want.
Like a new person got added to this page, so the visual diff is, okay, a bunch of things shifted around and like got reflowed, and you can sort of like go back and forth and be like, okay, I understand that this page has a new thing added to it. So you sort of have to learn how to like read P diffs because they can be a little bit noisy, right? So for example, this one, they look the same, but in the footer, and you probably can't read that, but it says like, if type of jQuery not equal undefined
slash something. So this one was somebody added a gem, which happens to inject some scripts into the page, and the gem was in a broken state, right? So all of their tests are probably passing, everything else is passing, but their footer has some junk in it, right? And you often can't catch these kinds of things
without visual tests or looking at it. Here's just P diff art. Like I found this in some diffs that I've done, and an image got shifted over just perfectly to create this like nice P diff art. Totally useless, but kind of cool. And also a pretty strong signal in P diffs is if there are zero pixels changed,
that's really important for you, right? Like in a classic refactor of your app, in a pure refactor, you're not changing anything that somebody's interacting with. All the plumbing's shifting around, you're changing architectures, you're upgrading something, but the actual thing that people are touching or the API that you're touching is not actually changing, right? So having a zero pixel change P diff can be a really strong signal
because you get visibility into knowing that nothing has changed in this page, right? I can safely upgrade this thing because everything is remaining the same. And as your app gets bigger and bigger, you wanna be able to do those kinds of refactors for your code health, right? So let's go write a visual regression testing system in two minutes. Ready, go.
Okay, so I have this app. This is Gifindor, which is, if you went to Brandon Hayes' talk at RailsConf two years ago, this is his app. And Brandon, I don't know if you're in the room, but you probably didn't expect that anybody was gonna go back and write tests for your demo app from two years ago, but we're gonna do it. So here we go. So here are some feature specs I've written for this app.
And they do simple things like you visit a page, you expect that it has some content, you click a dialog, you expect that the new thing is up. This app has just basic behaviors. It's just a stream of posts, right? And you can upload GIFs. And you can do simple things like you click Submit a GIF, and it does a jQuery animation
that pulls that down, and you can type stuff, and there's a validation state, like a bunch of things that we all do all the time, right? So these tests for this are relatively straightforward. So let's just go save a screenshot at the end of this. All I'm doing here is using the Capybara screenshot, save screenshots capabilities,
and this works with basically every web driver that you have, except for rack tests. But most web drivers support this. So let's save that, and let's go run the tests. R is just my bash alias for bundle exec rspec, so don't let that throw you. And you should all have that, by the way, because you type that all the time.
So great, we've run the tests. Let's see, there's a change here, right? We can open old.png. Great, so we have a screenshot of what our state, what our test looked like in that state, or what our app looked like in that test state, right? And this is kind of what I call a complex UI state. You've clicked a button, and some jQuery animation is fired in order to open up that top dialog.
This is not just like a static page that you visited, right? But you'll also notice, it doesn't quite look exactly like the page we were looking at, right? This border image is all messed up, and there's some other things going on here, so we'll talk a little bit about that later, like why that's actually not totally the same. Okay, so great, so we've saved our old image.
Let's change it to new for the new one, and let's go change the background color of this app. So here's the CSS, let's just change the background color by one pixel, right? And we'll make sure that this other one is saved. We'll go run our tests, great.
So we have an old and a new, great. Let's compare them, and store it in diff.png. Open up diff.png, cool, here's a pdiff, right? Like all the background pixels of this page have changed. And you might think of this as just noise, right?
But why would anybody care about a background color that you can't see? But I guarantee you that there's a designer in this room who actually would probably want to know if this changes, right? And they want to guarantee that there's a consistent color palette being used, and that we developers aren't sort of arbitrarily changing the background shades when we think that that's a new color that we should use, right?
There needs to be some consistency there. So I kind of don't discount these kinds of changes as just because you can't see them at the eye, that doesn't mean that they're not important. So great, right? Awesome, let's all do this. So simple uses here are catching visual regressions, right?
That's the kind of obvious one. But then if you start thinking about this more, there's a lot of advanced uses for this kind of stuff. Like CSS refactors and deletions is a big one, right? You're all terrified to delete CSS, yes or no, right? Because it's scary, you don't know where that CSS is used, you don't know what legacy parts of your app are using that CSS. So what if you go add a visual diff test,
a visual regression test to your top 50 pages, now go to delete your CSS and see what happens, right? And if you've deleted it and nothing changes on the pages you care about, great, you can probably delete that CSS. Testing style guides, especially testing living style guides is a pretty cool use of this. Safe dependency upgrades, so often your libraries,
they're backwards compatible, but they're adding new features. So you wanna be able to upgrade your libraries. But upgrading libraries and dependencies is also kind of scary sometimes. And you wanna be able to, especially if those libraries are providing front-end dependencies of any sort, if they're providing JavaScript behaviors, if they're providing, if your style guide is in its own gem and you're importing that and you're upgrading style guide versions,
upgrading dependencies safely and having these kinds of visual checks can be really useful. Visual regression testing for emails is an awesome advanced use case I've seen. Testing D3 visualizations is something I've sort of started experimenting with recently because testing D3 is actually kind of hard, right? You can kind of test, I'm not a D3 expert by any means, but you can sort of test the data transformations
you're making, how your inputs transform to your outputs, and sort of how you expect the D3 is going to be able to visualize that. But wouldn't it be nice to just be able to be like, this is what it looks like, I know that that's right. Oh look, it's changed. Is it still right? That's kind of what you really wanna test with those visualizations. And then going further, what I really want here is a visual review process
alongside code review. And we're gonna talk a little bit more about that. So if this was all so easy, why aren't we all doing this right now? Right? And definitely somebody has said that if it wasn't easy, or if it was easy, it wouldn't be hard. This is, it gets really complicated, right? There's a bunch of problems,
and I'm gonna sort of hand-wave over a bunch of the problems, but I sort of bucket them in three different categories. Tooling and workflows, performance and non-deterministic rendering. So on the tooling front, it's kind of hard, right? There are some open source projects that do this right now. Phantom CSS is a great example of one, right?
But it sort of presents all of your visual changes as a ton of individual test failures, right? That's kind of a lot of information and a lot of failures for things that are, it just sort of, it confuses the line between something being flaky or a change that you want it to be and an actual test failure, right? Or for example, you probably shouldn't have to require
that you're manually storing these baseline images in your git repo, right? That's a big workflow tooling process that most of us are probably just not going to do. That's a lot of work, right? The performance one, I think this is the big one across the spectrum of all the open source tools, all the proprietary tools, all the everything. This is the big one that probably prevents us
from doing this right now. The examples I showed are somewhat contrived, right? They're pretty simple pages. But in the real world, I have some pages that are, when you render a full page screenshot of them, they're 30,000 pixels high, 40,000 pixels high, and that's not crazy, right? So rendering and screenshotting that kind of page and uploading it, storing it, can take 15 seconds just to render it
and another five to diff it. So if you have 100 of these tests that you want to do and they're all run serially, that's 30 minutes you're adding to your test suite, right? And none of us want to do that. Your feature specs, if you're writing feature specs already, they're already too slow, right? And they're already too flaky. So that's a hard one
and I think the performance is actually the biggest problem here. And then non-deterministic rendering, which we'll talk about. So I'm sort of hand-waving over the other problems. If you want to talk about this more, I would love to. So on the non-deterministic rendering front, simply there's a bunch of things that change in browsers. We're not just doing static pages. So animations is the big obvious one. So take this pure CSS animation.
If you visually diff this a bunch of times, what diff are you going to get? You might get this diff. You might get this diff. You might get this diff. These aren't useful to you. They're just kind of noise. So for example, in Percy, what we do to do this is we actually freeze animations by injecting this particular CSS style into the page
that tries to stop all of these animations from happening so you can just say nothing has changed. And if you want to know more about that, I have a post on blog.percy.io about how we actually do that. Or another one, dynamic data is a big problem. If you have anything on your page that changes, in your tests especially, you're going to see a visual diff from those.
So a date picker is a good example. And you can sort of fix these with fixture data instead of faker data. You can sort of move in a direction where you're having more static, deterministic things that you're using in tests, which I think is a relatively good fix. But this is still a big problem, and I have some ideas about how to address this kind of thing.
So old test browsers. So like we talked about before, what you see on the right here is what was rendered by Capybara WebKit, and what you see on the left is what's rendered by Firefox. And these are not the same thing. There's like, the border image doesn't work, and the web font here is not a web font in this one. And the problem with this is that often the browsers,
the old test browsers that we're using underneath are not really modern in any fashion, right? Like Capybara WebKit is an old fork of WebKit that doesn't support these things. If you, PhantomJS all the way up until the new 2.0 version, doesn't support these things too. It was a fork of Chrome 14 from five years ago, right? It doesn't render the modern web.
It also has 1700 open GitHub issues that are like basically untreated, so go for it. So that's a really hard problem, right? And then some other problems like you can't really control for is this sort of like sub-pixel anti-aliasing problems. The way that text is represented on a page is not totally deterministic, right? These things might shift by one pixel. GPUs don't actually guarantee in some ways
that like floating point operations will always come out to be the same thing, right? So if you have a gradient that's rendered on one machine and you try to render that same gradient on the other machine, they may not be pixel perfect. They probably won't be. If you compile some code with different optimization flags GPU floating point operations will be non-deterministic, right?
So we look at pages as if they are the same all the time but actually getting them to be pixel perfect is a big problem. Some tools attempt to solve this with some sort of like OpenCV computer vision researchy things where you try to say like, oh, is this a button? Has the button moved? And you sort of try to derive the page back from the image, right?
So that's hard. So pdiff's are only half the battle here, right? So back to our main goal, like what if we could see every pixel changed in any UI state in every PR, right? And this is really what I think is the difference between like visual regression testing sometimes
and what I frame as continuous visual integration, right? In the same way that like your unit tests are not the entire thing you're doing to test your system and you need processes to like be doing continuous integration, you need to be merging changes with all of your other developers all the time, you need to be testing them instantly in CI as fast and as parallelization, you know, as parallelized as possible.
There's a difference between like doing visual regression testing sometimes and continuous visual integration and these are sort of the big problem spaces that create that. So that would require, being able to do this would require that things are really fast, right? There's basically as fast as your test suite. You need to be able to handle complex UI states. You can't just test a static page.
We're not just here to just like look at all of our static pages. We need to be able to test components and all the different component states. And it needs to be continuously integrated into your workflow on basically every commit, right? In my mind, this can't be saved until you're either in production and even staging is like a little bit too late for me, right, like I want this to happen basically all the time.
So I'm gonna talk a little bit in the last part of this talk about how we sort of architected Percy to try to address these problems. So here's like, here's how Percy integrates into like an RSpec feature spec. It's basically the same thing that we created, right? You have a feature spec, it visits some page, it does some action on the page.
And then what you do is you just drop in, you know, Percy, Kabi, Bara, snapshot the page, give it a name, say this is the home page, right? So what's that actually doing underneath, right? When these things get pushed up to Percy, like are we pushing up images? And I say that with question marks because those will come along with all the problems that we noticed before, right? So we don't wanna do that.
So what we actually do here is we push up DOM snapshots. And if you think about this, like it makes a lot of sense because the most efficient, the most like lossless encoding format for a website is not an image of the website or a video of the website, it's the website, right? It's your assets, it's your DOM state that you've created.
So we actually push up the DOM and HTML snapshots and technically we push up SHA-256 fingerprinted versions of those assets so we actually never upload things twice. So the first run might be slow, but then after that it basically, my goal is basically to say like, you know, zero time is spent in your test suite after the first run, it's not totally true.
So then we do a bunch of like hand-wavy magic underneath that to actually say like, you know, we push that stuff into storage, we can talk to GitHub and set commit statuses, we can coordinate work with this like Percy Hub and actually this is the big part where that actually addresses most of the performance issues is we can parallelize this. So you've pushed us up a bunch of DOM snapshots as fast as your test suite can go
and what we actually do underneath is we run them as fast as your concurrency limit allows. So we can actually totally out of band of your test suite be parallelizing and running and rendering these DOM snapshots in a deterministic rendering environment and then be able to like, you know, show those to you in a nice way. So this was the sort of like main innovation
that helps I think this thing come to fruition. So as of yesterday actually I wanted to like talk about this, we have hit a million visual diffs rendered in Percy as of yesterday, so I was really like proud of that milestone. So here's a couple of quick Percy examples, I talked to some of our customers and got permission to show you just a couple of pages
to see just like what Percy the product looks like and how I've sort of been trying to address this problem. So here's Charity Waters' build, charitywater.org and they have some amazing, they're very design-centric team, they're big, big Rails apps. They run, you know, they've pushed 162 total snapshots on every build basically and this particular build which is called Footer, updating the new Footer markup
had 96 visual diffs, right? And you can sort of like go through each one of these pages and just be like, oh look, look at all these Footer changes and then this is the diff and I can click that and say like, oh great, so I noticed that this like Footer is different on all of these pages. And this is a lot to go through, right? So I just recently added this like overview mode where I can just like see all of my pages all at once, right, and be like, okay, and just confirm,
like really quickly do a visual review and just confirm that all of these changes are the ones that I want, right? These are the visual changes that I have actually like, you know, we want to make as part of this PR. So here's another example. So this page is basically like we're updating, you know, the PR as new press page
and we're trying to like update our new press page, right? And this one is just the first iteration of that PR where like they've removed some CSS styles and like, oh look, this page is totally broken, right? And they would never want to like launch this page but it gives them this sort of like iterative review process where they can go here and they can say, oh look, this is what our page looks like currently in this PR.
And then also the important signal of none of the other pages have changed based on this CSS change, right? And then you can go through and you can sort of see like what are the other pages here in this app. So on the workflow and tooling problem, so this is the last thing I'll show you. So we just provide this as like a CI service, right?
We just like, as your tests run, they actually push information up to Percy and then Percy marks this PR with another CI service right here. Percy, visual diff's found, right? And we can just click details, jump right to the page and be like, oh look, this is the state of this PR, this is that background change that we made.
And I can go through and I can decide like, yes, this is the right visual change that was intentional for this PR, let's go ahead and mark it. So I'll do a little, this is the one HTTP request that my demo, or that my talk requires, so let's hope that this works. So okay, so like I go here and I'm like, great, I'm doing a visual review right from GitHub, I looked at these things, this is what I want, click approve, and then GitHub will
mark that status as green, right? So now this sort of gives you like a lightweight visual review process for all of the different UI states of your app and at the PR level, right? Not like at some later stage. So yeah, that's basically what this DOM snapshotting
mechanism has helped us sort of like tackle a bunch of those different problems. So that's it, I just want you to take away from this talk that visual testing is possible, it's a thing, we should be doing it. It's a new way to think about testing and it can help give you deployment confidence, right?
I think of this as like the last stage of the CD pipeline where you just need to, like in your acceptance phase, you need to make sure that all of this stuff is looking correct, right, and you need to be able to approve it. And that this is still a very manual step but we can probably automate quite a bit of this. And then also there's like a lot more work to be done to make this a mainstream engineering practice.
One last thing. So because of this DOM snapshotting model, I'm able to, I just want to give you a sneak peek of something I'm working on over the next couple of months. I want to be able to do this for Ember regression tests or for Ember tests. So if you are an Ember user, I would love to talk to you just email me mike at percy.io
and let's talk about like getting you as a beta tester of this in Ember tests. Because I actually think that this is probably the world where this makes the most sense, right? Like not everyone is writing Rails regression tests or Rails feature specs in a lot of ways because they're really hard to write sometimes. But we are writing a lot of JavaScript tests nowadays. And as we sort of further separate our worlds
to like this is just an API backend and this is like a single page app front end and those lines become clearer and clearer, we're gonna have a lot more of these tests. And so to be able to get this kind of power, all we need to do is be able to send up those DOM snapshots and render them. So if you're interested in that, please let me know and I can, like I'd love to get your hands on the beta. So thanks so much.
Yeah, the question is like what is the baseline? Like how is the baseline created? So basically, I think you can do that a bunch of different ways. I usually just pick like master. Like whatever master has last created,
that is our baseline, right? And then we provide a mechanism in Percy where you can say like I want a more manual version where I actually like approve a master build and that becomes the baseline. So I think it's, you gotta have kind of both. But yeah, basically I think if you're really doing like master is always green and always deployable, then you should always be testing against master. Yeah, so we don't right now, right?
I think we don't, the question was do you do cross-browser testing? So I think that that would be a great evolution of this kind of testing, right, is doing more cross-browser testing. But it comes with all of those problems I mentioned too and you'd be surprised to learn that most browsers don't provide a full-page screenshot API. And Firefox is the main one. So I think that you can get like 90% of the benefits of visual testing with like one good modern browser.
But then that would be a great evolution of this kind of idea would be to do cross-browser testing. Yeah, that's a good question. The question's like what tech stack are we using? It's all custom-built on Google Cloud Platform. I've like Dockerized all of the environments. It's basically a Rails API, a full ember, strictly ember front-end. And the workers like run XVFB,
which is a virtual frame buffer. It's all on Linux. It runs Firefox. Yeah, it's running Firefox, yeah. Oh, oh, so you're asking about Percy access control, like who can access that Percy page? Right now I just tie it to like GitHub auth. So if you can see the repo in GitHub, if you have access, like team collaborator access to the GitHub repo, you can see it in Percy.
And anybody who can see that can hit approve. Yeah, I haven't built any like complex like role authentication kind of things yet, so. Yeah, I totally missed that part. So let me just do that quickly for the people who remain. Okay, so part of this thing is we have all of these different like screenshots at a particular width, right? But we have the original DOM of these.
So we can just resize the browser to a smaller width and actually show it. So here's like responsive testing. So here they have a 320 px version of this. So now I can see the footer change in all those different ones, right? And I can like full width this and like this is what this page looks like, you know, quote on mobile, basically just like at this breakpoint size, right?
So the DOM snapshotting model also takes care of that in that you can just like render it at different widths. This is not testing on the actual device, right? But it is like, you know, giving you at least the responsive side of it. The question was, can you disable the local test run and only have it on CI? That's actually the default behavior. And then I've had some people ask like,
I wanna disable it for only specifically this branch. So there's an environment variable we provide called percy-enable, which you can set to zero or one and it will force that environment to be on or off. Cool, thanks so much.