How I auto-refactored GDAL's test suite - TIB AV-Portal

How I auto-refactored GDAL's test suite

00:00

4

Zugehöriges Material

Open Source Geospatial Foundation (OSGeo)

Stigter, Craig de

Formale Metadaten

Titel

How I auto-refactored GDAL's test suite

Serientitel

FOSS4G SotM Oceania 2019

Anzahl der Teile

52

Autor

Stigter, Craig de

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/44723 (DOI)

Herausgeber

Open Source Geospatial Foundation (OSGeo)

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Craig's talk was the fourth and final talk in the "Cloud / Development" session at FOSS4G SotM Oceania 2019, organised by OSGeo Oceania and held at The National Library in Wellington, New Zealand from November 12-15 2019. FOSS4G SotM Oceania is the coming together of Oceania's geospatial open source and open data community - with four days of workshops, presentations, a community sprint and social events.

FOSS4G SotM Oceania 20197 / 52

1

19:33

Open Standards meets Open Source

2

25:46

Why are coordinate transformations so confusing?

3

25:14

Creating an Atlas of Aotearoa with open source software

4

22:27

Island Mapping: Measuring Urban Footprints with Mapillary & Open Source Tools

5

21:35

Messy maps to make sense of a complex world

6

05:45

Conference Closing - FOSS4G SotM Oceania 2019

7

18:43

How I auto-refactored GDAL's test suite

8

21:33

Running Geospatial Workloads on AWS

9

20:20

Access to Public Transport

10

15:43

The Open Data Cube and working towards a Digital Earth

11

20:32

Community . Focused Projects

12

19:08

Urban mobility across Oceania

13

22:33

FOSS4G Pasifika Roadmap: Then/Now/Beyond

14

21:28

Why is it so hard?

15

21:23

Workflows to repeat

16

22:16

A Journey Towards More Accessible Maps

17

22:11

Antarctic Data Analysis

18

21:12

Geoscience applications of QGIS: a bold new world in mineral exploration

19

25:02

Building for Reliability with FOSS

20

19:11

Where's your hood at?

21

21:50

Building a movement data and analytics platform

22

22:27

Map-based tools for Māori communities

23

21:43

Fishing for Routes

24

19:18

Learning rules and detecting errors in OpenStreetMap data

25

24:23

Why LINZ uses open source software

26

29:35

Mapping and Society

27

27:46

Delicious Democracy: snagging and mapping citizen data

28

20:17

Orbica Explorer

29

1:23:30

Lightning Talks - FOSS4G SotM Oceania 2019

30

22:52

This talk is about maptime

31

20:40

FIXME: Documenting the Uneven Distribution of OpenStreetMap Data Quality Issues

32

25:07

33

25:04

Towards developing the best (and most diverse!) map in the world!

34

23:01

The life of open source spatial

35

21:45

Wrangling GeoTIFFs in the browser

36

44:37

Panel on Building Communities - FOSS4G SotM Oceania 2019

37

16:58

Opening a treasure trove: The Historical Aerial Photography project

38

20:49

jsMapservice: leveraging the open source stack for rapid publishing of spatial data

39

22:27

Cloud Native GIS

40

24:55

How to put on your designer's hat when crafting map experiences

41

20:49

Crash data visualisation platform using Pixi, WebGL and Leaflet

42

20:58

Using open source software to map wildlife populations in the Northern Mariana Islands

43

19:12

Open source and open standards for sensor data sharing

44

24:13

DGGS: a new paradigm for spatial

45

20:49

Lightweight community mapping sites, powered by spreadsheets

46

20:48

Point-in-polygon lookups with vector tiles: quick, effective, and no server needed!

47

19:36

Enabling Local Government Innovation with Open Source Software

48

23:24

OpenStreetMap in French Polynesia

49

19:42

Open Source Discoverability: the journey of unearthing an abundance of hidden data

50

16:21

Mapping Marine Diversity: using QGIS to visualise and extract biodiversity data

51

23:31

Environmental Monitoring with Large-Scale Land Cover Classification

52

15:11

QField: keeping it simple in the field

Automatisches Abspielen

Sprache

Text

Bild

00:00

KoordinatenSuite <Programmpaket>SystemplattformMultiplikationsoperatorBitRefactoringFrequenz

00:37

StandardabweichungProgrammbibliothekRefactoringCodeFacebookBitRefactoringComputeranimation

00:57

QuellcodeBefehl <Informatik>Translation <Mathematik>FunktionalBildgebendes VerfahrenCodeMathematikInternationalisierung <Programmierung>Elektronische PublikationRefactoringHochdruck

01:56

VererbungshierarchieCodeRechenwerkFramework <Informatik>SoftwaretestSuite <Programmpaket>Computeranimation

02:17

SoftwaretestSuite <Programmpaket>TextbausteinFunktion <Mathematik>UmwandlungsenthalpieWarpingHochdruckCodeMotion CapturingProzessautomationSystemaufrufZeichenketteSoftwaretestMinimumElektronische PublikationAusnahmebehandlungDämpfungCodeMultiplikationsoperatorHochdruckDistributionenraumAbstrakter SyntaxbaumRefactoringFehlermeldungDesign by ContractFunktion <Mathematik>ParametersystemVerzeichnisdienstSISPTextbausteinKonditionszahlBefehl <Informatik>Formale GrammatikMusterspracheRegulärer Ausdruck <Textverarbeitung>AbfrageDateiformatBitKontextbezogenes SystemTransformation <Mathematik>GeradeTexteditorSichtenkonzeptInhalt <Mathematik>FunktionalSystemaufrufKomplex <Algebra>Leistung <Physik>Interface <Schaltung>Motion CapturingGruppenoperationSuite <Programmpaket>Projektive EbeneMathematikOrdnung <Mathematik>Framework <Informatik>VariableQuellcodeRechter WinkelArithmetisches MittelSkalarproduktCodierung <Programmierung>

11:03

ComputeranimationBesprechung/Interview

11:37

Lokales MinimumSoftwaretestElektronische PublikationProzessautomationDruckspannungEindringerkennungSoftwaretestVersionsverwaltungSkriptspracheMultiplikationsoperatorProjektive EbeneMathematikNeuroinformatikDämpfungVerzweigendes ProgrammBaum <Mathematik>MittelwertSuite <Programmpaket>Quick-SortBitKernel <Informatik>

13:51

RefactoringCodeSuite <Programmpaket>MultiplikationsoperatorCodeEinsSoftwaretestSuite <Programmpaket>GeradeProjektive EbeneBesprechung/Interview

14:44

Computeranimation

15:01

Monster-GruppeRückkopplungMonster-GruppeSoftwaretestE-MailGreen-FunktionRückkopplungLokales MinimumBildgebendes VerfahrenBesprechung/Interview

15:43

SichtenkonzeptComputerschachVerschlingungStandardabweichungRefactoringCodeKoordinatenSoftwaretestProzess <Informatik>BitE-MailMailing-ListeFigurierte ZahlElektronische PublikationFunktionalPunktQuick-SortArithmetische FolgeSoftwareentwicklerMathematikGüte der AnpassungHalbleiterspeicherPeer-to-Peer-NetzTaskRechenschieberSuite <Programmpaket>Projektive EbeneInstantiierungsinc-FunktionComputeranimation

Transkript: Englisch(automatisch erzeugt)

00:01

This is a talk which I'm going to bang through really quickly because it was originally a 30-minute talk About the depths of automated refactoring in Python I've been at coordinates for 11 years now. That's how long ages is And if this is the first you've heard of coordinates, you've unfortunately missed all our other talks

00:23

But come see us afterwards if finding or publishing data is a pain for you I spend a lot of time messing around with Python and this talk is going to be a little bit Python heavy So sorry about that, but tough So I went to PyCon 2018 and this kind of started there where I came across this talk by this guy called

00:44

John Reese works at Facebook and You should watch that talk It's online if you want a more gentle introduction to automated refactoring because as I say I'm going to rush through it a little bit John released a tool called Bola which

01:02

Is used for large-scale refactoring when you've got a change that you want to make to your code And it's a lot of changes and it's a pain to do manually. This can help you do it and This image shows him He's making a change in the changes. He's finding these to do comments to do internationalization

01:24

Followed by a print statement and he's adding a translation function to the print statement and removing the to do comment Just a little change there And he can run that over however many files he's got in his code base And I thought that was pretty cool And I thought I can I can do some things with this so I went off and tinkered with it

01:43

Downloaded Bola installed it played with it made a couple of small changes on our code base Thought it was pretty awesome and mentioned it to the CTO And Went to bed, and that could have been the end of my talk Except I woke up the next morning and Rob had just casually volunteered me

02:03

to refactor the whole G Del test suite Thanks Rob So I assume you've used G Del probably most of you have If you don't think you have you probably still have

02:21

If you've used QGIS or you've used post just just just GIS You've probably used G Del And it's a big C++ thing that underlies a lot of the like data transformation tools we have It has a huge Python test suite about 270 lines of 270,000 lines of code so you can understand why I was a bit mortified by this endeavor

02:47

But I thought I'd dive in and have a look and see well at least I'll have a look and see what's wrong with G Del's tests. Why do they need refactoring? I'll just work through this quickly basically it's got a really big old test suite and the runner and the

03:03

Format of all the tests is really well. It's all homegrown. There's no frameworks or anything It's very the tests are very verbose They're boilerplate II the output is ugly the tests ran in a specific order You can't just run one test you have to run the whole file in the order that they are written

03:22

It's just generally very inflexible So I'll just demonstrate with an example here's a test it's actually a made-up test I made it up, but it's very idiomatic for that the style that the G Del tests were

03:41

Now There's a few weird things going on if you're used to testing in Python that you might immediately notice firstly There's an if statement with some condition that it's checking the test is checking if CS is not equal to 4 7 8 3 Then fail the test basically and that's unusual because normally you'd have some kind of

04:00

Assertion and you'd throw an error in a test, but this doesn't do it just returns the string fail So basically the test contract and G Del test was to return a value and If that value was failed the test was failed if it was success the test succeeded if the test threw an exception The whole test suite stopped or at least that file did and it all just exploded

04:24

So if you're running thousands of tests and it just explodes halfway through you've got to start from scratch once you've fixed it So that was annoying It also does this thing G Del test dot post reason fail You might wonder what that's about I still wonder what that was about

04:42

I think back in the day. Maybe it sent something to a log or something Anyway, you could put whatever string you wanted in there with a reason why the test failed 99% of the time that was there with the string fail so that the reason was not terribly useful Kind of pointless, it'd be nice to just remove that

05:03

And you can also see just at the bottom there. It does it doesn't unlink It's created a file and it has to clean up after itself And if it doesn't no one else will so there's a temp file sitting in your G Del code directory And if your test explodes halfway through that's just going to stay there

05:22

So those are all kind of bad it'd be nice if tests could clean up after themselves automatically So I thought maybe this test should look like this at the bottom here This is the pi test or you know an easy if it was written in pi test style it might look like this

05:41

Firstly the if statement is gone. It's replaced with an asset this will throw an exception if the condition is false Rather than return a value, so there's no return success at the bottom if the test doesn't throw an error it succeeded The unlink thing at the bottom has gone. It's been replaced by you might notice there's a

06:05

An argument to this test function tempter that's called a pi test fixture And it sets up some stuff before the test runs and automatically cleans it up at the end of the test So you can kind of assume or imply that this is going to create the temporary directory

06:21

Called temp and whatever folder we're in I guess and it will delete it and all its contents at the end of the test The Yeah I'll just point out as well No matter how good your regex is if you wanted to do this with find and replace in your editor

06:41

You probably have a hard time So we need something a little bit more structured and what we need is called syntax aware refactoring Now this encode here is actually simplified a lot the original code is Kind of awful Sorry about that, but you can see the original code if you want. There's a link and stuff

07:06

and Yeah, so regex replace isn't going to cut it We need a syntax aware find and replace and it turns out that Python actually defines a simple pattern grammar I found it in the Python source code For these find patterns and this this is a

07:23

It's kind of like a regex, but it finds stuff in Python source files. It searches the syntax tree So let's walk through this quickly We're doing a bola query and we are searching some file names there will be Python files And we're looking for a pattern. I'll come back to the pattern and

07:43

We're going to find so that pattern finds some stuff in a Python file And then we're going to modify that stuff by calling a callback which is at the bottom there The syntax of the patterns looks a little bit complex, but it is kind of not too difficult to learn

08:01

I'll run through it. We're searching for a simple statement And it contains a function call that's actually what power means. I don't know why it's called power a simple statement that contains a Function call to gdell test dot post reason With one string argument and we're going to assign that string argument to the name reason

08:29

That's what that pattern means Go through that once more we're looking for a simple statement that contains a function call to gdell test dot post reason With one string argument, and we'll capture that argument in the name reason

08:44

So that that capture thing is a little bit like a regex group capture And then we're going to call the callback which is at the bottom and The callback is basically going to check if the reason was the string fail then remove the whole statement

09:05

Cool, okay Hopefully followed that and that's what it does So I thought that was handy It seems like a lot of work to just do that you probably possibly could do that one with the find and replace

09:24

You haven't seen it run on a big project. Of course when you do it looks like this So that one little change saved over 9000 almost 9000 lines of pointless code So I went a little bit nuts over that and I got really excited and I'll just quickly whip through some of the other

09:43

changes that I made The test functions so to run tests and py tests they have to be named tests underscore something So I just did that found things that looked like tests and call them test something I got rid of those return value things and I made it just throw assertion errors

10:05

exceptions And that was as easy as just getting rid of the stuff from the top left I mean There were there were like helper functions called throughout the code and the helper functions also returned success or fail sometimes So I had to change the helper function manually and then I just went through and automated these changes to

10:26

the things that called the helper functions in the bottom right I changed the if statements to assertions and You might notice that print statement has gone as well And I decided that was okay. I didn't need to print

10:42

the variables that I was testing because Pytest actually Automatically prints them if your assertion fails It will look at the variables you were checking and it will print out the values of all the things you were checking against So I let my code happily get rid of assertion Print statements as long as they were only printing things that I was asserting on

11:05

This took a while It was really fun. I enjoyed it, but I got a little bit borderline obsessive over it not Very borderline actually it was just really quite getting unhealthy and my wife wasn't very impressed

11:24

Because every you know every night I would spend like an hour and a half on this And it was so cool, and it was so much fun And I just yeah, I kept going and and meanwhile GDAL didn't stop it's actually a very active project

11:41

It's not the Linux candle, but they you know you can expect 10 15. Maybe 20 commits a day on average and sometimes more and Almost all the commits as you would expect would touch the test suite which I had sort of mostly munged And I couldn't really keep up with it because there's a lot of conflicts

12:03

Luckily It's not too hard when you're doing this to avoid conflicts on a lot of these things So I quickly decided that if I was going to make any manual changes they had to be separate commits and if I was going to make automated changes I Had to rebase them as well as rebasing on my manual things, and I had to recreate the automated commits from scratch

12:26

I wasn't going to commit them to my git thing and then have to resolve conflicts against them because what a waste of time So instead I would rebase my manual changes on what GDAL was doing and then I would totally recreate all my automated commits because

12:43

Computers are better at doing that than I am at resolving Conflicts And then Obviously I had to do as little as possible this was more difficult than you might imagine because I was having a lot of fun So I had to resist the temptation to fix everything

13:04

All that rebasing got tedious so every day I'd sit down, and I'd spend 15 minutes rebasing before I started doing anything useful So I wrote a script and that script Looked a little bit like this when it was running only about 10 times slower

13:22

I sped it up because it'd be too boring otherwise In any second now boom And you can see at the end of that script after it had recreated all the automated changes It just cherry-picked a whole bunch of manual stuff on top that I'd had on another branch

13:43

Yeah, it took about three minutes to run that was well sped up But it was so much fun to watch it every time I saw that go. I was like yes, this is awesome And it felt like this I'm just gonna Finish up with some notes on why not to do this

14:04

most of the time at least It actually takes quite a while To get up to speed with all this and most of the time you might be better off with a regex replace if you can But for very large projects, I think this was awesome

14:21

absolutely worth doing It only really works if you have well tested code Because no one's going to review your 100,000 line pull requests, so The only thing that's reviewing it are the tests luckily. This was a test suite so I Assume, it's well tested And yeah, don't do it if you can do a regex replace

14:44

I ended up with this Which I felt good about but possibly I shouldn't have No one wants to review that A couple of people did I don't think they actually went through all of it. They only went through the manual commits

15:01

So now I'm gonna review that I had to Come up with some tips for myself to get this thing merged. Otherwise, it was sit around forever Obviously, I had to be polite and humble. I had to get feedback from the mailing lists I had to leave out anything that was kind of controversial because I didn't want it jeopardizing the rest of the pull requests

15:22

and I had to be responsive and patient and Possibly quite importantly I had to give people shiny stuff because otherwise no one wanted to look at this thing It was a monster. So what I did was I did things like adding You know you green Unicode check marks when your tests pass and stuff like that

15:43

And voila it got merged I doubt even Raul is listening, but thanks to him for reviewing this and merging it as well as the other reviewers He possibly did read it off It's very thorough Thanks a lot

16:05

Craig a monumental task Kind of amazing it got merged and yeah, it's great questions

16:21

What is the most? obscure Thing that you automatically removed. Oh That's a good question. I Might go back a few slides to jog my memory. They're gonna play

16:54

There are actually eight changes in the end But they needed a lot of testing and rewriting and rebasing to get there. Yeah

17:02

And has anybody written? non-conforming rubbish tests Since your peer It's it's still a work in progress. Actually. I mean the that PR is merged, but there's sort of a lot of things I didn't get to For instance, you still can't run an individual test by itself. You have to run the whole file and that's because

17:23

There's no way that I could like figure out which tests required things from earlier tests in the same file. I Hope that people are writing things properly now with pytest fixtures instead of having tests that set things up and then other tests that depend on that but I haven't been watching super closely to be honest. Yeah, I would be at the end of my I

17:54

Get nervous when I submit a pull request to a new project, even if it's like one function How long did it take from the point where you breach this approach this with them say hey, I'd like to do this

18:03

How long was that process from that initial bit to it actually being merged? Or about three days Yeah, so there was a couple of back and forths on the mailing list But I quickly got the impression that all of the developers involved Would welcome this change because they couldn't stand the old test suite. It's just that there was a major pain point for them. Yeah

18:25

So that was pretty good Alright cool. Well, that's it on behalf of Everybody that uses cheetah. Thank you very much for your contribution to cheetah

Empfehlungen