How to write pylint plugins
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44328 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 20191 / 561
1
9
10
15
18
19
23
24
27
29
31
33
34
35
38
39
40
43
47
49
52
53
54
55
58
59
60
63
65
67
69
70
78
80
82
87
93
95
97
102
103
104
107
110
111
114
116
118
120
122
123
126
127
131
133
136
137
139
141
142
148
153
155
157
159
163
164
168
169
170
171
172
173
174
181
183
185
187
188
193
196
197
198
199
200
201
205
207
208
209
211
213
214
218
221
223
224
226
230
232
234
235
236
244
248
250
251
252
253
255
256
257
262
263
264
268
269
271
274
275
276
278
280
281
283
284
288
289
290
293
294
296
297
300
301
304
309
311
312
313
314
315
317
318
321
322
327
332
333
334
335
336
337
338
339
340
343
345
346
352
353
355
356
357
359
360
362
369
370
373
374
375
376
377
378
383
384
387
388
389
390
391
393
394
395
396
406
408
409
412
413
414
415
419
420
425
426
431
432
433
434
435
436
438
439
440
441
445
446
447
448
453
455
457
459
466
467
471
473
474
475
476
479
480
484
485
486
489
491
492
496
499
500
502
505
507
508
512
515
517
518
529
531
533
534
535
536
539
540
546
550
551
552
553
554
555
557
558
559
560
561
00:00
Plug-in (computing)WritingString (computer science)Endliche ModelltheorieObject (grammar)Component-based software engineeringCrash (computing)Abstract syntax treePoisson-KlammerOperator (mathematics)Streaming mediaCodeLine (geometry)Letterpress printingInformationVariable (mathematics)Condition numberModule (mathematics)Similarity (geometry)Goodness of fitFluid staticsClient (computing)Network topologyKey (cryptography)Mathematical analysisDifferent (Kate Ryan album)Plug-in (computing)Functional (mathematics)ParsingServer (computing)Cartesian coordinate systemModule (mathematics)String (computer science)Set (mathematics)InformationQuery languageValidity (statistics)BitLibrary (computing)Line (geometry)Streaming mediaProjective planeEndliche ModelltheorieObject (grammar)Inheritance (object-oriented programming)AuthenticationWeb browserAbstract syntax treeComputer fileTrailAttribute grammarReading (process)MathematicsData compressionSource codePoisson-KlammerLogical constantType theoryDecision theoryInternet service providerContext awarenessToken ringFile systemPhysical constantData structureTupleFunction (mathematics)Position operatorResultantDefault (computer science)ProgrammierstilFigurate numberoutputView (database)Mechanism designSoftwareElectronic mailing listCASE <Informatik>Multiplication signCuboidQuicksortSoftware frameworkAstrophysicsProcess (computing)Software bugRootComputer programmingElement (mathematics)Run time (program lifecycle phase)Row (database)Canadian Mathematical SocietyOpen sourceIdentifiabilityParameter (computer programming)MereologyOperator (mathematics)Self-organizationComputer animation
08:17
Function (mathematics)SyntaxbaumInheritance (object-oriented programming)Frame problemCheat <Computerspiel>Interface (computing)Open setParsingVariable (mathematics)Letterpress printingPlug-in (computing)Skeleton (computer programming)Code refactoringError messageMessage passingParameter (computer programming)Default (computer science)Configuration spaceString (computer science)CodeModule (mathematics)Source codeData managementMaxima and minimaToken ringEndliche ModelltheorieAdditionModule (mathematics)Electronic mailing listInterface (computing)Functional (mathematics)Data compressionLetterpress printingSoftware developerComputer configurationSystem callSocial classLogical constantOptical disc driveHookingLine (geometry)Multiplication signError messageString (computer science)Radical (chemistry)Message passingAstrophysicsOrder (biology)Parameter (computer programming)Level (video gaming)Token ringInformationInterpreter (computing)Finite-state machineInheritance (object-oriented programming)Attribute grammarFile systemProcess (computing)Different (Kate Ryan album)Structural loadKey (cryptography)Pattern languageObject (grammar)ExpressionMathematical analysisDecision theoryCodeResultantProjective planeBoilerplate (text)Slide ruleData dictionaryPlug-in (computing)Set (mathematics)Standard deviationFigurate numberMaxima and minimaLibrary (computing)Skeleton (computer programming)Open sourceCodecQuicksortPoint (geometry)Validity (statistics)Online helpLogicEntire functionElement (mathematics)ParsingNoise (electronics)Doubling the cubeEndliche ModelltheorieAbstract syntax treeComputer animation
16:29
Vertex (graph theory)System callModul <Datentyp>Social classData modelCore dumpData managementSoftware testingOpen setVideo gameMultiplication signView (database)Combinational logicPlug-in (computing)Projective planeSoftware developerCASE <Informatik>Social classPosition operatorStatement (computer science)Cartesian coordinate systemCodeFunctional (mathematics)outputComputer fileHand fanMoment (mathematics)MathematicsBoolean algebraData compressionModule (mathematics)Object (grammar)FlagSoftware testingMereologySuite (music)Electronic mailing listLibrary (computing)BitLogicError messageRevision controlAbstract syntax treePiDynamical systemData structureType theoryQuicksortDifferent (Kate Ryan album)DatabaseFile systemLink (knot theory)Abstract syntaxParameter (computer programming)Inheritance (object-oriented programming)Presentation of a groupPhysical systemComputer animation
24:41
Computer animation
Transcript: English(auto-generated)
00:05
Hello, good morning Okay, now that you're awake. My name is Alex, and I'm going to talk about violin plugins. I'm going to talk a little bit about tokenization a little bit about abstract syntax trees But the question is why do we need to have customized plugins?
00:24
why do we need to have even more linters on top of whatever we are already using and Why do we need to have tooling which will ultimately tell you your source code is wrong. You have to fix this And my answer is that the existing tools are not always enough and static analysis tools like plug-in like pylint
00:43
Are very easy to extend they can help you make your software better, and I will show you some examples so One use case is you want to enforce particular coding style This can be something which is not valid for the broader Python community this can be only valid for your own company or maybe only in the team that you work in or
01:03
More more more commonly valid only in the current project and that can be different from the rest of the projects in the company In kivy tcms. That's an open source project. I work on we like to use documentation strings with three double quotes And we don't like to have the rest of the styles which are perfectly valid
01:22
Python styles for documentation strings We just like three double quotes, and we have a customized pylint plugin which will discover this for us We can fix it and we can keep all of our source code in the same way So you can do things like let's say you have client and server relationships in your application
01:41
You may want to name them the same way so both the client both the server start with the same name, and then you have Client at the end server at the end easy to find easy to grab in the source code if you like You can do all sorts of stuff Another example is you are using a lot of frameworks a lot of libraries they have best practices they have some
02:03
Recommendations that you need to follow for example if you're using jungle this tells you don't hard-code Authentication user as a foreign key relationship because this can be changed You should be using a setting for that And also jungle tells you don't issue queries directly against the user model because that can be changed
02:25
we have a helper function, which will give you the actual model at runtime, and you can query that and all of this Is designed to facilitate downstream applications? Who may wish to change the stock user model and provide something else?
02:41
instead of the default so this is one way to Make our application aware of these things make it follow the practices Another example is we are using a jungle simple history to keep track of changes to some objects And jungle simple history works with model safe, so if you're using objects dot update or bulk create
03:05
this doesn't Use the safe method so we skip history We don't like to use these methods again customize piling plugins, so we are aware not doing this Another example is this is especially true in big projects in legacy projects of old source code
03:25
You can use static analysis tools to help you find possible sources of problems of bugs and we have Had the problem of missing permission, so we have views Which process requests from the from the browser and they are missing the permission required decorator?
03:43
So that's that's really bad, and we've seen this a few times Figured okay Let's create a plugin for piling and find all the places in the source code that may be missing these permissions So we have a list of them We can go check them out and figure out what's going on if we add a new view later and forget
04:02
To add the permissions to that the plugin will tell us so it's like nice mechanism to To very easily very quickly find some problems Before we continue two things that are important. You just need to know that they happen it
04:21
It's not necessary to know how they work in details under the hood This is parse first thing is parsing or lexical analysis or tokenization and the other thing is building abstract syntax trees from the source code so first You have input which is
04:42
The files of your program. This is all character input. It doesn't mean anything to the tooling It goes through this box Which is parsing tokenization lexical analysis and we get another data structure which which has a little bit more meaning So in my example, you see we have a keyword we have identifier. We have operators. We have numeric constants
05:03
This is something that static analysis tooling can work with a lot more easily and you can use this information To make decisions about your source code Tokenization is Python is very easy. We have the tokenized module which provides the tokenized function This function receives a single argument, which must behave like the read line method
05:26
So if you're working with file objects then file object dot read line should work if you're working with strings You have to wrap them in a byte say object And use the read line method The result of tokenize is a generator which we will return token info tuple objects token info is a name tuple type
05:45
It has five elements. So token type is an integer constant and also these names in the brackets These are constants defined in the tokenized module so you can use them as well You have the token value as a string You have start and end position of this token in the in the input character stream as tuples
06:07
So these are starting grow starting column and throw and column of this token and then you have the entire line Which is currently being inspected by the tokenizer. So this is all the output the tokenized gives you
06:20
This is how the hello world example looks like and you can experiment with it So it's very easy So experiment with that see how different pieces of source code look like to the tokenizer This is the first step that all static analysis tools do this is also done internally by Python as well
06:41
so Next thing is abstract syntax trees Sounds very complicated again used internally by a lot of tooling used internally by Python But If you want to work with them You don't really need to know how they are constructed or all the details behind that
07:00
You just need to know that it's a three base structure. It's very similar to DOM trees in the browser or to XML trees you have child nodes parent nodes you have siblings you have Different types of nodes they have different types of attributes and you can work with them Now pretty much very easily All these different colors they are objects from different types in Python and this is how you can recognize them
07:26
Internally when you're writing plugins Creating abstract syntax trees again very easy. We have the AST model which is built in in Python That is used by Python internally also used by some other tooling like cosmic ray
07:41
However pylint does not use AST pylint uses astro ID which is an external dependency Very similar to the built-in module almost everything is named in the same way, but you have to be aware that it's a different module So we have the parse function provided by astro ID this we will receive a string can we will return astro ID node
08:03
The root node is the module So everything that astro ID parses is represented as a module which contains something else inside of it So we have in this example The module doesn't have very interesting attributes because it doesn't come from the file system It doesn't have a name, but it has a body which is a list of expressions or expressions in the module
08:26
And we have a single expression in this list, which is a call to a function with the name print This function receives arguments, which is again a list and we have only one argument Which is a string constant in this example, and we don't have any keyword arguments
08:40
So again, you can experiment in the interactive interpreter Or you can create a small stripper experiment see how different pieces of code look like To AST This is a relatively well documented library what you're going to need most of the time is The list of classes you need to know their names and you will see why
09:04
In a second and you need to know their attributes. There are also some helper methods some helper functions That you may want to use they are usually defined in in the base classes So again experiment with that You figure out how it works
09:20
Next is pylint checker interfaces. This is the internal machinery that pylint provides For you the developer to hook into the Analysis process and be able to to create the plugins. This is also the machinery that pylint itself uses internally So all the checks that you have all the errors that you see when you work with pylint
09:41
They are implemented with these four interfaces and they are also implemented Under as if they were plugins, so the same thing The names are pretty much self-explanatory So we have open and close which are executed at the beginning and at the end
10:00
Then you have the role checker interface This is not very often used only in a few places used process modules receives the Result of a 3d parse so you can scan the entire module as a whole if you wish Then you have the token checker interface, which provides process tokens method. This receives the result of tokenize and
10:23
The most commonly used more than 90% of the time is the asteroidy checker interface this will respond to visit and leaf methods and the exact name of these methods is depending on the On the class name of the object of the note that you want to inspect
10:41
So for example, if you want to inspect a function the asteroid noise function def so you can define methods visit underscore function def or leave underscore function def Or if you want to inspect the class definition, this is visit underscore class def with leave underscore class def So that's why you need to know the names and the order of execution is this
11:04
So from top to bottom You can implement more than one interface in your plugin and the order is important Another important thing the order of execution of visit methods leaf method This is depth first and this is important because you can use it and build some sort of state machine in your plugin
11:24
Collect some information in the children and when you are leaving the parent node You note that all the children have been visited and you have all the information for them. You can make decision Next let's create a pilot plug-in skeleton. So this is the hello world of pilot plugins
11:44
Every module which is valid valid Python module that provides a register function with one argument will be considered a pilot plug-in Pilot will import the module try to execute this function. You can put anything you like this function Usually what goes inside is something like that linter dot register checker
12:03
And you create an object from something which pilot calls a checker class This is how the checker class looks like. This is all boilerplate code This is the bare minimum that you need to have for pilot to To be able to execute this class
12:22
This is where all the logic about discovering coding patterns and deciding if something is an error or not an error Is done So you need you need this attribute? This is double underscore implements double underscore and you give it a list of what interfaces you're going to implement
12:42
Usually it's only one but I can be more you need the name attribute Most of the time this is not used but it's mandatory and you need the messages dictionary Notice the name. So that's how it needs to be written the key in this dictionary is an
13:01
alphanumeric ID This must be unique across the entire pilot Installation at all the plugins that you want to enable The good thing is that if it's not unique pilot will crush and will give you a nice trace back and you will figure out That this is a duplicate and then you have the value in this dictionary is 2.3 elements. So this defines your error message
13:23
The first one is the short error message, which you are going to see on the terminal if you use pilot It's only one line. The second one is the human readable message ID This is what you're going to use to enable or disable Particular checkers on the command line. So for example disable missing doc. This is something that we do almost all the time
13:44
And the last one is a longer help message. This can be several lines long You can all you can also see this on the terminal with additional options. This is usually also compiled as HTML documentation That is the place to explain to the to the developer that sees the message why that is a problem how to fix it
14:00
maybe And you need to implement some method from these interfaces So you scan some source code and decide? Okay That's an error self dot that message Give it the human readable message ID and the rest of the arguments are used to annotate where this message This error appears in the source code So this module that particular line that particular column and pilot will print this information nicely for you
14:28
Invoking the plugins with minus minus load plugins The only thing that you need to be aware of is pilot is looking in the standard Python path for these Plugins, so if they are not there you either you have to move them there or modify the Python path settings and that does it
14:44
nothing else Now I'm going to show you a few examples From our open source project all of them are on github. We do have a lot more available These are things that we use to make our project better So documentation string checker. This is how it looks like the essence of it
15:02
It implements two interfaces so in process tokens We basically scan through all the tokens in the module find old string constants and keep reference to them in a dictionary Key in the dictionary is the string without the quotes and the value in the dictionary is the string with the quotes
15:21
Then we implement these asteroid based methods So when visiting modules class definitions and function definitions We want to inspect the documentation string and what we basically do is a dictionary lookup We find this thing in the dictionary and if it starts with three double quotes, that's fine
15:41
Otherwise, we consider it an error trigger a message for the developer the checker for jungle So again visit cost look for hard-coded strings. We don't really care if this is inside of a foreign key
16:00
Definition or someplace else if that is a hard-coded string We raise them where is an error message for the developer that easy and also we inspect the imports If we see something like from jungle country both models import user or do a wildcard import again error for the developer To inspect this figure out what's going on?
16:22
Missing permissions checker. That's probably the biggest one that we have which fits onto slides unfortunately So First visit module we try to figure out if this is a views module and in our project we have Application slash views that pie another application slash views that pie
16:41
That's the structure and we just inspect the module name Keep this in a in a boolean flag Next visit function def we try to figure out if this function that is in a jungle view file Is this a helper function or is this function based view something that Responds to HTTP requests and the way we check for that is if the first argument is named request
17:05
Then this must be a function based view from jungle and we continue with further inspection for classes. We do similar thing We want to make sure that The thing the class we are inspecting. This is a class base view in jungle This is not some helper class which is defined in the same module
17:22
And the way we do this is we inspect the list of base classes so because when you when you When you use class based views in jungle, they always inherit from something else So we use this to make a simple check. It's not very robust sometimes, but it works for us
17:40
and the most important thing the inspection part is We basically scan through the list of decorators for the method or for the class and search for some well-known names So if you don't have any decorators That's a problem for us if we do have some search for the well-known names and some combinations between them If we find them fine, if we don't find them again error
18:03
The developer must figure it out We do have other checkers in the project. So for example, we're looking for empty modules We're looking for nested function definitions or nested class definitions because it's that's a legacy code base It's been written in not very good way and we don't like to have these things
18:25
When we see these things they usually mean there are more problems inside. So that's why we have these checkers Searching for raw SQL. So Django is orem based Unfortunately, we did have a lot of hard-coded SQL statements in the source code
18:40
Which were not compatible with different types of databases again. We have a checker We have checkers for the libraries that we use For example this thing tags.py This is something internal that we have internal behavior in the application and we don't want to use Objects.get or create we want to use an internal method which will enforce some permission and some other logic
19:03
So that's why we have this and also we have had Some checkers which started life inside of our project and then we were later able to contribute to pilot and to pilot Django Because they they were valid for other people as well so and the last thing is
19:21
We do have ideas for other items like other plugins other checkers to create which are important for us So if you want to experiment if you want to get your hands dirty and start writing piling plugins This is a good place for you to start We can give you exact examples of
19:42
Pieces of source code which we don't like and why we think is problematic and you can try to create a plugin for that and contribute back to our project if you want to So the last thing I have to tell you is that we are also having an a project stand here at FOSTEM So if you want to come visit us and say hi
20:01
Talk a little bit more about you know, why why or how we are using these plugins. I will be there After this presentation and now we have five minutes for questions. Thank you. Okay, first question. Okay. Do you use pilings?
20:32
Yes, one person two person five people Oh Everybody, okay flake eight, okay
20:43
What I didn't hear a black, okay Okay, but the thing about black is it's nice to however, it's more like for formatting so Especially in the latest versions of piling and piling jungle. They they have checkers to show you things which are just considered bad practice
21:09
Yeah, okay, so the
21:29
Yeah The question is how many things we put in a check and and because developers don't always agree with something If you're going to so first of all, I am big fan of satisfying all possible
21:45
Checks that come from piling I think they are well designed and they are created for a purpose to make your life easier But but then if you're going to create your own customized plugins for your team And you have people that don't agree then you then maybe it's a good
22:03
Time to sit down and make some policies about you know coding style within the team Why you consider something to be a problem and why not and when you have this agreement then you can create a plugins and People will be happy. Yeah, how many false positives do we have in our plugins?
22:27
Answer is quite a few. I haven't counted them and This is for a reason It is it is relatively easy to create a plugin that will detect
22:41
the most common cases And it is relatively hard to to create a plugin That will take into account all the edge cases. So we prefer to to have very simple plugins and Have more false positive just disable them with a comment and ignore them instead of spending a lot of time
23:04
Fine-tuning the plugin. Okay. Yes Can you? Elaborate a little bit on how you would now fix the code actually also with pilot plugins
23:23
How can I fix the code when I see a problem Okay, so the question is basically can we change the abstract syntax trees? with violent and The the answer is sort of yes and no There are two link which use AST to to do dynamic replacement of of nodes
23:47
So for example cosmic ray is a tool for mutation testing which is based on automatically changing the source code and running your test suite And you can you can do this. You can also save this into a file when you build an abstract syntax
24:02
Three is relatively easy to export this into Python source code and it's almost the same as what was the input Pilot doesn't have the machinery to change Abstract syntax trees no nodes and then save them to the file system This can be added. Of course I mean it will be relatively easy to add but it's not existing at the moment that the two is not designed to do
24:25
These things but it is possible Okay. Thank you