We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to write pylint plugins

00:00

Formal Metadata

Title
How to write pylint plugins
Title of Series
Number of Parts
561
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Pylint is the most popular Python source code analyzer which looks for programming errors, helps enforce a coding standard and sniffs for some code smells. It is possible to write plugins to add your own checks, for example enforce all documentation strings to be enclosed with 3 double quotes! This presentation will focus on how the pylint plugin system works, how to examine AST tree of a piece of code and create a minimalistic plugin from scratch. It will also examine couple of plugins that are used in the Kiwi TCMS project.
10
58
80
111
137
Thumbnail
15:21
159
Thumbnail
18:51
168
Thumbnail
26:18
213
221
Thumbnail
15:22
234
Thumbnail
49:51
248
Thumbnail
23:06
256
268
283
Thumbnail
28:38
313
Thumbnail
1:00:10
318
Thumbnail
21:35
343
345
Thumbnail
36:13
353
Thumbnail
18:44
369
370
373
Thumbnail
44:37
396
Thumbnail
28:21
413
Thumbnail
16:24
439
455
Thumbnail
25:10
529
Thumbnail
15:36
535
Thumbnail
28:04
552
Plug-in (computing)WritingString (computer science)Endliche ModelltheorieObject (grammar)Component-based software engineeringCrash (computing)Abstract syntax treePoisson-KlammerOperator (mathematics)Streaming mediaCodeLine (geometry)Letterpress printingInformationVariable (mathematics)Condition numberModule (mathematics)Similarity (geometry)Goodness of fitFluid staticsClient (computing)Network topologyKey (cryptography)Mathematical analysisDifferent (Kate Ryan album)Plug-in (computing)Functional (mathematics)ParsingServer (computing)Cartesian coordinate systemModule (mathematics)String (computer science)Set (mathematics)InformationQuery languageValidity (statistics)BitLibrary (computing)Line (geometry)Streaming mediaProjective planeEndliche ModelltheorieObject (grammar)Inheritance (object-oriented programming)AuthenticationWeb browserAbstract syntax treeComputer fileTrailAttribute grammarReading (process)MathematicsData compressionSource codePoisson-KlammerLogical constantType theoryDecision theoryInternet service providerContext awarenessToken ringFile systemPhysical constantData structureTupleFunction (mathematics)Position operatorResultantDefault (computer science)ProgrammierstilFigurate numberoutputView (database)Mechanism designSoftwareElectronic mailing listCASE <Informatik>Multiplication signCuboidQuicksortSoftware frameworkAstrophysicsProcess (computing)Software bugRootComputer programmingElement (mathematics)Run time (program lifecycle phase)Row (database)Canadian Mathematical SocietyOpen sourceIdentifiabilityParameter (computer programming)MereologyOperator (mathematics)Self-organizationComputer animation
Function (mathematics)SyntaxbaumInheritance (object-oriented programming)Frame problemCheat <Computerspiel>Interface (computing)Open setParsingVariable (mathematics)Letterpress printingPlug-in (computing)Skeleton (computer programming)Code refactoringError messageMessage passingParameter (computer programming)Default (computer science)Configuration spaceString (computer science)CodeModule (mathematics)Source codeData managementMaxima and minimaToken ringEndliche ModelltheorieAdditionModule (mathematics)Electronic mailing listInterface (computing)Functional (mathematics)Data compressionLetterpress printingSoftware developerComputer configurationSystem callSocial classLogical constantOptical disc driveHookingLine (geometry)Multiplication signError messageString (computer science)Radical (chemistry)Message passingAstrophysicsOrder (biology)Parameter (computer programming)Level (video gaming)Token ringInformationInterpreter (computing)Finite-state machineInheritance (object-oriented programming)Attribute grammarFile systemProcess (computing)Different (Kate Ryan album)Structural loadKey (cryptography)Pattern languageObject (grammar)ExpressionMathematical analysisDecision theoryCodeResultantProjective planeBoilerplate (text)Slide ruleData dictionaryPlug-in (computing)Set (mathematics)Standard deviationFigurate numberMaxima and minimaLibrary (computing)Skeleton (computer programming)Open sourceCodecQuicksortPoint (geometry)Validity (statistics)Online helpLogicEntire functionElement (mathematics)ParsingNoise (electronics)Doubling the cubeEndliche ModelltheorieAbstract syntax treeComputer animation
Vertex (graph theory)System callModul <Datentyp>Social classData modelCore dumpData managementSoftware testingOpen setVideo gameMultiplication signView (database)Combinational logicPlug-in (computing)Projective planeSoftware developerCASE <Informatik>Social classPosition operatorStatement (computer science)Cartesian coordinate systemCodeFunctional (mathematics)outputComputer fileHand fanMoment (mathematics)MathematicsBoolean algebraData compressionModule (mathematics)Object (grammar)FlagSoftware testingMereologySuite (music)Electronic mailing listLibrary (computing)BitLogicError messageRevision controlAbstract syntax treePiDynamical systemData structureType theoryQuicksortDifferent (Kate Ryan album)DatabaseFile systemLink (knot theory)Abstract syntaxParameter (computer programming)Inheritance (object-oriented programming)Presentation of a groupPhysical systemComputer animation
Computer animation
Transcript: English(auto-generated)
Hello, good morning Okay, now that you're awake. My name is Alex, and I'm going to talk about violin plugins. I'm going to talk a little bit about tokenization a little bit about abstract syntax trees But the question is why do we need to have customized plugins?
why do we need to have even more linters on top of whatever we are already using and Why do we need to have tooling which will ultimately tell you your source code is wrong. You have to fix this And my answer is that the existing tools are not always enough and static analysis tools like plug-in like pylint
Are very easy to extend they can help you make your software better, and I will show you some examples so One use case is you want to enforce particular coding style This can be something which is not valid for the broader Python community this can be only valid for your own company or maybe only in the team that you work in or
More more more commonly valid only in the current project and that can be different from the rest of the projects in the company In kivy tcms. That's an open source project. I work on we like to use documentation strings with three double quotes And we don't like to have the rest of the styles which are perfectly valid
Python styles for documentation strings We just like three double quotes, and we have a customized pylint plugin which will discover this for us We can fix it and we can keep all of our source code in the same way So you can do things like let's say you have client and server relationships in your application
You may want to name them the same way so both the client both the server start with the same name, and then you have Client at the end server at the end easy to find easy to grab in the source code if you like You can do all sorts of stuff Another example is you are using a lot of frameworks a lot of libraries they have best practices they have some
Recommendations that you need to follow for example if you're using jungle this tells you don't hard-code Authentication user as a foreign key relationship because this can be changed You should be using a setting for that And also jungle tells you don't issue queries directly against the user model because that can be changed
we have a helper function, which will give you the actual model at runtime, and you can query that and all of this Is designed to facilitate downstream applications? Who may wish to change the stock user model and provide something else?
instead of the default so this is one way to Make our application aware of these things make it follow the practices Another example is we are using a jungle simple history to keep track of changes to some objects And jungle simple history works with model safe, so if you're using objects dot update or bulk create
this doesn't Use the safe method so we skip history We don't like to use these methods again customize piling plugins, so we are aware not doing this Another example is this is especially true in big projects in legacy projects of old source code
You can use static analysis tools to help you find possible sources of problems of bugs and we have Had the problem of missing permission, so we have views Which process requests from the from the browser and they are missing the permission required decorator?
So that's that's really bad, and we've seen this a few times Figured okay Let's create a plugin for piling and find all the places in the source code that may be missing these permissions So we have a list of them We can go check them out and figure out what's going on if we add a new view later and forget
To add the permissions to that the plugin will tell us so it's like nice mechanism to To very easily very quickly find some problems Before we continue two things that are important. You just need to know that they happen it
It's not necessary to know how they work in details under the hood This is parse first thing is parsing or lexical analysis or tokenization and the other thing is building abstract syntax trees from the source code so first You have input which is
The files of your program. This is all character input. It doesn't mean anything to the tooling It goes through this box Which is parsing tokenization lexical analysis and we get another data structure which which has a little bit more meaning So in my example, you see we have a keyword we have identifier. We have operators. We have numeric constants
This is something that static analysis tooling can work with a lot more easily and you can use this information To make decisions about your source code Tokenization is Python is very easy. We have the tokenized module which provides the tokenized function This function receives a single argument, which must behave like the read line method
So if you're working with file objects then file object dot read line should work if you're working with strings You have to wrap them in a byte say object And use the read line method The result of tokenize is a generator which we will return token info tuple objects token info is a name tuple type
It has five elements. So token type is an integer constant and also these names in the brackets These are constants defined in the tokenized module so you can use them as well You have the token value as a string You have start and end position of this token in the in the input character stream as tuples
So these are starting grow starting column and throw and column of this token and then you have the entire line Which is currently being inspected by the tokenizer. So this is all the output the tokenized gives you
This is how the hello world example looks like and you can experiment with it So it's very easy So experiment with that see how different pieces of source code look like to the tokenizer This is the first step that all static analysis tools do this is also done internally by Python as well
so Next thing is abstract syntax trees Sounds very complicated again used internally by a lot of tooling used internally by Python But If you want to work with them You don't really need to know how they are constructed or all the details behind that
You just need to know that it's a three base structure. It's very similar to DOM trees in the browser or to XML trees you have child nodes parent nodes you have siblings you have Different types of nodes they have different types of attributes and you can work with them Now pretty much very easily All these different colors they are objects from different types in Python and this is how you can recognize them
Internally when you're writing plugins Creating abstract syntax trees again very easy. We have the AST model which is built in in Python That is used by Python internally also used by some other tooling like cosmic ray
However pylint does not use AST pylint uses astro ID which is an external dependency Very similar to the built-in module almost everything is named in the same way, but you have to be aware that it's a different module So we have the parse function provided by astro ID this we will receive a string can we will return astro ID node
The root node is the module So everything that astro ID parses is represented as a module which contains something else inside of it So we have in this example The module doesn't have very interesting attributes because it doesn't come from the file system It doesn't have a name, but it has a body which is a list of expressions or expressions in the module
And we have a single expression in this list, which is a call to a function with the name print This function receives arguments, which is again a list and we have only one argument Which is a string constant in this example, and we don't have any keyword arguments
So again, you can experiment in the interactive interpreter Or you can create a small stripper experiment see how different pieces of code look like To AST This is a relatively well documented library what you're going to need most of the time is The list of classes you need to know their names and you will see why
In a second and you need to know their attributes. There are also some helper methods some helper functions That you may want to use they are usually defined in in the base classes So again experiment with that You figure out how it works
Next is pylint checker interfaces. This is the internal machinery that pylint provides For you the developer to hook into the Analysis process and be able to to create the plugins. This is also the machinery that pylint itself uses internally So all the checks that you have all the errors that you see when you work with pylint
They are implemented with these four interfaces and they are also implemented Under as if they were plugins, so the same thing The names are pretty much self-explanatory So we have open and close which are executed at the beginning and at the end
Then you have the role checker interface This is not very often used only in a few places used process modules receives the Result of a 3d parse so you can scan the entire module as a whole if you wish Then you have the token checker interface, which provides process tokens method. This receives the result of tokenize and
The most commonly used more than 90% of the time is the asteroidy checker interface this will respond to visit and leaf methods and the exact name of these methods is depending on the On the class name of the object of the note that you want to inspect
So for example, if you want to inspect a function the asteroid noise function def so you can define methods visit underscore function def or leave underscore function def Or if you want to inspect the class definition, this is visit underscore class def with leave underscore class def So that's why you need to know the names and the order of execution is this
So from top to bottom You can implement more than one interface in your plugin and the order is important Another important thing the order of execution of visit methods leaf method This is depth first and this is important because you can use it and build some sort of state machine in your plugin
Collect some information in the children and when you are leaving the parent node You note that all the children have been visited and you have all the information for them. You can make decision Next let's create a pilot plug-in skeleton. So this is the hello world of pilot plugins
Every module which is valid valid Python module that provides a register function with one argument will be considered a pilot plug-in Pilot will import the module try to execute this function. You can put anything you like this function Usually what goes inside is something like that linter dot register checker
And you create an object from something which pilot calls a checker class This is how the checker class looks like. This is all boilerplate code This is the bare minimum that you need to have for pilot to To be able to execute this class
This is where all the logic about discovering coding patterns and deciding if something is an error or not an error Is done So you need you need this attribute? This is double underscore implements double underscore and you give it a list of what interfaces you're going to implement
Usually it's only one but I can be more you need the name attribute Most of the time this is not used but it's mandatory and you need the messages dictionary Notice the name. So that's how it needs to be written the key in this dictionary is an
alphanumeric ID This must be unique across the entire pilot Installation at all the plugins that you want to enable The good thing is that if it's not unique pilot will crush and will give you a nice trace back and you will figure out That this is a duplicate and then you have the value in this dictionary is 2.3 elements. So this defines your error message
The first one is the short error message, which you are going to see on the terminal if you use pilot It's only one line. The second one is the human readable message ID This is what you're going to use to enable or disable Particular checkers on the command line. So for example disable missing doc. This is something that we do almost all the time
And the last one is a longer help message. This can be several lines long You can all you can also see this on the terminal with additional options. This is usually also compiled as HTML documentation That is the place to explain to the to the developer that sees the message why that is a problem how to fix it
maybe And you need to implement some method from these interfaces So you scan some source code and decide? Okay That's an error self dot that message Give it the human readable message ID and the rest of the arguments are used to annotate where this message This error appears in the source code So this module that particular line that particular column and pilot will print this information nicely for you
Invoking the plugins with minus minus load plugins The only thing that you need to be aware of is pilot is looking in the standard Python path for these Plugins, so if they are not there you either you have to move them there or modify the Python path settings and that does it
nothing else Now I'm going to show you a few examples From our open source project all of them are on github. We do have a lot more available These are things that we use to make our project better So documentation string checker. This is how it looks like the essence of it
It implements two interfaces so in process tokens We basically scan through all the tokens in the module find old string constants and keep reference to them in a dictionary Key in the dictionary is the string without the quotes and the value in the dictionary is the string with the quotes
Then we implement these asteroid based methods So when visiting modules class definitions and function definitions We want to inspect the documentation string and what we basically do is a dictionary lookup We find this thing in the dictionary and if it starts with three double quotes, that's fine
Otherwise, we consider it an error trigger a message for the developer the checker for jungle So again visit cost look for hard-coded strings. We don't really care if this is inside of a foreign key
Definition or someplace else if that is a hard-coded string We raise them where is an error message for the developer that easy and also we inspect the imports If we see something like from jungle country both models import user or do a wildcard import again error for the developer To inspect this figure out what's going on?
Missing permissions checker. That's probably the biggest one that we have which fits onto slides unfortunately So First visit module we try to figure out if this is a views module and in our project we have Application slash views that pie another application slash views that pie
That's the structure and we just inspect the module name Keep this in a in a boolean flag Next visit function def we try to figure out if this function that is in a jungle view file Is this a helper function or is this function based view something that Responds to HTTP requests and the way we check for that is if the first argument is named request
Then this must be a function based view from jungle and we continue with further inspection for classes. We do similar thing We want to make sure that The thing the class we are inspecting. This is a class base view in jungle This is not some helper class which is defined in the same module
And the way we do this is we inspect the list of base classes so because when you when you When you use class based views in jungle, they always inherit from something else So we use this to make a simple check. It's not very robust sometimes, but it works for us
and the most important thing the inspection part is We basically scan through the list of decorators for the method or for the class and search for some well-known names So if you don't have any decorators That's a problem for us if we do have some search for the well-known names and some combinations between them If we find them fine, if we don't find them again error
The developer must figure it out We do have other checkers in the project. So for example, we're looking for empty modules We're looking for nested function definitions or nested class definitions because it's that's a legacy code base It's been written in not very good way and we don't like to have these things
When we see these things they usually mean there are more problems inside. So that's why we have these checkers Searching for raw SQL. So Django is orem based Unfortunately, we did have a lot of hard-coded SQL statements in the source code
Which were not compatible with different types of databases again. We have a checker We have checkers for the libraries that we use For example this thing tags.py This is something internal that we have internal behavior in the application and we don't want to use Objects.get or create we want to use an internal method which will enforce some permission and some other logic
So that's why we have this and also we have had Some checkers which started life inside of our project and then we were later able to contribute to pilot and to pilot Django Because they they were valid for other people as well so and the last thing is
We do have ideas for other items like other plugins other checkers to create which are important for us So if you want to experiment if you want to get your hands dirty and start writing piling plugins This is a good place for you to start We can give you exact examples of
Pieces of source code which we don't like and why we think is problematic and you can try to create a plugin for that and contribute back to our project if you want to So the last thing I have to tell you is that we are also having an a project stand here at FOSTEM So if you want to come visit us and say hi
Talk a little bit more about you know, why why or how we are using these plugins. I will be there After this presentation and now we have five minutes for questions. Thank you. Okay, first question. Okay. Do you use pilings?
Yes, one person two person five people Oh Everybody, okay flake eight, okay
What I didn't hear a black, okay Okay, but the thing about black is it's nice to however, it's more like for formatting so Especially in the latest versions of piling and piling jungle. They they have checkers to show you things which are just considered bad practice
Yeah, okay, so the
Yeah The question is how many things we put in a check and and because developers don't always agree with something If you're going to so first of all, I am big fan of satisfying all possible
Checks that come from piling I think they are well designed and they are created for a purpose to make your life easier But but then if you're going to create your own customized plugins for your team And you have people that don't agree then you then maybe it's a good
Time to sit down and make some policies about you know coding style within the team Why you consider something to be a problem and why not and when you have this agreement then you can create a plugins and People will be happy. Yeah, how many false positives do we have in our plugins?
Answer is quite a few. I haven't counted them and This is for a reason It is it is relatively easy to create a plugin that will detect
the most common cases And it is relatively hard to to create a plugin That will take into account all the edge cases. So we prefer to to have very simple plugins and Have more false positive just disable them with a comment and ignore them instead of spending a lot of time
Fine-tuning the plugin. Okay. Yes Can you? Elaborate a little bit on how you would now fix the code actually also with pilot plugins
How can I fix the code when I see a problem Okay, so the question is basically can we change the abstract syntax trees? with violent and The the answer is sort of yes and no There are two link which use AST to to do dynamic replacement of of nodes
So for example cosmic ray is a tool for mutation testing which is based on automatically changing the source code and running your test suite And you can you can do this. You can also save this into a file when you build an abstract syntax
Three is relatively easy to export this into Python source code and it's almost the same as what was the input Pilot doesn't have the machinery to change Abstract syntax trees no nodes and then save them to the file system This can be added. Of course I mean it will be relatively easy to add but it's not existing at the moment that the two is not designed to do
These things but it is possible Okay. Thank you