We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

From Python script to Open Source Project

00:00

Formal Metadata

Title
From Python script to Open Source Project
Subtitle
Project maturity checklist
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Did you write a cool and useful Python script? Would you like to share it with the community, but you're not sure how to go about that? If so, then this talks is for you. We'll go over a list of simple steps which can turn your script into a fully fledged open-source project. The Python community has a rich set of tools which can help verify the quality of your code through automated code-review and linting. You can benefit by taking advantage of this ecosystem. Complete the steps in this checklist, and your project will be easier to maintain, you'll be ready to take contributions from the community and those contributions will be up to high standards. Your project will also keep up with other projects on PyPI and you will be alerted if any new release causes an incompatibility with your code. The same checklist can be used for non open-source projects as well. The project maturity checklist includes: Properly structure your code Use a setup.py file Add entry_points for your script command Create a requirements.txt file Use Black to format your code Create a tox.ini config and include code linters Set up a Git repo Refactor your code to be unit-testable and add tests Add missing docstrings Add type annotations and a MyPy verification step Upload to GitHub Add a continuous integration service (e.g. Travis) Add a requirements updater (e.g. pyup.bot) Add test coverage checker (e.g. coveralls) Add a Readme file and documentation Publish your project on PyPI Advertise your project
Keywords
20
58
Scripting languageOpen sourceGoogolPoint cloudMusical ensembleStandard deviationBlogDemo (music)Open setNumberMultiplicationMultiplication signCompilerOpen sourceStandard deviationIntegrated development environmentComputer virusProgrammer (hardware)InformationLevel (video gaming)Game theoryMusical ensembleGraph (mathematics)BitLecture/ConferenceComputer animation
InformationDemo (music)Open setStandard deviationUniqueness quantificationThomas KuhnCoroutineTablet computerIntegrated development environmentDemo (music)Type theoryUtility softwareSpacetimeOpen sourceNeuroinformatikRandomizationInformationAlgorithmSource codeComputer animation
Demo (music)Open setService (economics)Moment of inertiaInstallation artGraphic designInterface (computing)Common Language InfrastructureRandom numberIntegrated development environmentComputer configurationIterationDefault (computer science)Letterpress printingVolumenvisualisierungParameter (computer programming)Directory serviceCodeStatistical hypothesis testingPrice indexSound effectData structureFunction (mathematics)Slide ruleOpen sourceCode refactoringComputer configurationCASE <Informatik>Standard deviationUtility softwareFunctional (mathematics)String (computer science)WritingExpected valueUnit testingComputer iconArrow of timeLink (knot theory)Parameter (computer programming)Module (mathematics)Electronic mailing listOnline helpDescriptive statisticsCodeLine (geometry)Scripting languageStatistical hypothesis testingDirectory serviceSound effectComputer fileLibrary (computing)User interfaceoutputSet (mathematics)Integrated development environmentEntire functionLevel (video gaming)Goodness of fitContinuous integrationDependent and independent variablesInformationSingle-precision floating-point formatType theoryRootService (economics)Interface (computing)Source codeBitComputer animation
Function (mathematics)Point (geometry)YouTubeSet (mathematics)Functional (mathematics)Computer file
Reading (process)Revision controlOpen setIntegrated development environmentInstallation artLocal ringOpen sourceDistribution (mathematics)Binary codePoint (geometry)Module (mathematics)Directory serviceScripting languageComputer configurationSet (mathematics)Combinational logicComputer filePoint (geometry)Software developerPlug-in (computing)MultiplicationPhysical systemSystem callCASE <Informatik>Virtual realityArrow of timeVideo game consoleCodeDescriptive statisticsOpen sourceLocal ringBinary codeDistribution (mathematics)Functional (mathematics)MappingUML
CloningFile formatCodeFunctional (mathematics)Set (mathematics)Scripting languageUtility softwareVideo game consoleRevision controlMappingLatent heatPositional notationIntegrated development environmentStatistical hypothesis testingComputer fileClique-widthMultiplication sign1 (number)Module (mathematics)Limit (category theory)Electronic mailing listFreezing
CodeFile formatSoftware repositoryComputer fileFunction (mathematics)Demo (music)NumberLine (geometry)Standard deviationMultiplication signComputer fileCodeType theoryInstallation artMathematicsSource codeHookingDesign by contractConfiguration spaceSource code
Pauli exclusion principleSimultaneous localization and mappingCodeProgrammable read-only memoryStandard deviationComputer fileMathematicsFile formatArithmetic meanBookmark (World Wide Web)Code
Module (mathematics)String (computer science)CodeComputer architectureModule (mathematics)Software bug1 (number)CodePlug-in (computing)Electronic mailing listLine (geometry)Configuration spaceLengthStandard deviationSource codeJSONXMLUML
CodeElectronic mailing listJunction (traffic)Mathematical analysisFluid staticsParameter (computer programming)CodeType theorySoftware bugPlug-in (computing)Source codeBitInformation securityPi
Parameter (computer programming)Error messageElectronic mailing listTexture mappingModule (mathematics)Mathematical analysisFluid staticsSoftware bugFluid staticsParameter (computer programming)Type theoryFunctional (mathematics)CodeEntire functionMathematical analysis
Parameter (computer programming)Error messageMathematical analysisFluid staticsElectronic mailing listType theoryCode
Statistical hypothesis testingDemo (music)Electronic mailing listSheaf (mathematics)Statistical hypothesis testingIntegrated development environmentCASE <Informatik>Configuration spaceComputer configurationComputer fileCuboidSource code
Normed vector spaceExecution unitStatistical hypothesis testingWritingAnnulus (mathematics)Statistical hypothesis testingCodeStatistical hypothesis testingMultiplication signVirtual realityFunctional (mathematics)Unit testingMaxima and minimaRevision controlIntegrated development environmentWritingSystem callLatent heatBoilerplate (text)PiPhysical systemCode refactoringStatement (computer science)Shared memory
Function (mathematics)Plug-in (computing)Statistical hypothesis testingWritingExecution unitSuite (music)CodeSoftware repositoryMenu (computing)TorusInstallation artCache (computing)Continuous integrationCodeSoftware repositoryProcess (computing)Statistical hypothesis testingSystem callContinuous integrationDrop (liquid)Open sourceExterior algebraINTEGRALComputer fileSet (mathematics)Integrated development environmentFreewareJSONXMLUML
Configuration spaceComputer fileProcess (computing)Statistical hypothesis testingTotal S.A.Open sourceSoftware repositoryStatistical hypothesis testingComputer file1 (number)Statistical hypothesis testingProcess (computing)Figurate numberRobotConfiguration spaceRevision controlLatent heatLibrary (computing)Programming languageGreen's functionComputer animation
Covering spaceStatistical hypothesis testingComputing platformAutomationStatistical hypothesis testingCodeComputer configurationTraffic reportingStatistical hypothesis testingModule (mathematics)Suite (music)1 (number)Line (geometry)Source codeUnit testingLibrary (computing)InformationComputer fileXML
Demo (music)Statistical hypothesis testingCodeSoftware repositoryMathematicsStatistical hypothesis testingMultiplication signService (economics)CodeXML
CodeCodeSoftware repositoryMusical ensembleMereologyService (economics)RobotMoment (mathematics)
Condition numberGroup actionConfiguration spaceRule of inferenceBranch (computer science)RobotJSON
Revision controlCodeStatistical hypothesis testingMountain passDistribution (mathematics)Dot productStatistical hypothesis testingRobotMobile appComputer animation
Distribution (mathematics)PasswordBlogBlogSource codeComputer animation
Lecture/ConferenceMeeting/Interview
Goodness of fitHand fanRow (database)Sound effectCoefficient of determinationLecture/Conference
Scripting languageCodeRevision controlMoment (mathematics)WritingMeeting/InterviewLecture/Conference
Software repositoryTemplate (C++)Metric systemInstallation artStatistical hypothesis testingRoundness (object)Point (geometry)Meeting/InterviewLecture/Conference
Transcript: English(auto-generated)
Hello everybody, so you want to be a rock star? Don't worry. Everybody secretly wants to be a rock star So what does it take to become a rock star well step one you need to master your?
Instrument or multiple instruments if you play many of them But there's a very important step number two and that is that you need to learn to play in a band As a rock musician you're always on stage with other people you're always playing in a team So if you want to be a python rock star
Because that's your instrument then you need to learn to play well with your band Which is your team of other programmers and there are tools that can help you work together Play together better and those things are standards things that we can all agree on Best best practices that we found that work for us the best and tools that can actually check how well we're doing
following these standards and best practices so Hello. My name is Miho Kaczynski. I Come to you from Poland like Was said and I work at Intel on a very
Interesting open source project called n graph, which is a deep learning graph compiler Which I'm sure you will be hearing a lot about in the coming years but I Was in my spare time Doing some playing around a little bit with the open AI gym. I don't know if you know, this is a reinforcement learning
environment where you can teach an agent to play some solve some puzzles or play some games and What I found was that there wasn't actually a very good way to find out information about
Environments that you have installed locally, so I was playing around with these environments They're very good API's that you can use to Explore these environments. So I wrote up a little tool that That is able to help you explore these environments and I'm gonna show you
That a little demo right now if I can So this is just a command-line utility you just type something in it Shows you the environments that you have installed on your computer
Helps you pick the right environment for yourself and you can watch a random agent play space invaders and die quite quickly But you also get in the background some information About the rewards that it's getting for as it's playing which is helpful when you're developing your own
algorithm in the gym So, okay done that and I thought this is a very small thing, but it could be useful to others. So Maybe I can release this as a package Maybe somebody will be able to use this to play around as They are preparing to write their reinforcement learning algorithm for playing in the gym
So then I thought all right if I want to release this as an open source project. What will it take? and I Wanted to get Like put all these tools and best practices that I've learned from working on a larger open source project Into this tiny open source project and these are all the things that I found I
needed to to use so My my slides will be tagged with these little bubbles And you can the first set of bubbles shows you stages so you can prepare you first
you need to prepare your code for some of these things then you can automate some of the things that I will be talking about and Finally you can put all of this into a nice CI environment for continuous integration The little tags with a book icon show you like references for more information about this
things you can google for and the tags with no icon at all are just things you can pip install the names of packages you can pip install and tags with a little external link arrow are Names of services that I will be talking about which will come up on some slides so that's the legend for the next slide, but okay, let's start so
if you want to Write a command line utility you need to define your user interface your user interface in this case is your command line interface and The expectation of a user when he's coming to some utility is that the command line interface will work like this
If you type in the name of the utility with no With no other options It'll give you like this one line short description of what the syntax is a reminder of what the syntax of the command is If you want to find out more you You do the dash dash help thing and you get the longer description of the interface and then you can provide the
Options the values for the various options either with long names or short names. So that's what a good command line interface Looks like and actually It's written up in gnu guidelines for for command line interfaces. So how can you do this? Well, actually
Very easily. There's a tool that I like very much a package called doc opt which Allows you to define your entire command line interface just by writing a doc string for your
for your script, so you just write this documentation once and It acts as the help text that will come up when the user uses dash dash help, but it also becomes the The
input to the Doc opt function which is provided by the library which parses all the arguments takes all the values from the user from the command line and gives you back a list of Argument values that you can use directly in your script So that's the that's the first tool
I want to recommend to you doc opt but there are of course other things Other ways you can approach the same problem, okay, so Now we have a script with a nice command line interface what's the next step? Well, the next step is to Put all of this in a in a project in a package. So
this the the way we The standard we have in python for laying out code and In a directory that goes out that will go up on github is like this you The root directory will be the directory for your entire package and then inside of that you'll have
A readme file a setup.py file and some Some source code which can either go into a directory named the same as your module or better yet Just the directory called at source And then you may have some tests and some docs. So that's That that's where you can put your uh, your code
Okay, so that's ready now the next step I was thinking about okay, so and What should I do next well I have to refactor my code a little bit. So it's not just one big long function that does everything But is going to be something more maintainable. Maybe i'll have some contributors coming in. Maybe they want to
Add some features. It's good to prepare the code in some way that we can all Use to work together on later and so i'm just going to mention the The standard I think that we should all be following which is the clean code guidelines
from the famous book by uncle bob martin robert c martin and the tldr of Of clean code is basically that you should just write Small single purpose functions with meaningful names with arguments that have meaningful names
Each function serves a single responsibility doesn't take many parameters uncle bob says that two is the most that you should have and preferably no side effects and That allows you to uh, write things that you can easily test so you should write unit tests for each one of them
Okay, so some refactoring done let's uh, let's get to the next step well Now The good practice that we are all following is to use this construct where we're in our module using if name equals main then
We execute the function this this actually does two things One allows you to import the code from from from the module in another file and Refactoring the main function into a separate function will come in handy Very soon when i'm talking about
entry points in a second so um the next step Is to prepare a setup.py file now. This is actual setup.py file that I wrote It's not perfect, but there was a talk yesterday by By mark smith about writing the perfect
Preparing a perfect Pypy package so you should check that out on youtube afterwards if you haven't seen it but this is basically all you have to write to to get a To get setup tools to package up your code and this the arrow is pointing to a little trick that you can do to
If you have a readme file written in markdown just use that as the long description for your for your package That will later be available on pipe ui. So I recommend doing that Then If you already have a setup.py file, you can use it. Of course the
Basic use of a setup.py file is to prepare your packages so you can either Prepare a source package or a binary Distribution package a wheel which you can then upload to pipe ui But you can also use setup.py and you should be should be using it this way
During local development so you can actually inside of a virtual environment that you created for working on your project Use setup.py with the develop option to install it locally another way to do this maybe even better is to pip install dash e
and the current directory the dot indicates the current directory where the setup.py file is Which will actually just call pip to Call setup.py develop through pip but allows pip to handle the dependencies as well
so Another thing that setup.py allows you to do is to define entry points and this is This is a very useful feature of setup tools that not everybody takes advantage of Entry points allow you to actually combine
multiple packages Into systems where of plugins so you can have a main package and you can have other packages that That are plugins for that package and these things can be defined through entry points but A very simple use case for entry points is to define the console scripts
entry point which just gives you Which just creates a command So my command in this case will become the command that you can call that your user will be able to call at the command line After they install your package and this syntax here maps to a specific function
in a specific file in a specific module with this notation so If you want to write a command line utility, you should probably write a console script entry for your in your setup.py file okay, so Next next subject and that you need to take care of is
Requirements and this is a big subject which I will only be able to Skim over due to time limitations, but The Just of it is that you need to provide a way for your users to set up an environment
That resembles your environment As closely as possible and the way I use Requirements txt is to provide a list of specific packages at specific versions that i've tested the Package with
and this is This is very useful for for your users to then find out Okay, if it's not working for them, maybe one of the dependencies is at a different version so the Simplest way to create a requirements file is to use pip freeze and then you can
install these with pip install and I would recommend separating requirements and that you need for the actual Running up installation of the package from the ones that you only need for testing Because that will come in handy later when you're automating some ci processes
So i'm not going to get into pipenv or other approaches to handling requirements But you should look into those If you're curious Okay next Next best practice This is official now. I think we should all use black. So it's very simple to use
you just install it and then you just run it on your source code and it just Reformats the hell out of it But it does it in a consistent way so you may not like it, but it's the way we The way it works is consistent and we can all agree and that's the it's a huge value that we don't have to argue
How we're going to be formatting commas at the end of lines. There's a way that standardized and You know, let's just all stick to it. So Black is just one Formatter that you can use you can actually have a number of them
and if you If you use them, then a very good practice is to use them together with a pre-commit pre-commit is a simple tool that you install and then The first time you want to use it you you run this command pre-commit install and it sets up a git
pre-commit hook For running all of your code formatters so with if you want to use pre-commit with black then the Configuration file on the left Which you should store in the special
yamo file called pre-commit config will Set up will download black from the internet and prepare it for for running and then The next time you want to commit a change you type git commit That will trigger
black and run it on all your files and if anything is Changed by black meaning that it had to be reformatted. It will prevent you from actually committing the change. So it's a good useful tool for very quickly Checking your formatting before you even commit the change
So another Good way to test if you're actually following All the standards is to use a code linter my favorite is flake 8 but There are of course many others and why I like flake 8 is that it has this plugin architecture that I
Described before so you have flake 8 as the main module that you install but then there are Many many other flake 8 packages that you can add onto it And in this list you just have the ones that I like to use you can find others
they can Look for they can test not just compliance of your code to papate which is of course the requirement that The standard we should all be following but also it can look for some bugs common common mistakes that are made
sorting of your imports with isort and other things that That you like to have in your code. You can all be tested with these flake 8 plugins It's very easy to configure you can put your the configuration in talks any in the flake 8 section and define some uh define some
values like line length Now because we should all use black the official line length became 88 because it has a 10 tolerance for at 80 line length and a 10 tolerance and You can exclude some
checks from flake 8 if you want by Adding this ignore instruction in there and now if you run the flake 8 command it will load all of these plugins Run your code all of your source code through all of these tests and inform you if you're missing something or if something is
Not formatted correctly, or maybe you have a common bug or a security Fault that you didn't notice somewhere in your code. So this is very useful another useful Check is My pi and the type annotations that are now available in python 3
This takes a bit more work because you actually have to do the type like add the type annotation to all of your code But if you do it it pays off Because you can do static type analysis of your code before you actually
before you commit it so this Um will check if any where in your entire code base you're Calling something with the wrong type of argument and this can sometimes find bugs that you They're silly that you really didn't mean to do but somehow you put the wrong
You're calling a function with the wrong with the wrong variable for example and Normally you would have to find that somewhere and Fix it but my pi can find these types of issues for you very quickly
without even running your your code so use my pi for for this purpose if you If you have the patience to put the type annotations everywhere, but I recommend it. Okay. So now we have some checking
How do we Put it all together. Well the tool that everybody's recommending these days is talks and talks is very Simple to configure and it can put all of your tests together into one thing. So The a simple talks configuration is
written in the box on the left and it Defines a list of environments that will be tested in this case python 3.5 3.6 3.7 and then the definition of the testing environment the dependencies the commands we want to run and even some other
Configuration sections can be put all into this one talks in the file even for other tools so With this setup. All you have to do is run the talks command or if you want to run just a single environment You can run the talks command with the dash e option and the name of an environment and it will
Start by creating a virtual environment for that specific version of python installing all the dependencies into that environment installing your packaging up your code and installing it into the Virtual environment and then running the commands so all of your tests like here
I have flake again and pytest but you can build on top of that All the tests that you need can be run with one call of call to talks So that's very useful and it'll come in handy in a second when we're putting this all in in a ci system but If we're going to be testing things well
We need to write unit tests so This is where refactoring of the code into small functions comes in handy because now you can write simple unit tests for each function and the The test
tool that I think is becoming More and more popular all the time is pytest It's really easy to use you can write tests with minimal boilerplate Just uh, just just import your function run it and put some assert statements into a test and you have a test
That's all you have to do. So it's easy to get started and then you just run all the tests with a simple call to the pytest command okay, so Now we've got all the code preparation All the code is prepared everything is done we're ready to to
Share with the community a pretty Robust project. So what do we do? Well, of course, we put it up on in a git repository These days github is king but git lab, of course is a popular Alternative and and there there's bitbucket as well. So i'm not going to
To recommend Just github, but it is the one that has best integration for all the tools that I will be talking about from now on So you just set up a git repository Put all your code add all your code to the repository and push it to To the repository if you're creating a repository remember to put a git ignore file into your git repository
And the license the license is the thing that's easy to forget but it's critical If you don't put a license in your code, no one can use it so and Put the put the license
Set up the git repository and then you can proceed to setting up a continuous integration environment And I like to use travis But of course as with everything there are alternatives that travis is easy to use because you just prepare a simple Another yaml file you just drop another yaml file into your repository
and when you Put one like this that calls tox that one talks that one call to tox will run all your tests. So If you if you do that and you set up an account on travis and add your repository to that to that account the testing will start and
You will start seeing These little checks on all the prs that you make to your repository which are very useful Even for yourself when you're writing code you can Go through the pr process of your own changes and see if it passes all the tests Another useful tool that's available for free for anybody who has an open source
repository on github is a requirements updater I like to use pi up bot specifically for python Requirements, but there's also dependabot which is free for other languages as well so there's no configuration required you just
Set it up by creating an account and giving it access to to to your repository and then the bot will Scan your requirement file and figure out if they're up to date with versions of on pi pi If they're not then it will start creating pull requests with updates to specific versions of packages
And if you have a ci process in place then you will know which ones you can merge and which ones you can't because the ones you Can merge are the ones that have a green check marks and the ones that tests failed for will have a will have a cross
So that's very useful. Okay, another useful things another useful thing is to check your test coverage the pi test library and other Python unit test libraries can actually check which lines of your source code were hit When running your test suite and then give you a report So this is actually very easy to use for pi test. You just add this dash dash
Cove the option specify your module and then you'll get a report for for your module if you want more information You can ask for a html report and that will generate a code coverage report in an html An html files which show you exactly which lines of your source code
Are tested and which ones are not being tested so you can So you can See where you still need to add code add tests and then The you can actually integrate this with another service
Online which will track the test coverage over time and maybe even prevent you from merging changes which decrease the code coverage on your in your repository Another thing I want to mention is code review if you're working as a team the best thing you can do for each other is
Is is review each other's code that people you work with Have the chance and you have the chance to tell them which part of the music they're playing you really like And which one you think should be a little better and that's the moment to do. This is during code review But you can also there are also services now and I think they're getting better although they're far from perfect yet
They do automated code review so you can sign up for something like Kodas codacy or code climate and it will actually look at all the prs in your repository and find things that may be wrong with this code and give you
Make a code review on your prs Okay, another bot you can employ is something that will automatically merge prs So mergeify.io is one that I uh recently set up and you can configure rules that Apply to your pr
And if a pr matches these rules then it will get automatically merged for example, this is a configuration For automatically merging a pr that has passed ci and has at least one positive review and uh if your pr matches this then mergeify will Merge it and you can even set up a different rule that will delete the old branch
So if you have all this in place you actually have bots working for you The pi app bot will find updates to packages on pi pi Travis will test if these packages are Are passing your tests and if everything passes mergeify can merge these prs so without you doing anything
You can have your Project be kept up to date with its dependencies on pi pi So I'm getting to the end of my story now. You're ready to publish your Your project on pi pi. This is very easy. We have a tool called twine
And so once your packages are built you can upload them To twine if you just need to set up an account on pi pi And your package is published And you're happy and everybody can use it so I wrote up All the details I know this went fast
But everything that I said is in an article on my blog so you can read it at your own pace And with that, thank you very much Thank you so much mijo for this very interesting talk
Do we have questions for mijo? So, um Yeah, I haven't set up an automated documentation for this particular project because there isn't much documentation but
uh i'm torn between sphinx and make docs Because i'm a fan of markdown so I don't like restructured text which makes me biased against uh against sphinx, but I think sphinx is very powerful and i've seen it used to good effect by people so
I guess in the right hands Thank you Okay, uh, do we have another question for mijo? Yes, let's use the microphone for the recording Uh recently, I worked in an automated versioning and I came to bump to version together with a script that I made myself
Do you know some tool to do it? In an automated way or we have to do it ourselves at the moment to uh to bump the version of your Of the code that you're writing
That's a good question and I don't think I have encountered a tool that actually Does this so so far i've been doing it manually, but it would be a good good thing to to automate. Yes. True. Yeah. Thank you
We have one more question over there. I'll come to you Um, do you know if there's a way to install pre-commit globally or as a git? repo template because my experience is people forget to Install pre-commit hooks when they start new projects
Well, if they forget to install pre-commit they will pay for it during After they commit because they will the pr will not pass your tests So they it's it benefits them to install it. So so they'll have motivation to do it at some point
Do we have more questions? Raise your hands Okay, so if we don't have further questions for mihal another round of applause for mihal