GrimoireLab a Python toolset for software development analytics

FOSDEM VZW

Gonzalez-Barahona, Jesus M.

Formal Metadata

Title

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Gonzalez-Barahona, Jesus M.

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42071 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The talk will explain how to analyze software development repositories ofcommon use in the free software community with [GrimoireLabtools], a toolset for software developmentanalytics writting in Python. It will start by explaining how to retrieve datafrom git, Bugzilla, GitHub, mailing lists, StackOverflow, Gerrit, and manyother repositories by, and organizing it in a database. The talk will laterexplain how this database can be exploited with several components of thetoolset, for different purposes. In this context, special attention will begiven to how to extract useful information from it using Python/Pandas andiPython/Jupyter Notebooks; and how to use ElasticSearch/Kibana to deployactionable dashboards that show data in all its glory. Many free / open source software (FOSS) projects feature an open developmentmodel, with public software development repositories which anyone can browse.These repositories are normally used to find specific information, such acertain commit or a particular bug report. But they can also be mined toextract all relevant data, so that it can be analyzed later to learn about anyspecific or general aspect of the project. This talk will explain theGrimoireLab method for doing that, which is based on organizing all thatinformation in a database, which can be later analyzed. This approach allowsfor minimal impact on the project infrastructure, since data is retrieved onlyonce, even if it later analyzed many times. It allows as well for efficiencyand comfort when mining data for an analysis, since the results are readilyavailable, databases can be shared and replicated at will, and queried themwith any kind of tools is easy. The tools that retrieve information from the repositories are grouped in theGrimoireLab toolset. It includes mature, widely tested programs capable ofextracting information from most repositories used by FOSS projects of anyscale. Many of them are agnostic with respect to the database used, althoughcurrently ElasticSearch is the best supported. The produced databases can be exploited in several ways, of which two will beexplained during the talk: using Python/Pandas to produce iPython/JupyterNotebooks which analyze some aspect of the project; and using Python to feed aElasticSearch cluster, with a Kibana front-end for visualizing in a flexible,powerful dashboard. All these approaches can be used to understand general aspects of the project,such as how efficient are the code review or bug fixing processes, how diverseare contributions to the git repository, or how conversations in mailing listsor StackOverflow are shaped. But they can be used as well to drill down, andanalyze the contributions by a certain developer, or the longer code reviewprocesses, or the contents of the most lively email and QA threads. The talk will explain the whole process from data retrieval to visualization,and will show some specific cases of real world use, such as the dashboardsproduced for Eclipse, OPNFV, MediaWiki and many others. Some of the contentsof the talk are described in detail in the online book GrimoireLab Training.