We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Analyzing 750 billion events and 46 TB of code

Formal Metadata

Title
Analyzing 750 billion events and 46 TB of code
Subtitle
What you can learn from GitHub's shared data on BigQuery
Alternative Title
Analyze terabytes of OS code with one query: How to leverage the code shared on GitHub with ease
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
Google has made available a BigQuery copy of most open source code shared inGitHub. This allows any interested party to analyze 5 years of GitHub metadataand more than 42 terabytes of code easily. In this session we'll cover how toleverage this data - to understand the community around any language orproject. With this, design requests and decisions can be made looking at theactual patterns discoverable through analytical methods. Google has made available a BigQuery copy of most open source code shared inGitHub. This allows any interested party to analyze 5 years of GitHub metadataand more than 42 terabytes of code easily. In this session we'll cover how toleverage this data - to understand the community around any language orproject. With this, design requests and decisions can be made looking at theactual patterns discoverable through analytical methods. During a lighting talk we can quickly see: * How is this run. * How coding patterns have changed through time. * Guiding your project design decisions based on actual usage of your APIs. * How to request features based on data. * The most effective phrasing to request changes. * Effects of social media on a project's popularity. * Who starred your project - and what other projects interest them. * Measuring community health. * Running static code analysis at scale.