We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PySpark and Warcraft Data

Formal Metadata

Title
PySpark and Warcraft Data
Title of Series
Part Number
123
Number of Parts
173
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceBilbao, Euskadi, Spain

Content Metadata

Subject Area
Genre
Abstract
Vincent Warmerdam - PySpark and Warcraft Data In this talk I will describe how to use Apache Spark (PySpark) with some data from the World of Warcraft API from an iPython notebook. Spark is interesting because it speeds up iterative processes on your hadoop cluster as well as your local machine. I will give basic benchmarks (comparing it to numpy/pandas/scikit), explain the architecture/performance behind the technology and will give a live demo on how I used Spark to analyse an interesting dataset. I'll explain why you might want to use Spark and I'll also go in and explain when you don't want to use it. The dataset I will be using is a 22Gb json blob containing auction house data from all world of warcraft servers over a period of time. The goal of the analysis will be to determine when and if basic economics still applies in a massively online game. I will assume that the everyone knows what the ipython notebook is and I will assume a basic knowledge of numpy/pandas but nothing fancy. The dataset has been chosen such that people who are less interested in Spark can still enjoy the analysis part of the talk. If you know very little about data science but if you love video games then you should like this talk.
Keywords