Top 15 Python Tips for Data Cleaning/ Understanding

EuroPython

Chua, Hui Xiang

Formale Metadaten

Titel

Untertitel

With two bonus tips!

Serientitel

EuroPython 2020

Anzahl der Teile

130

Autor

Chua, Hui Xiang

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/49924 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2020

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Data cleaning is one of the most important tasks in data science but it is unglamorous, underappreciated and under-discussed. These are some common tasks involved in data cleaning but not limited to: - Merging/ appending - Checking completeness of data - Checking of valid values - De-duplication - Handling of missing values - Recoding Most, if not all, of the time, the datasets that we have to analyze are unclean. i.e. they are not necessarily complete/ accurate/ valid. This will impact the accuracy of our analysis if we do not clean them properly. This talk covers how to perform data cleaning and understanding using primarily Pandas and Numpy. If you’re new to data analytics/ data science and are interested how to use Python to perform analysis, or if you're an Excel user hoping to move to Python, this talk might be for you. Participants should be at least familiar with the basics of Python programming.