Building a Naive Bayes Text Classifier with scikit-learn

Zitieren

EuroPython

Agbaneje, Obiamaka

Formale Metadaten

Titel

Building a Naive Bayes Text Classifier with scikit-learn

Serientitel

EuroPython 2018

Anzahl der Teile

132

Autor

Agbaneje, Obiamaka

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/44904 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2018

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Machine learning algorithms used in the classification of text are Support Vector Machines, k Nearest Neighbors but the most popular algorithm to implement is Naive Bayes because of its simplicity based on Bayes Theorem. The Naive Bayes classifier is able to memorise the relationships between the training attributes and the outcome and predicts by multiplying the conditional probabilities of the attributes with the assumption that they are independent of the outcome. It is popularly used in classifying data sets that have a large number of features that are sparse or nearly independent such as text documents. In this talk, I will describe how to build a model using the Naive Bayes algorithm with the scikit-learn library using the spam/ham youtube comment dataset from the UCI repository. Preprocessing techniques such as Text normalisation and Feature extraction will be also be discussed.