Building a Naive Bayes Text Classifier with scikit-learn

Cite

EuroPython

Agbaneje, Obiamaka

Formal Metadata

Title

Building a Naive Bayes Text Classifier with scikit-learn

Title of Series

EuroPython 2018

Number of Parts

132

Author

Agbaneje, Obiamaka

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44904 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Machine learning algorithms used in the classification of text are Support Vector Machines, k Nearest Neighbors but the most popular algorithm to implement is Naive Bayes because of its simplicity based on Bayes Theorem. The Naive Bayes classifier is able to memorise the relationships between the training attributes and the outcome and predicts by multiplying the conditional probabilities of the attributes with the assumption that they are independent of the outcome. It is popularly used in classifying data sets that have a large number of features that are sparse or nearly independent such as text documents. In this talk, I will describe how to build a model using the Naive Bayes algorithm with the scikit-learn library using the spam/ham youtube comment dataset from the UCI repository. Preprocessing techniques such as Text normalisation and Feature extraction will be also be discussed.