Is that spam in my ham? - TIB AV-Portal

Is that spam in my ham?

39

Formal Metadata

Title

Is that spam in my ham?

Title of Series

EuroPython 2016

Part Number

7

Number of Parts

169

Author

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/21176 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Lorena Mesa - Is that spam in my ham? Beginning programmers or Python beginners may find it overwhelming to implement a machine learning algorithm. Increasingly machine learning is becoming more applicable to many areas. This talk introduces key concepts and ideas and uses Python to build a basic classifier - a common type of machine learning problem. Providing some jargon to help those that may be self-educated or currently learning ----- Supervised learning, machine learning, classifiers, big data! What in the world are all of these things? As a beginning programmer the questions described as "machine learning" questions can be mystifying at best. In this talk I will define the scope of a machine learning problem, identifying an email as ham or spam, from the perspective of a beginner (non master of all things "machine learning") and show how Python can help us simply learn how to classify a piece of email. To begin we must ask, what is spam? How do I know it "when I see it"? From previous experience of course! We will provide human labeled examples of spam to our model for it to understand the likelihood of spam or ham. This approach, using examples and data we already know to determine the most likely label for a new example, uses the Naive Bayes classifier. Our model will look at the words in the body of an email, finding the frequency of words in both spam and ham emails and the frequency of spam and ham. Once we know the prior likelihood of spam and what makes something spam, we can try applying a label to a new example. Through this exercise we will see at a basic level what types of questions machine learning asks, learn to model "learning" with Python, and understand how learning can be measured.