Lorena Mesa - Is that spam in my ham?
Beginning programmers or Python beginners may find it overwhelming to
implement a machine learning algorithm. Increasingly machine learning
is becoming more applicable to many areas. This talk introduces key
concepts and ideas and uses Python to build a basic classifier - a
common type of machine learning problem. Providing some jargon to help
those that may be self-educated or currently learning
-----
Supervised learning, machine learning, classifiers, big data! What in
the world are all of these things? As a beginning programmer the
questions described as "machine learning" questions can be mystifying
at best.
In this talk I will define the scope of a machine learning problem,
identifying an email as ham or spam, from the perspective of a
beginner (non master of all things "machine learning") and show how
Python can help us simply learn how to classify a piece of email.
To begin we must ask, what is spam? How do I know it "when I see it"?
From previous experience of course! We will provide human labeled
examples of spam to our model for it to understand the likelihood of
spam or ham. This approach, using examples and data we already know to
determine the most likely label for a new example, uses the Naive
Bayes classifier.
Our model will look at the words in the body of an email, finding the
frequency of words in both spam and ham emails and the frequency of
spam and ham. Once we know the prior likelihood of spam and what makes
something spam, we can try applying a label to a new example.
Through this exercise we will see at a basic level what types of
questions machine learning asks, learn to model "learning" with
Python, and understand how learning can be measured. |