Recipe for text analysis in social media: a linguistic approach

Cite

EuroPython

Veny, Eulàlia

Formal Metadata

Title

Recipe for text analysis in social media: a linguistic approach

Title of Series

EuroPython 2018

Number of Parts

132

Author

Veny, Eulàlia

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44960 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The analysis of text data in social media is gaining more and more importance every day. The need for companies to know what people think and want is key to invest money in providing customers what they want. The first approach to text analysis was mainly statistical, but adding linguistic information has been proven to work well for improving the results. One of the problems that you need to address when analyzing social media is time. People are constantly exchanging information, users write comments every day about what they think of a product, what they do or the places they visit. It is difficult to keep track of everything that happens. Moreover, information is sometimes expressed in short sentences, keywords, or isolated ideas, such as in Tweets. Language is usually unstructured because it is composed of isolated ideas, or without context. I will talk about the problem of text analysis in social media. I will also explain briefly Naïve Bayes classifiers, and how you can easily take advantage of them to analyse sentiment in social media, and I will use an example to show how linguistic information can help improve the results. I will also evaluate the pros and cons of supervised vs unsupervised learning. Finally, I will introduce opinion lexicons, both dictionary based and corpus-based, and how lexicons can be used in semi-supervised learning and supervised learning. If I have time left, I will explain about other use cases of text analysis.

Keywords

Veny, Eulalia