Add to Watchlist

I Hate You, NLP... ;)

12 views

Citation of segment
Embed Code
Purchasing a DVD Cite video
Series
Annotations
Transcript
and I'd like you to all join me in welcoming Catherine because Catherine hates your computer given half and then on the by thinking about the 1 where you're Europe on behind the thinking of the your quite social that iron law of hosting a really great events I've had a wonderful time and meeting people way smarter than me so that my name is Catherine I know known across the Internet as the General I run a
company called the genesis and so you can find the something in writing and talking about things that he did not come and I like working with text and data on that and if you're ever in Berlin combined hiding in the approved by where from that group of people and we have to do so
I feel like because I'm talking on machine learning and I don't have a phd next to my name I need to have
this this disclaimer which is I'm not expert in machine learning I can't sit here and tell you about which algorithm algorithms work better and exactly why I'm a Python developer and
interested in applying machine learning to text analysis and sentiment analysis so by all means if you are a machine learning experts and I say something wrong feel free to either correct me in the questions or I will buy you a beer later and you can teach me something so here's some assumptions are made about you I'm not going to be going over and over basics undoubtedly going over machine learning basic I assume that we already have discovered that you wanna talk about basics later and then again of course I'm happy to talk
about later but I'm assuming that you already have done maybe even a little bit of sentiment analysis that you already have played around with some natural language processing and you have some
basic understanding of machine learning and what matters to you so here's a little bit of what will cover in the initial description I said that there was going to be like putting endeavors and this that are upon further inspection of the particle becomes a little bit more
difficult when we go into deep sentiment analysis and cross language sentiment analysis so what I'm going to do is I'm going to prevent you from codons repositories that are going to use but we don't have time to date for light because if it were going to cover some of the tools talk a little bit about what and how it's being used were gonna talk a little bit about sentiment analysis going to cover a lot of the research that's been done right now I'm seeing a massive gap in terms of the
research that's been done on sentiment analysis and the tools we have available in closed and I'm kind of hoping to push forward the conversation in the Python community by giving this talk and kind of covering a little bit of what's happening so what we will cover is some magical Python library where you can download today and it worked in multiple languages for sentiment analysis I must say that if you want to take a look at use and hate mail to sales forces
so in case you want to send him somewhere you want a practice of natural language generation and hate mail somewhere sentence sales force I met online it was really amazing start up by using matrix vectorization of recursive sorry yeah recurrent neural tensor networks and they have an open API and it was fabulous and then sale for
sales for spot them and they API and they're currently shutting down the page API and the I ran into a guy who works for a sales force who
actually gets to work with this team and I asked him I think and please let something be publicly available and he said good luck have fun with so again all the hate mail and if you work for sales force to somehow get inside the internal network of trees and and
so another 1 that's trying to do some things and the 1 and only 1 such available via API is is among the learned and and and here you can see that they have these different models available they have some that are being actively worked on you can also create a publicly available models so if you absolutely need hate cajun like any something that works tomorrow I was started from these places but we can see just even looking through the API that we have very different precision levels and so I would say that's why I'm giving this talk is we're gonna talk about some of the
methods in the theory
behind how to create a sentiment analysis that works for you so to begin with I wouldn't have about what is sentiment analysis and how to go about it in so what is sentiment analysis really and when I look at the street and I being know American some of my tweets I like to view of the American references so forgive me for being that way but only look this treaty on we have a lot of different sentiment right if we look at the modi we the crime if we look at this photo
that we have here then we see that a different emotion of crime if we look at the text then we
might just see the word life and determined that this is a possible models and so when we're trying sentiment analysis we've all these different things we stance of other fields words Jennifer Hudson had other field towards the the award the some really
complex things and we think we can boil them down into positive negative neutral but I wanted here to debunk them today so sentiment analysis is used in all different sorts of systems I'm and I had the pleasure of going to sentiment analysis and review recently and hearing just how many different places you there even people using it in anomaly detection and they're charting sentiment across the views on Twitter and they're looking at it via release tax and so there's all different ways that people are using sentiment analysis on including in obviously just simple user satisfaction or grant so we we look at the sentiment analysis steps this can basically be broken down into 4 major steps and they
basically go from left to right and top to bottom however depending on what you choose they could be mixed up so the 1st thing is dealing with the corpora are you building a lexicon what how using your labeling next then you're probably going to choose you algorithm or you can determine what the model or approach to then you're doing the 1st and preprocessing you might even do normalization and standardization across the dataset and finally you're testing evaluating and improving and in not approving instead you may revisit any of these earlier steps quite possible that you will need to to through out
so we're going to go to the talk and we're going to basically cover all of the steps and we don't talk about what's what's happening in research and what happening in the 1st step is choosing your lexical so 1 of the
oldest and most tested in true ways of auto tagging a lexicon particularly if you using social media data is on track and it's this idea of distant supervision so I'm going to go on and going to gather tweets or going to going down Facebook statuses or whatever I'm going to use and understand that we emoticons or image and you can think about this also as a good opportunity to use of the source so in quite a lot of the research they also just expand the corpora that they're using by using at the start so I wanna grab all hashtag happy have joyful excetera and this can also help expand into new languages if you're working in 1 language and in 1 of 5 different model as long as you have a very good thesaurus then you can do that
this is 1 of the other stuff that you're going to need to know you have all this data how do you talk we have this very naive idea that everything can be tied positive negative or positive positive negative neutral or even positive negative in determining neutral that's really in research has been
proven wrong and so what we have to think about here is if you really need a simple model that simply positive negative and neutral and then you you can a lot of gray area and you're going have probably more ending up in neutral or more ending up and false positives and false negatives so when you start moving away
from that and you start thinking about positive and negative at the scale of the start to see that things converge a lot faster and that you have less of these false positives and negatives this also means that you have to build your own let's upon and you have to label it yourself so there's pluses and minuses and depending on the resources of your team you may not be able to do this there's also quite a lot of people looking at stamps detection so this is a mixture of NLP topic identification or entity
detection and then using stamp and a lot of that right now is essentially a bag of words model still and you were trying to evolve that to be more advanced and to try and look at things and say OK can I determined that the use of words are actually applied to this organization or this entity and then I can start to detect the stance so maybe positive toward and really like this restaurant they have great food but the service was total crap then that means that I have 1 idea of the restaurant 1 idea of the student and 1 idea of the service and then finally we have this conversation by motion because we have this wide range of emotions it's really difficult to determine just positive so in terms of time the lexicon and looks like 1 of
my abilities but can we have a few different methods so all actually start from the bottom of a simple Boolean is 0 1 right or maybe negative
1 0 1 then we have a sliding scale which is going from completely negative to completely positive and I'm asking people than to just rate on the scale that they've actually found with research on that is
that people tide over time so if you're sitting there asking me to read something for 5 hours I'm eventually going to just be like neutral neutral neutral and charged on the right and like we know that just from our knowledge of human they also found that psychologically people avoid the edges so you get much more representative of like it's kind of negative it's kind of positive because people don't wanna say is completely negative or completely positive so 1 of the best methods so
that's recently come into practice is this best worst and you have a list of say 4 or maybe 5 words or n-grams and you can use that and you can ask people say OK choose the most positive and in most negative and that actually is creating some really interesting lexicon and what they found is that people agree after about 3 examples so I can show it to 3 people and move on to my next sample and I'm in the 90 per cent agreement ratio which is pretty masses for a time in mexico and this is another simple study and this is somebody on the east and Muhamad who works in the National Resource Council in Canada does quite a lot of different top foreign accented vowel competitions which is like a phantom and evaluation machine learning competition that happens every year and he has some really great state-of-the-art models what he found is that when switch to using a scale of negative
1 to 1 that's of
of about 90 % of people agree with . 4 of each other so that was a really interesting revelation to move away from this 0 1 because
we hear constantly will humans don't even agree 77 per cent of the time or whatever statistic you look at and what he was able to find is when you give people a scale and you let people talk about the gonna steal you can actually find this least perceptible difference and particularly then if you even focus on native language speakers you get an even smaller a smaller core difference so I had the opportunity to chat and I'll hopefully be posting chat with William Hamilton is currently working
at Stanford on his PhD and he's developed a new model called social sense and what he's done is he's taken subreddits and is determined that subreddits in different communities use language differently so when I'm hanging out in our stores and I say Amen really soft I don't mean that as a compliment right I mean that in a really negative way so
again we have this mentality a lot of times in this code that we all use the same words to describe positive and negative and that's just not the case especially if you're doing sentiment analysis on the community and especially community like for example as as Python developers would have a lot of different ways that we use works he has all of these data available and I took a look at our programs and I pulled up some of the most positive and the most negative I also found out that Python has a slightly negative definitions I know you're out of
international kind of are programming and he's to start the here and I found some interesting things it's funny to me that 200 right in the middle like you can tell that we have web developers in our program and spaghetti is really negative that I must again be among spaghetti code maybe not so much and we find that Minecraft and has a very positive association and so really interesting data dataset
it's only right now but I think very open to suggestions so feel free to send him any suggestions or ask them to do research and then for you so now we have electric comparable chosen the lexicon we need to choose an approach on other so how we do that what machine learning systems can and it turns out we can use a whole bunch and I know this is the end of the river a lot on the screen but they're
all different ways that people are using machine learning with sentiment analysis and they're achieving really great results with really small datasets so all come to this again later by if you are dealing with movie
reviews you get together but if you're not using the 90 the model lecture time and you start to enter this space with there's quite a lot of active research and not a lot of agreement on what good to use why or why not antibody in very different results with very different system obviously the finely tuned art
systems are still the best performing ones right now and this is these people that spend all their time researching and pulling out little details but we don't have time for that right we're probably Python developers 1st and sentiment analysis researchers next so I'll talk a little bit about how these are compare Ferdinand talked about and the older approaches and the new approaches bold approaches to the bag-of-words breathing continues that words and
I would say yeah there's many positive words must be positive the new idea is
to start to use word embeddings and searchability uses word embeddings in deep learning models and by doing so I have a little bit of more complexity and I can say OK are these words bunch together maybe they're bunch together because they have a sentiment together the following the old was term frequency inverse document frequency which is basically a nice way of doing bag of words on and the new way is using these doctors that were these documents embedding models the old way is kind of the events if I say good job I mean good and I'm talking about jobs and they're all related and new way is talking about skip-grams even dependency modifications which is starting to look at the parts of speech and label the words of parts of speech because I mean very different things if I use the word as a noun vs. and always the supervised state-of-the-art systems and new ways kind of into a
semi-supervised approach so you curious about some of data lots and introduce an overseas flight because they reference the papers across the bottom but this is a good
summarization and this is a summarization of again sigh Muhammad's model which is done really well in competitions and these are all the different things that he's palling out of features and he's fine tuning them and analyzing them in and pulling out probably more some features that we don't know about that have been released you can see that there is a starting place if you're interested in tuning your own market but less
about what we can actually use of Python developers and what we can use easily I we can use words about conductor having going year-round use Jensen but then so Johnson is this great library and what it
gives you a is these word about sorry coming people know about to back and after that but the but so this this idea of representing a word document as a vector so everything is in a vector space and can be multidimensional it is multidimensional and what are you going to do is you can use Johnson & Johnson can load in these dimensional vectors and give you a
matrix or a vector representation of each of the words in the document and what isn't going to give you is a mathematical way to represent text right and this is kind of where text was held back for quite some time but is beginning of a really good mathematical way to represent text reuse encounters in words that's not an interesting so there's also the type so
there's 2 different ways to work and that of already talked really a lot about where to back and actually there's not enough attention given to glove gloves of Stanford 1 and has a slightly different representation and uses a weighted model the word
2 vectors basically does this word appear in the sentence then it means it's somewhat related to these other words in the sentence whereas glove is gonna say hades these words are closer therefore they might have more of approximately and in terms of the so depending on what word embedding model use our method you use you can get really different results so if you're comparing of word embeddings built on bag-of-words 5 which is like that of 5 n-gram like you know 5 words around count then we have a bag of words to and then
we have a dependency which again is this part of speech and what we see is if you take a look at the Florida line what we see is in in Florida with bag of words we get this mixture of other ways to talk about Florida cities in Florida but we to dependencies we get other states and this is maybe significant to whatever you're trying to do if you want to other entities likewise entities rather than just related words you might want to look at a dependency word embedding rather than a bag of words word meaning there is an online comparison tool where you can look at this and so I took a look at right time and it looks like there's some of them mixed together to use that was 5 to or the dependency you give slightly different results and the really neat thing about this is that in this as well they give you a little bit of a visualization of how those words embeddings and so if you want to stay around in and start to make decisions on where using for your word embeddings then I recommend I'll going to this so another thing
that I really mean here is is is recently released there was a paper versions of the bottom
where where things are not neutral they based on human language and they have the same by is that you would expect to see in humans and to come up with some of these factors I was using the google news google news vectorization of the 301 and that 1 is used by places all over the place and I found some pretty atrocious stuff and maybe not in the 1st 10 examples like for example here at the bottom there was some intense racism going on if I looked to at the top 30 words that within that space so I for 1 year to especially if you're using it for something like sentiment right which would we may have some words mixed in there that are very biased and or using you from natural language generation you need to be very aware that these biases exist 1 interesting thing however is the author of this people were able to find they could
find vectors showing the actual like misogyny so they specifically focused on the misogyny and and they're actually able to reverse and reverse direction vectors by finding these factors that were that was pushing things in a direction where like women have at lesser profession FIL let's
about what that actually means but here I have some great examples used on the news too much code to go over and talk and so I would recommend looking at them these are great examples for simple machine learning so this is giving 95 or simple linear approaches when you're SVM young and what it's doing is it's going to take the vector and it's going to try to whatever it whatever you using as labels and if you want to do it on a document approach to eventually taking a vector of the document model or the sentence model it depends on the of results and then you're passing it as unlabeled sentence a labeled sentences are actually a model and Jensen searches using Jensen's you can label we're sentences and then pass them directly in let's say a logistic regression or whatever it is but to use the symbol of perpetrators of the shallow machine another thing to be aware of when you're making these decisions is you cheating and that's right so you have to choose your grammar by grammar how do you want to represent these and when you're choosing word embeddings here generally making those decisions as using the words and what they found this is the research by Stanford which enable them to come up with the matrix factorization of recurrent neural tensor networks and what they found is that the longer you let your n-grams the more
mixed sentiment you have which is kind of logical when you think about it and I'm going to go on and on and on and have a big 1 sentence I'm probably going to express may be different viewpoints or mixed emotions and so if you're choosing the uni-grams and
bi-grams you can see your way over here and there some words that are obviously negative and positive but there's
polylogarithmic just and the further you go toward the exploring n-grams but more complex sentiment you get and the more people are able to say looking at
this phrase is definitely negative and the whole phrase negative so this is based on the Stanford sentiments of words word which is available in Java another
approach that people using for this that's going pretty well is the LSC and which is long short-term memory
networks and these are deep learning networks and what they're using it is they're generally using these were embedded of in this document embedding and they're pushing them into you people in models and the nice thing about last and why been doing some really amazing stuff for also other natural language processing is that it can forget that
so it can learn things and it can forget so because it has that ability what it can do is can change as it sees more or less of a representative model so language we know changes and trends come and go and because of
that this our approach has been really are powerful and you can use it with piano and tensor for I have some examples
there's also this ability to use convolutional neural networks and so initially the idea is that convolutional neural networks really great images because you can trump and label images and then you accordingly through pixels right on the improper but not before using it for sentences or for document and they're able to use the word embeddings for each of those and try and move through them and then
through the same principles of pulling and softmax you can then you
reach a conclusion and say OK according to my sigmoid softmax this is positive or negative or neutral
and again I because you need to take a look way it a little bit different when you're during in dealing with deep learning label text so what a lot of times you'll have a hundred document and manually
passing red and a lot of times you produce using simple arrays like that this is what is positive is going to point out that this thing is is going on right there's different things happening in this so I will keep an eye on the space in terms of how it actually performs because again right here we're having this everything is the positive or negative i be curious to see and I know
that some academics are working on more complex representations of the labeling and because of that this might become an interesting space and then here is where a lot of people and using this dual system maybe have a simple
classifier and I have a lexical and and have the time and feeding them both into a deep learning systems so maybe I have a constant Twitter leader of this and that and it into a simple classifier and I'm alignment deep learning system to learn from that classifier this is where a lot of people are starting to experience some big jumps without having to
do a lot of work so if you like me really think this might be the best approach other still
obviously some open research but it's an interesting theory this idea that I can use my simple art classifier and feed into my deep learning systems and learning system will eventually and become more
intelligent and have a much wider lexicon over time so depending on the system you might need a handle regularization preprocessing and this is like the classic
NLP problems is preprocessing essential it really depends who you ask there have been some really interesting research on sentiment analysis insane doing too much propre sorry doing too much preprocessing actually is the problem and what ends up happening is you get
rid of some of the way that we speak in insight into 1 another or the way that we use colloquialisms and by getting rid of this you actually removing part of the how we're expressing ourselves so what I would recommend it was generally the literature agreed upon his do minimal pre-processing like maybe just some simple lemmatization or some simple title and then try and then do more preprocessing and then try and see where you hit a nice accuracy in your mind and this is kind of where sensitive back in species are really making some cool in words as the sea is a start based in Berlin and they're doing essentially passing the cursor face in 1 our and what they're able to do is they they trained on our behalf the English language and the German-language corpus available trained on radio data and whether they are able to do it is the timing of each of the vectors so sensitive that is you have this had vector so when I talk about
Google the organization google the verb mean different things and I might feel differently about that in terms of sentiment so they're making some great in when I
pass this is using space when I pass in the vector for the winky face interface old-school imaging of then I get back some of these other things and I can see that I'm getting back interjections which is probably a lot more of how we use the when the and nodes in the way we express sentiment like I might say something that I that I don't mean like I had to computer and then use when face to say I'm just kidding so here we can see that the teacher it's a species picking up on the fact that I may actually be be mitigating what I'm saying and I be really curious to see how the gets incorporated into a lot of things to worry about the
hasn't incorporated most of them finally we get to kind of like what IBM Watson is doing and we're talking about emotions so there is this famous
diagram that you see almost every sentiment analysis talk and this this idea that we have all these emotions that are emotions can be made it down to just the 8 core emotions and everything else is just some combination of those equal I and no as psychologists so I can't comment on this but you know it makes so much sense so if you look in the sentiment analysis and you wanna start moving away from just positive negative and you want to move into motion Wednesday they are in charge of like customer service channels and you're trying to say that I really wanna know win the region loading are activated and because we need to work on that so source but I don't care so much about the positive emotions like fear about 1 of the
things I am what's in a hospital in a lot of that's available on the meex API energizing classifying the anger joy fear distrust and some it also has the social tendencies and the language
styles so 1 of the interesting things that I was asking some researchers as they were to traveling in hey could you may be saying he I'm angry but saying something positive that means other sarcastic or ironic and when was there or something that is that nobody's really doing get let's together the interesting to hear about so I will not leave the building something that does something similar to this and taking a look at how
can we start to talk about we have words saying 1 thing but our tone or maybe told over time says and by building these models by even necessary thing like you have 1 person has the lexical and you speak a certain way can we actually start to understand
sentiment at a much deeper levels we'll get into that as we talk about the next portion which is what is still unsolved which is something quite a lot so apologies
if you you came and and it but to identify promising refining um planets here here here is lost on sentiment analysis models on we going known how to tell when somebody's being funny and we'll want times can even tell whether being funny about so here I have this mixture this you know social media presence which is this mixture of images contact and how could we started you know we don't understand images that we don't understand time but there's very few that try to develop an
intention and this is a really cheap part because Howard talking online is this combination so negation is still moving problems and here's a uni-gram bi-grams study that says when you negate a uni-gram and bi-gram the redline
is the assumption that you simply create a negative value for that you about them and what we see here is the blue dots are the actual representations so when I say if it was not that bad then I don't actually mean
that it was a good break and so we can start to look at these and talk about it is not so simple to say just because you some meaning dates and emotion is some offices and even some really interesting studies about modifiers and how maybe they can point towards and mathematical representation of negation of the sentence so back to Stanford sentiment trees which are and the r and
c and networks and that there are actually have gotten pretty advanced at determining negation and overall sentiment of complex and and what they're doing here is they're creating smaller n-grams and the creating sentiment vectorization for that and then they have a series of logic rules and the logical state OK well this is more negative than this is positive so overall it is negative and they're doing some really
interesting research again and they have their own a Java jar that you can download and play around with it but it's still mainly trained on his movie reviews which we'll get to in a 2nd so we we still have this problem that constraint on news so this is quite a lot of things that are part of our we have mixed sentiment of complex emotions sarcasm irony and humor lost
right we also have speaker intention personality and there's been some interesting research in that in terms of trying to determine who you are as a speaker and if I study you over time can I determine who you are as a speaker and then can I get a better idea of what you mean we also have slain in like new in older phrases so part of William Hamilton say with social science was looking at sentiment over time and he was
pointing out that if you look at historical documents you can use any normal sentiment analysis because in 19 20 people were talking a lot differently about that and we have these new phrases to where you know cool AIS means something very different than just school right and so there we also just general NLP issue so I feel like I will be left out of your mind vision of the Pokémon reference in my slides so like the ones in here I which I really hope of people in America do anyway because I'm very fearful watching from Berlin about what may or may not happen in November so yeah that's that simple the model and
animation in the following way so but here's an idea of this is consciously exploring sentiment but no sentiment analysis would understand it so we're talking about cultural references we also need an ability to say hey by now whole command mean emphasis we also have
sticking with images and get right so now what did he is part of Twitter this is important right arm I would not understand the sentiment of this tree if I could also look at the image and then determine sentiment of the
image right and the really interesting thing is a lot of time right so the government to
somewhere and maybe somewhere there's like past that he or hashtagged mood or whatever it is and I can apply these and then I can pull them out pull pull out the visual representation and put it back into words and use those words to make a decision another really big problem with this is speed speed and memory so if you ever try to to make your own word embeddings like I hope that you have a server somewhere that it was a lot of memory and you have a lot of time on because sometimes
compiling these word embeddings clause is actually little bit faster a lot faster almost half the time the word to that if you're making a word embeddings it can take 14 15
hours and if you're using gloves it can end up leaving gigabytes and gigabytes of RAM so if you're making these embeddings yourself a committee to become a real problem and a performance bottleneck both for kind of like keep that feeding my model and also how do I know it in a
real-time situation so let us some folks in region have been studying is the ability to create dense word vectors and dense document and they created this idea of densify and where there are with us yet another if you can see over in the corner but they were able to get it from the English words for executions per 2nd to densify mall at 178 only a slight difference in accuracy so the more solutions we find for sentiment analysis more problems in house and depending on especially if you're working in different languages this can get even more difficult what some people are doing that is interesting is attending to do aggregation within lexicon so if I aggravate everybody speaking english in this particular city what they're finding is maybe there's more agreement within that community on sentiment
and words so that's something to look at and I also haven't seen a lot of studies on ensemble methods and I think that that's coming into vogue as low as character level embeddings
so character low level and that and so the idea that we can predict and uh to ourselves on a character level given a language so here is a graphic immediately tongue-in-cheek environment little small but this is like my own version of 100 meters psychic
learned flow charts which basically says that if you have a movie reviews or any reviews set and if you have anything else and don't have money and time and samples you're pretty much groups so far found in and out of so we have a solid sentiment analysis not really but if you have time or willing to work on building a lexicon on particularly for your model what they found is that there can be some really great way so I have found a lot of people and also a lot of companies that building their own corpora and their own lexical on and they're actually in the 90th percentile so 1 of the things you can reasonably any questions on the person's life you can read all the the papers and look and
include a reference to the policy and as long as the CNN and the simple and simple model architecture for
everyone we have time to a few questions that you'd like the first one you need to do if you wanna leave desperate and a coffee please do so very quietly and respect for the market area I think the stock splits and 1 if you had a chance to look at the school of sentiment analysis EPA they just released this week I I haven't had a chance to look at that and you have that bookmarked but I'll let you know when they do this the problem really is that a lot of these really depend on the lexicon and Google is likely using the word embeddings and there were embeddings have problems because of of of using the word embeddings for sentiment so I would doubt it's massively better than what's available from Stanford which is generally in the high 70 per cent of but I think there is and and if you have done and have you use it up that is a little there's so much happening and it's hard to keep up with everything and also prepare the top which thanks very much for the talk and I'm a linguist by training error so all of the examples you mentioned we realize that like the being able to accurately tell but the sentiment is really relies and doing a lot about the context so how how much do you think we can actually expect how accurate do you think we expect to be just based on the text because it seems that people are really really pouring a lot of time and effort and time into making the text models really great and this is wonderful but maybe there's just a kappa how well we can do only based on text yeah I agree with you know I did you a certain degree to that statement and basically what I'm curious about is more when we start to look at these dependency models research able to detect stance and phrases that if I'm using a series of phrases is directed at a particular object and particularly if they can do that in a multilingual situation in which people are starting to prove that you can do that that's gonna take sentiment analysis we to the next 1 is I can start to say this cluster of words applies to this object and it's obviously negative words obviously positive then I can start to make a little bit of difference and bounds and moving away from just this idea that somehow words around other words so the very 1st step and that more we incorporate the purses into our models the better I sentiment analysis has gotten time pressure object thank you so thank you amazing so customers that's really interesting and it seems to me is that like the year produces a question and this and sentiment analysis is really a problem of holding up and so and when the weather and it's using the helps part hiding hole with the hundred instances and give you the user modeling and sentiment analysis of Masons United does and I was really really impressed with some of the deeper learning models that have been coming out of this and the fact that they're actually having really really good accuracy with very little training so long as actually some great papers and I'm happy to reference them and post them there's some great trails of saying I doubt we decided they were only there for a month and we were able to nearly get to state of the art fine systems so really think that deep learning is likely the answer here and as deep learning involves I think sentiment analysis will too because I have we see what is done in terms of natural language generation and if it's making his interest in natural language generation and we're getting these networks are able to start to understand what we mean always trying to say then we can start to maybe predict how we feel on but the problem is the sentiment analysis field is still like very very much a close model was the place they're doing it owned by places where they can be source code matter mind and uh so I think that there is this pressure on it says like open source developers to try and keep up with these things are happening behind closed doors so we have time for further so think you join me in thanking change
Computer animation
Lattice (order)
Internetworking
Multiplication sign
Physical law
Computational intelligence
Event horizon
Algorithm
Computer virus
Observational study
Software developer
Software developer
Expert system
Coma Berenices
Expert system
Local Group
Virtual machine
Machine learning
Computer animation
Whiteboard
Physics
Equation
Arithmetic mean
Machine learning
Process (computing)
Computer animation
Process (computing)
Natural language
Mathematical analysis
Virtual machine
Expert system
Bit
Natural language
Virtual machine
Dataflow
Multiplication sign
Covering space
Mathematical analysis
Bit
Mathematical analysis
Mass
Particle system
Tensor
Machine learning
Computer animation
Repository (publishing)
Term (mathematics)
Natural language
Descriptive statistics
Fundamental theorem of algebra
Email
Multiplication
Electric generator
Artificial neural network
Forcing (mathematics)
Mathematical analysis
Bit
Recurrence relation
Tensor
Matrix (mathematics)
Computer animation
Vector space
Lecture/Conference
Natural language
Data conversion
Recursion
Library (computing)
Web page
Email
Computer animation
Network topology
Artificial neural network
Forcing (mathematics)
Scientific modelling
Energy level
Subtraction
Digital photography
Computer animation
Dedekind cut
Mathematical analysis
Right angle
Mathematical analysis
Subtraction
Theory
Twitter
Scientific modelling
View (database)
Mathematical analysis
Mathematical analysis
Field (computer science)
Twitter
Word
Video game
Computer animation
Intrusion detection system
Quicksort
Subtraction
Boiling point
Physical system
Standard deviation
Computer animation
Process (computing)
Scientific modelling
Normal (geometry)
Software testing
Mathematical analysis
Maxima and minima
Computer icon
Scale (map)
Trail
Scientific modelling
Simultaneous localization and mapping
Source code
Electronic mailing list
Table (information)
Distance
Twitter
Sign (mathematics)
Facebook
Medical imaging
Computer animation
Hypermedia
Emoticon
Natural language
Negative number
Area
Scale (map)
Sign (mathematics)
Mixture model
Scaling (geometry)
Computer animation
Scientific modelling
Negative number
System identification
Negative number
Position operator
Boolean algebra
Scale (map)
Greatest element
Boolean algebra
Service (economics)
Scientific modelling
Multiplication sign
Range (statistics)
Bit rate
Student's t-test
Word
Sign (mathematics)
Word
Computer animation
Term (mathematics)
Self-organization
Right angle
Data conversion
Scale (map)
Word
Boolean algebra
Scaling (geometry)
Computer animation
Bit rate
Multiplication sign
Negative number
Representation (politics)
Bit rate
Position operator
Scale (map)
Boolean algebra
Observational study
Scientific modelling
Multiplication sign
Sampling (statistics)
Virtual machine
Bit rate
Mass
Performance appraisal
Word
Word
Computer animation
Negative number
Subtraction
Statistics
Scaling (geometry)
Computer animation
Multiplication sign
Core dump
Natural language
Subtraction
Metropolitan area network
Computer programming
Code
Scientific modelling
Software developer
Multiplication sign
Definite quadratic form
Mathematical analysis
Usability
Word
Computer animation
Data storage device
Statistics
Natural language
Right angle
Subtraction
Computer programming
Touchscreen
Computer animation
State diagram
Web-Designer
Code
Virtual machine
Right angle
Computer programming
Associative property
Physical system
Spacetime
Model theory
Multiplication sign
Scientific modelling
Mathematical analysis
Virtual machine
Inverse element
Mathematical analysis
Embedding
Term (mathematics)
Word
Frequency
Computer animation
CNN
System programming
Subtraction
Resultant
Physical system
Model theory
Multiplication sign
Software developer
Mathematical analysis
1 (number)
Inverse element
Bit
Mathematical analysis
Embedding
Term (mathematics)
Word
Word
Frequency
Computer animation
CNN
System programming
Position operator
Physical system
Complex (psychology)
Greatest element
Process (computing)
Scientific modelling
Embedding
Bit
Mathematical analysis
Inverse element
Mereology
Event horizon
Word
Frequency
Word
Computer animation
Term (mathematics)
Speech synthesis
Physical system
Metropolitan area network
System call
Scientific modelling
Software developer
State of matter
Dimensional analysis
Port scanner
Emulation
Sign (mathematics)
Word
Number
Word
Punched tape
Computer animation
Estimation
Moving average
Hazard (2005 film)
Library (computing)
Software developer
Natural language
Multiplication sign
Basis (linear algebra)
Mathematics
Word
Word
Matrix (mathematics)
Computer animation
Vector space
Continuous function
Natural number
Representation (politics)
Right angle
Process (computing)
Implementation
Data type
Library (computing)
Software developer
Natural language
Scientific modelling
Embedding
Counting
Basis (linear algebra)
Mathematics
Word
Word
Computer animation
Vector space
Continuous function
Term (mathematics)
Natural number
Representation (politics)
Process (computing)
Implementation
Library (computing)
Resultant
Pairwise comparison
Greatest element
State of matter
Multiplication sign
Decision theory
Embedding
Bit
Embedding
Line (geometry)
Mereology
Revision control
Word
Mixture model
Word
Arithmetic mean
Degree (graph theory)
Computer animation
Visualization (computer graphics)
Speech synthesis
Right angle
Pairwise comparison
Resultant
Metropolitan area network
Greatest element
Electric generator
Spacetime
Divisor
Direction (geometry)
Word
Word
Computer animation
Vector space
Authorization
Natural language
Right angle
Reverse engineering
Divisor
Code
Linear regression
Artificial neural network
Scientific modelling
Decision theory
Length
Virtual machine
Embedding
Symbol table
Recurrence relation
Tensor
Word
Matrix (mathematics)
Computer animation
Vector space
Mixed reality
Formal grammar
Subtraction
Resultant
Read-only memory
Raw image format
Word
Computer animation
Length
Mathematics
Tensor
Process (computing)
Dataflow
Computer animation
Artificial neural network
Scientific modelling
Embedding
Moving average
Representation (politics)
Natural language
Twitter
Pixel
Dataflow
Artificial neural network
Computer-generated imagery
Interior (topology)
Embedding
Maxima and minima
Tensor
Medical imaging
Word
Tensor
Computer animation
CNN
Machine vision
Spacetime
Computer animation
CNN
Term (mathematics)
Ring (mathematics)
Multiplication sign
Computer-generated imagery
Interior (topology)
Bit
Logical constant
Spacetime
Computer animation
CNN
Multiplication sign
Representation (politics)
Physical system
Hydraulic jump
Distance
Physical system
Twitter
Multiplication sign
Mathematical analysis
Parsing
Regular graph
Theory
Distance
Preprocessor
Preprocessor
Computer animation
CNN
output
Physical system
Physical system
Multiplication sign
Group theory
Mereology
Data model
Word
Preprocessor
Word
Computer animation
Vector space
Term (mathematics)
Self-organization
Natural language
Process (computing)
Species
Bounded variation
Medical imaging
Spacetime
Computer animation
Vector space
Vertex (graph theory)
Interface (computing)
Species
Computational intelligence
Service (economics)
Computer animation
Natural language
Source code
Core dump
Mathematical analysis
Combinational logic
Diagram
Natural language
Mathematical analysis
Bounded variation
Position operator
Building
Word
Computer animation
Natural language
Multiplication sign
Scientific modelling
Mathematical analysis
Weight
Position operator
Medical imaging
Mixture model
Voting
Identifiability
Computer animation
Hypermedia
Scientific modelling
Multiplication sign
Mathematical analysis
Energy level
Control flow
Dot product
Observational study
Line (geometry)
Combinational logic
Electronic mailing list
Mereology
Cartesian coordinate system
Hypothesis
Sound effect
Computer animation
Network topology
Negative number
Representation (politics)
Diagonal
Series (mathematics)
Complex (psychology)
Observational study
Artificial neural network
State of matter
Arithmetic mean
Mathematics
Prediction
Computer animation
Network topology
Vector space
Logic
Uniform resource name
Negative number
Representation (politics)
Office suite
Context awareness
Complex (psychology)
NP-hard
Constraint (mathematics)
Computer animation
Java applet
Term (mathematics)
Multiplication sign
Hash function
Mereology
Slide rule
Arithmetic mean
Computer animation
Machine vision
Electronic meeting system
Mathematical analysis
1 (number)
Normal (geometry)
Right angle
Read-only memory
Server (computing)
Multiplication sign
Decision theory
Computer-generated imagery
Embedding
Iterated function system
Hidden Markov model
Mereology
Discrete element method
Twitter
CAN bus
Inclusion map
Word
Medical imaging
Population density
Word
Computer animation
Network topology
Representation (politics)
Right angle
Social software
Line (geometry)
Multiplication sign
Scientific modelling
Disintegration
Embedding
Word
Finite element method
Population density
Mixed reality
Commitment scheme
output
Subtraction
Arc (geometry)
Scale (map)
Musical ensemble
Point (geometry)
Interior (topology)
Mathematical analysis
Bit
Embedding
Population density
Word
Computer animation
Vector space
Intrusion detection system
Speech synthesis
Data Encryption Standard
Natural language
Negative number
Musical ensemble
Social software
Observational study
Scientific modelling
Multiplication sign
Disintegration
Embedding
Solid geometry
Revision control
Word
Video game
Mixed reality
Dedekind cut
Flowchart
Energy level
output
Scale (map)
Musical ensemble
Metropolitan area network
Point (geometry)
Mathematical analysis
Sampling (statistics)
Local Group
Word
Computer animation
Graphical user interface
Natural language
Context awareness
Open source
State of matter
Scientific modelling
Multiplication sign
Source code
Embedding
Mereology
Field (computer science)
Wave packet
Mathematics
Term (mathematics)
Presentation of a group
Subtraction
Error message
Computer architecture
Physical system
Area
Series (mathematics)
Electric generator
Slide rule
Cohen's kappa
Artificial neural network
Software developer
Mathematical analysis
Bound state
Coma Berenices
Bit
Template (C++)
Degree (graph theory)
Word
Computer animation
Statement (computer science)
Natural language
Object (grammar)
Pressure
Loading...

Metadata

Formal Metadata

Title I Hate You, NLP... ;)
Title of Series EuroPython 2016
Part Number 99
Number of Parts 169
Author Jarmul, Katharine
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/21170
Publisher EuroPython
Release Date 2016
Language English

Content Metadata

Subject Area Information technology
Abstract Katharine Jarmul - I Hate You, NLP... ;) In an era of almost-unlimited textual data, accurate sentiment analysis can be the key for determining if our products, services and communities are delighting or aggravating others. We'll take a look at the sentiment analysis landscape in Python: touching on simple libraries and approaches to try as well as more complex systems based on machine learning. ----- Overview ------------- This talk aims to introduce the audience to the wide array of tools available in Python focused on sentiment analysis. It will cover basic semantic mapping, emoticon mapping as well as some of the more recent developments in applying neural networks, machine learning and deep learning to natural language processing. Participants will also learn some of the pitfalls of the different approaches and see some hands-on code for sentiment analysis. Outline ----------- * NLP: then and now * Why Emotions Are Hard * Simple Analysis * TextBlob (& other available libraries) * Bag of Words * Naive Bayes * Complex Analysis * Preprocessing with word2vec * Metamind & RNLN * Optimus & CNN * TensorFlow * Watson * Live Demo * Q&A
Loading...
Feedback

Timings

  631 ms - page object

Version

AV-Portal 3.7.0 (943df4b4639bec127ddc6b93adb0c7d8d995f77c)