Merken

I Hate You, NLP... ;)

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and I'd like you to all join me in welcoming Catherine because Catherine hates your computer given half and then on the by thinking about the 1 where you're Europe on behind the thinking of the your quite social that iron law of hosting a really great events I've had a wonderful time and meeting people way smarter than me so that my name is Catherine I know known across the Internet as the General I run a
company called the genesis and so you can find the something in writing and talking about things that he did not come and I like working with text and data on that and if you're ever in Berlin combined hiding in the approved by where from that group of people and we have to do so
I feel like because I'm talking on machine learning and I don't have a phd next to my name I need to have
this this disclaimer which is I'm not expert in machine learning I can't sit here and tell you about which algorithm algorithms work better and exactly why I'm a Python developer and
interested in applying machine learning to text analysis and sentiment analysis so by all means if you are a machine learning experts and I say something wrong feel free to either correct me in the questions or I will buy you a beer later and you can teach me something so here's some assumptions are made about you I'm not going to be going over and over basics undoubtedly going over machine learning basic I assume that we already have discovered that you wanna talk about basics later and then again of course I'm happy to talk
about later but I'm assuming that you already have done maybe even a little bit of sentiment analysis that you already have played around with some natural language processing and you have some
basic understanding of machine learning and what matters to you so here's a little bit of what will cover in the initial description I said that there was going to be like putting endeavors and this that are upon further inspection of the particle becomes a little bit more
difficult when we go into deep sentiment analysis and cross language sentiment analysis so what I'm going to do is I'm going to prevent you from codons repositories that are going to use but we don't have time to date for light because if it were going to cover some of the tools talk a little bit about what and how it's being used were gonna talk a little bit about sentiment analysis going to cover a lot of the research that's been done right now I'm seeing a massive gap in terms of the
research that's been done on sentiment analysis and the tools we have available in closed and I'm kind of hoping to push forward the conversation in the Python community by giving this talk and kind of covering a little bit of what's happening so what we will cover is some magical Python library where you can download today and it worked in multiple languages for sentiment analysis I must say that if you want to take a look at use and hate mail to sales forces
so in case you want to send him somewhere you want a practice of natural language generation and hate mail somewhere sentence sales force I met online it was really amazing start up by using matrix vectorization of recursive sorry yeah recurrent neural tensor networks and they have an open API and it was fabulous and then sale for
sales for spot them and they API and they're currently shutting down the page API and the I ran into a guy who works for a sales force who
actually gets to work with this team and I asked him I think and please let something be publicly available and he said good luck have fun with so again all the hate mail and if you work for sales force to somehow get inside the internal network of trees and and
so another 1 that's trying to do some things and the 1 and only 1 such available via API is is among the learned and and and here you can see that they have these different models available they have some that are being actively worked on you can also create a publicly available models so if you absolutely need hate cajun like any something that works tomorrow I was started from these places but we can see just even looking through the API that we have very different precision levels and so I would say that's why I'm giving this talk is we're gonna talk about some of the
methods in the theory
behind how to create a sentiment analysis that works for you so to begin with I wouldn't have about what is sentiment analysis and how to go about it in so what is sentiment analysis really and when I look at the street and I being know American some of my tweets I like to view of the American references so forgive me for being that way but only look this treaty on we have a lot of different sentiment right if we look at the modi we the crime if we look at this photo
that we have here then we see that a different emotion of crime if we look at the text then we
might just see the word life and determined that this is a possible models and so when we're trying sentiment analysis we've all these different things we stance of other fields words Jennifer Hudson had other field towards the the award the some really
complex things and we think we can boil them down into positive negative neutral but I wanted here to debunk them today so sentiment analysis is used in all different sorts of systems I'm and I had the pleasure of going to sentiment analysis and review recently and hearing just how many different places you there even people using it in anomaly detection and they're charting sentiment across the views on Twitter and they're looking at it via release tax and so there's all different ways that people are using sentiment analysis on including in obviously just simple user satisfaction or grant so we we look at the sentiment analysis steps this can basically be broken down into 4 major steps and they
basically go from left to right and top to bottom however depending on what you choose they could be mixed up so the 1st thing is dealing with the corpora are you building a lexicon what how using your labeling next then you're probably going to choose you algorithm or you can determine what the model or approach to then you're doing the 1st and preprocessing you might even do normalization and standardization across the dataset and finally you're testing evaluating and improving and in not approving instead you may revisit any of these earlier steps quite possible that you will need to to through out
so we're going to go to the talk and we're going to basically cover all of the steps and we don't talk about what's what's happening in research and what happening in the 1st step is choosing your lexical so 1 of the
oldest and most tested in true ways of auto tagging a lexicon particularly if you using social media data is on track and it's this idea of distant supervision so I'm going to go on and going to gather tweets or going to going down Facebook statuses or whatever I'm going to use and understand that we emoticons or image and you can think about this also as a good opportunity to use of the source so in quite a lot of the research they also just expand the corpora that they're using by using at the start so I wanna grab all hashtag happy have joyful excetera and this can also help expand into new languages if you're working in 1 language and in 1 of 5 different model as long as you have a very good thesaurus then you can do that
this is 1 of the other stuff that you're going to need to know you have all this data how do you talk we have this very naive idea that everything can be tied positive negative or positive positive negative neutral or even positive negative in determining neutral that's really in research has been
proven wrong and so what we have to think about here is if you really need a simple model that simply positive negative and neutral and then you you can a lot of gray area and you're going have probably more ending up in neutral or more ending up and false positives and false negatives so when you start moving away
from that and you start thinking about positive and negative at the scale of the start to see that things converge a lot faster and that you have less of these false positives and negatives this also means that you have to build your own let's upon and you have to label it yourself so there's pluses and minuses and depending on the resources of your team you may not be able to do this there's also quite a lot of people looking at stamps detection so this is a mixture of NLP topic identification or entity
detection and then using stamp and a lot of that right now is essentially a bag of words model still and you were trying to evolve that to be more advanced and to try and look at things and say OK can I determined that the use of words are actually applied to this organization or this entity and then I can start to detect the stance so maybe positive toward and really like this restaurant they have great food but the service was total crap then that means that I have 1 idea of the restaurant 1 idea of the student and 1 idea of the service and then finally we have this conversation by motion because we have this wide range of emotions it's really difficult to determine just positive so in terms of time the lexicon and looks like 1 of
my abilities but can we have a few different methods so all actually start from the bottom of a simple Boolean is 0 1 right or maybe negative
1 0 1 then we have a sliding scale which is going from completely negative to completely positive and I'm asking people than to just rate on the scale that they've actually found with research on that is
that people tide over time so if you're sitting there asking me to read something for 5 hours I'm eventually going to just be like neutral neutral neutral and charged on the right and like we know that just from our knowledge of human they also found that psychologically people avoid the edges so you get much more representative of like it's kind of negative it's kind of positive because people don't wanna say is completely negative or completely positive so 1 of the best methods so
that's recently come into practice is this best worst and you have a list of say 4 or maybe 5 words or n-grams and you can use that and you can ask people say OK choose the most positive and in most negative and that actually is creating some really interesting lexicon and what they found is that people agree after about 3 examples so I can show it to 3 people and move on to my next sample and I'm in the 90 per cent agreement ratio which is pretty masses for a time in mexico and this is another simple study and this is somebody on the east and Muhamad who works in the National Resource Council in Canada does quite a lot of different top foreign accented vowel competitions which is like a phantom and evaluation machine learning competition that happens every year and he has some really great state-of-the-art models what he found is that when switch to using a scale of negative
1 to 1 that's of
of about 90 % of people agree with . 4 of each other so that was a really interesting revelation to move away from this 0 1 because
we hear constantly will humans don't even agree 77 per cent of the time or whatever statistic you look at and what he was able to find is when you give people a scale and you let people talk about the gonna steal you can actually find this least perceptible difference and particularly then if you even focus on native language speakers you get an even smaller a smaller core difference so I had the opportunity to chat and I'll hopefully be posting chat with William Hamilton is currently working
at Stanford on his PhD and he's developed a new model called social sense and what he's done is he's taken subreddits and is determined that subreddits in different communities use language differently so when I'm hanging out in our stores and I say Amen really soft I don't mean that as a compliment right I mean that in a really negative way so
again we have this mentality a lot of times in this code that we all use the same words to describe positive and negative and that's just not the case especially if you're doing sentiment analysis on the community and especially community like for example as as Python developers would have a lot of different ways that we use works he has all of these data available and I took a look at our programs and I pulled up some of the most positive and the most negative I also found out that Python has a slightly negative definitions I know you're out of
international kind of are programming and he's to start the here and I found some interesting things it's funny to me that 200 right in the middle like you can tell that we have web developers in our program and spaghetti is really negative that I must again be among spaghetti code maybe not so much and we find that Minecraft and has a very positive association and so really interesting data dataset
it's only right now but I think very open to suggestions so feel free to send him any suggestions or ask them to do research and then for you so now we have electric comparable chosen the lexicon we need to choose an approach on other so how we do that what machine learning systems can and it turns out we can use a whole bunch and I know this is the end of the river a lot on the screen but they're
all different ways that people are using machine learning with sentiment analysis and they're achieving really great results with really small datasets so all come to this again later by if you are dealing with movie
reviews you get together but if you're not using the 90 the model lecture time and you start to enter this space with there's quite a lot of active research and not a lot of agreement on what good to use why or why not antibody in very different results with very different system obviously the finely tuned art
systems are still the best performing ones right now and this is these people that spend all their time researching and pulling out little details but we don't have time for that right we're probably Python developers 1st and sentiment analysis researchers next so I'll talk a little bit about how these are compare Ferdinand talked about and the older approaches and the new approaches bold approaches to the bag-of-words breathing continues that words and
I would say yeah there's many positive words must be positive the new idea is
to start to use word embeddings and searchability uses word embeddings in deep learning models and by doing so I have a little bit of more complexity and I can say OK are these words bunch together maybe they're bunch together because they have a sentiment together the following the old was term frequency inverse document frequency which is basically a nice way of doing bag of words on and the new way is using these doctors that were these documents embedding models the old way is kind of the events if I say good job I mean good and I'm talking about jobs and they're all related and new way is talking about skip-grams even dependency modifications which is starting to look at the parts of speech and label the words of parts of speech because I mean very different things if I use the word as a noun vs. and always the supervised state-of-the-art systems and new ways kind of into a
semi-supervised approach so you curious about some of data lots and introduce an overseas flight because they reference the papers across the bottom but this is a good
summarization and this is a summarization of again sigh Muhammad's model which is done really well in competitions and these are all the different things that he's palling out of features and he's fine tuning them and analyzing them in and pulling out probably more some features that we don't know about that have been released you can see that there is a starting place if you're interested in tuning your own market but less
about what we can actually use of Python developers and what we can use easily I we can use words about conductor having going year-round use Jensen but then so Johnson is this great library and what it
gives you a is these word about sorry coming people know about to back and after that but the but so this this idea of representing a word document as a vector so everything is in a vector space and can be multidimensional it is multidimensional and what are you going to do is you can use Johnson & Johnson can load in these dimensional vectors and give you a
matrix or a vector representation of each of the words in the document and what isn't going to give you is a mathematical way to represent text right and this is kind of where text was held back for quite some time but is beginning of a really good mathematical way to represent text reuse encounters in words that's not an interesting so there's also the type so
there's 2 different ways to work and that of already talked really a lot about where to back and actually there's not enough attention given to glove gloves of Stanford 1 and has a slightly different representation and uses a weighted model the word
2 vectors basically does this word appear in the sentence then it means it's somewhat related to these other words in the sentence whereas glove is gonna say hades these words are closer therefore they might have more of approximately and in terms of the so depending on what word embedding model use our method you use you can get really different results so if you're comparing of word embeddings built on bag-of-words 5 which is like that of 5 n-gram like you know 5 words around count then we have a bag of words to and then
we have a dependency which again is this part of speech and what we see is if you take a look at the Florida line what we see is in in Florida with bag of words we get this mixture of other ways to talk about Florida cities in Florida but we to dependencies we get other states and this is maybe significant to whatever you're trying to do if you want to other entities likewise entities rather than just related words you might want to look at a dependency word embedding rather than a bag of words word meaning there is an online comparison tool where you can look at this and so I took a look at right time and it looks like there's some of them mixed together to use that was 5 to or the dependency you give slightly different results and the really neat thing about this is that in this as well they give you a little bit of a visualization of how those words embeddings and so if you want to stay around in and start to make decisions on where using for your word embeddings then I recommend I'll going to this so another thing
that I really mean here is is is recently released there was a paper versions of the bottom
where where things are not neutral they based on human language and they have the same by is that you would expect to see in humans and to come up with some of these factors I was using the google news google news vectorization of the 301 and that 1 is used by places all over the place and I found some pretty atrocious stuff and maybe not in the 1st 10 examples like for example here at the bottom there was some intense racism going on if I looked to at the top 30 words that within that space so I for 1 year to especially if you're using it for something like sentiment right which would we may have some words mixed in there that are very biased and or using you from natural language generation you need to be very aware that these biases exist 1 interesting thing however is the author of this people were able to find they could
find vectors showing the actual like misogyny so they specifically focused on the misogyny and and they're actually able to reverse and reverse direction vectors by finding these factors that were that was pushing things in a direction where like women have at lesser profession FIL let's
about what that actually means but here I have some great examples used on the news too much code to go over and talk and so I would recommend looking at them these are great examples for simple machine learning so this is giving 95 or simple linear approaches when you're SVM young and what it's doing is it's going to take the vector and it's going to try to whatever it whatever you using as labels and if you want to do it on a document approach to eventually taking a vector of the document model or the sentence model it depends on the of results and then you're passing it as unlabeled sentence a labeled sentences are actually a model and Jensen searches using Jensen's you can label we're sentences and then pass them directly in let's say a logistic regression or whatever it is but to use the symbol of perpetrators of the shallow machine another thing to be aware of when you're making these decisions is you cheating and that's right so you have to choose your grammar by grammar how do you want to represent these and when you're choosing word embeddings here generally making those decisions as using the words and what they found this is the research by Stanford which enable them to come up with the matrix factorization of recurrent neural tensor networks and what they found is that the longer you let your n-grams the more
mixed sentiment you have which is kind of logical when you think about it and I'm going to go on and on and on and have a big 1 sentence I'm probably going to express may be different viewpoints or mixed emotions and so if you're choosing the uni-grams and
bi-grams you can see your way over here and there some words that are obviously negative and positive but there's
polylogarithmic just and the further you go toward the exploring n-grams but more complex sentiment you get and the more people are able to say looking at
this phrase is definitely negative and the whole phrase negative so this is based on the Stanford sentiments of words word which is available in Java another
approach that people using for this that's going pretty well is the LSC and which is long short-term memory
networks and these are deep learning networks and what they're using it is they're generally using these were embedded of in this document embedding and they're pushing them into you people in models and the nice thing about last and why been doing some really amazing stuff for also other natural language processing is that it can forget that
so it can learn things and it can forget so because it has that ability what it can do is can change as it sees more or less of a representative model so language we know changes and trends come and go and because of
that this our approach has been really are powerful and you can use it with piano and tensor for I have some examples
there's also this ability to use convolutional neural networks and so initially the idea is that convolutional neural networks really great images because you can trump and label images and then you accordingly through pixels right on the improper but not before using it for sentences or for document and they're able to use the word embeddings for each of those and try and move through them and then
through the same principles of pulling and softmax you can then you
reach a conclusion and say OK according to my sigmoid softmax this is positive or negative or neutral
and again I because you need to take a look way it a little bit different when you're during in dealing with deep learning label text so what a lot of times you'll have a hundred document and manually
passing red and a lot of times you produce using simple arrays like that this is what is positive is going to point out that this thing is is going on right there's different things happening in this so I will keep an eye on the space in terms of how it actually performs because again right here we're having this everything is the positive or negative i be curious to see and I know
that some academics are working on more complex representations of the labeling and because of that this might become an interesting space and then here is where a lot of people and using this dual system maybe have a simple
classifier and I have a lexical and and have the time and feeding them both into a deep learning systems so maybe I have a constant Twitter leader of this and that and it into a simple classifier and I'm alignment deep learning system to learn from that classifier this is where a lot of people are starting to experience some big jumps without having to
do a lot of work so if you like me really think this might be the best approach other still
obviously some open research but it's an interesting theory this idea that I can use my simple art classifier and feed into my deep learning systems and learning system will eventually and become more
intelligent and have a much wider lexicon over time so depending on the system you might need a handle regularization preprocessing and this is like the classic
NLP problems is preprocessing essential it really depends who you ask there have been some really interesting research on sentiment analysis insane doing too much propre sorry doing too much preprocessing actually is the problem and what ends up happening is you get
rid of some of the way that we speak in insight into 1 another or the way that we use colloquialisms and by getting rid of this you actually removing part of the how we're expressing ourselves so what I would recommend it was generally the literature agreed upon his do minimal pre-processing like maybe just some simple lemmatization or some simple title and then try and then do more preprocessing and then try and see where you hit a nice accuracy in your mind and this is kind of where sensitive back in species are really making some cool in words as the sea is a start based in Berlin and they're doing essentially passing the cursor face in 1 our and what they're able to do is they they trained on our behalf the English language and the German-language corpus available trained on radio data and whether they are able to do it is the timing of each of the vectors so sensitive that is you have this had vector so when I talk about
Google the organization google the verb mean different things and I might feel differently about that in terms of sentiment so they're making some great in when I
pass this is using space when I pass in the vector for the winky face interface old-school imaging of then I get back some of these other things and I can see that I'm getting back interjections which is probably a lot more of how we use the when the and nodes in the way we express sentiment like I might say something that I that I don't mean like I had to computer and then use when face to say I'm just kidding so here we can see that the teacher it's a species picking up on the fact that I may actually be be mitigating what I'm saying and I be really curious to see how the gets incorporated into a lot of things to worry about the
hasn't incorporated most of them finally we get to kind of like what IBM Watson is doing and we're talking about emotions so there is this famous
diagram that you see almost every sentiment analysis talk and this this idea that we have all these emotions that are emotions can be made it down to just the 8 core emotions and everything else is just some combination of those equal I and no as psychologists so I can't comment on this but you know it makes so much sense so if you look in the sentiment analysis and you wanna start moving away from just positive negative and you want to move into motion Wednesday they are in charge of like customer service channels and you're trying to say that I really wanna know win the region loading are activated and because we need to work on that so source but I don't care so much about the positive emotions like fear about 1 of the
things I am what's in a hospital in a lot of that's available on the meex API energizing classifying the anger joy fear distrust and some it also has the social tendencies and the language
styles so 1 of the interesting things that I was asking some researchers as they were to traveling in hey could you may be saying he I'm angry but saying something positive that means other sarcastic or ironic and when was there or something that is that nobody's really doing get let's together the interesting to hear about so I will not leave the building something that does something similar to this and taking a look at how
can we start to talk about we have words saying 1 thing but our tone or maybe told over time says and by building these models by even necessary thing like you have 1 person has the lexical and you speak a certain way can we actually start to understand
sentiment at a much deeper levels we'll get into that as we talk about the next portion which is what is still unsolved which is something quite a lot so apologies
if you you came and and it but to identify promising refining um planets here here here is lost on sentiment analysis models on we going known how to tell when somebody's being funny and we'll want times can even tell whether being funny about so here I have this mixture this you know social media presence which is this mixture of images contact and how could we started you know we don't understand images that we don't understand time but there's very few that try to develop an
intention and this is a really cheap part because Howard talking online is this combination so negation is still moving problems and here's a uni-gram bi-grams study that says when you negate a uni-gram and bi-gram the redline
is the assumption that you simply create a negative value for that you about them and what we see here is the blue dots are the actual representations so when I say if it was not that bad then I don't actually mean
that it was a good break and so we can start to look at these and talk about it is not so simple to say just because you some meaning dates and emotion is some offices and even some really interesting studies about modifiers and how maybe they can point towards and mathematical representation of negation of the sentence so back to Stanford sentiment trees which are and the r and
c and networks and that there are actually have gotten pretty advanced at determining negation and overall sentiment of complex and and what they're doing here is they're creating smaller n-grams and the creating sentiment vectorization for that and then they have a series of logic rules and the logical state OK well this is more negative than this is positive so overall it is negative and they're doing some really
interesting research again and they have their own a Java jar that you can download and play around with it but it's still mainly trained on his movie reviews which we'll get to in a 2nd so we we still have this problem that constraint on news so this is quite a lot of things that are part of our we have mixed sentiment of complex emotions sarcasm irony and humor lost
right we also have speaker intention personality and there's been some interesting research in that in terms of trying to determine who you are as a speaker and if I study you over time can I determine who you are as a speaker and then can I get a better idea of what you mean we also have slain in like new in older phrases so part of William Hamilton say with social science was looking at sentiment over time and he was
pointing out that if you look at historical documents you can use any normal sentiment analysis because in 19 20 people were talking a lot differently about that and we have these new phrases to where you know cool AIS means something very different than just school right and so there we also just general NLP issue so I feel like I will be left out of your mind vision of the Pokémon reference in my slides so like the ones in here I which I really hope of people in America do anyway because I'm very fearful watching from Berlin about what may or may not happen in November so yeah that's that simple the model and
animation in the following way so but here's an idea of this is consciously exploring sentiment but no sentiment analysis would understand it so we're talking about cultural references we also need an ability to say hey by now whole command mean emphasis we also have
sticking with images and get right so now what did he is part of Twitter this is important right arm I would not understand the sentiment of this tree if I could also look at the image and then determine sentiment of the
image right and the really interesting thing is a lot of time right so the government to
somewhere and maybe somewhere there's like past that he or hashtagged mood or whatever it is and I can apply these and then I can pull them out pull pull out the visual representation and put it back into words and use those words to make a decision another really big problem with this is speed speed and memory so if you ever try to to make your own word embeddings like I hope that you have a server somewhere that it was a lot of memory and you have a lot of time on because sometimes
compiling these word embeddings clause is actually little bit faster a lot faster almost half the time the word to that if you're making a word embeddings it can take 14 15
hours and if you're using gloves it can end up leaving gigabytes and gigabytes of RAM so if you're making these embeddings yourself a committee to become a real problem and a performance bottleneck both for kind of like keep that feeding my model and also how do I know it in a
real-time situation so let us some folks in region have been studying is the ability to create dense word vectors and dense document and they created this idea of densify and where there are with us yet another if you can see over in the corner but they were able to get it from the English words for executions per 2nd to densify mall at 178 only a slight difference in accuracy so the more solutions we find for sentiment analysis more problems in house and depending on especially if you're working in different languages this can get even more difficult what some people are doing that is interesting is attending to do aggregation within lexicon so if I aggravate everybody speaking english in this particular city what they're finding is maybe there's more agreement within that community on sentiment
and words so that's something to look at and I also haven't seen a lot of studies on ensemble methods and I think that that's coming into vogue as low as character level embeddings
so character low level and that and so the idea that we can predict and uh to ourselves on a character level given a language so here is a graphic immediately tongue-in-cheek environment little small but this is like my own version of 100 meters psychic
learned flow charts which basically says that if you have a movie reviews or any reviews set and if you have anything else and don't have money and time and samples you're pretty much groups so far found in and out of so we have a solid sentiment analysis not really but if you have time or willing to work on building a lexicon on particularly for your model what they found is that there can be some really great way so I have found a lot of people and also a lot of companies that building their own corpora and their own lexical on and they're actually in the 90th percentile so 1 of the things you can reasonably any questions on the person's life you can read all the the papers and look and
include a reference to the policy and as long as the CNN and the simple and simple model architecture for
everyone we have time to a few questions that you'd like the first one you need to do if you wanna leave desperate and a coffee please do so very quietly and respect for the market area I think the stock splits and 1 if you had a chance to look at the school of sentiment analysis EPA they just released this week I I haven't had a chance to look at that and you have that bookmarked but I'll let you know when they do this the problem really is that a lot of these really depend on the lexicon and Google is likely using the word embeddings and there were embeddings have problems because of of of using the word embeddings for sentiment so I would doubt it's massively better than what's available from Stanford which is generally in the high 70 per cent of but I think there is and and if you have done and have you use it up that is a little there's so much happening and it's hard to keep up with everything and also prepare the top which thanks very much for the talk and I'm a linguist by training error so all of the examples you mentioned we realize that like the being able to accurately tell but the sentiment is really relies and doing a lot about the context so how how much do you think we can actually expect how accurate do you think we expect to be just based on the text because it seems that people are really really pouring a lot of time and effort and time into making the text models really great and this is wonderful but maybe there's just a kappa how well we can do only based on text yeah I agree with you know I did you a certain degree to that statement and basically what I'm curious about is more when we start to look at these dependency models research able to detect stance and phrases that if I'm using a series of phrases is directed at a particular object and particularly if they can do that in a multilingual situation in which people are starting to prove that you can do that that's gonna take sentiment analysis we to the next 1 is I can start to say this cluster of words applies to this object and it's obviously negative words obviously positive then I can start to make a little bit of difference and bounds and moving away from just this idea that somehow words around other words so the very 1st step and that more we incorporate the purses into our models the better I sentiment analysis has gotten time pressure object thank you so thank you amazing so customers that's really interesting and it seems to me is that like the year produces a question and this and sentiment analysis is really a problem of holding up and so and when the weather and it's using the helps part hiding hole with the hundred instances and give you the user modeling and sentiment analysis of Masons United does and I was really really impressed with some of the deeper learning models that have been coming out of this and the fact that they're actually having really really good accuracy with very little training so long as actually some great papers and I'm happy to reference them and post them there's some great trails of saying I doubt we decided they were only there for a month and we were able to nearly get to state of the art fine systems so really think that deep learning is likely the answer here and as deep learning involves I think sentiment analysis will too because I have we see what is done in terms of natural language generation and if it's making his interest in natural language generation and we're getting these networks are able to start to understand what we mean always trying to say then we can start to maybe predict how we feel on but the problem is the sentiment analysis field is still like very very much a close model was the place they're doing it owned by places where they can be source code matter mind and uh so I think that there is this pressure on it says like open source developers to try and keep up with these things are happening behind closed doors so we have time for further so think you join me in thanking change
Verbandstheorie
Computer
Gesetz <Physik>
Ereignishorizont
Computeranimation
Internetworking
Portscanner
Expertensystem
Softwareentwickler
Algorithmus
Whiteboard
Gruppenkeim
Softwareentwickler
Algorithmische Lerntheorie
Computeranimation
Expertensystem
Arithmetisches Mittel
Expertensystem
Virtuelle Maschine
Bit
Prozess <Physik>
Natürliche Zahl
Algorithmische Lerntheorie
Natürliche Sprache
Computeranimation
Analysis
Bit
Dokumentenserver
Ruhmasse
Überlagerung <Mathematik>
Natürliche Sprache
Term
Analysis
Computeranimation
Deskriptive Statistik
Fundamentalsatz der Algebra
Gewicht <Mathematik>
Tensor
Partikelsystem
Datenfluss
Algorithmische Lerntheorie
Analysis
Matrizenrechnung
Umsetzung <Informatik>
Bit
Datennetz
Vektorraum
Natürliche Sprache
Computeranimation
Differenzengleichung
Multiplikation
Generator <Informatik>
Forcing
Tensor
Programmbibliothek
Vorlesung/Konferenz
Rekursive Funktion
E-Mail
Analysis
Netzwerktopologie
Subtraktion
Informationsmodellierung
Datennetz
Forcing
Verweildauer
Vorlesung/Konferenz
E-Mail
Computeranimation
Übergang
Homepage
Subtraktion
Twitter <Softwareplattform>
Benutzerschnittstellenverwaltungssystem
Rechter Winkel
Digitale Photographie
Programmierumgebung
Analysis
Physikalische Theorie
Computeranimation
Analysis
Videospiel
Subtraktion
Siedepunkt
Sichtenkonzept
Physikalisches System
Quick-Sort
Analysis
Computeranimation
Informationsmodellierung
Datenfeld
Twitter <Softwareplattform>
Wort <Informatik>
Analysis
Softwaretest
Metropolitan area network
Informationsmodellierung
Normalvektor
Ausgleichsrechnung
Analysis
Speicherbereichsnetzwerk
Computeranimation
Standardabweichung
Informationsmodellierung
Facebook
Weg <Topologie>
Twitter <Softwareplattform>
Smiley
Hypermedia
Negative Zahl
Vorzeichen <Mathematik>
Smiley
Quellcode
Natürliche Sprache
Bildgebendes Verfahren
Computeranimation
Zusammengesetzte Verteilung
Zentrische Streckung
Negative Zahl
Informationsmodellierung
Flächeninhalt
Ortsoperator
Systemidentifikation
Vorzeichen <Mathematik>
Negative Zahl
Computeranimation
Data Encryption Standard
Umsetzung <Informatik>
Selbst organisierendes System
Wort <Informatik>
t-Test
Vorzeichen <Mathematik>
Negative Zahl
Term
Computeranimation
Kiosksystem
Spannweite <Stochastik>
Dienst <Informatik>
Informationsmodellierung
Rechter Winkel
Minimum
Wort <Informatik>
Boolesche Algebra
Bitrate
Zentrische Streckung
Negative Zahl
Ortsoperator
Wort <Informatik>
Selbstrepräsentation
Vorzeichen <Mathematik>
Bitrate
Bitrate
Computeranimation
Beobachtungsstudie
Subtraktion
Wort <Informatik>
Ruhmasse
Vorzeichen <Mathematik>
Computeranimation
Virtuelle Maschine
Negative Zahl
Informationsmodellierung
Stichprobenumfang
Wort <Informatik>
Bitrate
Leistungsbewertung
Zentrische Streckung
Statistik
Subtraktion
Speicherabzug
Natürliche Sprache
Computeranimation
Inklusion <Mathematik>
Subtraktion
Positive Definitheit
Extrempunkt
Natürliche Sprache
Code
Computeranimation
Metropolitan area network
Informationsmodellierung
Rechter Winkel
Wort <Informatik>
Speicher <Informatik>
Optimierung
Softwareentwickler
Gammafunktion
Analysis
Assoziativgesetz
Virtuelle Maschine
Rechter Winkel
Web-Designer
Physikalisches System
Optimierung
Code
Computeranimation
Touchscreen
Resultante
Subtraktion
Wort <Informatik>
Inverse
Aggregatzustand
Physikalisches System
Raum-Zeit
Analysis
Computeranimation
Systemprogrammierung
Virtuelle Maschine
Informationsmodellierung
Strukturgleichungsmodell
ATM
Modelltheorie
Analysis
Bit
Ortsoperator
Wort <Informatik>
Aggregatzustand
Physikalisches System
Analysis
Computeranimation
Eins
Systemprogrammierung
Strukturgleichungsmodell
ATM
Wort <Informatik>
Softwareentwickler
Modelltheorie
Analysis
Topologische Einbettung
Bit
Wort <Informatik>
Inverse
Sprachsynthese
Physikalisches System
Frequenz
Term
Komplex <Algebra>
Analysis
Ereignishorizont
Computeranimation
Informationsmodellierung
Prozess <Informatik>
Minimum
Mereologie
Wort <Informatik>
Informationsmodellierung
Uniforme Struktur
Programmbibliothek
Wort <Informatik>
Aggregatzustand
Extrempunkt
Softwareentwickler
Große Vereinheitlichung
Computeranimation
Gammafunktion
Matrizenrechnung
Softwareentwickler
Rechter Winkel
Wort <Informatik>
Datentyp
Selbstrepräsentation
Natürliche Zahl
Wort <Informatik>
Vektorraum
Computeranimation
Basisvektor
Implementierung
Resultante
Topologische Einbettung
Softwareentwickler
Wort <Informatik>
Selbstrepräsentation
Natürliche Zahl
Vektorraum
Zählen
Term
Computeranimation
Informationsmodellierung
Wort <Informatik>
Basisvektor
Implementierung
Resultante
Data Encryption Standard
Topologische Einbettung
Bit
Wort <Informatik>
Versionsverwaltung
Sprachsynthese
Paarvergleich
Computeranimation
Entscheidungstheorie
Zusammengesetzte Verteilung
Arithmetisches Mittel
Metropolitan area network
Rechter Winkel
Minimum
Mereologie
Visualisierung
Wort <Informatik>
Ext-Funktor
Gerade
Informationssystem
Aggregatzustand
Autorisierung
Vektorraum
Natürliche Sprache
Teilbarkeit
Raum-Zeit
Computeranimation
Richtung
Metropolitan area network
Generator <Informatik>
Rechter Winkel
Reverse Engineering
Minimum
Wort <Informatik>
Resultante
Matrizenrechnung
Subtraktion
Topologische Einbettung
Datennetz
Formale Grammatik
Symboltabelle
Vektorraum
Analysis
Code
Teilbarkeit
Computeranimation
Differenzengleichung
Entscheidungstheorie
Metropolitan area network
Virtuelle Maschine
Informationsmodellierung
Tensor
Lineare Regression
Mixed Reality
Wort <Informatik>
Reelle Zahl
Metropolitan area network
Festspeicher
Wort <Informatik>
Computeranimation
Metropolitan area network
Informationsmodellierung
Topologische Einbettung
Prozess <Physik>
Mathematik
Twitter <Softwareplattform>
Datennetz
Tensor
Selbstrepräsentation
Datenfluss
Natürliche Sprache
Computeranimation
Metropolitan area network
Topologische Einbettung
Pixel
Tensor
Tensor
Wort <Informatik>
Datenfluss
Bildgebendes Verfahren
Computeranimation
Neuronales Netz
Metropolitan area network
Bit
Extrempunkt
Term
Raum-Zeit
Computeranimation
Konstante
Physikalisches System
Wechselsprung
Twitter <Softwareplattform>
Selbstrepräsentation
Physikalisches System
Raum-Zeit
Computeranimation
Inklusion <Mathematik>
Physikalisches System
Präprozessor
Regulärer Graph
Ein-Ausgabe
Physikalisches System
Physikalische Theorie
Computeranimation
Analysis
Metropolitan area network
Präprozessor
Selbst organisierendes System
Wort <Informatik>
TVD-Verfahren
Mereologie
Datenmodell
Wort <Informatik>
Vektorraum
Term
Natürliche Sprache
ART-Netz
Computeranimation
Knotenmenge
Mereologie
Computer
Vektorraum
Bildgebendes Verfahren
ART-Netz
Raum-Zeit
Computeranimation
Schnittstelle
Diagramm
Dienst <Informatik>
Ortsoperator
Schaltnetz
Speicherabzug
Quellcode
Natürliche Sprache
Analysis
Computeranimation
Demo <Programm>
Analysis
Informationsmodellierung
Ortsoperator
Gebäude <Mathematik>
Wort <Informatik>
Analysis
Computeranimation
Demo <Programm>
Zusammengesetzte Verteilung
Informationsmodellierung
Hypermedia
Vorlesung/Konferenz
Identifizierbarkeit
Bildgebendes Verfahren
Computeranimation
Analysis
Übergang
Beobachtungsstudie
Skalarprodukt
Negative Zahl
Selbstrepräsentation
Mereologie
Schaltnetz
Chi-Quadrat-Verteilung
Computeranimation
Kartesische Koordinaten
Beobachtungsstudie
Mathematik
Datennetz
Selbstrepräsentation
Reihe
Vektorraum
Mathematische Logik
Komplex <Algebra>
Computeranimation
Office-Paket
Netzwerktopologie
Arithmetisches Mittel
Negative Zahl
Aggregatzustand
Nebenbedingung
Hash-Algorithmus
Applet
Mereologie
NP-hartes Problem
Term
Komplex <Algebra>
Computeranimation
Arithmetisches Mittel
Rechenschieber
Ebene
POKE
Rechter Winkel
Vererbungshierarchie
Mini-Disc
Normalvektor
Maschinelles Sehen
Computeranimation
Analysis
Eins
Topologische Einbettung
Wort <Informatik>
Selbstrepräsentation
Zeiger <Informatik>
Computeranimation
Entscheidungstheorie
Netzwerktopologie
Metropolitan area network
Spezialrechner
Twitter <Softwareplattform>
Rechter Winkel
Festspeicher
Mereologie
Server
Wort <Informatik>
Bildgebendes Verfahren
Gammafunktion
Subtraktion
Bit
Wort <Informatik>
Desintegration <Mathematik>
Sprachsynthese
Computeranimation
Metropolitan area network
Informationsmodellierung
Punkt
Große Vereinheitlichung
Demo <Programm>
Analysis
Binärdaten
Topologische Einbettung
Elektronische Publikation
Gruppe <Mathematik>
Mixed Reality
Übergang
Vektorraum
Zeiger <Informatik>
Natürliche Sprache
Menge
Soziale Software
Dichte <Physik>
Ein-Ausgabe
Wort <Informatik>
Beobachtungsstudie
Stereometrie
Videospiel
Topologische Einbettung
Gruppe <Mathematik>
Desintegration <Mathematik>
Wort <Informatik>
Programmablaufplan
Versionsverwaltung
Gruppenkeim
Mixed Reality
Natürliche Sprache
Analysis
Soziale Software
Computeranimation
Übergang
Metropolitan area network
Informationsmodellierung
Gruppe <Mathematik>
Ein-Ausgabe
Stichprobenumfang
Punkt
Wort <Informatik>
Analysis
Bit
Subtraktion
Wellenpaket
Term
Kappa-Koeffizient
Computeranimation
Gebundener Zustand
Informationsmodellierung
Softwareentwickler
Analysis
Topologische Einbettung
Befehl <Informatik>
Datennetz
Mathematik
Open Source
Reihe
Quellcode
Physikalisches System
Kontextbezogenes System
Natürliche Sprache
Objekt <Kategorie>
Generator <Informatik>
Druckverlauf
Minimalgrad
Datenfeld
Flächeninhalt
Mereologie
Wort <Informatik>
Computerarchitektur
Fehlermeldung
Aggregatzustand

Metadaten

Formale Metadaten

Titel I Hate You, NLP... ;)
Serientitel EuroPython 2016
Teil 99
Anzahl der Teile 169
Autor Jarmul, Katharine
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21170
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Katharine Jarmul - I Hate You, NLP... ;) In an era of almost-unlimited textual data, accurate sentiment analysis can be the key for determining if our products, services and communities are delighting or aggravating others. We'll take a look at the sentiment analysis landscape in Python: touching on simple libraries and approaches to try as well as more complex systems based on machine learning. ----- Overview ------------- This talk aims to introduce the audience to the wide array of tools available in Python focused on sentiment analysis. It will cover basic semantic mapping, emoticon mapping as well as some of the more recent developments in applying neural networks, machine learning and deep learning to natural language processing. Participants will also learn some of the pitfalls of the different approaches and see some hands-on code for sentiment analysis. Outline ----------- * NLP: then and now * Why Emotions Are Hard * Simple Analysis * TextBlob (& other available libraries) * Bag of Words * Naive Bayes * Complex Analysis * Preprocessing with word2vec * Metamind & RNLN * Optimus & CNN * TensorFlow * Watson * Live Demo * Q&A

Ähnliche Filme

Loading...