Claudia Guirao Fernández - Python, Data & Rock'n'Roll Approach to topics, evolution, correlations through the lyrics of some of the greatests rock bands of all times. We will talk about the different phases of this personal project, in which I approach to a passion through a scientific method. This is a project that combine different techniques: - web crawling - NoSQL - Natural Language Processing - Data visualization
welcome everyone we have that the blood through the ball present and what can use with computer fj thank you have the right to common many scholarly and I am working and said that the sentences that contain candidates you can find me what to do hostility down but it might where we are training so in case you're
interested in joining us this compendium there what i'm talking today about the personal operated he it is book where that assigns unbroken musique on Python going it is making about satisfy my curiosity I'm I love rock music and I was asking myself if I could do
something to know more about oppression because of knowledge some of you would guess that they probably have not been rocking in air with a friend someone will talk about on excellence is bigger than this week they stay married will develop a means not an and and the that on some music that and also engaged in these very you know of today the soon taken apart the because they explain my topic more so much better than I could so please review it and expected that so love me development development
stages where these 4 that the gradient and study its because of this processing and glass and and the mammalian on the future 1st 1 of these categories as an I said it is about brought about linux and and and on the other ways to get out thanks having both on have from an underweight fostered by this time is whether scrubbing all the dataset where it's going from mn these using request and beautiful so bad days you are going to make these
massively you should take a look to see that use for my convenience I've started all that data in model this is mean working hard for the stock in the that is
mucosa and and have to make to this scale what is that all those groups were selected consumer personal preferences in case you want to add on 0 1 about me and I will do it on their signals 1 valiant screaming meex cast copyright so minds would get it but I cannot believe said that maybe it is something that was about In this well this is more or less how it looks like I'm enrolled in intro where and have the group the arguments on the very the website and also the 4 weeks I have scrap more than 7 thousand samples from more than 50 groups this sound and bn are not used to work with what the it is only an the party group so we have the top 10 groups spring or the more more productive groups by the number of science I'm happy and then move with the frequency of was this that is the data processing we have to to do it
seems that the unemployed means there's war about information if they kind of have to add to their
progress towards this would remove the it is full of that learned and that kind of stuff and it had to do e-mail a very fast and for got someone also we have and use contractions and what they have done In the process thereof Linux can be studied in another direction it is called will groups sets was it ones and have not been announced that the that set but I have done is a time satisfy my produced at the the 1st thing I did was comparable my work with the lyrics of songs with that problem problems in case you don't know whether this is the direction of development Don 18 the sixties and it is it is our
combined with different views of Governments confidence in it but it doesn't contain semantic so friend FIL whether it is to take into account the frequencies in my linux
and their frequency in the Brown corpus goes on with the primitive form the get missing and not surprisingly those the 10 most local works OK who by the way they baby the last so everything's going again and those are the not running words however generality schools make I still want to play with them frequencies and if you consider the full corpus from it is if
you consider the food got was the most frequent word is love and once again in the and if you brought form there Markovitz groups maybe love is 1 of the most in queen is 1 of the most frequent lost that went in the the variance also invented going but not only 1 I question came to my mind in before after the analyses coverage in some instances what they did is taking up on the left and length of the unique works in each group considering will be by invited by the most reach lead exclude is popular numbers been thing under is 1 point ahead of AC DC whom the the same by the other will be embarrassed this song and considered that mean and you have that surprising rated isn't seen as an element of 93 words by some the rare by considering that will be under that these needs are almost with the sense it is not the strength of the hair more than the average words for some n on the floor we have registered because they don't have so much to leaks need 1
on the weekends it's only fine but they didn't actually use so much work needs and that's laws is rock about 6 and last it was a asked about their songs with of peace and 6 in its Linux we have 1 1 thousand
700 model is 1 thousand 700 songs it is almost the uh tend of the songs are talking about a lot of but if we asked the same about drops on the 24 on the 24 sometimes thing about 4 track they were dropped so I prefer change that free that sentence but is 1 piece of property sliding made these fine we already know that will be my bound only let us a few local Iraq has not indispensable problem what discipline and the honor so we have information that rock songs are always talking about the same the same words that are
not so much difference between the groups and so must phrase we can solve for wire I didn't surrender on try to make some more sophisticated sophisticated techniques and the requirement that so this is what they Spain this morning I actually to organize
understand the war prepared get it out of preparing and I said said we have only 1 100 thousand or more than 100 thousand words in our vocabulary with that is so small that problem area and what we need to is uh the time-frequency measures that men frequency of it it's in its classical definitions my former but but what we have to take income inequality interpretation is that these algorithms and are those
hours to get the most relevant words you know were from or groups in the sky we do that features direction with secular and I'm going so fast because if you already have this presentation and don't have so much time to explain the irony common we
already have a with this similarity which into the group and we are ready to class what I'm going to do at least k means I try with 3 classes but unfortunately what we have active the algorithm is something like that 3 loops on some works that are not so meaningful I cannot T does this class that I cannot that also and this 1 burn groups and not so if I did it by hand I didn't I didn't put this troops together but if we
take a look at the FIL next slide and more but you can see that the closer groups are the most senior on it
myself a little bit more sense while the under suspicion our near also the we and who you find on the lower and also in the same for all closed but we cannot see clearly any class the main conclusion is by these algorithms we cannot class or rules however a balance it is because as I said they are so similar once
again Indians render I try to me I didn't Iraq got clustering on when you probably also you cannot see any class they knew groups on was sad about once again I would try another the need it these LDA for those who are not into and I mean I will make he she is being mentioned that if you take a look at these slides and sees
it is obvious that the 1st and the 2nd 1 octagon about food that under 4 about parents or are neurons and the last 1 is some each of there topics this is what LDA that's I did the same with my son it's not broken by from 1 from what again I did 10 topics and when they get looks like that once again this notion much meaning that it can be used and I don't have an answer about topics Looking at these least so I move forward and try and more complicated than need for it to be I'm not I don't have so much time to explain it by on my approximation where they after the area is the worst closer to 1 in the particles so long all these words close to know making they of
to low appoint enough and if we tried the same with writing we
have every yeah and site for 49 while this is going to be better than the previously and by the way as I promised in my so my where it's going to talk about David Bowie evolution know everyone
laughter and the plants and so on as that what in the is taking into account the domain of the most frequent words in that discography of they've always grouping by on
sorted by the uh their release of the date of that released energy taken over there we have blue dots are sorry that blue that that's our lot of words and you can see indeed and these are the ones they are not so frequent maybe Bowie worse on the outside of and and this is they don't so that if you don't speak so much about it but in this 1 never let me down then he repeated it is worth more than 25 times as a bonus track have between evolution and require actually laughter I did the same but we use for wars a lot of time 1 unknown and currently it is always about love as that in the 1st 2 I want of you can see it's not the have he had already have it online but in the queen and going to it not so much would be that but but in the middle of the score the that a bit more than 80 times in and out of room this summer what the what's and I did this for it in and 1 it by and I would like to what
I would like to do next is it massively all uh scrap more groups are more there's more standards and try to make some classes that actually works include some patterns are morally object there will be made on the hybrid recommender system so of course for their people will
get this thing don't have i need some something so feel free to go with me the novel the novel will be at the the online you have on there PDF of this presentation is already online because could take an answer thank you for your attention and keep drop the of I cannot these articles think you grow them so we have
time for questions the like to into free until the was pretty amazing and I was wondering like but was the 1 who work together OK so do seemed and maybe there's some on American English noun which affecting some words may be our below says because the dialect could be thinking of problem actually there are more read this book that I make them in Miami course cost maybe that's that this effect in mind so they're but probably the in the same it works well so thank you thank you other questions high-security the use of the rules for world along most times for similar isn't it because you only 2 groups you like the idea from the pro problem that's what I think 15 groups if not of the you know I think it is quite clear but probably the last thing is associated with that SUSY they're like the same groups have formed here that for the history so like many of the other about a piece on forever this effect in if I take some more nowadays groups providing change thank you I'm looking at all 1 error anything out of the room in which was so Mary and he has to be questions about the things like a critique in a lot of more lyrics and trying to attack minorities some of them so you can hear the algorithm will like finding more clusters of writing more relationships between what's and stuff like that what if the if I morning you according to persuade you tax money manually OK so the sums of all local a friendship and what to freely practice name probably will head if we to make on how to use another classifier not agree that they're not using because I don't have time to actually if perhaps tracks of something that thank you and thank you for the salt of drugs have you thought about to euphemisms I'm thinking about drugs and sex are things we don't normally talk and like in songs and don't express in those words I'm not surprised to have found a high count of the word drugs on but I'm non-native speakers but I'm sure this suggestions of like how we talk about drugs and I have to tell them dead actually I I didn't want to put especially so much work in that field and dropped probably they are produced and explicitly the war tracks on some of the key economies of standards so maybe very whose insisting on but I'm not communities of different native so I can add enough power for their in the way thank you for your attention Thank you


