Packet Hacking Village - Personalized Wordlists With NLP by Analyzing Tweets
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 335 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/48745 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
SyntaxbaumComputerspielNatürliche SpracheNatürliche ZahlGüte der AnpassungProzess <Informatik>Generator <Informatik>Computeranimation
00:21
ComputersicherheitFontPasswortMinkowski-MetrikDemo <Programm>InformationProzess <Informatik>Coxeter-GruppeComputersicherheitMailing-ListeWeb SiteNeuroinformatikTwitter <Softwareplattform>Wort <Informatik>PasswortMechanismus-Design-TheorieComputeranimation
01:00
PasswortWärmeleitfähigkeitKategorie <Mathematik>ForcingInternetworkingPasswortWeb SiteDifferenteGruppenoperationComputeranimation
01:36
Hash-AlgorithmusChiffrierungHardwareAlgorithmusPasswortAlgorithmusHardwareTypentheorieVariableHash-AlgorithmusPasswortSchlüsselverwaltungSprachsyntheseSoftwaretestGruppenoperationKomplex <Algebra>Computeranimation
02:15
p-BlockPasswortWeb SiteHardwareMail ServerSchaltnetzEinfach zusammenhängender RaumKorrelationStellenringFamilie <Mathematik>Exogene VariableMetropolitan area networkDifferenteDienst <Informatik>HardwareVariableKomplex <Algebra>EinflussgrößeServerNetzadressePasswortOrtsoperatorWeb-ApplikationComputeranimation
02:57
MenütechnikSchaltnetzMAPAuswahlaxiomForcingKomplex <Algebra>ZahlenbereichKartesische KoordinatenPasswortSchlussregelComputeranimation
03:30
Ausdruck <Logik>Hash-AlgorithmusKraftMinkowski-MetrikMenütechnikAusdruck <Logik>Total <Mathematik>AuswahlaxiomForcingÄhnlichkeitsgeometrieVerdeckungsrechnungRandwertMusterspracheSchaltnetzZahlenbereichPasswortComputeranimation
04:22
Ausdruck <Logik>GleitkommarechnungForcingExogene VariableWeb-ApplikationBus <Informatik>COMPlastikkarteSpielkonsolePasswortVerdeckungsrechnungComputeranimation
05:10
Fatou-MengeHyperbelverfahrenEinfügungsdämpfungAutorisierungLesezeichen <Internet>Computeranimation
05:42
PasswortWärmeleitfähigkeitGrundraumAbstimmung <Frequenz>Wort <Informatik>PasswortComputeranimation
06:09
Minkowski-MetrikWort <Informatik>PasswortMySpaceMailing-ListeVerdeckungsrechnungMinkowski-MetrikComputeranimation
06:32
Folge <Mathematik>Folge <Mathematik>Formale SpracheArithmetisches MittelRechenwerkWort <Informatik>Kontextbezogenes SystemApache ForrestVollständigkeitSchnittmengeComputeranimation
07:02
RFIDMinkowski-MetrikGewicht <Ausgleichsrechnung>Formale SpracheSprachsyntheseProgrammbibliothekGrundraumMereologieResultanteTabelleTermProzess <Informatik>Abstimmung <Frequenz>HilfesystemDifferenteWort <Informatik>Mailing-ListeComputeranimation
07:49
PasswortAlgorithmusQuellcodeAlgorithmusInformationPhysikalische TheorieFlächeninhaltWort <Informatik>QuellcodePasswortComputerspielInnerer PunktComputeranimation
08:49
QuellcodeGenerator <Informatik>Inverser LimesFlächeninhaltQuellcodeAlgorithmusWort <Informatik>Twitter <Softwareplattform>Computeranimation
09:16
BenutzerfreundlichkeitFatou-MengeWort <Informatik>Passwortsinc-FunktionTwitter <Softwareplattform>Computeranimation
09:56
Fatou-MengeEigentliche AbbildungSpezialrechnerProgrammbibliothekLoopFormale SemantikBildschirmmaskeAbstimmung <Frequenz>Eigentliche AbbildungWort <Informatik>VerkehrsinformationInnerer PunktComputeranimation
10:17
Wort <Informatik>ÄhnlichkeitsgeometrieAlgorithmusAlgorithmusSchaltnetzFormale SemantikÄhnlichkeitsgeometrieWort <Informatik>PasswortComputerunterstützte ÜbersetzungPhysikalisches SystemPunktAbstimmung <Frequenz>URLTwitter <Softwareplattform>Computeranimation
10:56
Fatou-MengeWort <Informatik>Landau-TheorieSyntaktische AnalyseFormale SemantikWort <Informatik>PasswortURLComputeranimation
11:20
SchaltnetzKonstanteFamilie <Mathematik>PasswortWort <Informatik>Fatou-MengeComputeranimation
11:51
Twitter <Softwareplattform>ProgrammbibliothekAlgorithmusFormale SpracheProgrammbibliothekGarbentheorieSystemaufrufProzess <Informatik>Mailing-ListeElement <Gruppentheorie>Eigentliche AbbildungWort <Informatik>Twitter <Softwareplattform>Computeranimation
12:23
Minkowski-MetrikSchlussregelATMTwitter <Softwareplattform>ZeichenketteGenerator <Informatik>Konfiguration <Informatik>Physikalisches SystemRechenwerkResultanteVirtuelle MaschineDatensichtgerätAutomatische HandlungsplanungParametersystemWurzel <Mathematik>ATMMailing-ListeQuellcodeExtreme programmingMultiplikationsoperatorMinkowski-MetrikRechter WinkelTwitter <Softwareplattform>Radikal <Mathematik>Wort <Informatik>Web-SeiteService providerRegulärer Ausdruck <Textverarbeitung>Elektronische PublikationPasswortVerdeckungsrechnungURLComputeranimation
14:29
NormalvektorLokales MinimumEinfach zusammenhängender RaumInverser LimesInternetworkingBitRechter WinkelTwitter <Softwareplattform>Computeranimation
15:33
GoogolE-MailLoginStatistikGraphische BenutzeroberflächeGrundsätze ordnungsmäßiger DatenverarbeitungGammafunktionDämon <Informatik>Jensen-MaßWendepunktDynamisches RAMWurm <Informatik>Lokales MinimumComputeranimation
15:52
MagnettrommelspeicherTurm von HanoiOffene MengeComputerspielComputeranimation
16:45
BitrateGraphische BenutzeroberflächeCOMComputeranimation
17:11
MenütechnikMomentenproblemInternetworkingTwitter <Softwareplattform>BitEinfach zusammenhängender RaumComputeranimation
18:31
InformationDivergente ReiheGeradeZahlenbereichRelativitätstheorieWort <Informatik>PasswortURLAggregatzustandRechenwerkAbstimmung <Frequenz>Computeranimation
19:21
Divergente ReiheGeradeZahlenbereichInformationBenutzeroberflächeLokales MinimumSyntaktische AnalyseEinfach zusammenhängender RaumData MiningPasswortMailing-ListeWort <Informatik>ComputeranimationProgramm/Quellcode
19:46
GeradeZahlenbereichInformationKonvexe HülleHill-DifferentialgleichungLokales MinimumPasswortRechter WinkelAbstimmung <Frequenz>Programm/QuellcodeComputeranimation
20:10
MarketinginformationssystemInformationGeradeZahlenbereichDatenverwaltungFormale SpracheImplementierungStatistikProgrammbibliothekGrundraumGüte der AnpassungRandomisierungWort <Informatik>Mailing-ListePasswortsinc-FunktionMultiplikationsoperatorIdentitätsverwaltungMySpaceSchießverfahrenInverser LimesKomplex <Algebra>MereologieProjektive EbeneRechenwerkSpeicherabzugVerkehrsinformationWeb SiteDickeMinkowski-MetrikProgramm/QuellcodeComputeranimation
24:42
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:00
All right. Good afternoon, everyone. 3 p.m. It is my pleasure to introduce Utkushan. Thank you. Hello, everyone. Welcome to my – by the way, can you hear my voice in the backside? Is it okay? Great. Welcome to my talk named Generating Personalized Worklist with Natural Language Processing by Analyzing Twits. Let me introduce myself first. I am Utkushan.
00:25
I'm usually doing researches and writing tools which are about offensive side of security. I'm currently working for Twitter security. You can find detailed information on my website and you can follow me on Twitter if you are interested in. In this presentation,
00:41
I will start by talking about password guessing attacks. Then, I will explain why reducing the word this size is crucial for password guessing. After that, we will see how can we generate worklists based on target's personal topics. Finally, I will demonstrate the Rodillo tool which does that job. Uh, passwords are our main security mechanism for the digital
01:05
accounts since the beginning of the internet. Because of that, passwords are, uh, main targets for the attackers. Of course, there are lots of different way to gather a target's password. For example, the attacker can prepare a phishing website to trick targets into
01:21
entering their credentials to a rogue website. Or an attacker can, uh, conduct a password guessing attack through brute forcing. Password guessing attacks are usually described in 2 main categories. They are offline and online attacks. Offline password guessing attacks are usually conducted against, uh, captured hashes or encryption keys. For example, for
01:45
hash cracking, attacker calculates some password hash and compares with the target hash. Uh, there are 2 variables which affects the success other than the password complexity. They are hardware resources and the type of phishing algorithm. More hardware resources are
02:02
providing a speed, therefore, increases the, uh, chance of success. Uh, the other thing is, of course, phishing algorithm. For example, cracking an MD5 hash will be faster than cracking a bcrypt one. However, online password guessing attack is something different. In online password guessing attack, attacker sends username password
02:23
combination to a service like HTTP, SSH, et cetera, and tries to identify the co- correct combination by checking the response from those services. There are lots of different variables which affects the chance of success. For example, our connection speed, server's bandwidth, also server can block our IP address or lock targets, accounts, et
02:46
cetera. There are lots of counter measurements. Uh, online attacks and hardware resources has no positive co- correlation, therefore, it's much, much harder than the off- offline attacks. Uh, most of the web applications have password com- complexity
03:02
rules where user have to use at this one number, uh, upper level characters and o- of course, special character. Therefore, reducing the brute force size, brute force pull to an accept- to acceptable size is very important for the attackers. Instead of brute forcing all combinations, actually, we can make some smarter choices. For example, we
03:23
can try the most common passwords. If it doesn't work, we can, uh, make some smarter combinations. To reduce the combination pull, hashcat team created a technique called mask attack. The main idea is people are choosing their passwords with similar
03:40
patterns. They are not pure gibberish data. Therefore, we can define a pattern which is called mask, and we can do brute force with its boundaries. Uh, let's say our target password is ULA 1984. With the brute force approach, we need to brute force all nine characters with all of our charsets. The formula is nine over 62, which is a very,
04:02
very big number. For the mask attack, since we define the pattern, we don't need to brute force everything. For example, the first characters can be only, uh, upper letter. There are 29 choices. The to- total amount of combinations are way smaller than the pure brute force technique. It's around, uh, 200 billions. But of course, it's still
04:24
too much for the online attacks. We usually can't send 200 billions request to web application and, uh, check its response. But of course, pure brute force attack and mask attacks are not the only way for password guessing. There is also science fiction method based on, uh, smart guessing. For example, on Sherlock's Hound of
04:45
Baskerville episode, Sherlock Holmes was checking the personal stuff of the target, and were guessing the password in just one shot. So, we can talk about the third method now. It seems very unrealistic, but in theory, it's possible to find ULA 1984
05:04
password in less than three shots. We just need to have some Sherlock Holmes skills. Let's assume that target is posted at TVT and we are a Sherlock Holmes candidate. We can make following deductions. Uh, target's daughter's name is Yulia, and the target loves
05:25
her so much since he or she TVTs about her. And the target's favorite author is George Orwell, whose most popular book is 1984. So, combine them together, the answer is Yulia 1984. Uh, is this that simple? Uh, we will come back to this
05:41
later. So, according to some researchers, researchers which are conducted by Carnegie Mellon University, most of the people are choosing words from their hobbies, sports, movies, et cetera, for their passwords. This means that most of the user passwords are contains meaningful words, and they are related with the
06:02
passwords owner. So, in theory, we can become a Sherlock Holmes on password guessing. We can actually prove that people are mostly using meaningful words for their passwords. When we analyze leaked Myspace and Ashley Medicine word, password lists, and generate most used masks, uh, we, we can see that almost 95
06:23
percent of those passwords are formed by sequential alphabetic characters. So, there is a high problem that these are meaningful words. Uh, let's try to prove that they are actually contains meaningful words. So, what is a meaningful word? We can say that a letter sequence is a meaningful English word if it's
06:44
listed in a English lexicon. Uh, for those who are not familiar with the NLP context, lexicon means that, uh, the complete set of, uh, meaningful units in a language. Uh, we used, uh, Stanford's wordnet lexicon for, for this job. Our
07:04
analyze showed that almost 40 percent of those word, word lists are included in wordnet lexicon, hence, they are meaningful English words. Now, we need to apply post tagging, which means part of speech tagging, to these words to understand what kind of words they are. Uh, post tagging is a process to find a
07:22
words class. There are eight different part of speeches in English language. Uh, for example, I learned them in English lessons. For example, for those who are not familiar with those, for example, there are nouns like table, chair, uh, there are verbs like eating, going, et cetera. So, we analyze those words
07:41
with the help of Python's NLTK library, and results show that, uh, most of these words are singular noun. So, let's recap what kind of facts that we have so far. First, our analyses show that people are using meaningful words for their passwords. And the second, uh, from the research conducted by various of
08:02
universities, we know that passwords are mostly based on personal topics. So, Charlie Combs' methods is legit in theory. But can it be done in practice? What Charlie Combs did was analyzing personal topics of the target. Then, he combined them in his mind and came up with a candidate password. Uh, but how
08:25
can we do it in real life? To achieve this, we need information about the target and algorithm which extracts good password candidates from that information. We need a data source just like Charlie Combs said. Uh, we need a
08:41
source where we can find hobbies and other interest areas of the targets, and actually, we all know that kind of source. Uh, it's Twitter, of course. In Twitter, people are tend to write posts about their hobbies and other interest area mostly. Since there's a character limitation for each of it, users need to write things more focused, and this makes things easier for us
09:05
because we don't need to deal with large gibberish texts. So, let's use Twitter as a data source and try to build our personalized, uh, word disgenerational algorithm. First of all, we need to gather tweets from target via Twitter's API. Then, we need to get rid of unnecessary data. But how do
09:27
we know if a, uh, word is necessary or unnecessary? Uh, since we're trying to find personalized things, uh, we can remove stop words since everybody are using them. For example, we can remove things like I, my, she, etc. from
09:41
the tweets. Secondly, as you recall, our research showed that people are mostly using nouns for their passwords. Uh, we can remove any verbs. For example, as you can remember from the previous slides, leaked word
10:00
dis-word mostly, uh, formed by nouns. So, we can apply post tagging to rest of the words to detect most used nouns and proper nouns. For this to it, nouns are, uh, doubter and outer. Uh, proper nouns are George, Orville, and Julia. Sometimes, users are combining two meaningful
10:21
words for their passwords. Uh, but of course, they are not like two random words. They have a kind of semantic similarity. We also need to combine similar words. Uh, we used WordNet's pet similarity algorithm for detecting, uh, semantic similarity of the words which are extracted from the tweets. Pet similarity algorithm gives us a score between
10:43
zero and one, and we are combining two words if their score is greater than 0.12. In the example shown in the slide, we will combine cat tag and flame thrower with each other. Researchers have also found that some of the most used semantic themes in passwords are
11:02
locations and years. For this job, we will send the extracted nouns to Wikipedia and, uh, parse related years and steals from them. In this example to it, we sent George Orville to Wikipedia and re- returned us, uh, words like London 1984, et cetera.
11:21
So, the last step is combining all of our data. From the example to it, we got George Orville words, we sent it to Wikipedia, and it returned us 1984 words. Uh, beyond that, we also had a Julia as a proper noun. So, when we combine all of our data, we will have the correct password, Julia in 1984, in- in somewhere. So, instead of millions of
11:42
combinations, we could correct this password in, uh, less than two minutes or less steps. So, uh, it's just like Sherlock Holmes did. So, to automatize all processes, I coded a tool named Rodiola. It's written in Python and mostly based on NLTK library. It follows
12:01
the algorithm that I described in the previous section. With a given Twitter handle, uh, it can automatically can compile a personalized word list with elements such as nouns, uh, proper nouns, uh, stees, years related to them. Currently, it only supports English language, but I will finish the Turkish and German
12:21
support- support soon. You can use Rodiola in three- three different modes. In the base mode, Rodiola takes a Twitter handle as an argument and generates personalized word lists without any fancy stuff. For example, when you give Elon Musk username, it will generate, uh, passwords like Tesla, cars, SpaceX, boring
12:42
machine 2018, et cetera. In the regex mode, a user can generate additional strings with the provided regex. These generated strings will be appended prefix or suffix to the words. Uh, for this mode, Rodiola takes a regex value as an argument.
13:00
Regex value, uh, defines the string placement. With the regex shown in the slide, it will generate passwords like Tesla root zero one, Tesla root zero two, and it goes like this. In the mask mode, user can provide hashcat style mask values for the- for this generation. So in this example, we used mask in which first character is upper letter, uh,
13:22
second one is lower letter, and the third one is upper letter again. If you don't have any Twitter API or you want to use another data source, you can bring your own data. Rodiola provides you two different options. In the first one, you can provide a text file to Rodiola which contains lots of harvested texts. In the second one, you
13:43
can provide the URL list and Rodiola harvests text from this URL automatically and will build the personalized worklist right away. Uh, you can download and try the tool our- from our GitHub page by yourself. Okay, so
14:02
demo time. To make this demo, I will get a Twitter handle from the audience, a volunteer, and we'll pass it to Rodiola and we will check its results all together. So is there anyone who are actively using Twitter in English and willing to share the username with me? Yes, sir. But
14:21
let me open up my terminal first. Okay, I'm all- all ears. A, B, A, V, E, E, K, yeah, D, E, E, alright.
15:00
S. This one. Okay. Let's see what we will have got. Now, it's downloads to bits from that Twitter handle. Uh, probably it will download like 2,500 because it's, uh, Twitter's API limitation. I hope I have internet
15:25
connection right now. Mmm. Weird. Alright. Defcon WiFi is
15:48
not working. Are there anyone who are using Defcon WiFi
16:00
right now? Is it not working? Sorry? The WiFi. Yeah. Weird. Actually, it was working for like 3 days and it just stopped in my presentation. Anyway, but I- I want- it can
16:23
stop me, you know? I will use my hotspot. Sorry? Defcon open. Nah, it's rock WiFi, right? Alright. Come on. Yep.
16:56
Now we have a working mic, WiFi. Come on, man. No,
17:14
internet? Yeah. Okay. Great. I hope we can find your
17:21
password then. Yeah, you have lots of tweets, sir.
17:48
Sorry? Uh, actually, it can't- it can't do that at this moment. It just downloads like 2,000 tweets. Actually, it
18:04
downloads faster, but because of my internet connection, it's a little bit slow. Now, okay. Now it analyzes the download- download the tweets and we'll- we'll get like most used nouns, proper nouns, and we will see them in a second. Okay. So, most used nouns are wow, giants, let's
18:40
hope, let's go giants. So is there like a sport team or
18:44
something? Yeah. Uh, United, Manchester United, NFL. So, does this make sense to you? Yes. Great. Now, okay. Now,
19:03
it sent those words to Wikipedia and past related locations and years from them. And let's see what kind of passwords we have. So, uh, I don't know. Actually, now,
19:28
I don't know the reason, but it couldn't parse related years, but it should have, probably because of my connection or something. So, sir, is your password listed
19:40
in this word list? Could you please take a look again? No? All right. So, of course, it will be a miracle to create a password with this method to a Defcon attendee. All right. So let's turn back to the slide again.
20:02
Right. Anyway, I can stop the mirror. Anyway, so as a conclusion, since people are tend to use words from their hobbies, movies, sports, et cetera, for their passwords, uh, users should not be- should not use these
20:22
words for their passwords, uh, since we can create some kind of accurate word list with, uh, given data. Beyond that, any actor that has much more data about that person will have an ability to create, uh, more accurate word lists for the target people. Uh, so people should avoid using those kind of times in their
20:44
passwords and should use password manager with random passwords. So, uh, that's all I have today. Thank you coming for listening. So, are there any questions?
21:01
Yes, please, sir. What kind of lists? Yeah, yeah, of course. They are leaked MySpace and Ashley Medicine word lists. So when you analyze them with the pack
21:22
tool, you can find it on GitHub, you will see the same statistics. So you- yep, please. Yeah, currently you can't do that, but in future I will do that. Because since it's some kind of experimental project,
21:42
I didn't want to limit, uh, you know, password length. Uh, yep, please. Yeah, to be honest, zero. Yeah, of
22:12
course, I couldn't create any, like, for example, my friends are using mostly, uh, you know, uh, complex passwords, et cetera, and I couldn't create any of
22:21
these passwords. But when you try it on your, for example, mother or, you know, some older person or some not ungeek person, it may work. So, but I need
22:42
to know the, uh, real identity of the target. So I need to get, like, their hobbies, their something, and, like, cross check with all of them. But if only I know the correct, uh, real identity of the MySpace work lists, probably your method will do work. But
23:02
actually it can be done in Ashley Medicine work lists. Probably the real identity is already revealed. Maybe I can check them, yeah. Please. Uh,
23:21
good idea, actually, I haven't thought about that. Yeah, good idea. So there's, I, I, as far as I can see, there's no question. Thank you. Yeah, there's a question. Uh, Py, I used Python's NLTK library mostly. Sorry? Uh, I, I only know Python
23:43
support. Uh, probably it only has Python support. So wasn't that your answer? Only, only Python. Uh, only English, only English, yeah. Actually, I think
24:00
it's kind of supports French or Spanish, but they are not strong as English one. So you need, if you are working on a different language, you need to find some, uh, local implementation for the NLP stuff. Um, actually, each language has
24:21
own websites. So, for example, for Turkish language, there's, uh, I can download some items from university websites, but I didn't research about the other language. But there's no single place that you can download everything. Okay, so that's all. Thank you.