Password Village - Practical PCFG Password Cracking
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 374 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/51599 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Asynchronous Transfer ModeNumerical digitAlpha (investment)PasswordExecution unitArtificial neural networkOrder (biology)Bookmark (World Wide Web)BitElectronic mailing listMachine learningNeuroinformatikStatement (computer science)Process (computing)Virtual machineSlide ruleEndliche ModelltheorieMereologyProjective planeHacker (term)Real numberSoftwareSerial portMassSoftware frameworkPasswordMultiplication signInformationWorkstation <Musikinstrument>CountingRule of inferenceRight angleComplex (psychology)NumberConnectivity (graph theory)Wave packetQuicksortType theoryWhiteboardDifferent (Kate Ryan album)Keyboard shortcutOpen setInformation securityFrame problemAuditory maskingPlastikkarteDisk read-and-write headFlagRippingContext-free grammarFormal languageCentralizer and normalizer2 (number)Computer animation
06:34
Asynchronous Transfer ModeFlynn's taxonomyExecution unitComplex (psychology)MathematicsLogicPasswordStrategy gameStreaming mediaPasswordRepresentation (politics)NumberOrder (biology)Game theoryBitGraph (mathematics)Multiplication signReal numberVideo gameScaling (geometry)MereologyStudent's t-testVector potentialComputer hardwareObservational studyType theoryFrame problemHash functionGraph (mathematics)LogicExpert systemElectronic mailing listUniverse (mathematics)Graphics processing unitCore dumpComputer animation
10:36
PasswordAsynchronous Transfer ModeModule (mathematics)BefehlsprozessorDefault (computer science)SubsetAxiom of choiceRippingProcess (computing)Rule of inferenceLogicMenu (computing)Numerical digitMultiplicationDrum memoryWordPasswordBefehlsprozessorDefault (computer science)SubsetRule of inferenceModule (mathematics)Set (mathematics)Wave packetWordType theoryDifferent (Kate Ryan album)BitData dictionaryoutputHash functionRippingInternetworkingCuboidFunction (mathematics)Semiconductor memoryThread (computing)Process (computing)Computer hardwareEntire functionOperating systemOrder (biology)ExistenceMereologyCodierung <Programmierung>Software repositoryDigitizingElectronic mailing listMathematical optimizationStandard deviationLine (geometry)Computer-assisted translationLogicElectric generatorComputer programmingMultiplication signMiniDiscWeb 2.0Goodness of fitFormal grammarTunisCASE <Informatik>Virtual machineData structureSoftware testingAlgorithmRemote procedure callPhysical systemComputer configurationPower (physics)Web applicationNumberTotal S.A.IdentifiabilityWritingKeyboard shortcutStatistics2 (number)Software crackingComputer animation
19:41
ForceRadiusRule of inferenceLogicAsynchronous Transfer ModeMultiplicationWordFunction (mathematics)Limit (category theory)Witt algebra1 (number)Menu (computing)Electronic visual displayExecution unitConvex hullSummierbarkeitInterior (topology)Twin primeElectric currentData miningMaxima and minimaMIDIChi-squared distributionInclusion mapInformation managementRadio-frequency identificationOnline helpGraphical user interfaceComputer wormHill differential equationMaizeFlagLink (knot theory)Wechselseitige InformationMotion blurHidden Markov modelHash functionEmailGraphic designEmpennageChemical equationRevision controlFrequencyFormal languageoutputOrder (biology)Endliche ModelltheorieField (computer science)TouchscreenRule of inferenceFunction (mathematics)Video gameBitDigitizingFormal grammarWordType theoryWave packetCombinational logicSet (mathematics)TrailEnumerated typeReal numberLine (geometry)Traffic reportingDifferent (Kate Ryan album)CodeRun time (program lifecycle phase)AdditionHash functionData dictionaryPattern languagePasswordMusical ensembleCASE <Informatik>Proper mapStress (mechanics)Forcing (mathematics)ThumbnailComputer programmingDynamical systemInformationWeb crawlerComplete metric spaceElectronic visual displayNumberRippingStatisticsMetropolitan area networkElectronic mailing listMarkov chainMatching (graph theory)Formal languageChemical equationMereologyPosition operatorLatent heatArithmetic progressionBefehlsprozessorProbability distributionAlgorithmKey (cryptography)Ocean currentComputer animation
28:43
Markov chainLevel (video gaming)CalculationSpacetimeRevision controlDisintegrationAsynchronous Transfer ModeFlagPasswordHash functionRippingCASE <Informatik>Rule of inferenceSet (mathematics)WordMultiplicationProof theoryWitt algebraWebsiteSystem identificationWave packetLatent heatCharacteristic polynomialSubsetBitTouchscreenMultiplication signCASE <Informatik>Type theoryoutputStrategy gameStandard deviationPasswordData dictionaryOrder (biology)Hash functionFormal grammarWebsiteWave packetGame theoryDifferent (Kate Ryan album)FlagRootQuicksortRule of inferenceForcing (mathematics)Physical systemWordThread (computing)Electric generatorLatent heatAlgorithmSummenregelBuffer overflowCodeFormal languageBuildingRevision controlElectronic mailing listCore dumpInformationLimit (category theory)TrailOpen setBounded variationDataflowData storage deviceAsynchronous Transfer ModeDependent and independent variablesBuffer solutionComputer programmingRippingVirtual machineNeuroinformatikSoftware repositoryComputer animation
37:25
System identificationWave packetRule of inferenceLatent heatPasswordSubsetCharacteristic polynomialAsynchronous Transfer ModeWebsiteCore dumpPRINCE2Formal languageSubject indexingElectronic mailing listGoogolWebsiteHash functionSet (mathematics)InformationCore dumpStudent's t-testProcess (computing)Different (Kate Ryan album)Computer fileWave packetOrder (biology)Message passingFrequencyWordLevel (video gaming)PRINCE2Keyboard shortcutoutput2 (number)ParsingPasswordString (computer science)MereologyComputer programmingAsynchronous Transfer ModeBitNumberElectronic mailing listAlpha (investment)DigitizingRule of inferenceSemiconductor memoryWeb 2.0StatisticsLengthEmailAlgorithmStreaming mediaCellular automatonElectronic visual displaySubsetCodierung <Programmierung>Computer configurationCache (computing)Crash (computing)Link (knot theory)Uniform resource locatorSign (mathematics)Cyclic redundancy checkLetterpress printingSubject indexing
46:04
Asynchronous Transfer ModeComputer animation
Transcript: English(auto-generated)
00:01
So real quick about me, I have a bit of a reputation as kind of approaching this as an academic frameset, simply because I got started in the whole password cracking thing through my research when I was getting my PhD, but I really do strongly believe in learning by doing. So I'm an active
00:22
member in team John Ripper and I do participate in password cracking contests like, you know, Crack Me If You Can that's going on right now. Luckily this talk is, you know, being filmed, you know, before the contest starts, so no spoilers here unfortunately, but good luck to everyone else who's participating. So password cracking really it's my hobby and a
00:44
little bit of an obsession, but it's not my day job unfortunately, but my day job has been very exciting recently though because I really focus on medical device security. And you can imagine with all the greatness around COVID-19, it's been an interesting time. So one project that I
01:03
really kind of want to highlight is the open ventilator monitoring and alerting project that I've been helping to contribute to. And there's actually a talk at the Biohacking Village this Sunday that I really highly recommend people go ahead and listen to, so that our team members are giving it. And it really is going to talk to you about, you know, how the
01:23
lessons learned and how other people can help contribute as well. Because this has been a big problem because as I'm sure you're aware, there's been a huge demand for ventilators to be able to help deal with COVID-19. So there's been a lot of different projects that have kind of stood up to try to help produce, you know, low-cost ventilators to help fill that need
01:44
pretty quickly there. So rather than have every single do-it-yourself ventilator develop their kind of whole own monitoring and alerting framework, we're trying to produce one common one that can be applied to all these different projects across the board. So because when you have these
02:00
ventilators being able to treating, you know, patients, the patients are highly infectious. So you don't want to have the nurses exposed to that. But if something goes wrong, you need, you know, seconds count. So you need to be able to forward all that sensing information that these ventilators are doing back to a centralized nursing workstation. And you need to do that securely because you're running on this on a real hospital network. So that's
02:22
been a really fulfilling project that I've been working on. So the other thing that I'm kind of helping out with here, as I move my head, is I'm helping out run the Defcon Biohacking Villages Capture the Flag contest. So this was originally supposed to be in Vegas. That changed, of course. So now a lot of
02:42
vehicle equipment is actually sitting in my house. So I have to be able to provide a way for hackers from all over the world to be able to log in and hack these infusion pumps here without also hacking my smart thermostat. As part of that, I actually had to repurpose one of my password cracking rigs, as you can see there, in order to run all the the VMs that are helping to keep
03:01
people, you know, on those ventilators and hacking those and not, you know, hacking my smart thermostat. So probably the first questions I should start kind of addressing here is, you know, what does that PCFG stand for in, you know, PCFG password cracking? So originally, and I guess still technically,
03:25
it stands for probabilistic context-free grammar, which is the kind of modeling framework it uses in order to model how people create passwords. So if you're into, you know, the serial automa or, you know, formal languages, this might actually mean something to you. But for most people, you know, they hear that and they're like, oh god, that's like mass and stuff like that. I mean there's
03:43
no way it's gonna run on my computer. And then you kind of like slowly walk away. So I decided in order I need it need to have a more descriptive name. So I went ahead and rebranded it to prequel fuzzy guesser. So and this kind of explains it a little bit better about what it's actually doing underneath the hood, because you train this on a list of passwords and then
04:03
they'll go ahead and create guesses that are similar to those passwords but different, which is really kind of important in order to help, you know, expand your cracking session. Side note, this is my favorite slide I've ever made, so it's all downhill from here. So really kind of what it's doing is it's
04:21
using machine learning in order to crack passwords. And when I say machine learning, I mean that in the traditional sense of a whole bunch of if-then statements. So it's not using neural networks or artificial intelligence, but you are training it on passwords that you expect to be somewhat similar to the target passwords that you're cracking. And when it processes that training password set, it extracts all sorts of probability information about
04:43
the components of those passwords that it finds there. So it figures out things like, you know, capitalization masks, whether numbers go at the beginning of the password versus the end, the probability of individual letters and numbers found in that password, keyboard walks, and so on. And so it goes ahead and creates a model based upon all those different types of
05:01
probability information there, and then it uses those in order to generate a very highly probable password guesses in probability order. So it'll start with the most probable password guess and go to the second most probable password guess and then go to the third one, and so on, until you crack the password that you're trying to find or you give up. So let me just move my signal here a little
05:23
bit here. So just to kind of tie us back into probably what's going on right now, as I said, I don't know what the actual contest is going to be like for crack me if you can, but CoreLogic healthily provided a brief summary of, you know, what the center is going to be at least here. And so we're going to be
05:42
targeting 12 different individuals, and those individuals change your passwords over time in order to be able to deal with more complex password creation requirements. And that sounds a little bit like something that a PCFG might actually be useful for, so I'm really optimistic for this contest. You'll see how optimistic I am on Saturday when we're
06:01
giving this talk, but this is kind of the scenario that this was originally developed for in the first place. You know how a subject creates passwords, so you want to create passwords kind of similar to that, but you also want to go ahead and change them and maybe, for example, you use more complex rules or complex password creation requirements
06:23
added on top of that there. So I'm available probably on Discord right now, and so I'll be able to answer questions about how potentially you might be able to tweak this in order to help in a scenario like this here. The fact that there's a lot of academic papers about this though is when I give a talk like this, I don't actually have to create any of my own graphs. I just
06:43
can go to other papers there, look at the research that other people have done, and just pull out their graphs in order to be able to talk with it here. So one thing that I really kind of want to highlight though, and you need to kind of look at this with a bit of a skeptical eye, is that you'll notice that all these cracking sessions are really short. I know
07:02
one trillion guesses here, that might sound like a lot, but when you start talking about GPU password cracking, you're talking about under a second in order to generate all those. So that's just no time whatsoever. Part of the reason for this here is that the PCFG approach is very slow. It doesn't scale very well with multi-threading
07:21
currently. So when you start talking about the passwords that you want to be able to crack, it works very well when you're going ahead targeting very slow password hashes, where you can only make thousands of guesses a second because the hash is very slow. But when you start talking about things like unsalted MD5, other attacks are going to be much more effective because you
07:41
can just make so many more guesses in the same time frame there. So when you start looking at faster password hashes, you can certainly go ahead and still use a PCFG to supplement your attack, and still go ahead and crack some passwords that you might not normally get. But in general, for the faster password hashes, you really are going to want to go ahead
08:01
and use more traditional types of password cracking attacks in order to really make use of the hardware that you have available to you. So I want to talk about this graph, and really focus on it, because this was a really neat study done by Carnegie Mellon University. And one of the
08:21
problems when you do academic research, especially when you start talking about offensive tactics, is that the academics are running attacks themselves. So you're looking at how effective students are at cracking passwords versus someone professional potentially. So CMU took probably the most
08:43
straightforward approach to be able to solve that problem, was they went out and reached out to CoreLogic. You might have heard of them, they're running this password village, they run the Crack Me If You Can competition. So when you're trying to find an expert, they're like way up there. So they're a pretty good representation for that there. So what they did was they gave one of the CoreLogic
09:02
engineers a password list, they asked them to crack it, and they recorded how many passwords they cracked over time, and it was the number of guesses they made, and then they compared that against other cracking sessions as well. And one thing, it makes me smile every time I see this here, is that the PCFG did really well compared
09:22
to the pros, which was CoreLogic, for that short cracking session. So when you start asking, can this represent how a real professional password cracker operates, the short answer is it certainly appears to be able to do that there. So, full disclaimer, when you
09:44
gave CoreLogic more time, they definitely performed way better. This is a logarithmic graph, so that's about 100 times more guesses. And also, I'll admit this wasn't fair to CoreLogic either, because that's not typically how people crack passwords in real life there. Was that
10:03
such a short cracking session? And when it is that short, usually it's against a really strong password hash, and you have a lot of time in order to really manually tweak your attack that you're running there. That being said, if any of you are listening, I would love to have a repeat or a rematch of this
10:23
attack, just to see how this performs with all the new improvements that have been made into PCFGs, and I'm sure that CoreLogic's really been upping their game over the years as well. So that's why I'm somewhat
10:40
hopeful that we'll be able to find it useful in the contest that's going on right now. So enough about all the research side of that there, let's talk about how to actually make use of this PCFG password cracker. So, the first thing is you just go ahead and download it from the GitHub repo. And the requirements of it,
11:01
I really strive to make it as simple as possible. So, you need to have Python 3, and that's it. So, there's an optional care debt Python module that can help during the training, and that's because it helps detect what character encoding the training set is, because character encoding is the bane of my
11:20
password cracking existence. But even that's optional, and actually it's now being installed as part of PIP 3. So, if you have PIP, you probably don't even need to install it yourself as well. And this is really useful though, because I find a lot of situations where like when I'm cracking passwords, like I don't have internet access. So, it's really nice to be able to go ahead and quickly just throw my tool on a box and get it
11:41
to run. So, if you can run Python 3 on a box, you can probably run this here. So, I've tried it on a bunch of different OS's. I've actually even gotten it to run on NetBSD, which is the only thing I've ever gotten to be able to run on NetBSD. So, hopefully this is easier than your typical academic's tool set in order to be able to get it installed and start cracking passwords with it quickly as you can. So, we start talking about
12:04
hardware requirements, because that's always an important portion when we start talking about password cracking. The PCFG tool set, it is single-threaded CPU bound, which is why it's so awfully slow. But it will use an entire CPU thread. So, you do really need to dedicate one
12:21
full CPU thread to the PCFG. The other thing is it has very high RAM usage. It basically maintains a lot of different data structures and memory, and those data structures become more complex over time, so it just grows. So, I could have done some things, tried to go ahead and prune that or move some of it to disk, but
12:43
RAM is cheap, so I haven't. So, it'll just keep on growing over time. So, initially it starts at pretty low usage, but if you're talking about running this password cracking session for like a week or two, you really need to have at least 16 gigabits of RAM to really kind of just fully dedicate to the PCFG tool set itself.
13:04
So, the next step is to actually make use of it and run it. So, I apologize up front that I tend to use the words ruleset and grammar interchangeably, and at least to me they mean the exact same thing. But really what I'm talking about is that I mentioned machine learning a couple of
13:22
times here. You have to go ahead and train a grammar on an existing password data set. Now, you may want to have different grammars for different targets that you're trying to target. So, if you're trying to target a web application catering to younger people, you might want to train it on passwords that resemble that. If you're trying to target corporate passwords,
13:41
you might want to train it on corporate passwords instead, and use those rulesets against target passwords that you think will match that. So, you can have as many rulesets as you want to be able to really fine-tune your cracking session there. So, the default one ruleset that comes with the PCFG password cracker was actually trained
14:00
on a subset of 1 million passwords from the Rocky data set, which came out in 2008. It was against web passwords, so there wasn't really any strong password requirement whatsoever. I've been thinking about updating that, so if you have a good data set that you think that I should use for that there, I'm open to hearing about that there to make it a little bit more effective. But that being said, the Rocky data set is
14:22
still extremely effective even to this day. It's just Blink 1.8.2 is not nearly as popular. So, after you choose a data set that you want to use, though, now you go ahead and start generating guesses. So, it's a Python program, so you just go run the Python 3. You run the PCFG guesser.py tool from the repo. You
14:46
give it the name of the ruleset. By default, it's default. So, if you don't go ahead and specify that there, it'll go ahead and use the Rocky data set. And then you go ahead and specify a session name as well. By default, this is default as well. And so the session is used to restart a password cracking session. So,
15:03
if you have to cancel it for whatever reason, you can go ahead and restart it back up again. So, I really want to kind of highlight, though, that the PCFG tool set is only a password guess generator tool set. It will generate password guesses. It will generate those password guesses in probability order. So, start with the most probable
15:21
password guess, second most probable password guess, and keep on going down the line. It will not actually hash and crack any passwords. So, you need to use another password cracking tool set for that there. Both John the Ripper and Hashcat work are basically any other password cracking tool that accepts guesses in from the standard input there. As I mentioned earlier, I'm on Team John the Ripper,
15:41
so I'm going to go ahead and use John the Ripper for pretty much all of my examples here. But you can totally use Hashcat as well. So, in order to do this here, you run the previous command that I talked about. And then you pipe it into, for example, John. And on John they have a command called standard in, so that you type
16:00
that in there and instead of running data from like a word list, or generate your password guesses, it'll go ahead and use the the password guesses that are piped into it instead. And you're cracking passwords! That's really all there is to it. So, there's definitely optimizations for actually using this in the real world though. So, the first thing I really kind of want to highlight is a lot of times
16:21
you want to know what the status of a cracking session is. So, the challenge when you are using the pipe command though, is if you go ahead and hit the enter button on your keyboard, instead of sending the enter button to John the Ripper, it's going to go ahead and forward that to my tool instead. So, you might want to be able to, you know, get John Ripper to output a status report. So, the way that you do that is you send a SIG user 1
16:42
signal to John the Ripper. If you're writing this on a Linux system here, you just type in kill dash SIG user 1 and then the process on Netfire or John Ripper. And when you do that, you hit enter, it'll be just like hitting enter on the John Ripper itself and it'll go ahead and output the status output of its current cracking session. So, now you
17:01
can do things like, okay, not only see the passwords are getting cracked, but you can see like the number of hashes, the total number of hashes are cracked so far. You can see like, for example, the guessing speed. So, in this case it's making about 4 million guesses a second. And then you can see like how long it's been running and, you know, all the other, you know, options as well. So, I want to kind of dig into that one,
17:21
you know, output of that cracking session though, because I think, you know, this really kind of helps demonstrate, kind of from the power of using the PCFG. Because normally when you're, this here is showing the passwords as they get cracked. So, you can kind of see that it's not just going ahead and, you
17:40
know, figuring out one rule and then exhausting that rule and then going to the next rule like you would see in the more traditional password cracking session. Instead, it's creating much more fine-grained rules and iterating between all those depending what the current probability of it is here. So, when you see these passwords being cracked, it's kind of fun to try to figure out like how did the underlying system, you know, generate that
18:00
password guess? Why is it making the guess right now that it is? So, kind of if you look up, you know, initially here like this is pretty easy. Okay, it's just taking like some five-letter word.
18:23
I apologize, my microphone just died here. So, funds are doing, you know, DEFCON remote. So, you know, it's using five digits, you know, or five-letter words plus four digits here. Moving on though, this is a kind of an interesting one here is SESS IS COOL. So, I looked into my
18:40
input dictionary and SESS IS COOL was not in my input dictionary or my train set at all. And I found out that it was actually you doing multi-words for this here. So, it was combining, you know, SESS and then IS COOL. So, one kind of cool thing about this, and I'll talk to us about this a little bit later, is that instead of, you know, going ahead and breaking this up into three words like we normally would think about it there, it actually broke it up into two
19:00
different words. So, SESS IS COOL. So, that way you can go ahead and go through it and say like okay is, you know, KD COOL? Is, you know, Ally COOL? Is, you know, Bob COOL? Because there's a lot of cool people out there. So, it can go ahead and iterate through those there and try that type of, you know, mangling rule for it there. And what's really cool about this is that
19:22
it learned that IS COOL is a common word from the train set itself. So, I didn't actually ever program in that logic into it. It learned it by itself by looking at the training data which is, as I said, pretty cool. But you
19:40
can see after that there it kind of went into brute force. It wasn't the pure brute force, and I'll talk about the different types of brute force here, it's actually combined into very short words, kind of like a combinator attack, but still, you know, it's able to kind of get that out that way. And then it went ahead and tried, you know, words with, you know, special characters, the same special characters at the beginning and the end of
20:00
them too. And that's, you might be able to see that in a traditional password cracking session, but you actually have to have a rule in order to generate that. And trying to create those rules is a real pain, so you won't see those in, you know, most, you know, common publicly available rule sets, but it was able to learn that from the training data, which was, I thought, pretty cool as well. So, down
20:21
here, and I'm kind of, you know, need to get off the screen here, but you can see it's trying some longer words, but these are actually, while they're normal words here, it's actually generated them via the multi-words as well. So, like finger, plus nail, or 90 plus 9. And once again, this is kind of really useful, because now you don't have to have things like 99, 98, 93, you know, in your data set,
20:43
or your word list as well, because it's generating those on the fly. Kind of going down a little bit further here, this is your more traditional kind of rule here, it's just two digits plus capitalizing the first line of a word. So, you can see it's starting to do that, but, you know, settings is a pretty uncommon password, a word, to be able to use. So, it's trying it later in a cracking session here. And now it's
21:02
even combining even more mangling rules. So, it's doing a multi-word, you know, of wood plus fish, and, you know, Tara plus Don, and it's adding digits to the end of that as well. So, you can see how it starts stacking these different rules together. And I kind of want to highlight that this cracking session's been going on, as you can see from the status output, for about 13
21:21
minutes. So, all the really easy passwords have already been cracked. It's already guessed, you know, 123456, and password 12345, and so on. So, these are starting to get into more of the, you know, the fuzzier of the rule set, so you might not normally see in a normal password cracking session. So, as I mentioned a little bit
21:41
earlier, if you hit the enter button, it's going to go to my program, not, you know, John the Ripper. But, there's a lot of information that I want to be able to provide to people about the status of your cracking session. So, whenever you hit enter, or, you know, basically Andy Zarkey's here, it'll display an output of what it's currently doing. So, you can try to figure out, you know, whether you want
22:00
to continue it, whether it's working correctly, and whether it's kind of doing what you want it to do as well. So, kind of going through this here, I hit enter twice, and so you can kind of see how it's generating, you know, these password guesses as doing it here. So, the first one, you know, it's a, you know, basically going ahead and trying to combine two words. So, it's a multi-word type of attack, and you
22:22
can, if you dig into, like, the real details of it there, you can kind of see that it's trying, like, the hundred and forty-third most probable word, with no capitalization, and it's combining it to the ninety-third most probable four-letter word, with no capitalization as well. So, you can see that the probabilities it assigns to, like, even individual words and stuff like that is
22:40
very, very fine-grained. So, it's going to try some words, and then, like, do other mangling rules, and stuff like that, and they'll go back to the less probable words later on in the cracking session. So, now this next one here, it's kind of a little bit, I'll try to get out of the way, or something like that, you can see that it switched to a real brute force attack using Omen, or ordered
23:01
Markov, ordered Markov enumerators, and I'll talk about that a little bit later there. But, really, I kind of want to highlight, though, that it's trying, you know, more traditional cracking rules, so it's like, you know, like, you know, combining words, and then it's switching to brute force, and they'll switch to another mangling rule after this here, and then it'll keep on going based on whatever the
23:20
current probability is. Now, as I said, I really struggle with documenting my code, so I try to go ahead and add as much documentation into the runtime behavior of it as possible. So, if you, instead of hitting, you know, enter, or anything else along those lines, you hit H and hit enter, it'll provide, or just
23:40
H actually, it'll provide a SAS report output of what all these different fields here mean, and that SAS report actually is much longer than even, you know, displayed on the screen here. It explains what all those different, like, letters, like A5 or C5, actually stand for. The one thing that I kind of really want to highlight, though, is this
24:01
one metric here called probability coverage, because since the PCFG password cracker creates guesses in probability order, it starts with the very high probable passwords, and it goes to less probable passwords and less probable passwords, and the model that it has will basically never finish. It'll just keep on figuring new combinations of
24:21
words to go through to it. So, a real challenge becomes, you know, when do you go ahead and give up on a cracking session? So, you haven't cracked a password. When should you go ahead and, you know, kill this off and try some other cracking type of attack that might be more successful? Or, when do you go ahead and just choose to say, I'm not going to crack this password
24:40
and move to a different case? So, this probability coverage is a very fuzzy metric that I tried to develop to try to just give you a little bit of kind of a rule some about when that should be. So, what this metric says is that if the target password is the same probability distribution as the password
25:01
data set that I trained, and if my grammar and how to model how these password passwords were created was exactly correct, this is the probability that we cracked this password. Now, these are one of those assumptions that is actually true in real life. You know, the probability model of the password trying to crack is probably very different. You know, the grammar that I generate and train on is absolutely not perfect. But, at least as
25:22
I said, it kind of gives you a rule of some to say, okay, you know, this is starting to get a little bit high. It says, you know, I had like a 90% chance of cracking this password. I haven't cracked this password yet. Maybe I should go ahead and give up. So, and you'll notice this number jumps up really high initially because it's making you a high probability of getting password guesses, and then it slows to a crawl to almost like no advancing
25:41
after you get through, you know, like 70 or 80, 90 percent, you know, completion there. So, this is a kind of really good to be able to figure out, you know, where can I go ahead and devote that, you know, that one single CPU in that RAM somewhere else there. So, another usage tip I just kind of want to highlight is that sometimes the
26:01
cracking dynamic when it comes to speed is completely reversed. So, you might be trying to crack very, very computationally password hashes, expensive password hashes, or a lot of, like, let's say, assaulted hashes, in which case you're really only making, you know, a couple guesses a second. Well, this generator is generating, you know, let's say, you know, between like, you know, a hundred thousand and like
26:20
four million guesses a second. So, it gets backlogged and basically essentially freezes while it waits for to be able to send more guesses to the passive cracking program. So, occasionally if you hit enter it won't actually display set the status or it'll take a while to display the status and that's kind of usually what's happening. So, if that's happening and you're kind of
26:40
curious whether the passive cracking sessions crashed or not, I recommend going back to earlier advice about sending a signal to, let's say, John Ripper and just seeing how that's doing there in order to make sure that your passive cracking session is still running. So, as I talked about, you know, multi-word feature has probably been the
27:01
biggest, you know, addition to the new 4.0 rewrite and it has completely shocked me how effective this has been here. So, I won't get talk too much details about it, but the one thing I want to really kind of stress though is that it is not language specific at all. It learns all what constitutes a
27:22
word from the training set that you're giving it there. So, it'll pick up things like new band names or proper nouns that are really hard to specify inside a language dictionary or whatever new Pokemon just came out. It identifies patterns like, you know, I love and stuff like that. So, this is very useful for be able to, you know,
27:43
you know, try to target new, you know, password hashes. So, it is, as I said, it's not language specific. It works best with, I would say, kind of like a European English type languages. It really struggles still with some of
28:02
the other languages like Mandarin, but that is something absolutely that I really want to focus on more going forward here. It's not perfect. It's definitely a work in progress. So, there is a balance between, you know, creating, you know, false positives of the matches here. If you don't see some of the base words in the train set by themselves, they won't
28:20
identify them. But, it's something that is evolving and part of the new pull request that I just received from somebody else actually has some improvements to this here that I'm really excited about again pushing to main. So, one of the other big features that have been added recently here is
28:41
order Markov enumerator. So, and the only reason why I talked about this is that a similar approach can be taken for pretty much anything. So, if someone creates a better cracking attack or cracking mode, it can totally be incorporated into a PCFG style attack. I'll be a little bit like the
29:01
Borg in that respect. But, the real challenge is to be able to figure out how to assign a probability to a password guess. So, if you can assign a probability to a password guess, I can probably incorporate it into a PCFG. So, just kind of in the the last little bit here, I really kind of want to highlight some, you know, additional
29:21
tricks that are very useful when it comes to cracking passwords. So, the first one here is a skip root flag in the PCFG. And basically what this does is disable omen guess generation. And that's not to say that, you know, omen guess generation is something that's bad to do. It certainly definitely helps increase the success of a passive cracking session.
29:42
But, you can, this is a way to paralyze your attack. So, if you're having another system that's going ahead and really cranking through your brute force attack, you might want to go ahead and, you know, do all your brute force on that other system or on the other thread. And then run the PCFG guesser really just to focus on the word mangling rules instead. So, in order to do that, all you do is just
30:01
when you run it, just type in skip root. Another flag that's really kind of useful is the all lower flag. And what this means is they'll stop doing any sort of case mangling on the password guesses. So, let me try to move my picture just a little bit here just to make it easier to read.
30:22
As I go back, I apologize. Okay, so a lot of times you may want to not go ahead and do case mangling inside the PCFG itself. And one reason might be that
30:42
the hash that you're targeting is case insensitive like landman. That's not probably the best example though because if you're cracking landman hashes you're not using PCFG in order to do that there. You're just going ahead and brute forcing that sucker and taking it out that way. Where it's more likely though is that case mangling is very distinct for how people do it
31:02
there. So, if someone does a certain type of case mangling, they have a tendency to keep on using that strategy for all their other passwords. So, when you start doing things like targeted password cracking, you may not want to go ahead and just go ahead and do what you know what everyone does. You want to really make a really specific case mangling for that particular individual. In that case,
31:21
the better way to do this is that John Ripper supports a really powerful feature called pipe. What the pipe does is instead of just go ahead taking the guesses in from standard input and writing them as-is, you can apply additional rules on top of that like you would do in a traditional password cracking
31:40
dictionary type of attack. You can specify your very specific case mangling rules inside John Ripper's rule set and then pipe the lowercase password guesses right into John Ripper and have John Ripper capitalize it itself. That can be very powerful when you have an idea what type of case mangling you want to be able to target.
32:08
So, I of course moved it to the wrong portion here. Let me move my screen again here, I apologize. Some improvements, as I mentioned, there was
32:21
an amazing pull request that was submitted to me with a bunch of new features. I'm slowly incorporating them into the core, but I actually have the features available as their own kind of tool called Segmentr.py, and by that I mean, and I apologize if I mispronounced your name because I've only seen it written, but Chun-Won Wang submitted this
32:41
here and it really impresses me there. So, probably the biggest feature I'm really excited about is Leapspeak replacement. This has been a feature that has been kind of my white whale as far as implementing, and it's just every single time I've gone through it, it's just not been very effective, but that
33:01
currently incorporated into this tool called Segmentr.py that's included in the repo, that will go ahead and try to parse that information out. So, I'm looking at getting that incorporated into my core trainer and getting that incorporated into passive cracking sessions in order to be able to really target that there. He also improved some of the multi-word detection, so he made that
33:20
better, and then he also has incorporated some new approaches into the password score, which is a different tool that you can go ahead and submit your password into the password score, and it'll tell you what the probability your password is, which is kind of nice as well. So, all credit goes to him for these really impressive press lists here, and if
33:40
anyone else is looking at helping out too, I'm all about that. So, thank you very much once again for that there. Okay, so let me move my screen around again here. Okay, so the next thing I
34:02
currently want to talk about here is the compile PCFG guesser. So, I've been talking about the Python tool set all along right now. So, the compile PCFG guesser is a completely different fork, and you kind of get the name there. Instead of being written in Python, it's written in compiled C code. It's a little bit harder to get actually installed and
34:20
running, simply because when you start talking about compiling your code, you know, it runs great on my machine, but it has challenges elsewhere. I tried to go ahead and use the hashcat build makefile for this. So, if you can build hashcat on your computer, you have at least a better chance of being able to go ahead and get this running as well. But if you have problems, please, you know,
34:41
reach out to me on the GitLab or GitHub site, and I can try to help you fix those there. So, I will say that the trainer portion, it will always be written in Python. I just like writing Python too much to change that over. So, basically you'll go ahead and create the training rule sets with the Python trainer, but then copy them over to be
35:01
used in the compiled version here. Also, the compiled version has a tendency to lag in features from the Python toolset, because once again, I like writing Python. I'm not the best C coder in the world. So, basically, if I write a hello world program, it's gonna have like five buffer overflows and, you know, a segfault. So, take
35:23
that as what you will there, but I'm making this available if someone wants to write a better one, I'm totally open to that as well. But, you know, it doesn't have save first order, it doesn't have SAS outputs, and it has no ohm and guess duration. So, all that being said, you know, why bother with this here?
35:41
And really, at the end of the day, the main reason is it's about 20 times faster than the Python toolset. And I I've always heard that, you know, C code is faster than Python, but when I saw that, I was like, holy crap. So, I will be upfront. I'm actually, even with all these limitations, when I'm
36:00
tracking passwords, I'm using the compiled C version now much, much more often than I'm using the Python one there. So, because that 20-speed improvement is hard to beat for most password cracking sessions. So, now I'm going to talk real quick about training passwords. So, I've been talking about
36:22
this a lot here, and there's a lot of different reasons why you want to go ahead and create a new password training set there. So, language is a huge one. So, you want to be able to train on passwords that are similar to the target that you're trying to target. And another big one is that corporate passwords are very, very different than you'll see from websites. And I'm sure
36:41
you've probably heard CoreLogic talk about this before, you know, yesterday. But that's something that, you know, is very evident. So, if you're trying to target corporate passwords, you probably do want to go ahead and train on corporate passwords versus go ahead and training on passwords for some gaming website. So, another reason to go
37:05
ahead and train it, though, is if you're targeting a specific password creation policy, or you know which mangling rules your target prefers. So, one way to be able to really target that there is to train only on passwords that match that training set there. And there's other things you can do, like the password rules, or the, you know,
37:23
the grammar that I generate. I made sure that I didn't include anything like a CRC check or any of those any checks into it there. So, you can actually open up the files themselves, or just text files, and start editing the probabilities of different things in them by hand, too. So, if you say, like, oh, this is one word I really want to go ahead and make it, like, highly probable, but I don't want to have to train on a whole new training set.
37:42
You can just go ahead and open it up, put that word in there, give it whatever probability you want, and then that will just be read in and used in your password cracking session there. So, the other reason that the train on the password training set there is it generates a bunch of information that extracts a lot of information from that password set. So, it's really useful to be able to analyze a new dump that
38:00
you have accessible to you there. So, for example, it'll pull out, like, common emails, it'll pull out dates and websites, and try to help you figure out where did this, you know, password data set come from there. So, the next question, of course, is, you know, where do you go ahead and, you know, get these password data sets from? So,
38:20
there's a lot of challenges with this, too, because a lot of data sets are not optimal when it comes to training on. So, I don't know if you know of hashes.org, but it's a really great site for being able to download all these, you know, dumps as they come out here. So, for example, let's say you want to go ahead and train on this data set here. I'm not gonna try to pronounce the name of that site, because I'm sure I'll
38:43
just horribly, horribly mangle it there. But, when I did some googling about this site here, it was a site for, you know, new college students trying to find a job in China. So, that's kind of an interesting data set there that you might want to be able to use in order to train for cracked passwords here. So, if
39:01
you download something from caches.org, the first most important thing is select the plain option, plain text option, to train your rule set on, because you don't want to include the hash part of your training set, because it'll think it's part of the password and just it goes poorly there. So, one other thing I really kind of want to highlight here, and this is a feature that I'm hopeful to be able to get, you know, added to the PCFG tool
39:23
set, but I was informed by the owner of the site here that they actually do some additional things for encoding non-UTF8 characters, so that my trainer will not fully parse correctly. So, that's something I need to add in too, so that it goes ahead and, you
39:42
know, uses the correct character encoding for non-English passwords. So, I just want to put that warning out too for trying to train this on things like Mandarin. But, one problem with a lot of these dumps here is, the first one is they don't contain duplicate guesses. So,
40:02
duplicate guesses are really important when it comes to trying to figure out what the probability of password is, because if you don't have duplicate guesses, 123456 looks like a very just random string. So, that's useful, but I will say when you run longer cracking sessions with PCFGs, that lack of
40:21
duplicates becomes less and less important because you've already exhausted all the really probable password guesses. The one issue though is that the omen portion really does struggle without the duplicates. So, you might not want to go ahead and enable omen guessing if you train on a data set that doesn't contain any duplicate guesses there. The other problem with these dumps is that they only contain the passwords that have been
40:40
cracked. So, basically you don't know or learn anything about the passwords that haven't been cracked. That's not a deal stopper, but just useful to keep that in mind there, that you know the the crack percentage is going to be very useful when it comes to figuring out how good a data set is or to create a new rule set. So, in order to train a
41:01
password data set there, really it's a Python program once again. You just give the name of the rule set that you want to be able to train it on, as well as the the password data set that you want to train it on as well. And it'll go ahead and you know run in order to do all the the parsing and stemming of the password data set
41:20
here. So, it will try to auto detect what that encoding is, but when in doubt you know set it to be UTF-8 because the encoding really does matter quite a bit there. So, the first pass it takes through the data set there, it learns all the character frequencies and base words for multi-word detection. So, it actually makes a couple different passes through the same data
41:41
set in order to learn more and more and more about it there. The second pass it goes through there, it'll do much of the real parsing in the password. So, it figures out things like you know keyboard walks, alpha strings, you know letters, how probable like the digits are and stuff like that. So, most of the stuff you think of traditionally when you talk about you know what the probabilities of different you know things are, it does on the second run
42:00
through. And it actually goes ahead and makes this whole third run through then to see about how effective things like you know omen would be for cracking passwords. So, that kind of gets back to how omen generates the probability it's different levels there. And so, this takes a while. So, if you're training it on a million passwords, you know it's done in
42:20
within like a minute or two. If you're training it on a billion passwords it takes significantly longer and it has to keep all this data in memory. So, if you're training on some of these really gigantic data sets there, it's just not going to work. So, one thing you might want to do is just select a subset of that password set you know chosen
42:41
randomly in order to train your rule set on instead. So, after you're all done with that though, it'll display statistics about the data set you just trained upon too, which are really kind of useful to figure out you know where it came from. So, you know password lengths and stuff like that. But, one thing that I've been kind of added that I found really
43:01
useful is it'll display kind of like the top URLs, which are usually at the beginning the top of them are like you know web you know email account information. But, if you start getting down a little bit you can actually see usually what the website is because people have a tendency to use the website in their password. I also
43:22
highlight the dates that it finds in there as well because that's useful kind of trying to date when that password data set got leaked. Now, I want to highlight that there's a long tail when it comes to the dates. I'm sorry because people you know create passwords before you know the
43:40
password data set gets stolen. So, you'll see a lot of passwords for years before the data set actually gets down. But, if you start kind of going down it a little bit you can say okay that's probably about where to cut off was for when this password data set was you know disclosed. So, kind of one last thing I
44:01
really want to talk about real quick is that I am trying to get this to work with other cracking modes there. So, one of the you know really popular cracking modes used is called Prince. So, Prince basically takes a lot of different words and just combines them all together and makes lots of guesses based upon that. But, one challenge with
44:20
Prince is that it's very dependent upon the input word list that you give it to it there because the the word list needs to have you know high quality words in it. But, it also needs to have a level of kind of cruft in there too just because if you want to go ahead and let's say add the number one to end of a word you have one in your word list by itself. But,
44:41
the challenge is the larger your word list is the more words are trying to combine and then you know it starts to have issues there as well. So, we have all this probability information about how a password was generated. So, maybe we can go ahead and use this to create a very bespoke word list for like a Prince style guessing session there. So, I created another tool called
45:00
Princeling that basically just does that there. So, I'm sorry my microphone just went out again there. But, yeah so it creates a very you know high quality word list there and does it automatically. Because, one
45:21
thing I like about Prince is it's the tool to have the attack that I run when I want to goof off. So, like you know pass cracking sometimes it takes a lot of brain cells because you're kind of looking at how you're cracking you're trying to optimize your cracking session. And, Prince is like I have no idea what I want to do. I want to go watch Tiger King on Netflix. Let's just go ahead and just launch this off and come back and see if it was successful. And, Prince is usually actually quite successful. So, it's a
45:42
pretty good tool to be able to use and buy anything that you can do in order to automate Prince even more. All four which is why I went ahead and created that. So, I'm going to go ahead and stop the live stream here and hopefully I'll be on Discord there in order to answer any questions that you have. I hope you enjoyed this. I hope this is helpful. And, once again you know thank
46:00
you for attending the password village here at DEFCON Safe Mode.