Pillaging DVCS Repos...for fun and profit

Video thumbnail (Frame 0) Video thumbnail (Frame 559) Video thumbnail (Frame 1510) Video thumbnail (Frame 5344) Video thumbnail (Frame 5985) Video thumbnail (Frame 6967) Video thumbnail (Frame 9904) Video thumbnail (Frame 11512) Video thumbnail (Frame 11824) Video thumbnail (Frame 12223) Video thumbnail (Frame 12616) Video thumbnail (Frame 13457) Video thumbnail (Frame 13949) Video thumbnail (Frame 14406) Video thumbnail (Frame 15841) Video thumbnail (Frame 16302) Video thumbnail (Frame 16719) Video thumbnail (Frame 17838) Video thumbnail (Frame 18152) Video thumbnail (Frame 18469) Video thumbnail (Frame 18921) Video thumbnail (Frame 19970) Video thumbnail (Frame 20422) Video thumbnail (Frame 20945) Video thumbnail (Frame 21292) Video thumbnail (Frame 21839) Video thumbnail (Frame 22645) Video thumbnail (Frame 23025) Video thumbnail (Frame 23380) Video thumbnail (Frame 23702) Video thumbnail (Frame 24295)
Video in TIB AV-Portal: Pillaging DVCS Repos...for fun and profit

Formal Metadata

Pillaging DVCS Repos...for fun and profit
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Adam Baldwin - Pillaging DVCS Repos For Fun And Profit https://www.defcon.org/images/defcon-19/dc-19-presentations/Baldwin/DEFCON-19-Baldwin-DVCS.pdf White Paper Here: https://www.defcon.org/images/defcon-19/dc-19-presentations/Baldwin/DEFCON-19-Baldwin-DVCS-WP.pdf Distributed Version Control Systems, like git are becoming an increasingly popular way to deploy web applications and web related resources. Our research shows these repositories commonly contain information very useful to an attacker. This talk, which was part of my small contribution to the Penetration Testing Execution Standard (PTES) will demonstrate how to identify these repositories and techniques to pillage just as much information as possible from them. Lastly there will be release of a toolkit to automate the the discussed techniques supporting git, hg and bzr repositories! Adam Baldwin has over 10+ years of mostly self taught computer security experience and currently is the co-founder and Chief Pwning Officer at nGenuity focusing on security of web applications. He at one time possessed a GCIA and if his CPE's are up to date should still have a CISSP. Prior to starting nGenuity Adam worked for Symantec. Adam is a minor contributor to the W3AF project and has previously spoke at Toorcamp, Djangcon 2010, and JSconf 2011.

Related Material

Video is accompanying material for the following resource
Web 2.0 Web 2.0 Repository (publishing) Weight Software testing Software testing
Meta element Trail Computer file MIDI Directory service Disk read-and-write head Product (business) Revision control Web 2.0 Pattern matching Root Operator (mathematics) Repository (publishing) Software repository Energy level Software testing System identification Stochastic process Predictability Meta element Matching (graph theory) Information Software developer Computer file Directory service Control flow Variable (mathematics) Regulärer Ausdruck <Textverarbeitung> Disk read-and-write head Type theory Repository (publishing) Network topology Website Pattern language Right angle Routing
Plug-in (computing) Information Demo (music) Code Source code Bit Web 2.0 Repository (publishing) Website Pattern language Software framework Belegleser Plug-in (computing)
NP-hard Point (geometry) Backup Sequel Computer file State of matter Source code 1 (number) Electronic mailing list Login Web 2.0 Revision control Root Single-precision floating-point format Core dump Software repository Cloning Computing platform Predictability Source code Email Information Key (cryptography) Software developer Computer file Data storage device Content (media) Electronic mailing list Database Directory service Instance (computer science) Entire function Subject indexing Web application Process (computing) Repository (publishing) Software repository Network topology Password Computing platform Website Right angle Key (cryptography) Game theory Cloning
Scripting language Wechselseitige Information Just-in-Time-Compiler Clique-width Mountain pass Password Gastropod shell Password Database Database Backup Invertible matrix
Sequel Core dump Core dump Database Table (information) Absolute value
Predictability Computer file Intrusion detection system Repository (publishing) Software developer Forcing (mathematics) Maxima and minima Website Pattern language
Facebook Word Graphical user interface Myspace Quantum state Key (cryptography) Googol Convex hull Key (cryptography) Instance (computer science) Simulation Twitter
Execution unit Game controller Meta element Link (knot theory) Computer file Computer file Plastikkarte Directory service Line (geometry) Plastikkarte Number Web 2.0 Type theory Ultimatum game Repository (publishing) Password Authorization Website Quantum Proxy server
Demo (music) Multiplication sign
Scripting language Predictability Regulärer Ausdruck <Textverarbeitung> Computer file Demo (music) Data storage device Directory service Bit Directory service Regulärer Ausdruck <Textverarbeitung> Disk read-and-write head Subject indexing Different (Kate Ryan album) Point cloud Website Configuration space Object (grammar)
Inheritance (object-oriented programming) Computer file Data storage device Price index Directory service Right angle Disk read-and-write head
Regulärer Ausdruck <Textverarbeitung> Server (computing) Demo (music) Repository (publishing) Cloning Information Cloning
Information Computer file Multiplication sign Demo (music) Data storage device Price index Uniform resource locator Subject indexing Personal digital assistant Object (grammar) Repository (publishing) Website Object (grammar)
Computer file Object (grammar) Computer file Object (grammar) Directory service
Error message Cache (computing) Computer file Demo (music) Configuration space Content (media) Computer font Total S.A.
Execution unit Sample (statistics) Computer file Demo (music) Maxima and minima Configuration space Content (media) Regulärer Ausdruck <Textverarbeitung> Total S.A.
Web application Network topology Source code Website Configuration space Price index Right angle Perspective (visual) Entire function
Execution unit Scripting language Hoax Table (information) Key (cryptography) Password Database Database Word Cache (computing) Password Hash function Uniqueness quantification Configuration space Information HTTP cookie Website HTTP cookie
Randomization Graphical user interface Computer file Website
Repository (publishing) Software bug
so welcome to pillaging dbcs repositories for fun and profit a little
bit about me i'm adam baldwin not to be
confused with this adam baldwin i co-founded ingenuity pen testing firm a
pen tester of webs and i curate evil packet net that's about it so what the
is DV CS for those of you that that might not be familiar with what git mercurial or bizarre are they're distributed version control systems their source code management tools to keep track of revisions to files in crap like that sounds great they're basically made up of a meta directory in a working tree a at a high level the mated directory is what we're going to be talking about today the made a directory contains a bunch of really juicy information and great that's the that's the high-level overview so what's the
problem with this that sounds like a great tool for developers and for managing revisions and things like that that the problem is is that this is being used by developers and by operations to push web apps out to production they're starting to it's it's a really popular tool right like git push and now you're now your shits in production right the problem is is that they're leaving these dot made of directories public right we're supposed to be blocking these it's pretty pretty standard practice to block directories however you know these are getting these are getting left in web routes and as an attacker we want to have access to to all of this the information that's in in this repository the situation might be were we're doing a pen test or were we're an attacker and we want to we want to get some foothold on this site this might be that foothold and no that doesn't say gov so in the situation that I ran into we were deploying a site they left the dot get meta directory in in in there webroot and I wanted to see okay what type of exposure does this create and as you'll see later on we can't just simply clone the repository out of the web root it just it doesn't work that way and so once we first step is identifying repositories then we need to figure out how to extract information from those repositories and then of course some random process and then we'll profit as you will see so here's how we identify repositories it's pretty damn simple all three of them get HD bizarre and there's others I focused on these three they all have predictable file names they all have something in the repository that is that you can access and then pull a regex match against these are very lightweight files is why I picked these they have they have great patterns they're lightweight and there's other files that are that you could use but they're a variable size and it's it's a real pain when you have to download you know a couple Meg file to pattern match against as an example you can example.com / get / head and it could be any directory it doesn't have to be just the the root if you want to see the
patterns that we're using there's a plugin for w 3f for the the web attack an audit framework scanner a just go look at the code and there's the patterns that we're using in there very useful for identifying these as far as I know there's no other scanners that are looking for these other source code please if somebody knows of them correct me and let me know afterwards so we
wanted to understand just how large a scope this problem was right if we you know how who's doing this and so we scanned the top million Alexis sites and we found roughly 2,000 repositories out of those a million sites so not real common but as you will see the impact and the information that we can extract from these repositories is pretty significant extremely useful to an attacker and gives you that foothold that you might need obviously the most popular was was get out of all those so that's what we'll demo in a bit so once
you've identified the repo that you have that you have a git repository or an HTM repository or bizarre you need to clone that repository and you can't simply run say git clone and cologne that URL so here's the process that we came up with to actually extract data from from the repositories first thing you need to do is check for directory browsing if director browses enable on site it's game over you can simply double you get the dot get directory and then restore the repository get reset hard and then you've got your contents you've got the source code you get the entire tire tree if that doesn't work barring that we need to get the predictable files in there's an index file in every single one of them in index or adder state that index file that der state contains a listing of all of the stuff that's in that that repository and for a web application that's the webroot that's all the crap that they're sticking in that web root source code files and anything else that might be included so even if you're not able to extract any additional data you can clean a lot of information from that just that single index file that's a predictable file in in all instances after we've done after we get that index we parse that index we need to download references that that index points to and then we can try to restore the repository after that in some instances that's not a hundred percent possible because of how the repository store data but you're going to get enough information so pillaging once we have our information what kinds of things are we looking for what are we going to find well easy enough we're going to find platform details we're going to find if there's backup files if there's sequel dumps in there if there's you know there's source code it's a web app right that's that's what it's going to be credentials certs API keys just to get you started thinking other than those top million
sites here's the kind of stuff that we found and I'm not going to go through all of them but some of the more interesting ones sequel dumps of entire databases HT passwords Excel documents mail spools you kind of get in the picture of stuff that we're going to find and basically these repositories shouldn't have this stuff checked into them that's that's one of that's one of the core problems not that they just left that the made a directory is is hanging out in the web root but that the developers are checking this crap in right there not being audited and no one's giving a flying what they put into the repository and even if they did and they go oh I shouldn't put that in there they just remove it and they forget about this old revision thats hanging out in the tree and so if you're you can find stuff that's not just in the current current revision if you look a few back you're going to find some really juicy stuff but that potentially just by looking at the at the logs so
here here's here's the the Apple epic fail montage the remember pillage then burn just some of the really really fun stuff that we found database passwords
in shell scripts database dumps that
does not say gov as well as user tables
absolute piles and piles and piles of sequel dumps that people had done and just said Oh we'll check them into the repository and and push them to our get server because that's cheap back up and it's really good idea get ignore the get
ignore file or the HD ignore file every one of these has a different ignore file but they're really useful it's it's the it's the developers way of saying we don't want to be bothered by this stuff in our repository so so we're going to ignore it we don't want to be bothered we're not going to check it into a positive very but we're going to put a pattern into the ignore file what's a really good place to look because you're like oh these are the juicy things that you don't want us to know about but now we know about it it's kind of like the robots.txt thing how about customer
invoices this particular site had i'm pretty sure every customer invoice had ever generated hanging out in their repository they weren't predictable file names in the webroot you wouldn't have been able to easily brute force them with something like dirt Buster or anything like that but you could extract them from the repository how about some
API keys apparently they still use myspace Google API facebook SSH key for
the ec2 instance those aren't useful at all are they HTTPS words this is a
really interesting one as well it's a German site that when you went to the actual file in the web rude it was HT password protected and if you pull that out of the repository it's it's you can obviously get that that particular file and I'd really be curious on the discussion I'd love to have this a session with the eff on is this actually bypassing access control these files are public because they're in the dot get directory there they're just sitting there you don't actually have to bypass access control to access them so so what type of line are you crossing if you're accessing this stuff in the get directory in the in the meta directory coincidentally enough these are account numbers and routing numbers for a very large pile of their customers so yeah no authorization required how about credit
card numbers and docx files seriously what the yeah this is actually a non-us company but so I don't know pci pci wow that
went fast so we have a demo and and then a tool release to automate all of this because i really didn't give you any technical details cuz i don't know so demo time okay look demo alright so what
I did was I pulled down the dbcs pillage toolkit a once I push that up on the github you'll get the URL in a bit what I included was a tool for be zrh team and bizarre or busy get and here you and I included the PI Cloud scripts that we use to scan the top million sites so you can just use them apply clouds very handy for scanning a lot of stuff simultaneously and some regex and that's about it so to start out i'm going to show you so here's our here's the get
directory of our of our target site as you see there's there's a lot of different predictable file names we've got configs we get the index file the object store this is the this is how get
stores all their crap right it's the the Shah value of a particular file and
those are the files that we're going to go after and so let's show that git
clone git clone
yeah it doesn't work a same thing with BZ and in HD it's it's it's not going to work if we run the particular to get
pillage what it does is it pulls down
the index file parses it and lets you know hey you're about to make 1,900 requests depending on the site and depending on what they check in this could take a really really really long time some sites had twenty thirty thousand objects in their their data store for those situations don't download it figure out what you might want to go after just sit know look in the index file yourself and then look and look to see what juicy information you might might want they might not have anything they actually want to download so in our case we're going to hit yes
it's just a lot of annoying scrolling text it's basically telling you all the objects that it's going to get and it's it's pulling down those those references anybody uses the tool if you get 40 flowers that's normal get doesn't let you get some of them just based on how
it works after after it downloads those references it tries to check out each file to restore it into your working directory it works on get bizarre and mercurial
alright so it's kind of hard to see cuz the large font once it once it checks
out all files it's just going to run those file names against a regex file that we that we included in the toolkit and I'd love some contributions to that and kind of shows you owe these might be files of interests that you might want to go look at right away a pretty cheesy pretty error-prone as you can see there's crap we just don't care about
I'm going to sanitize this file so you guys don't know who this is from well
first of all there's all the source code for the site yay so you've got all of all everything in there webroot that we're able to check out so now you went from having no knowledge you know of the site to actually having the entire source code tree for that web application which is really really really powerful from an attackers perspective let's sanitize the wp config right this is a wordpress site right
let's hope that worked oh there's some
cookie ashes database password you know
again all the crap that you find in a wordpress could fake off keys all that
stuff I'd sanitize it in there but
that's that's something just so I can sanitize this this file so you don't know who this particular site came from it's it's just a random wordpress site
yeah so that's pretty much it that's the tool he can get the tool at the github repository there and I will be putting that up after the talk I forgot to set it public before I got up here please fork and contribute there's lots of bugs there's stuff that it could use feature wise it could use some love please contribute questions and answers in the
QA room sit