Examining the Internet's pollution
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 93 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/36262 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
InternetworkingComputer animation
00:29
Crash (computing)InformationInternetworkingNumberInternetworkingIntrusion detection systemType theoryComputer fontSoftwareInformationBitPresentation of a groupQuicksortXMLJSON
02:19
Crash (computing)InternetworkingDenial-of-service attackDenial-of-service attackInternetworkingOpen sourceMultiplication signAddress spaceIP addressComputer wormDependent and independent variablesField (computer science)Service (economics)Thermal radiationComputer animation
03:56
Direct numerical simulationServer (computing)Software bugQuicksortServer (computing)SoftwareVirtual machinePeer-to-peerIP addressOrder (biology)Direct numerical simulationService (economics)Computer animation
04:37
UDP <Protokoll>Crash (computing)Address spaceSocial classComputer wormXML
05:23
Vector spaceMalwareQuicksortOperating systemVirtual machineVulnerability (computing)Service (economics)Operator (mathematics)XML
05:56
SoftwareIP address1 (number)QuicksortState of matterCuboidComputer networkAddress spaceConnected spaceOrder (biology)Data storage deviceVirtual machineRoutingRight angle
07:17
Computer networkObservational studyComputer networkCovering spaceInformation privacyIP addressNumberSoftwareMultiplicationSpacetimeObservational studyComputer animation
08:05
Open sourceUbiquitous computingDenial-of-service attackInternetworkingService (economics)Term (mathematics)InternetworkingIP addressGraph (mathematics)Denial-of-service attackEvent horizonGoodness of fitSampling (statistics)Multiplication signComputer animation
08:58
Canonical commutation relationOpen sourceBlock (periodic table)Denial-of-service attackInternetworkingComputer networkInformation securityMereologySheaf (mathematics)InternetworkingParameter (computer programming)Vulnerability (computing)Different (Kate Ryan album)IP addressCommunications protocolSocial classTerm (mathematics)TwitterPort scannerComputer animationXML
10:15
Parameter (computer programming)Port scannerSoftwareConfiguration spacePort scannerGraph coloringTerm (mathematics)HeuristicJSONXMLComputer animationDiagram
11:11
MereologyPseudozufallszahlenHeuristicIP addressConfiguration spaceInternetworkingSoftware bugRandomizationGroup actionComputer animation
11:52
Group actionConfiguration spaceRandomizationAddress spaceComputer animation
12:22
DemoscenePort scannerExpected valueInternetworkingAddress spaceIP addressUniqueness quantificationEntire functionVideo-CDComputer animation
12:54
IP addressSoftware bugGraph (mathematics)Scaling (geometry)Multiplication signRight angleConfiguration spaceFrequencyNumberHeuristicSoftware testingOnline help
14:09
RippingAuthorizationComputer wormInformationConfiguration spaceTelnetVariety (linguistics)Traffic reportingInternet der DingeXML
15:05
RippingInternetworkingResultantFrequencyMultiplication signIP addressDiagram
15:38
Denial-of-service attackExplosionMultilaterationVulnerability (computing)BitTraffic reportingDenial-of-service attackMultiplication signUniverse (mathematics)Diagram
16:35
Information securityComputer networkDenial-of-service attackServer (computing)WebsiteDirect numerical simulationWeb 2.0Server (computing)Dependent and independent variablesDenial-of-service attackPoint (geometry)Service (economics)XMLComputer animation
17:09
Server (computing)WebsiteDirect numerical simulationDependent and independent variablesWeb pageOpen setDenial-of-service attackDirect numerical simulationDenial-of-service attackIP addressWeb 2.0Server (computing)NumberResolvent formalismQuery languageDependent and independent variablesPublic domainSoftwareClient (computing)Domain nameVirtual machine
18:38
Open setDenial-of-service attackServer (computing)Direct numerical simulationDependent and independent variablesResolvent formalismInternetworkingOpen setNumberSubsetCodeProjective planeMultiplication signQuicksortFraction (mathematics)Diagram
19:51
Denial-of-service attackOpen setOpcodeNumberError messageResolvent formalismDependent and independent variablesProjective planeOpen setDirect numerical simulationServer (computing)Computer animation
20:29
Public domainOpen setImage registrationInformationPublic domainSoftware testingPhase transitionWebsiteImage registrationInformationMathematical analysisType theoryRight angleMultiplication signFrequencyNumberDiagram
21:18
Information securityComputer networkPrice indexScatteringHash functionSubject indexingSoftwareSoftware development kitTable (information)Connected spaceInformationUniform resource locatorGame theoryHoaxComputer animation
22:38
Hash functionLinear programmingTerm (mathematics)WordRight angleComputer animation
23:07
Block (periodic table)Address spaceLinker (computing)Asynchronous Transfer ModeSet (mathematics)Nichtlineares GleichungssystemIP addressPseudozufallszahlenComputer animation
23:37
Temporal logicBlock (periodic table)Open sourceBitMultiplication signIP addressComputer animation
24:11
InformationInformationTable (information)Uniform resource locatorSoftwareHash functionComputer animation
24:47
Content (media)Computer networkClient (computing)Peer-to-peerInformationString (computer science)SoftwareClient (computing)Library (computing)String (computer science)InformationIP addressComputer animation
25:36
Computer networkSample (statistics)IP addressSoftwareHash functionIntrusion detection systemComputer animationXML
26:05
InformationComputer networkAddress spaceDependent and independent variablesSample (statistics)Computer networkInformation securityFinitary relationServer (computing)Denial-of-service attackPrice indexSoftwareEncryptionMultiplicationUniform resource locatorMultiplication signIP addressPhysical systemNumberSoftware bugOpen sourcePeer-to-peerBitResultantConfiguration spaceComputer animation
26:39
Open sourceUDP <Protokoll>Computer wormKeilförmige AnordnungNumberOpen sourceMultiplication signConfiguration spaceShape (magazine)Graph (mathematics)BitTerm (mathematics)1 (number)Computer animation
27:28
IP addressComputer scienceTerm (mathematics)NumberSoftwareOpen sourceMultiplication signCategory of beingComputer animation
28:07
Address spaceMotion captureUDP <Protokoll>Logical constantHill differential equationIntrusion detection systemEmailPell's equationDocument management systemDuality (mathematics)Mach's principleOpen sourceInfinite conjugacy class propertyHeat transfer coefficientSystem callNetwork socketGraphical user interfaceSet (mathematics)Cellular automatonElectronic data interchangeDesign by contractMenu (computing)Content management systemDependent and independent variablesIP addressVirtual machineComputer animation
28:43
UDP <Protokoll>Address spaceVirtual machineSoftware bugOrder (biology)Computer animation
29:21
SoftwareProduct (business)Module (mathematics)Peer-to-peerCausalityModul <Datentyp>DatabaseComponent-based software engineeringGreen's functionDatabaseModule (mathematics)SoftwareInformationComponent-based software engineeringComputer programInformation securityMalwareXML
29:53
Event horizonRevision controlRevision controlSoftware bugMultiplication signEvent horizonMereologyComputer animationDiagram
30:37
Information securityComputer networkFinitary relationServer (computing)Denial-of-service attackPrice indexSoftwareEncryptionOpen sourceUDP <Protokoll>Peer-to-peerRevision controlUniform resource locatorSoftware bugPeer-to-peer2 (number)LengthInformationKey (cryptography)Order (biology)Message passingComputer wormQuicksortBitEntropie <Informationstheorie>EncryptionIP addressCategory of beingGame controllerMathematical analysisComputer animation
32:34
Peer-to-peerDependent and independent variablesRadio-frequency identificationHash functionOpen sourceRippingLevel (video gaming)InternetworkingComputer networkInformation securityIP addressNumberMereologySoftwareVirtual machineInformation securityInternetworkingAdditionEvent horizonThermal radiationComputer animation
33:16
Turing testSpring (hydrology)Multiplication signInternetworkingGraph (mathematics)SoftwareEvent horizonNumberSpring (hydrology)FrequencyComputer animation
33:48
Open sourceMathematicsOpen sourceField (computer science)Router (computing)InternetworkingProgrammschleifeComputational fluid dynamicsComputer animation
34:18
Computer networkDynamic Host Configuration ProtocolMultiplication signIP addressPoint (geometry)SoftwareComputer animation
34:56
Dynamic Host Configuration ProtocolRippingMultiplication signQuicksortIP addressClient (computing)Intrusion detection systemComputer animation
35:24
Uniqueness quantificationClient (computing)Maxima and minimaSound effectMaxima and minimaAutonomous system (mathematics)InternetworkingComputer animation
35:57
SoftwarePresentation of a groupSparse matrixInformation securityProcess (computing)InternetworkingAddress spaceMotion capture
Transcript: English(auto-generated)
00:00
Karen Benson, take it away. Hi everyone, thank you so much for coming today to learn about examining the Internet's pollution. As announced, I'm Karen Benson and I'm really excited
00:21
to be talking here today at my first DEF CON. So to start off, a couple years ago on Reddit, somebody asked the garbage men on there about the illegal, strange and valuable things that they had seen while examining other people's trash.
00:41
And you can go find this thread and read what they found. But the main takeaway is that they found a number of interesting and valuable items. So today I'm gonna talk about the analogous question, but for the Internet. We're going to ask what sort of interesting and valuable information can we find
01:01
looking at some packets and traffic that you may consider the Internet trash. And I feel that I'm pretty qualified to talk to you about this, not because I'm Oscar the Grouch, but because I just defended my PhD, in which I spent the last four years
01:20
looking at this type of traffic. And prior to that, I looked at not so trashy traffic, but writing intrusion detection software. So I've looked at some packets. All right, so quick outline of the talk. Basically, I'm gonna go a little more into depth
01:42
on what this trash is and the various ways that you can collect this. I'll talk about the ways that we collect this and the ways that you could possibly collect this on your own networks. And I'll go into a little bit about the data that I used for the presentation. And then the bulk of it is going to be
02:01
about the interesting and valuable items that you can find in trash. And then there will be a concluded. All right, so what is Internet trash? Or, this is something I made up, so what am I calling this? So basically, I mean any unsolicited packets. So this means you're not going out trying to get people to send packets to you.
02:22
You're just passively capturing everything that comes to you with your own IP addresses. And this has a name other than trash. It's Internet Background Radiation, or IBR. And people have studied this for a long time to look at worms and stuff like that,
02:42
but I'll tell you kind of more of the things that have happened in the past couple years. So, probably the most obvious example of IBR is scanning. When you're searching for hosts that run a service, you're going to send packets to hosts that will respond to you, as well as hosts that are behind firewalls
03:00
and they're not going to respond to you, and possibly to people like me who are just kind of collecting the garbage of the Internet. We also get backscatter packets, which is any packet that's a response to a forged or spoofed packet. And typically, you think of these in denial of service attacks.
03:21
So you have a victim and the attacker, and the attacker doesn't necessarily want everyone to know that they are the one launching the attack, and so they may be able to forge the source address or the from field of the packet, and when they send it to the victim, the victim may have a hard time differentiating
03:41
between forged and non-forged packets, and they may respond, but they're not going to respond to the attacker. Instead, they're going to hopefully respond to us. Next, we have misconfigurations, which is when you just erroneously believe that a machine is hosting a service.
04:00
These can be small-scale, like someone typing an IP address incorrectly, but they can also be pretty large-scale and affect a lot of hosts, and we see this a lot in peer-to-peer networks. Similar to misconfigurations are bugs, and this is when you have some sort of software error that causes the packets to reach
04:20
an unintended destination, such as a byte order bug. So even if you know your DNS server correctly, you may, because of some issue in software, send the packet to an unintended destination. We also get a bunch of spoofed traffic, where for some reason, people are using the wrong address.
04:44
They typically aren't trying to attack me, but we still get some packets like this. And then finally, there's some traffic that we just don't know what it is. This can be TCP SYN packets to non-standard ports,
05:04
or UDP packets where we don't understand what the payload is. One example of this is encrypted packets. They are difficult to understand what the intention of that packet is. So this is kind of a summary of the major classes of IBR. So how can we collect this?
05:23
You've probably heard of honeypots, where you purposely set up machines to be infected with malware. Maybe you run an old operating system, or some sort of vulnerable service. And with this, you can get really in-depth information because you're infected and you understand
05:42
the attack vectors and the consequences of this. But if we don't want to do something so in-depth, we can have some other setups. First example of this is just collecting one-way traffic. So if this is your network,
06:01
and these are the used machines in your network, you announce some BGP prefix, and you probably have some sort of middle box keeping state of the connections, and which ones are bi-directional, and which ones haven't received an acknowledgement yet.
06:21
And if they never receive an acknowledgement, this is probably some sort of unsolicited traffic, so you can store this as your collection of IBR. Similar to this, you can have a gray net where your state is the IP addresses that are used, and then you just know which other ones you can write to storage as they come into your network.
06:44
Another concept related to this is, if all of your addresses are in some small BGP prefix, but you have a much larger one, you can announce the whole prefix that you have, and then based on the destination, decide which ones to route to the destination,
07:02
or write to storage. And then finally, an extreme example of this is a network telescope where you just don't use a BGP prefix that you have, and you record all the traffic that comes in. And in the order that I presented these,
07:20
it becomes easier to scale and implement, and there's normally relatively fewer privacy concerns, but you lack the ability to do really in-depth analysis if you're not responding and people can avoid your IP addresses. For this talk, I am going to use traffic collected
07:41
at a number of network telescopes. So we have multiple large academic network telescopes, and we receive a ton of data from these. We're currently capturing about five terabytes of compressed PCAP per week, and we have traffic going all the way back to 2008,
08:01
so we can do some historical studies with this. And with this data, we see traffic from all over the internet. In terms of the countries, we see all countries except a few islands in the Pacific Ocean, and in terms of IP addresses,
08:21
we are seeing about 5% of the announced IP addresses in BGP, so it's a pretty good sampling. And I'm showing you data from July 2013, but if we look over time, this is, we're almost always seeing data. I didn't extend this graph, but it's just increased a lot recently, too.
08:42
There can also be events such as the Spamhaus attack, which was a really big DNS-based denial of service attack, and with this attack, we see, this event, we were able to see traffic from more hosts.
09:04
So now we get to go to the exciting part of the talk, where we talk about the interesting and valuable things found in the internet's trash. So for this section, I'm gonna go through the major classes of traffic besides spoofed,
09:21
and I'm gonna tell you about the thing that I think is the most exciting for them. So in terms of scanning, I'll talk about some trends and some relationships, the vulnerability announcements. And to collect this data, we used the historical data that we had since 2008, and we just applied
09:43
Bro's parameters for determining if an IP address is a scanner, which is if you send packets to 25 different IP addresses on the same protocol and port within five minutes, Bro would alert that you were being scanned. So this is maybe not the best definition of a scanner,
10:04
because it obviously depends on how many IP addresses you have, and it's definitely not capturing slower scans, but it can give us kind of a first look at the macroscopic scanning that's happening on the internet, or at least of our networks.
10:23
So I broke up the data into what was happening from 2008 and 2012 first, and you can see that the colors correspond to ports, and we see in terms of packets and IP addresses, the purple port is very popular,
10:42
and this is TCP 445, and we see that the first increase is right when the Conficker outbreak occurred, and then we see subsequent increases, often corresponding to new releases of Conficker. But we can't say all of this is necessarily Conficker,
11:03
because there's other scans of this port, though most of it happens to be from Conficker. And so we can come up with some heuristics to determine which packets originate from Conficker. And to do this, we can exploit a bug that Conficker has
11:22
in its pseudo-random number generator. For the most part, when it's randomly scanning the internet to propagate, it has a bug where it only targets IP addresses A dot B dot C dot D, where B is less than 128 and D is less than 128, so it's only really scanning a fourth of the internet.
11:42
And so we used a heuristic based on the birthday problem, which basically says, given a random group of people, what is the probability that two people are going to share a birthday? And often this, it's like surprising,
12:02
it's only like 34 people, and it's pretty, and then it's likely that people share a birthday. So another way of asking this question is how many unique birthdays can we expect to give in n people and 365 birthdays? So turning this into a identifying Conficker,
12:22
if we have IP addresses A dot B dot C dot D that are being scanned, we can look at the individual bytes of the IP address. So if we look at D and we say, how many unique D values can we expect to give in either targeting 128 or 256 targets,
12:41
which are the possible values for D? And you can repeat this for the other bytes, and you can then start to differentiate between randomly scanning a quarter of the internet versus the entire internet in expectation. So if we look at the Conficker outbreak, and the amount of scanning that happened
13:01
around that time period, and this graph is in log scale, we do have some missing data, but we do see an increase right when Conficker was discovered. So what we would expect here is that we wouldn't see any host matching
13:21
our Conficker heuristic. However, when we look at the number of IP addresses meeting the Conficker heuristic, this is what we see. And so for up until about August, we didn't see no IP addresses met this heuristic, and then all of a sudden we started seeing some traffic.
13:43
So this is, and this is well before Conficker was actually discovered. So this is evidence that someone was trying to actually test out their Conficker bug prior to this. And on the first day, the IP addresses
14:01
were all in the same province, and the first couple days, they were all in the same province in China. And so maybe this is helpful. As far as I know, nobody has claimed the Microsoft 250K bounty to collect the Conficker worm author, so perhaps this information could be useful for that.
14:24
So that was before 2012. So if we look at what was happening since 2012, not surprisingly, Conficker is dying out. But the most popular report has been replaced with port 23, which is Telnet.
14:42
And the best explanation I have for this is that people may be trying to scan for Internet of Things. If you have a better idea, let me know. And we can also see some other interesting things happening here. So this spike that is in gray, it was a variety of ports, and it corresponded
15:02
to traffic from the Carna botnet, which was somebody decided to create a botnet, scan the whole internet, and then publish all of the results anonymously. So we see this, and we can verify that that traffic was actually coming from the Carna botnet based on their data.
15:22
So if we look at the IP addresses, we notice some period of time where there's increased activity on a port. So if we look at Heartbleed, right around there, in here you can see in red
15:41
where the Heartbleed vulnerability announcement occurred, and then like a week or so later, we see a lot of increased activity on the pink port, which is TCP 443, which was where Heartbleed likely could be exploited. Similarly, a little bit later,
16:01
we see a lot of traffic, a lot of host scanning TCP port 5000, and so just Google searching TCP port 5000 during that time. Akamai had a report that they were seeing lots of universal plug and play devices
16:20
being used in denial of service attacks, and prior to that report, we see evidence of scanning on that port. So we were potentially seeing activity before it was used in an attack. All right, so that was scanning. Well, hopefully we will release our scanning data set pretty soon,
16:41
but going on to backscatter, I'm gonna talk about an attack that we've been seeing on authoritative DNS servers. So just a reminder, backscatter is a response to a spoofed packet. So let's suppose you have a web server that you want to perform a denial of service attack on.
17:01
You could do the denial of service attack directly on the web server. However, there is also another weak point. All legitimate hosts who want to contact that web server need to find the IP address associated with the name. So they have to do a number of DNS queries.
17:21
So it turns out that you could also perform a denial of service attack on the authoritative name server. So one way that you can do this is with an open resolver. And an open resolver, typically with DNS, you should only resolve domains for machines that you administer.
17:43
So UCSD's domain server should only resolve domain names for clients in UCSD. So it's typically considered bad because otherwise you could use them in DDoS attacks. But so you could use an open resolver
18:02
to pull off this attack on the authoritative name server. In particular, the attacker can spoof a packet, a DNS query, send it to the open resolver, and since the open resolver resolves the data for everyone, it's more than happy to ask the authoritative name server
18:20
and they get a response. And since the original query was spoofed, they do not respond to the attacker, but instead it's likely that they will return, respond to our network telescope, or there's a probability that it will do that. So this is, so we're seeing a lot of traffic recently
18:44
from open resolvers. So this is 2014 data. So prior to pretty much the end of January, 2014, we didn't see pretty much any traffic from open resolvers.
19:01
We saw about 3,000 open resolvers per month. And then starting in February, 2014, we saw 1.5 million open resolvers per month. And we noticed that once this attack sort of took off, we were seeing traffic from the same open resolvers over and over again.
19:20
This is only a small fraction of the open resolvers used on the internet. The open resolver project, which is scanning, active scanning at the same time, saw about 20 times the number of open resolvers that we did. But this is, so this means that this attack
19:40
is only using a subset of the open resolvers. And, but we can also look at some other data that we have from the attack, which is the status code that comes back with your DNS response. So if it's like, okay, everything's happy. But you can also get a number of failures,
20:01
including a serve fail, which indicates that there's a problem with most likely the authoritative name server. And in the month of data, we got serve fail errors from nearly every open resolver that we saw, whereas in the open resolver project scan, they see this error very seldomly.
20:22
So this is evidence that this attack is actually overwhelming authoritative name servers. So one interesting thing is that you see, we see some data on January 29th, and then the attack seems to really take off in the beginning of February.
20:41
And this first day, the domain that was queried was all for body.com, which is a popular website, so this reflects a testing phase here. Since then, there's been lots, the domains seem to be just used
21:00
for a very short period of time. Most of them seem to have bogus registration information. And we're still seeing this, all this analysis was from the first month of activity, and we're still observing this type of attack right now. All right, so that was backscatter.
21:21
Now I'm gonna go on to misconfigurations, which in particular, I'm gonna talk about BitTorrent misconfigurations. So if you want to download a torrent through BitTorrent, you use, you contact the, you typically contact the BitTorrent distributed hash table,
21:41
and they will tell you the location of the torrent or some other BitTorrent node that is closer to the torrent that you want. However, there can be malicious nodes in the hash, in the distributed hash table, and they can lie to you about the location of the torrent. And if this happens repeatedly over and over again,
22:01
it's going to be a lot harder for you to actually find the torrent and get the latest episode of Game of Thrones or whatever you want to watch. So this attack is called an index poisoning attack, where you're purposely inserting fake information into or about what's in the hash table.
22:20
And so what happens after you receive this false information is you try to set up a connection. So when people send BitTorrent packets to the network telescope, we get an idea of what torrents they are trying to download. And so this is some data from July 2012,
22:41
and in terms of the most packets associated with a torrent, and you'll notice that a lot of them happen to have the word China in their name. And a year later, we see about the same thing. So this attack doesn't seem to be going on right now,
23:03
or if it is, it's a lot lower. But we have, but, oh, I'm sorry. And typically, in this China attack, typically, the IP addresses that are asked for the torrents satisfy this equation,
23:24
or this set of IP addresses. Basically, they're in certain slash 13 blocks. And so it seems that they're being generated programmatically with a buggy pseudorandom number generator. And this attack is, sometimes we see a lot of packets
23:41
from it, and then sometimes we don't see any. And currently, we're not seeing very, very many. But more recently, in about a year ago, we saw a huge spike in the amount of bit torrent traffic we see. We're getting traffic from about 250 times
24:02
more IP addresses per hour. And we don't really know everything that's going on. To try to investigate this, we were, so just as a recap, when you want the torrent, you ask someone, the DHT node, the location of the torrents,
24:26
and they come back with the locations, and then they potentially contact our network telescope. So we wanna know who's spreading this false information. So this node, so we can't really learn this by looking at the IBR.
24:40
Instead, we can set up nodes to actually interact with the distributed hash table. So we set up two torrent, two clients, and examined what happened for over two months, and they both contacted our network telescope fairly frequently.
25:00
And so we looked at who was telling our clients to contact the network telescope. And the most popular client string was a libtorrent one, but this only accounted for about 70% of the clients, and it's a pretty popular client string among legitimate hosts as well.
25:21
Most of the IP addresses were in China, but they were in multiple ASs. So this wasn't 2.6 vessel in identifying who is actually sending this false information. But we did notice one really suspicious behavior. So in the hash table, all the nodes have an ID.
25:45
So that means that they think that the nodes in the IP addresses in our network telescope also have IDs. And so the IDs that they request, they all have four as the third byte.
26:02
So that's kinda weird. And typically, when you look at the location, when you receive locations, you receive not just one location, but multiple locations at a time. And this behavior is similar for a lot of other IP addresses that we see. So we're receiving a lot of bit torrent traffic
26:24
as a result of a bug in, or a misconfiguration in a peer-to-peer network. Peer-to-peer networks also caused a lot of traffic as a result of a bug in one of the systems. So if we look at the number of sources
26:41
sending us traffic over time, we notice some interesting things like the configure outbreak, when we started seeing a lot of bit torrent traffic. And then all of a sudden in October 2010, there was all, the shape of the graph definitely changes.
27:01
It's very diurnal, and we weren't really sure what was happening here. And we were able to identify the responsible payload, and certain bytes seemed fixed, and then we could hypothesize about what the other ones were using it for. But we still had no idea what this was, and the popular ports, the most frequently used ports,
27:23
we weren't really sure what those were either. But we did notice that in terms of the sources sending them, they were mostly located, a large number of them were located in China. In fact, we received, in a month's time, traffic from 30% of all BGP announced
27:43
IP addresses in China, so this is huge. Also interestingly, when the USA category for IP addresses belonged to the UCSD computer science department where I went to school.
28:01
So we were able to coordinate with someone who could monitor the traffic going in and out of UCSD's network to basically capture traffic from these IP addresses. We ensured that this traffic wasn't spoofed and was actually happening. So all of the CSC machines basically contacted
28:21
a common IP address, and in response, they got a pretty large packet. And based on this packet, then they sent about 40 more packets to different machines, and they were all encoded in this original big packet. And it wasn't just one packet,
28:43
they were exchanging a lot of packets. And eventually, the UCSD machines would receive a packet like this. And so this packet is from 113.70.40.122, but instead, they would respond to 122.40.70.113
29:01
just immediately after receiving this packet. So, and this packet met the BPF filter that we had used to identify all of this traffic. So this is a byte order bug, and this is why we were receiving a lot of this traffic.
29:20
We identified that this software bug was in KuiHoo 360, and if you look at their license agreement, and this is like the most popular security software in China, and if you look at their license agreement, you see that they will use peer-to-peer technology to update program modules, malware definition databases,
29:42
and components of the software. So basically, we were getting information about when people were updating, getting software updates. We contacted KuiHoo and told them, hey, like, you have this bug, and so then we could see how long it took
30:01
for them to fix it. The traffic had one kind of weird thing, which was like every four to five weeks, there was a large spike, probably related to big update events, but there wasn't a big decrease following one of these.
30:21
Instead, it decreased like about a month later, and this date was about the same time a new version of KuiHoo was available on their website. So we're still getting some traffic, but in general, this bug has been fixed. All right, now onto the last part,
30:41
which is looking at some unknown traffic. So the bug was also an example of unknown traffic, but I'll go through another one. So basically, if you investigate some of these packets a little bit more, you might be able to come
31:01
or identify where they're from. So in the beginning, when I explained the unknown category, I said, here's a packet. Its payload appears to be encrypted. So basically, this one IP address was getting a lot of traffic sent to it,
31:20
and they all seemed to be encrypted based on the entropy of the bytes. But we did a byte-wise analysis of like, what is the first byte, second byte, third byte, and stuff like that. And we found that this byte here always seemed to be somewhat related to the whole length of the packet itself.
31:40
And then I read a bunch of white papers, and found that the Salady botnet, their encryption is such that these four bytes are an RC4 key used to decrypt or encrypt the entire rest of the message. So when we decrypted almost all the packets
32:00
to this one IP address, we found that they all sort of started like this. So this confirmed that this is a Salady commanding control packet. So this is kind of interesting, because you're like, okay, I understand why someone would have a bug, or someone would purposely put false information
32:21
into a bit torrent DHT, or I understand how a byte order bug happens, but this also happens in peer-to-peer botnets as well, and that's why we received that much, we received a lot of traffic. In fact, if we look at how many IP addresses were sending us traffic per month,
32:42
basically to this one IP address, we see about the same number of infections as Symantec was seeing in the early part of this decade. So in conclusion, it's pretty likely that you are transmitting internet background radiation, and if you use network telescopes or other technologies,
33:03
you can find a whole bunch of interesting things. In addition to just looking at these kind of security related events, we can also learn about the networks and machines generating the traffic. For example, you can do outage detection
33:21
with traffic reaching network telescopes. This is a graph from a paper that analyzed events during the Arab Spring, and as you can see, the number of packets coming from Libya went down to zero at certain periods of time, and these corresponded to known times
33:40
that the Libyan government had pulled the plug on their country's internet. We can also look at path changes, so when you send a packet on the internet, there's this TTL field that is decremented by every intermediate router to prevent routing loops,
34:01
but based on this, you can infer how many hops away the source is, so if this changes, then you know that a path change occurred, and this can help you analyze outages and understand routing dynamics. So looking at some of this stuff, we can see if you have traffic like this
34:22
where the TTL is the same value over time, it's probably using the same path, but if it looks more like this, then you know that the path has changed. And then as a final example, we can also look at DHCP lease duration,
34:41
so when you join a network using DHCP, you announce that you want to join the network, and you're given an IP address to use, and typically, at some point in time, you no longer use that IP address, which means at a future time, someone else can use the same IP address. So we can look at DHCP lease durations
35:03
using any traffic that has some sort of ID associated with a client, so if these are the packets you receive over time, you know that the lease duration is at least this long, and at most, this long. So as I noted before,
35:21
BitTorrent has IDs as well, so we can use BitTorrent to identify how long lease durations are for various autonomous systems. So this autonomous system, almost everything, has a minimum lease duration of less than seven days, and this is really useful for understanding
35:42
the effectiveness of blacklisting, or how, if people are going to not be able to access the internet, because you have blacklisted their IP. So hopefully you enjoyed the talk today, where we discussed some of the crazy things
36:03
that happen on the internet, and thank you.
36:22
Hi, very fascinating research, and a great presentation, thank you. Looking toward the future, I noticed this was all IPv4, have you done any consideration of IPv6 based telescopes, and you think it's practical with the sparseness of prefixes and v6 addresses? So I haven't, but some people wrote a research paper
36:42
where they used an IPv6, they basically were able to announce a covering prefix, and basically capture everything that wasn't, other people weren't announcing in BGP, and they didn't find as much, but I think as IPv6 evolves, I think also this will evolve as well.
37:01
Thank you. So thank you, that's very convincing that this is incredibly useful data, how can other security researchers get access to it? So I know that the data that UCSD has, that it is available to academic researchers,
37:23
you might need to sign a bunch of things, but I don't know the whole process, but I mean you can start with, if you have your own network too.
37:43
Is there a question over there? Thank you.