We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading

00:00

Formale Metadaten

Titel
Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading
Alternativer Titel
Asura PCAP File Analyzer for Anomaly Packets Detection
Serientitel
Anzahl der Teile
322
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Recently, the inspection of huge traffic log is imposing a great burden on security analysts. Unfortunately, there have been few research efforts focusing on scalablility in analyzing very large PCAP file with reasonable computing resources. Asura is a portable and scalable PCAP file analyzer for detecting anomaly packets using massive multithreading. Asura's parallel packet dump inspection is based on task-based decomposition and therefore can handle massive threads for large PCAP file without considering tidy parameter selection in adopting data decomposition. Asura is designed to scale out in processing large PCAP file by taking as many threads as possible. Asura takes two steps. First, Asura extracts feature vector represented by associative containers of sourceIP, destIP pair. By doing this, the feature vector can be drastically small compared with the size of original PCAP files. In other words, Asura can reduce packet dump data into the size of unique sourceIP, destIP pairs (for example, in experiment, Asura's output which is reduced in first step is about 2% compared with the size of original libpcap files). Second, a parallel clustering algorithm is applied for the feature vector r. In second step, Asura adopts an enhanced Kmeans algorithm. Concretely, two functions of Kmeans which are (1)calculating distance and (2)relabeling points are improved for parallel processing. In experiment, in processing public PCAP datasets, Asura can identified 750 packets which are labeled as malicious from among 70 million (about 18GB) normal packets. In a nutshell, Asura successfully found 750 malicious packets in about 18GB packet dump. For Asura to inspect 70 million packets, it took reasonable computing time of around 350-450 minutes with 1000-5000 multithreading by running commodity workstation. Asura will be released under MIT license and available at author's GitHub site on the first day of DEF CON 26.
NeuroinformatikCantor-DiskontinuumElektronischer FingerabdruckLokales MinimumKartesische KoordinatenCoxeter-Gruppe
NeuroinformatikCantor-DiskontinuumElektronischer FingerabdruckCAN-BusFontOrdnungsreduktionThreadElektronischer ProgrammführerDiskrete-Elemente-MethodeNatürliche ZahlHelmholtz-ZerlegungTaskDomain-NameCybersexInternetworkingBitrateICC-GruppeSimulationVirtuelle MaschineMaschinelles LernenWechselseitiger AusschlussSchnittmengeTrennschärfe <Statistik>CodeMereologieProgrammierungAnalysisKatastrophentheorieStrömungsrichtungThreadDemo <Programm>BruchrechnungMultiplikationsoperatorCASE <Informatik>SynchronisierungComputersicherheitBitrateInternetworkingHackerRechter WinkelCompilerLeistung <Physik>ResultanteMechanismus-Design-TheorieExpertensystemKartesische KoordinatenSkalenniveauCoxeter-GruppeServerRuhmasseDatensatzGerichteter GraphSystemaufrufBitOpen SourceAlgorithmische LerntheorieKategorie <Mathematik>Data MiningTabelleProzessautomationFontLaufzeitfehlerSelbst organisierendes SystemRichtungMehrkernprozessorParallele SchnittstellePaarvergleichRohdatenVersionsverwaltungSpeicherabzugGarbentheorieRechenschieberComputeranimation
ProgrammierungThreadModelltheorieParallele SchnittstelleParserGeradeCodeProzess <Informatik>GraphikprozessorParallelrechnerLokales MinimumNotepad-ComputerHelmholtz-ZerlegungOrdnungsreduktionTaskEindeutigkeitSpeicherabzugElektronische PublikationProjektive EbeneTaskNotebook-ComputerResultanteParallelrechnerDatenfeldGemeinsamer SpeicherNeuroinformatikWorkstation <Musikinstrument>ProgrammierungAbstraktionsebeneHelmholtz-ZerlegungMAPBruchrechnungOrdnungsreduktionProdukt <Mathematik>GruppenoperationMusterspracheReelle ZahlExtreme programmingSchlüsselverwaltungTypentheorieAssemblerCodeRechter WinkelDämpfungGeradeGamecontrollerSystemprogrammSchedulingProzess <Informatik>Cluster <Rechnernetz>Streaming <Kommunikationstechnik>SkalarfeldTemplateHackerPhysikalisches SystemMalwareThreadEinfache GenauigkeitAusnahmebehandlungProgrammierstilRohdatenBinärcodeComputeranimation
DatenstrukturStrebeWechselseitiger AusschlussZeichenketteE-MailWechselseitige InformationSpeicherabzugEindeutigkeitDienst <Informatik>DatentypGerichteter GraphZahlenbereichFolge <Mathematik>KreisringOrdnungsreduktionGerichteter GraphRuhmasseDatenstrukturArithmetisches MittelSelbstrepräsentationGarbentheorieBitTrennschärfe <Statistik>Quellcode
Ein-AusgabeElement <Gruppentheorie>AuswahlaxiomThreadStreaming <Kommunikationstechnik>ParallelrechnerLokales MinimumHackerTermSTLMusterspracheSyntaktische AnalyseRegulärer GraphKontrollstrukturProgrammbibliothekTaskIntelWissenschaftliches RechnenTabelleProgrammierstilRegulärer GraphKonfiguration <Informatik>NeuroinformatikPortscannerMultiplikationsoperatorSystemaufrufProgrammbibliothekFehlermeldungTemplateMixed RealityMAPZweiSuite <Programmpaket>PunktHash-AlgorithmusCASE <Informatik>Automatische HandlungsplanungCodeRechenschieberLesen <Datenverarbeitung>RuhmasseProgrammierungComputerarchitekturXMLComputeranimation
OrdnungsreduktionWechselseitiger AusschlussDatenstrukturStrebeZeichenketteE-MailSpeicherabzugEindeutigkeitProtokoll <Datenverarbeitungssystem>Gerichteter GraphKonfiguration <Informatik>ZahlenbereichFolge <Mathematik>TaskHelmholtz-ZerlegungProzess <Informatik>ThreadStochastische AbhängigkeitReelle ZahlSinguläres IntegralCodeParallelrechnerVersionsverwaltungTeilbarkeitSyntaktische AnalyseZeiger <Informatik>SchedulingGasströmungWarteschlangeMetropolitan area networkRechter WinkelTaskSchedulingMultiplikationsoperatorNeuroinformatikCASE <Informatik>Vorzeichen <Mathematik>Gemeinsamer SpeicherElektronische PublikationLastteilungDynamisches SystemThreadHelmholtz-ZerlegungImplementierungComputeranimation
ThreadTransitionssystemKernel <Informatik>Elektronische PublikationProzess <Informatik>Demo <Programm>MultiplikationsoperatorSchlussregelNeuroinformatikMetropolitan area networkTUNIS <Programm>Eigentliche AbbildungImpulsFaktor <Algebra>Demo <Programm>EinflussgrößeGrenzschichtablösungSkalierbarkeitKontextbezogenes SystemFolge <Mathematik>Gemeinsamer SpeicherDatenmissbrauchKernel <Informatik>ParallelrechnerProgrammierungBinärcodeThreadResultanteSpeicherabzugKonfigurationsraumComputeranimation
Nichtlineares ZuordnungsproblemLokales MinimumBildschirmfensterThreadAnalog-Digital-UmsetzerLipschitz-StetigkeitTheoremZellularer AutomatDemo <Programm>KonfigurationsraumOrdnungsreduktionZahlenbereichDemoszene <Programmierung>DimensionsanalyseCluster <Rechnernetz>ThreadBinärcodeGebäude <Mathematik>ComputeranimationXML
SurjektivitätNormierter RaumDemo <Programm>HilfesystemVorzeichen <Mathematik>ProgrammierungDemo <Programm>Algorithmische LerntheorieProzess <Informatik>SchnittmengeComputeranimation
Formale SpracheLokales MinimumRechenwerkProzess <Informatik>VIC 20Lokales MinimumBitInhalt <Mathematik>Treiber <Programm>ImplementierungNeuroinformatikMAPGamecontrollerStreaming <Kommunikationstechnik>Gewicht <Ausgleichsrechnung>SchnittmengeKonfigurationsraumReelle ZahlAssemblerCodeMobiles InternetSchedulingKartesische KoordinatenFrequenzResultanteThreadBinärcodeRohdatenProzess <Informatik>ParallelrechnerMalwareComputeranimation
Transkript: Englisch(automatisch erzeugt)
We'll start with Ruel Ando. Hi everyone. I'm really excited to talk here. So thank you for listening to me. And in this presentation, I'm going to talk about real applications using
multi-threading for analyzing a huge pickup file. This is a tool which takes full advantage of multi-core processor and achieve high performance improvement. Actually, is there anyone
in this room that thinks that Wireshark is a little bit slow? No? The design goal of this tool is kind of multi-threaded Wireshark with automated detection. So I guess multi-threading
can be one of the new frontier for packet inspection. Okay. Sorry. My name is Ruel Ando.
I'm working in a governmental organization, so I'm not weird. So my talk is divided into four parts. At first, I would like to talk about the current catastrophic situation of traffic analysis. The funny thing here is that
we have too many packets to be inspected. However, for the program to have the solution, we have more packets. This is kind of a very helpful situation. I'll tell you about it later.
And the second one is the main part. When you build a tool for analyzing a huge pickup file using massive threads, you have some sections on how to convert code into a concurrent
version. The selection of features and containers and synchronization mechanisms such as mutex, rock-free, and something like that. And the third part is a demo and experimental result.
Simply stated, speed up is a ratio of parallel execution time to serial execution time. So I'll show you the comparison. Then let me conclude this talk. This slide
shows the catastrophic situation. As everyone in the audience already knows, internet traffic
is increasing at an exponential rate. However, there are two huge professionals. Sorry. I'm so excited. What's going on? Thank you.
Huge traffic imposes a great burden on security researchers and analysts. But the traffic explosion is not similar to hacking or exploiting. Because hacking and exploiting
is impulsive and will be finished within several minutes. But this is not the case of traffic explosion. Unfortunately, traffic explosion keeps exploding like an accident of a nuclear power plant. So in my case, in my laboratory, I have 200 to 300 raw files
to be stored in the server to inspect it. Well, during this 20 minute presentation,
three to five gigabyte file is stored to be inspected. This is a really helpful session for me. So automation is really important for me and everyone in the audience.
But from my experience, open source data mining tool doesn't work in many cases. Because in the world of automatization and marketing, commercial tool is not going to
find people trying to hide his activities. And to make things more worse, open source data mining tool simply ignores people trying to hide and assume behavior is a part of everyone
else. So I would like to emphasize that packet dump is the last result. PTA file is there and how to find source to be trusted. And machine learning, at least one million
versus one trillion, machine learning has table property. I tell you what, if machine learning
doesn't work on the data set comprising one million trying data set, what is needed? What is needed is much more bigger set. This is unexpected because
as machine learning failed to fail in the data set comprising one million trying data set, the intuitive conclusion is that it doesn't work at all. But according to this paper, all we need is much more bigger packets. So the situation is very curious, isn't it?
So Astra has four features. At first, Astra should be run on commodity workstations and laptops.
It can run with reasonable computing resources. Because GPU and crafting systems such as Spark is still expensive and high cost and sometimes spouty. And more importantly, Astra uses project B thread which is really old programming style.
Writing a program and choosing appropriate level of abstraction is really important. Usually, hardly anyone misses the old programming method except hackers.
What we are copying here is a real world packet stream which is huge, not nice, not organized in regular pattern, and unfortunately unpredictable. So flexibility
is important. You use assembly language for analyzing malware binaries. So raw threads and MPI expose the control of parallel computing at the lowest level. But
at the lowest level, we don't have libraries, containers, and schedules. So you have to
implement these utilities in full scratch by yourselves, like the era of 1980s or 1990s. I guess this field can be one of the new frontier for packet inspection.
As a result, Astra is compact but powerful. Actually, Astra has about 2,000 lines of code, but can process about 75 million packets in 200 to 400 minutes.
These two are intuitively simple. Astra takes two steps. Reaction using task decomposition and clustering using data decomposition. As you know, reaction takes a collection of data
and reduces it to a single scalar value. Clustering is a task of grouping data in the same group in such a way that data in the same group is more similar than those in other groups. The important thing here is reduction passes container
to clustering. The container is a really important key in this two-stage processing. The container is a class template of C++
and the future selection is almost an anomaly detection of packets based on futures. There are many research efforts and futures could be many,
but the important thing here is to find a proper representation for reducing massive pickup file. We use this representation in the middle of this slide,
key value, and we use two structures. This is a little bit complicated. Please see the source code in detail. Let me talk more about containers. Containers is a really important
point for mass reading. You have three options. First one is STL. STL is an old basic
and regular programming style, but STL is not concurrent friendly, so it is a standard practice to wrap up a lock around STL to make them safe for concurrent access.
The second one is Intel TVB. It's an excellent library, but Intel TVB provides
a highly concurrent container, but a highly concurrent container is sometimes high cost. It takes a lot of time. I guess this one is mainly for scientific computation, so unfortunately what everyone here is doing is not like science computation. The data is not
well organized and predicted, so in this case TVB is not suitable, I guess. And the third one is the emerging technology of thrust. Thrust is a TC++ template library
for GPU. By using thrust, you can write a code to have a whole reduced scan and something like that accelerated by GPU. But unfortunately, as far as I know,
there's no plan to implement the hash table map and associated container in GPU, so I guess it'll be time to be common for packet inspection. I guess this could be
a future work. And this slide is the main architecture of ASR. If you have a case where the computation time on individual PQR files is variable and unpredictable,
you'd be better served by task decomposition. Specifically, if you have a case, the amount of computation time will vary. Dynamic scheduler will be best.
And here, as with dynamic scheduler of task decomposition, load balancing is important to take into consideration. You have to implement scheduler
by yourself. Please see the upper side of this slide. This is a shared container, which is cute. The dynamic scheduler involves setting up the shared container,
which holds data and all threads to pull out tasks when the previous task is completed. So you should protect shared container so that the thread can be assigned correctly
and the task should not rust through some corruption of shared container.
Let me explain the result. To put it simply, the speedup here is the ratio of parallel computing time to sequential computing time. And scalability. So what is scalability?
I think scalability is a measure of how much speedup the program gets as you add more and more core and threads. I guess this kernel tuning is proper.
I don't know exactly, but with proper tuning of Linux kernel, ASR can have 75 million packets with 500 threads in about 287 minutes.
There are some rules to be improved because the size of shared container is not proper.
A lock intention and context switching occur too much. But I guess it's reasonable that it can process more than 7 million packets in several hours. I would like to skip
the attack detected in detail because of some issues of public data set. So instead, let me show you a demo. First of all, binary is compiled according to the configuration
of number of threads. And reduction step one. Sorry, this demo is too fast, so
I cannot see. Reduction step two. Building binaries for clustering. Clustering has five to seven dimensions. And the data is truncated for this demo, which is
too short. Do you know what's going on? Sorry, I don't know what's going on. Because
machine learning is too fast. The machine learning relies on so huge data set and processing speed is so fast. So machine learning might stop the program. We cannot expect it to solve.
You know, aside from that, we don't fully understand. So the demo is too fast, so I cannot talk about it. So let me conclude this talk. I will talk about a little bit weird
application, which is called Astra, using multi-threading. For copying with real-world pick-up stream, which is huge, not nice and sometimes able,
flexibility is needed. Just like you should use assembly language for malware binaries. But using raw thread and MPI takes advantage of retrieving performance of magical processors.
And p-thread can expose the control of parallel computing at the lowest level. But unfortunately or not, we should implement
everything, libraries, content, schedules, which is really exciting for me. And as a result, they offer maximum flexibility. So as a result, Astra is compact,
but powerful. Astra has thousands of code and can process more than 70 million packets in 200 to five minutes. For future work, Astra must be speeded up,
because there's room to improve the size of containers and applying TVB and GPU. I really recommend this multi-threading, applying multi-threading for packet inspection.
It's really exciting. This can be one of the new frontiers for packet inspection. So thank you everyone. That's all. Thank you for listening.