Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Alternativer Titel |
| |
Serientitel | ||
Anzahl der Teile | 322 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/39733 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
NeuroinformatikCantor-DiskontinuumElektronischer FingerabdruckLokales MinimumKartesische KoordinatenCoxeter-Gruppe
00:23
NeuroinformatikCantor-DiskontinuumElektronischer FingerabdruckCAN-BusFontOrdnungsreduktionThreadElektronischer ProgrammführerDiskrete-Elemente-MethodeNatürliche ZahlHelmholtz-ZerlegungTaskDomain-NameCybersexInternetworkingBitrateICC-GruppeSimulationVirtuelle MaschineMaschinelles LernenWechselseitiger AusschlussSchnittmengeTrennschärfe <Statistik>CodeMereologieProgrammierungAnalysisKatastrophentheorieStrömungsrichtungThreadDemo <Programm>BruchrechnungMultiplikationsoperatorCASE <Informatik>SynchronisierungComputersicherheitBitrateInternetworkingHackerRechter WinkelCompilerLeistung <Physik>ResultanteMechanismus-Design-TheorieExpertensystemKartesische KoordinatenSkalenniveauCoxeter-GruppeServerRuhmasseDatensatzGerichteter GraphSystemaufrufBitOpen SourceAlgorithmische LerntheorieKategorie <Mathematik>Data MiningTabelleProzessautomationFontLaufzeitfehlerSelbst organisierendes SystemRichtungMehrkernprozessorParallele SchnittstellePaarvergleichRohdatenVersionsverwaltungSpeicherabzugGarbentheorieRechenschieberComputeranimation
07:28
ProgrammierungThreadModelltheorieParallele SchnittstelleParserGeradeCodeProzess <Informatik>GraphikprozessorParallelrechnerLokales MinimumNotepad-ComputerHelmholtz-ZerlegungOrdnungsreduktionTaskEindeutigkeitSpeicherabzugElektronische PublikationProjektive EbeneTaskNotebook-ComputerResultanteParallelrechnerDatenfeldGemeinsamer SpeicherNeuroinformatikWorkstation <Musikinstrument>ProgrammierungAbstraktionsebeneHelmholtz-ZerlegungMAPBruchrechnungOrdnungsreduktionProdukt <Mathematik>GruppenoperationMusterspracheReelle ZahlExtreme programmingSchlüsselverwaltungTypentheorieAssemblerCodeRechter WinkelDämpfungGeradeGamecontrollerSystemprogrammSchedulingProzess <Informatik>Cluster <Rechnernetz>Streaming <Kommunikationstechnik>SkalarfeldTemplateHackerPhysikalisches SystemMalwareThreadEinfache GenauigkeitAusnahmebehandlungProgrammierstilRohdatenBinärcodeComputeranimation
10:58
DatenstrukturStrebeWechselseitiger AusschlussZeichenketteE-MailWechselseitige InformationSpeicherabzugEindeutigkeitDienst <Informatik>DatentypGerichteter GraphZahlenbereichFolge <Mathematik>KreisringOrdnungsreduktionGerichteter GraphRuhmasseDatenstrukturArithmetisches MittelSelbstrepräsentationGarbentheorieBitTrennschärfe <Statistik>Quellcode
11:58
Ein-AusgabeElement <Gruppentheorie>AuswahlaxiomThreadStreaming <Kommunikationstechnik>ParallelrechnerLokales MinimumHackerTermSTLMusterspracheSyntaktische AnalyseRegulärer GraphKontrollstrukturProgrammbibliothekTaskIntelWissenschaftliches RechnenTabelleProgrammierstilRegulärer GraphKonfiguration <Informatik>NeuroinformatikPortscannerMultiplikationsoperatorSystemaufrufProgrammbibliothekFehlermeldungTemplateMixed RealityMAPZweiSuite <Programmpaket>PunktHash-AlgorithmusCASE <Informatik>Automatische HandlungsplanungCodeRechenschieberLesen <Datenverarbeitung>RuhmasseProgrammierungComputerarchitekturXMLComputeranimation
14:25
OrdnungsreduktionWechselseitiger AusschlussDatenstrukturStrebeZeichenketteE-MailSpeicherabzugEindeutigkeitProtokoll <Datenverarbeitungssystem>Gerichteter GraphKonfiguration <Informatik>ZahlenbereichFolge <Mathematik>TaskHelmholtz-ZerlegungProzess <Informatik>ThreadStochastische AbhängigkeitReelle ZahlSinguläres IntegralCodeParallelrechnerVersionsverwaltungTeilbarkeitSyntaktische AnalyseZeiger <Informatik>SchedulingGasströmungWarteschlangeMetropolitan area networkRechter WinkelTaskSchedulingMultiplikationsoperatorNeuroinformatikCASE <Informatik>Vorzeichen <Mathematik>Gemeinsamer SpeicherElektronische PublikationLastteilungDynamisches SystemThreadHelmholtz-ZerlegungImplementierungComputeranimation
16:17
ThreadTransitionssystemKernel <Informatik>Elektronische PublikationProzess <Informatik>Demo <Programm>MultiplikationsoperatorSchlussregelNeuroinformatikMetropolitan area networkTUNIS <Programm>Eigentliche AbbildungImpulsFaktor <Algebra>Demo <Programm>EinflussgrößeGrenzschichtablösungSkalierbarkeitKontextbezogenes SystemFolge <Mathematik>Gemeinsamer SpeicherDatenmissbrauchKernel <Informatik>ParallelrechnerProgrammierungBinärcodeThreadResultanteSpeicherabzugKonfigurationsraumComputeranimation
18:17
Nichtlineares ZuordnungsproblemLokales MinimumBildschirmfensterThreadAnalog-Digital-UmsetzerLipschitz-StetigkeitTheoremZellularer AutomatDemo <Programm>KonfigurationsraumOrdnungsreduktionZahlenbereichDemoszene <Programmierung>DimensionsanalyseCluster <Rechnernetz>ThreadBinärcodeGebäude <Mathematik>ComputeranimationXML
19:23
SurjektivitätNormierter RaumDemo <Programm>HilfesystemVorzeichen <Mathematik>ProgrammierungDemo <Programm>Algorithmische LerntheorieProzess <Informatik>SchnittmengeComputeranimation
19:55
Formale SpracheLokales MinimumRechenwerkProzess <Informatik>VIC 20Lokales MinimumBitInhalt <Mathematik>Treiber <Programm>ImplementierungNeuroinformatikMAPGamecontrollerStreaming <Kommunikationstechnik>Gewicht <Ausgleichsrechnung>SchnittmengeKonfigurationsraumReelle ZahlAssemblerCodeMobiles InternetSchedulingKartesische KoordinatenFrequenzResultanteThreadBinärcodeRohdatenProzess <Informatik>ParallelrechnerMalwareComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:00
We'll start with Ruel Ando. Hi everyone. I'm really excited to talk here. So thank you for listening to me. And in this presentation, I'm going to talk about real applications using
00:28
multi-threading for analyzing a huge pickup file. This is a tool which takes full advantage of multi-core processor and achieve high performance improvement. Actually, is there anyone
00:50
in this room that thinks that Wireshark is a little bit slow? No? The design goal of this tool is kind of multi-threaded Wireshark with automated detection. So I guess multi-threading
01:11
can be one of the new frontier for packet inspection. Okay. Sorry. My name is Ruel Ando.
01:22
I'm working in a governmental organization, so I'm not weird. So my talk is divided into four parts. At first, I would like to talk about the current catastrophic situation of traffic analysis. The funny thing here is that
01:47
we have too many packets to be inspected. However, for the program to have the solution, we have more packets. This is kind of a very helpful situation. I'll tell you about it later.
02:06
And the second one is the main part. When you build a tool for analyzing a huge pickup file using massive threads, you have some sections on how to convert code into a concurrent
02:25
version. The selection of features and containers and synchronization mechanisms such as mutex, rock-free, and something like that. And the third part is a demo and experimental result.
02:48
Simply stated, speed up is a ratio of parallel execution time to serial execution time. So I'll show you the comparison. Then let me conclude this talk. This slide
03:09
shows the catastrophic situation. As everyone in the audience already knows, internet traffic
03:20
is increasing at an exponential rate. However, there are two huge professionals. Sorry. I'm so excited. What's going on? Thank you.
04:02
Huge traffic imposes a great burden on security researchers and analysts. But the traffic explosion is not similar to hacking or exploiting. Because hacking and exploiting
04:21
is impulsive and will be finished within several minutes. But this is not the case of traffic explosion. Unfortunately, traffic explosion keeps exploding like an accident of a nuclear power plant. So in my case, in my laboratory, I have 200 to 300 raw files
04:56
to be stored in the server to inspect it. Well, during this 20 minute presentation,
05:07
three to five gigabyte file is stored to be inspected. This is a really helpful session for me. So automation is really important for me and everyone in the audience.
05:26
But from my experience, open source data mining tool doesn't work in many cases. Because in the world of automatization and marketing, commercial tool is not going to
05:46
find people trying to hide his activities. And to make things more worse, open source data mining tool simply ignores people trying to hide and assume behavior is a part of everyone
06:07
else. So I would like to emphasize that packet dump is the last result. PTA file is there and how to find source to be trusted. And machine learning, at least one million
06:32
versus one trillion, machine learning has table property. I tell you what, if machine learning
06:43
doesn't work on the data set comprising one million trying data set, what is needed? What is needed is much more bigger set. This is unexpected because
07:02
as machine learning failed to fail in the data set comprising one million trying data set, the intuitive conclusion is that it doesn't work at all. But according to this paper, all we need is much more bigger packets. So the situation is very curious, isn't it?
07:30
So Astra has four features. At first, Astra should be run on commodity workstations and laptops.
07:42
It can run with reasonable computing resources. Because GPU and crafting systems such as Spark is still expensive and high cost and sometimes spouty. And more importantly, Astra uses project B thread which is really old programming style.
08:11
Writing a program and choosing appropriate level of abstraction is really important. Usually, hardly anyone misses the old programming method except hackers.
08:27
What we are copying here is a real world packet stream which is huge, not nice, not organized in regular pattern, and unfortunately unpredictable. So flexibility
08:46
is important. You use assembly language for analyzing malware binaries. So raw threads and MPI expose the control of parallel computing at the lowest level. But
09:09
at the lowest level, we don't have libraries, containers, and schedules. So you have to
09:21
implement these utilities in full scratch by yourselves, like the era of 1980s or 1990s. I guess this field can be one of the new frontier for packet inspection.
09:40
As a result, Astra is compact but powerful. Actually, Astra has about 2,000 lines of code, but can process about 75 million packets in 200 to 400 minutes.
10:07
These two are intuitively simple. Astra takes two steps. Reaction using task decomposition and clustering using data decomposition. As you know, reaction takes a collection of data
10:25
and reduces it to a single scalar value. Clustering is a task of grouping data in the same group in such a way that data in the same group is more similar than those in other groups. The important thing here is reduction passes container
10:50
to clustering. The container is a really important key in this two-stage processing. The container is a class template of C++
11:07
and the future selection is almost an anomaly detection of packets based on futures. There are many research efforts and futures could be many,
11:26
but the important thing here is to find a proper representation for reducing massive pickup file. We use this representation in the middle of this slide,
11:44
key value, and we use two structures. This is a little bit complicated. Please see the source code in detail. Let me talk more about containers. Containers is a really important
12:14
point for mass reading. You have three options. First one is STL. STL is an old basic
12:29
and regular programming style, but STL is not concurrent friendly, so it is a standard practice to wrap up a lock around STL to make them safe for concurrent access.
12:51
The second one is Intel TVB. It's an excellent library, but Intel TVB provides
13:03
a highly concurrent container, but a highly concurrent container is sometimes high cost. It takes a lot of time. I guess this one is mainly for scientific computation, so unfortunately what everyone here is doing is not like science computation. The data is not
13:28
well organized and predicted, so in this case TVB is not suitable, I guess. And the third one is the emerging technology of thrust. Thrust is a TC++ template library
13:45
for GPU. By using thrust, you can write a code to have a whole reduced scan and something like that accelerated by GPU. But unfortunately, as far as I know,
14:04
there's no plan to implement the hash table map and associated container in GPU, so I guess it'll be time to be common for packet inspection. I guess this could be
14:22
a future work. And this slide is the main architecture of ASR. If you have a case where the computation time on individual PQR files is variable and unpredictable,
14:49
you'd be better served by task decomposition. Specifically, if you have a case, the amount of computation time will vary. Dynamic scheduler will be best.
15:05
And here, as with dynamic scheduler of task decomposition, load balancing is important to take into consideration. You have to implement scheduler
15:22
by yourself. Please see the upper side of this slide. This is a shared container, which is cute. The dynamic scheduler involves setting up the shared container,
15:41
which holds data and all threads to pull out tasks when the previous task is completed. So you should protect shared container so that the thread can be assigned correctly
16:06
and the task should not rust through some corruption of shared container.
16:21
Let me explain the result. To put it simply, the speedup here is the ratio of parallel computing time to sequential computing time. And scalability. So what is scalability?
16:45
I think scalability is a measure of how much speedup the program gets as you add more and more core and threads. I guess this kernel tuning is proper.
17:13
I don't know exactly, but with proper tuning of Linux kernel, ASR can have 75 million packets with 500 threads in about 287 minutes.
17:33
There are some rules to be improved because the size of shared container is not proper.
17:44
A lock intention and context switching occur too much. But I guess it's reasonable that it can process more than 7 million packets in several hours. I would like to skip
18:07
the attack detected in detail because of some issues of public data set. So instead, let me show you a demo. First of all, binary is compiled according to the configuration
18:35
of number of threads. And reduction step one. Sorry, this demo is too fast, so
18:44
I cannot see. Reduction step two. Building binaries for clustering. Clustering has five to seven dimensions. And the data is truncated for this demo, which is
19:06
too short. Do you know what's going on? Sorry, I don't know what's going on. Because
19:25
machine learning is too fast. The machine learning relies on so huge data set and processing speed is so fast. So machine learning might stop the program. We cannot expect it to solve.
19:45
You know, aside from that, we don't fully understand. So the demo is too fast, so I cannot talk about it. So let me conclude this talk. I will talk about a little bit weird
20:08
application, which is called Astra, using multi-threading. For copying with real-world pick-up stream, which is huge, not nice and sometimes able,
20:25
flexibility is needed. Just like you should use assembly language for malware binaries. But using raw thread and MPI takes advantage of retrieving performance of magical processors.
20:50
And p-thread can expose the control of parallel computing at the lowest level. But unfortunately or not, we should implement
21:05
everything, libraries, content, schedules, which is really exciting for me. And as a result, they offer maximum flexibility. So as a result, Astra is compact,
21:22
but powerful. Astra has thousands of code and can process more than 70 million packets in 200 to five minutes. For future work, Astra must be speeded up,
21:47
because there's room to improve the size of containers and applying TVB and GPU. I really recommend this multi-threading, applying multi-threading for packet inspection.
22:10
It's really exciting. This can be one of the new frontiers for packet inspection. So thank you everyone. That's all. Thank you for listening.