Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading

Video thumbnail (Frame 0) Video thumbnail (Frame 581) Video thumbnail (Frame 11202) Video thumbnail (Frame 16452) Video thumbnail (Frame 17962) Video thumbnail (Frame 21621) Video thumbnail (Frame 24423) Video thumbnail (Frame 27429) Video thumbnail (Frame 29081) Video thumbnail (Frame 29881)
Video in TIB AV-Portal: Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading

Formal Metadata

Title
Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading
Alternative Title
Asura PCAP File Analyzer for Anomaly Packets Detection
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Recently, the inspection of huge traffic log is imposing a great burden on security analysts. Unfortunately, there have been few research efforts focusing on scalablility in analyzing very large PCAP file with reasonable computing resources. Asura is a portable and scalable PCAP file analyzer for detecting anomaly packets using massive multithreading. Asura's parallel packet dump inspection is based on task-based decomposition and therefore can handle massive threads for large PCAP file without considering tidy parameter selection in adopting data decomposition. Asura is designed to scale out in processing large PCAP file by taking as many threads as possible. Asura takes two steps. First, Asura extracts feature vector represented by associative containers of sourceIP, destIP pair. By doing this, the feature vector can be drastically small compared with the size of original PCAP files. In other words, Asura can reduce packet dump data into the size of unique sourceIP, destIP pairs (for example, in experiment, Asura's output which is reduced in first step is about 2% compared with the size of original libpcap files). Second, a parallel clustering algorithm is applied for the feature vector r. In second step, Asura adopts an enhanced Kmeans algorithm. Concretely, two functions of Kmeans which are (1)calculating distance and (2)relabeling points are improved for parallel processing. In experiment, in processing public PCAP datasets, Asura can identified 750 packets which are labeled as malicious from among 70 million (about 18GB) normal packets. In a nutshell, Asura successfully found 750 malicious packets in about 18GB packet dump. For Asura to inspect 70 million packets, it took reasonable computing time of around 350-450 minutes with 1000-5000 multithreading by running commodity workstation. Asura will be released under MIT license and available at author's GitHub site on the first day of DEF CON 26.
Presentation of a group Cantor set Maxima and minima Neuroinformatik Fingerprint
Presentation of a group Group action Scheduling (computing) Thread (computing) Model theory Workstation <Musikinstrument> Parsing Bit rate Parallel port Computer programming Neuroinformatik Fraction (mathematics) Personal digital assistant Automation Damping Information security Machine learning Computer font Shared memory Bit Virtual machine Category of being Helmholtz decomposition Pattern language Simulation Directed graph Computer file Open source Maxima and minima Mass Parallel computing Product (business) Wechselseitiger Ausschluss Hacker (term) Helmholtz decomposition Cantor set Energy level Selectivity (electronic) Demo (music) Key (cryptography) Assembly language Electronic program guide Expert system Code Core dump Tableau Extreme programming Line (geometry) Cartesian coordinate system System call Compiler Personal digital assistant Infinite conjugacy class property Fingerprint Run time (program lifecycle phase) Code Direction (geometry) Multiplication sign Set (mathematics) Computer font Mereology Machine learning Bit rate Synchronization Process (computing) Graphics processing unit Computer file Domain name Thread (computing) Data mining Type theory Internetworking Natural number Self-organization Right angle Task (computing) Resultant Reduction of order Row (database) Ocean current Laptop Server (computing) Game controller Level of measurement Line (geometry) Real number Cybersex Virtual machine Discrete element method Field (computer science) Power (physics) Internetworking Reduction of order Uniqueness quantification Task (computing) Parallel computing Projective plane Mathematical analysis Neuroinformatik Computer programming CAN bus Catastrophism Abstraction
Email Wechselseitige Information Intel STL Multiplication sign Strut Sheaf (mathematics) Neuroinformatik Programmierstil Computer configuration Pattern language Computational science Error message Library (computing) Service (economics) Parsing Texture mapping Streaming media Bit Term (mathematics) Control flow Element (mathematics) Thread (computing) Annulus (mathematics) Arithmetic mean Hash function Hacker (term) Data structure Task (computing) Reduction of order Directed graph Point (geometry) Maxima and minima Mass Regular graph Template (C++) 2 (number) Sequence Regular graph Software suite Uniqueness quantification Representation (politics) Data structure output Directed graph Data type Axiom of choice Parallel computing Planning Core dump Tableau System call Number Wechselseitiger Ausschluss Personal digital assistant String (computer science) Mixed reality Library (computing)
Email Scheduling (computing) Multiplication sign Strut Neuroinformatik Sequence Independence (probability theory) Pointer (computer programming) Sign (mathematics) Uniqueness quantification Process (computing) Divisor Aerodynamics Communications protocol Metropolitan area network Directed graph Task (computing) Parsing Real number Concurrency (computer science) Shared memory Code Core dump Thread (computing) Number Wechselseitiger Ausschluss Computer configuration Personal digital assistant String (computer science) Helmholtz decomposition Revision control Right angle Queue (abstract data type) Scheduling (computing) Data structure Task (computing) Reduction of order Singuläres Integral
State transition system Context awareness Momentum Demo (music) Multiplication sign Computer file Demo (music) Shared memory Amsterdam Ordnance Datum Flow separation Measurement Proper map Sequence Rule of inference Thread (computing) Scalability Neuroinformatik Kernel (computing) Factory (trading post) Process (computing) Tunis Metropolitan area network
Demo (music) Online help Cellular automaton Demo (music) Maxima and minima Set (mathematics) Thread (computing) Dimensional analysis Demoscene Computer programming Number Sign (mathematics) Process (computing) Machine learning Analog-to-digital converter Normed vector space Reduction of order Configuration space Theorem Lipschitz-Stetigkeit Surjective function Window
Mobile Web Execution unit Scheduling (computing) Game controller Implementation Assembly language Code Real number Weight Content (media) Maxima and minima Device driver Set (mathematics) Maxima and minima Bit Streaming media Cartesian coordinate system Formal language Neuroinformatik Commodore VIC-20 Energy level Configuration space Process (computing) Resultant
we'll start with rural under really excited to talk here so thank you for listening to me and in this presentation I'm going to talk about we're
application using my 34 I think huge pick up this is a tool which take full advantage of our school professor and cheap high performance improvement actually is there anyone in this room think that Wireshark is a little bit slow no the design call if to is kind of moderated by Asha with automated direction so I guess marauding can be one of the new frontier for packet inspection okay so my name is Drew and I'm working in governmental organization so I'm not weird so my talk is divided four parts the at first I would like to talk about current catastrophic situation of traffic analysis so I think here is that we have too many packets to be inspected however for the program to have the solution we have more audits this is very helpful situation a terawatt later and second one is a main part when you build a tool for analyzing whose pcap file using massive threads you have some fractions how to combat code into a concurrent budget the selection of futures containers and synchronization mechanism such as mutex rockery something like that and the third part is a demo and experimental result simply stated speed-up is ratio of power execution time to fear at the jail time so I'll show you the compiler then let me conclude this talk this right shows a trophic situation as everyone of audience already know internet traffic is increasing at exponential rate however there are two huge professionals so again so I'm so excited to a huge topic import is a great body on security researchers and honest but traffic explosion is a similar to the hacking or expert because hacking and exploit is impulsive and will be finished within several minutes but this is not a case of traffic explosion unfortunately truck traffic explosion keeps exploding like an accident of nuclear power plant so in my case in my laboratory I have 200 to 300 row file to be stored in the server to be inspected well during this 20 minutes presentation 3 to 5 gigabyte file - it is stored to be inspected it is really help a session for me so automation is really important for me and everyone of ordinance but for my experience open-source data mining true doesn't work in many cases because in the world of atom attachment at marketing commercial to is not going to find people trying to hide his activities and to make things more worth open-source data mining to simply ignore people trying to hide and assume uniting behavior is a part of liberals so that I like to imply that pocket done with roughly thawed PDF file is rare and hard to find source to be trusted and Mussina decide one medium Baathist font to Arial machine learning has table property I tell you what if massive money doesn't work on the data set comprising 1 million training guessing what is needed what is needed is much more bigger set this is unexpected because it's machine running well too well in the get set comprising on 1 million training tested the intuitive conclusion is that it doesn't work at all but according to this paper all we need is much more bigger pockets so the situation is very curious
so Asura has four features at first Asura should be run on commodity workstation and laptops Pitcairn with reasonable computing resources because GPU and crafting system such as spark is still expensive and high post and sometimes about and more importantly Astra you this projects pset which is really odd product forming style when writing a program choosing appropriate level of abstraction is really important usually hungry anyone misses your program method except happens what we are copying here is real world extreme which is huge no nice not organized in degra pattern and unfortunately unpredictable so flexibility is important right you use assembly language for analyzing male violence so groceries and MPI expose control of parallel computing at roie's level but at lowest level we don't have railroads containers and schedules so you have to implement these utilities include scratch by yourselves right share out 1980s or 1990s I guess this field can be one of the new frontier for public inspection as a result Astra is compact about top bacteria Astra has about two thousand lines of code but can profess about 75 million packets and 200 to 400 means these to be intuitive Li simple Astra takes two steps reaction using health task compulsion and cross type date using data decomposition as you know in action takes a fraction data and reduce it to sink our body and the crusting is a task for growth equipping data in the same group in such a way that data in the same group is more senior than to those in callicles and the important thing here is reduction passive container to craft the container is a really important key and if tuesday's processing
container is crossed temperate of C++ and the future section are almost Oh anomaly detection packet based on futures there are many research efforts and futures could be many but the important thing here is to find proper representation for reducing the massive pika fire we use this representation and the meaning of this right team are you and we use two structures and this is not little bit complicated preseason source coding the detail
and let me talk more about condoms Congress is a very important point for Marsh reading you have three options first Monday's STL still is old basic and regular programming style but STL is not congruent friendly so it is the standard practice to wrap up a rock around STL to make them say for concurrent access and second one Dimitra fifth second one is Intel TBB is an excellent library but the inter TV provides highly competent error but the high command Cantara is sometimes the high cost it takes longer time and I guess this one is mainly for scientific computation so unfortunately but what everyone here is doing is like science computation so the data is of asana and pretty predictable so in this case TV business suite of artists and Salman is the emerging technology of thrust thrust is a t3 puff-puff template library for GPU by using thrust you can write a call to help the youth scan on something ever accelerated by GPU but unfortunately as far as I know there's no plan to implement the hash table map associated container in GPU so I guess they read that time to be common for packet inspection I guess this is a this could be a future lock
and that this right is a man picture of
AUSA if you have a case when the competition time on individual pika fire is vile and predictable you'd be better served by task of compulsion specifically if you have a case the amount of computational time will vary dynamic schedule will be based here as we stomach scheduled tasks the compulsion Road bossy is the important to take into consideration you have to implement skinnier by yourself please see their upper side of this right this is a shared container which is cue the dynamic schedule involves setting up our shared container which hold data and our threats to proud tasks when the previous task is cooperated so you should protect share a container so that the sweat Camby sign correctly and tasks should not cross through some corruption of co2 container
okay let me show about explain to our resort to party simply speed up here is ratio of computing time to sequential computing time and scalability so what is cavity I think's cavity is a measure of how much speed up the program get as you had more and more poor and threat with I guess this carnal chewing is proper but I'm from Agra factory but the with proper tuning of Phoenix College after account 75 million packets with 500 threats and about two hundred eighty seven minutes to tell the truth man there are some rules to be improved to be improved because size of share the container is not proper rock intention and context switching occurred too much so but my guess is reasonable - can profess more than 7 million packets in several hours I'd like to skip the attack detected in detail because some issues of public dataset so instead let me show you a demo momentum
first of all binary is compiled according to the configuration of number of cells actually and reduction step one so this theorem is too fast so I can't we have three we have step to building binary for crafting crossing has five to seven dimensions and that data is truncated to for this demo which is too short so do you know what's going on so I don't know what's wrong I have because of my scene learning is to first
fascinating realize on so huge data set and processing speed is so fast so machine learning might stop program which cannot expect it you so you know I said that we don't fully understand so let's do first I got top buddy
so Remy Congo thief talk I have talked about little bit weird application which is called after using much trading for copying with real world pick up stream which is huge not nice sometimes able flexibility is needed just like you should use assembly language for mobiles but using cross weight and MPI takes advantages retreat for performance of Moscow professors and pthread tax for the control of computing at lowest level but unfortunately or not we should implement everything drivers content schedules yes which is really exciting for me and as a result they offer maximum for 3d so as a result Astra is compact but Papa after a half thousand of code and can process more than 70 million packets and 205 million so 200 200 - five minutes for future work must be period because just rooms to improve okay configure the size of containers and are praying TVB and GPU and I really recommend this marks ready applying my setting for packet inspection it's really exciting it can be worn in frontiers for public inspection so thank you hi everyone that's all Theresa
Feedback