ntopng: an actionable event-driven network traffic analysis application
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61596 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023160 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Computer networkMathematical analysisMetropolitan area networkInterface (computing)ArchitectureEvent horizonFirewall (computing)DatabaseUsabilityScripting languageSystem programmingEmailMetreStatement (computer science)Cartesian coordinate systemOperator (mathematics)InformationComputer networkRow (database)Process (computing)Library (computing)Computer architectureGeneric programmingCommunications protocolMetadataMereologyPublic key certificateDataflowRepresentation (politics)Rule of inferenceInterface (computing)Software developerMultiplicationSystem administratorType theoryEncryptionCASE <Informatik>Multiplication signCodeUser interfaceRepresentational state transferOpen sourceConnected spaceQuicksortConfiguration spaceOperating systemReal-time operating systemRouter (computing)Link (knot theory)Different (Kate Ryan album)Goodness of fitGastropod shellNatural numberPresentation of a groupComputer wormCombinational logicMathematical analysisDatabaseComputer programmingLevel (video gaming)VirtualizationPasswordPhysicalismTelecommunicationWeb 2.0output
08:54
MetreStatement (computer science)Coding theoryPersonal digital assistantTransport Layer SecurityExplosionZoom lensPublic key certificateInterface (computing)Process (computing)DataflowCommunications protocolFrequencyFunction (mathematics)Local ringScripting languageCommon Language InfrastructureMessage passingToken ringLetterpress printingGastropod shellSocial classMetadataResultantGraphical user interfaceInformationLibrary (computing)Uniform resource locatorOverhead (computing)Traffic reportingNetwork topologyQuery languageRepresentational state transferStatisticsPhysical systemInstance (computer science)AverageScripting languageMultiplication sign1 (number)Time seriesAutonomous system (mathematics)Thread (computing)Data structureLatent heatCommunications protocolDataflowGastropod shellClient (computing)Transport Layer SecurityType theoryComputer networkRule of inferenceVulnerability (computing)NumberCartesian coordinate systemSystem callRepresentation (politics)CodeData miningInternet der DingeDatabaseExtreme programmingEvent horizonCategory of beingSource codeMereologyInterface (computing)Group actionConnected spaceTelecommunicationAdditionFormal languagePublic key certificateSoftware developerZoom lensComputer networkOrder (biology)
17:43
Scripting languageAugmented realityCore dumpoutputEvent horizonExtension (kinesiology)EmailPasswordProbability density functionBuildingExplosionInformationMathematical analysisComputer networkKeyboard shortcutCodeSoftware developerScaling (geometry)Source codeScripting languageTraffic reportingCybersexFigurate numberDevice driverMetadataoutputCombinational logicEmailFlow separationCountingEvent horizonCartesian coordinate systemGroup actionInformation securityMultiplication signResultantNatural numberPhysical systemInformationGastropod shellOrder (biology)Task (computing)Software developerCASE <Informatik>Moment (mathematics)2 (number)IP addressLine (geometry)CodeProbability density functionMalwareWeb 2.0WritingLibrary (computing)Formal languageSource codePoint (geometry)View (database)DataflowComputer configurationCategory of beingMessage passingInstance (computer science)Type theoryComputer networkComputer animation
26:31
Augmented realityScripting languageGastropod shellInterface (computing)Instance (computer science)DataflowExplosionZoom lensCoding theoryPersonal digital assistantTransport Layer SecurityPublic key certificateStatement (computer science)ArchitectureMultiplication signComputer architectureProgram flowchartComputer animation
27:03
Program flowchart
Transcript: English(auto-generated)
00:07
OK, good morning. This time it's your turn to talk about something different. NTOP-NG. What is NTOP-NG? It's an open source application, of course.
00:21
We are here. And you can download the code on GitHub. You'll see the link at the end of the talk. What is NTOP-NG doing? It is, first of all, a real-time network traffic monitoring application. So it means that this place you on a web interface. What is happening in your network live? OK, no delay.
00:41
This unless, of course, we are receiving flows coming from a router that are somehow a little bit delayed, because by nature, they are on average. So they have a certain lifetime. And it is designed for network monitoring and cybersecurity. It means that there are some behavioral checks. So we are not bound to rules.
01:00
You have seen Surikata representation before. So you see there are some rules in case this happens then. So this is not our case. So we work based on behavior. So it means that if you have a host that is misbehaving, more or less similar to what you've seen before, that suddenly start to send too much traffic with respect to the past, or starting to fire up a new application.
01:24
So accept connection on a certain port for TLS that was not open before. This is a typical example. So therefore, it means that the application simply starts up and learns what is happening on the network. There are some levels of learning.
01:41
So sometimes, it is an immediate learning, because you specify some sort of configuration. But usually, this is not the case. The case is that the application learns what is happening. In case something goes wrong, goes different, fires up an alert. This is the idea. And the architecture is actually divided in two parts.
02:03
First of all, the pocket processing part. That is based on more or less pfring-only pickup. So this means that you can run on Windows, Linux, Mac OS, FreeBSD, whatever. Instead, pfring is something that we have co-developed that is a Linux technology for accelerating pocket capture,
02:22
but not only for that. But also, for instance, for merging traffic for multiple adopters, for distributing traffic. So it's much more than simply RX acceleration. On top of this, there is an open source library that we still maintain at NTOP called NDPI. So this is the only open source library that is doing the pocket inspection. But for us, it means that we try
02:42
to understand from the traffic what is the application protocol. So if it's TLS, if it's a generic protocol, if it's Google mail, that's a very specific protocol. And out of the traffic, we extract the metadata. So for instance, we extract certificate information, and we generate something we call a risk.
03:01
So looking at the traffic, we see if there is something wrong, such as an expired certificate, just to give you an idea. And we trigger an alert. On top of this, there is a top-NG, because this is the first part that is basically provided by the operating system. And top-NG has a C++ engine that is processing pockets that is, in essence, doing
03:22
traffic analysis, creates internally the representation of the data based on the concept of network interfaces, because we can have multiple network interfaces for which the traffic is received. It can be a virtual interface, such as NetFlow collection, or a physical interface, ETH0. And then we have something we call behavioral checks,
03:41
where we check flows and hosts. Flows means that each independent communication, such as a TCP connection, is checked. Instead, a host, we take the host as a whole. So in essence, if host is doing a port scan, each individual communication is OK, or more or less OK. But the fact that this host is
04:00
doing this in a sequence, in a network, on a host, it's a problem. So this is called behavioral checks. And on top of this, we trigger alerts that can be consumed locally or sent elsewhere. This is the first part. On top of this, we have the Lua interface. Why Lua? Because we like C++, but C++ is something not for everybody.
04:22
So we need to simplify, for instance, the development of the web interface. So for instance, the REST API is written in Lua, sitting on top of C++. So we have created an API that allows us to avoid typical problems of C++. At the same time, we simplify the way the application is working.
04:40
So therefore, we use Lua for operations that are not critical, such as the web GUI, or for checking interfaces that are not necessarily real time. So for the SNMP, for SNMP, we fetch the data every five minutes and do the checks. So traffic injection, as I said, is done in multiple ways.
05:02
Sometimes it's serious traffic, so packets. Sometimes it is not. It's a flow. And this is handled by the C++ engine. So the C++ engine is doing it efficiently. And then we have other type of ingestions based on events. So something that we don't really control completely,
05:21
but that are relevant for us. So we have seen Suricata, the presentation before some minutes ago. This is a typical example of input. Why this? Because as I said at the beginning, we don't have rules. We don't want to have rules. So we don't want to say if the payload contains this and this and this and this, then.
05:41
Because we don't believe that this is what we should do. Instead, there are wonderful tools such as Suricata that are doing that very well. So therefore, the idea is to combine natural monitoring and behavior analysis with this type of tools. So therefore, indirectly, through tools as a Suricata that is optional, of course, you don't have to run it with top-ng mandatory,
06:02
you can have this type of information that can be combined directly by top-ng. Of course, we have FIWARE logs and syslog. Why this is important? Because we can have a look at information that is not visible from the traffic. So we always play with packets being in Alfredo. But we understand that packets have limitations, especially
06:23
for encryption. So we have seen before rules saying if you are downloading a binary application. That is fine if it's plain text. But if TLS, you will never see that happen. So you have to use things like rules on top of this, on top of this, but they are just guesses. So instead, if through syslog or other means, we know that.
06:44
So if we see an attack on WordPress saying that this host is trying to guess the password of the administrator user, this is much relevant information. And from the network standpoint, it looks simply nice. Everything is OK. The problem is that from the application. That's why we believe in network, but at the same time, we need to have some other information that
07:02
is injected into the application. And of course, we have historical data. We use a database called ClickHouse. So we can put a billion of records. Everything is working very fast. This is also an open source database. And for time series, we use Round Robin database or InfluxDB.
07:22
And as I said before, we have checks that are divided in two parts, C++ for efficiency. So the first part, in essence, where you have to process traffic in line, such as when you have a packet, an incoming packet, you have to check if this packet belonging to a flow, it's relevant.
07:41
And then we have other type of checks that are not so real time. So if we check on an SNMP interface, that need to be easy to be developed, but at the same time, that don't need to be fast. Because as I said, if we pull SNMP every five minutes, we have plenty of time for doing that.
08:01
And of course, we have notification that we send out. So we trigger a shell script, a web hook, syslog, email, telegram, usual thing, nothing new here. OK, let's now start to the talk after this introduction of NTOPNG. The problem is the following. So we have added over 150 checks, behavioral checks,
08:22
on the traffic. But there is always somebody that comes and says, I want to do something different. How can we support these people? How can we enable new programmers, or let's say, people that used to use Python, shell script, this type of programming language, or that don't want to learn the antennas of our application?
08:41
How can we do that? And many times, this happens when you are in a rush. So there is an attack. There is something happening on your network that you want to check. And we have two levels of the problem. First of all, we have to extend the behavioral checks in order to have some behavioral detection in a different way.
09:01
And in the second part that Alfredo will describe later, so how can we use NTOPNG as a data lake from languages such as Python, for instance, that is very popular, so that you can use NTOPNG as a source of data for your application? Of course, you have time series. As I said, for instance, we save data in InfluxDB
09:21
if you want. So therefore, you can use Grafana for creating your own dashboard. But these are simple dashboards. So if we want to do something more complicated, if we want to go beyond that, in addition to that, how can we do that? So this is the idea today. So we like C++. C++ is super efficient. We like it. We are used to playing with it since many years.
09:43
But we understand that is not what everybody wants. We need something easier. And we would like to understand also how it was possible to develop checks in minutes for people who are saying, OK, if I see this specific certificate or if I see this specific behavior,
10:00
then there is a problem. Something very peculiar to an organization, so not general for everybody, but for specific people. So for instance, how do I trigger an alert if there is TLS traffic within host A and B? So for instance, a printer should not make any TLS traffic just to make an example. So if this happens, how can it trigger an alert? Another problem is the following.
10:21
If I have a certificate signed by a certain organization, or for instance, if I have a BitTorrent connection that is going above 1 gigabit, or notify me if there is a Zoom call with bad quality. Things like this. Things like very, very peculiar, very specific checks that people want to do, maybe on an autonomous system
10:45
and not on another, or on a network and not on another. So things that are not general that we can implement for everybody. How can we do that? So let me talk how it works on top-ng internally. Let's have a look at the flow, also our communication.
11:00
So on top-ng creates a data structure inside itself as soon as we see the first bucket of the flow. So we see hypersource, IP destination, source parcel destination, protocol, VLAN, whatever. And then this is the first event that is relevant for us. And then, as I said, everything sits on top of NDPI, so the yellow part.
11:20
So we have another event when the application protocol is detected. Actually, this one is divided in two parts. First of all, as soon as the main protocol is detected, such as TLS, and then we can refine this information with metadata, saying, OK, this is TLS that is going to Google Mail and not Google Search or Google something else.
11:41
So second event, NDPI. And then we have, for longstanding flows, some periodic activities. So in essence, every minute, we do something different, something like I want to trigger an action. And then at the end, the flow end notification, so as soon as the flow is over.
12:01
So what we wanted to do, we wanted to create a low API that allows people to create the simple checks that are efficient, efficient enough for most people, because not everybody is 100 gigabit. But many people have one gigabit networks, or two or five gigabit networks. So they need some sort of efficiency, but they are not super extreme.
12:21
So let's say, use Lua for prototyping a check, for some people who need speed. Or use Lua for people who have, let's say, an industrial network or a network that is running at one gigabit or two. So in essence, we have created an API that allows from Lua to see internally, in ntopng, properties of the flow.
12:42
For instance, the number of bytes, multicast, layer 7 information, these type of things. And the API calls are very small. So in essence, we don't want the application to be inefficient, simply because we download to Lua the representation of the host, the representation of the flow. Simply the method that we are interested in.
13:02
So in the left side, you will see the C++ code, how it implements the stuff. On the right side, you will see an example of the Lua code. So in this case, just to give you an idea of how it works, so whenever there is one of the events, so for instance, we have to check the flow because NDPI is over, so the protocol has been detected.
13:20
So if you want to block, let's say, a Google mail, so what you need to do is to execute a Lua check after this happened. So in essence, in the C++ code, we have put the call to the Lua VM that executes a script. A script can be applied to many flows, not just for one. So this is where this happens.
13:42
And this is an example of a check. So we have a simple example. If you have a flow that is either TLS or QUIC, started from host 192.68, 178, to 1.1, and if it's TLS and if the protocol issue is.
14:01
So a very simple check that a friend of mine has asked because he's monitoring IoT networks, and they found a vulnerability on a specific type of rule. And the client was a specific device. So something that is not general. OK, so this is the way it works, very simple to write.
14:22
The problem is the following, that the overhead introduced, this is a very slow Intel i3. So just to give you an idea of the super worst case, it's 30 microseconds for everything in average. Whereas with C++, we can do it in one microsecond. Now, you say, this is bad. In a way, it is bad. I agree, because we are 30 times slower.
14:41
But you have to think, first of all, on a one gigabit network that this is not the problem. Also, you have to think that most of these checks are asynchronous. This is one of the few ones that are synchronous. So in essence, as soon as the protocol has been detected, we call this method. But it is not why the packets are coming. So in essence, we have another thread that is calling this while the traffic is coming.
15:02
But we don't stop the execution tree. So in essence, just to make it short, so if you take this overhead that you have introduced and you sum to everything, and you stay below certain boundaries, so if you want for every minute to execute the flow checks on all the flows, you are good. And of course, we trigger an alert.
15:20
And the result of the alert is a notification on the GUI that can be sent, for instance, through Microsoft Teams, just to give you an idea. We can trigger a shell script for something or can send an alert to my friend on Telegram. So this is the way it works. OK, now I have this. OK, so we have seen how to extend the Ntopng engine
15:42
with Lua scripts to access traffic information and use those information to check the traffic and trigger alerts, for instance. Now, recently released also a Python package that you can install with pip install Ntopng that allows you to use it as a library
16:02
to create a Python script which is able to access traffic information in Ntopng. And this happens through the REST API. This means that you can run your script even on a remote location. For example, you can access live data in Ntopng.
16:22
In this case, we are importing the Ntopng class. We are connecting to Ntopng using the Ntopng class. We get an instance of an interface in Ntopng, for instance, eth0. And we use this method to get all the hosts which are active in my network with all the metadata.
16:43
And there are plenty of methods in this class or another class in this library that allows you to get traffic information. So you can get alerts, flows, hosts, whatever. And you can also get historical data. So same way, so you connect to Ntopng, you get an interface.
17:02
From this interface, you get an instance of the historical class. And you can run queries in the database. For instance, you can get alert statistics from this time to this time, for instance, for the last 24 hours. And just print all the alerts that you have.
17:21
Now, those are two examples of querying the engine to get the data. Of course, we have seen that Ntopng is able to, when a check or an external event detects something, an event, we can trigger an alert. And we have seen that Ntopng supports several endpoints.
17:43
So we can send this alert using mail, a messaging system like Telegram, Slack. We can run a shell script. We can call a web book. So we can run a shell script, for instance, and this script can be a Python script.
18:03
So let's try to put all the pieces together. So we receive an event, which is generated by an internal check or an external check. This event can call a Python script. This Python script can get information from the alert itself or can query the engine through this API
18:22
that we created to get more data, to fetch more data and augment the alert information. And this can have some logic and trigger some action. So you can write your actions here to react to this event. In order to implement this, what you have to do in Ntopng
18:41
is, first of all, you have to enable the check that you want to use to analyze the traffic. For instance, in this case, we are using a custom check that the user creates in Lua, as Luca showed you before. Then if you want to write your Python script
19:01
that reacts to this event, you have to write an alert tender, which is a script that you place under Ntopng script shell. And this case is a simple script, which is just getting in a standard input the traffic information, the metadata. And for instance, if the alert type is our user script,
19:22
I want to do something. In this case, I'm just logging the IP address related to the host that triggered the alert and a message from our custom script. Then you have to go inside Ntopng. You go under notifications. You set that you want to send alerts to the shell script.
19:43
Here you have all the options like email, whatever. And you select your handler. And then you specify for your handler that you want to receive just critical alerts. So you specify the severity. You specify the category that you want to, of alerts that you want to handle, for in this case, cybersecurity and the entity.
20:01
In this case, I want to handle alerts about hosts. And then we can extend our handler. We have seen how to print just the alert information, but we can, again, we can use our Python library and Ntopng to access more information about the hosts.
20:22
So we receive this alert, which has been triggered on a specific host in our network. For instance, this host has been infected by malware. It's generating unexpected traffic, whatever. We want to get more information about this host to build a report, for instance. In fact, in our library,
20:41
we also have the ability to generate a report, or you can generate your own report using the API that we have. So we build this report and send an email. So this is a simple script that you can use. It's a few lines of code to handle alerts and generate reports and get, for instance,
21:01
an email on your mobile phone with the alert. So this is the big picture of the example that we have seen right now. So we have defined user script that triggers an alert, or we receive, again, events from any other source or internal checks.
21:21
We are calling our script, which is getting more information from the engine to build a report and send this report by email. So the result is this. So the system is checking your traffic, is building a report when something happens, and will send you an email with a report of the traffic for the host with the top alerts sorted by severity
21:43
or by count. The top contacts for the host, the chart of the traffic generated by the host, where you can add more, like the top applications used by the host, et cetera. Do you want to wrap up? Okay. So we have seen that within TopNG,
22:00
you can collect traffic information from traffic, flows, events, events from Surigata for instance, et cetera. And when we started within, actually, Luca started within Top, then we moved to TopNG. It was mainly a traffic monitoring tool. Today is also a cyber security tool,
22:22
able to do behavioral checks, not just for providing visibility, but also providing cyber security monitoring. And you are now able to extend this engine both with your scripts integrated in TopNG, or even with C++ plugins, let's say checks,
22:44
if you need to scale with performance, or you can use our Python library to write Python tools that can run externally, even remote boxes, to access traffic information in the TopNG engine,
23:03
and for instance, a PDF as we have seen with reporting what's going on in your network. Of course, all the code is available on GitHub, so if you want to contribute, you are welcome, especially now, you don't have excuses, we have a lot of libraries, scripting languages
23:21
for interacting with the engine, so something else to add? No, the only thing I want to say is that this is an efficient way from our point of view to do natural monitoring and cyber security, and at the same time to extract the information in a way that does not interfere with the main engine, that allows, I believe, most of people sitting in this room
23:43
to do whatever they like to create a monitoring tool that is tailored for their own needs, and that's the first step that is open source. That's all, thank you very much. Any questions?
24:05
It's just a simple question, how does it compare with CM tools? It looks like it does everything CM could do. CM tools. Yeah. I don't know the tools. No problem, I am not familiar with them.
24:22
Any other question? The scripts can be compiled to be more performant,
24:41
or do you not have these tasks in your developer timeline? To compile script to have more performance, you have script or like CCC, we saw that CCC script take one millisecond,
25:02
but the Lua script take 30 milliseconds. Microseconds, okay. Yes, of course, you can compile them, but you have to code it in C++ at the moment. So we use Lua just in time, compile the one seen before by some switch,
25:21
but it is not available everywhere for its own arm, and we want to support it as various. So yes, it is possible, but again, these are microseconds, not milliseconds, so one million of them per second. Any questions? Somebody else?
25:44
Hi, thank you. Do you have some figures about performance you are able to achieve on typical server, about flow per second? Some figure to share on the Lua scripting, and also some example on Python,
26:03
which should be less efficient? Okay, when you process packets with top-ng itself, you're able to process like a few gigabit per second, depending on the drivers you use, how you tune top-ng, let's say,
26:21
to scale with the performance, you can get 10 gigabit, for instance, but more or less, we range from a few gigabit to 10 gigabit in top-ng itself. You can use it in combination with our probes, which is a probe, or we have other probes, like Cento. In that case, you can scale with the performance up to 100 gigabit per second,
26:42
but the architecture changes a bit. Yeah, it's one, 100 gigabit and plus. Yeah, as of the checks, it depends on the checks that you enable, of course. Okay, I think we are running out of time. Many thanks for being here now. Thank you.
27:00
Thank you.