Network Traffic Classification for Cybersecurity and Monitoring
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56962 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 202252 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
Open sourceInformation securityComputer networkDenial-of-service attackSoftwareVirtuelles privates NetzwerkComputer wormLogical constantMotion captureEvent horizonSoftware developerMathematical analysisGreatest elementPresentation of a groupProcess (computing)ImplementationHost Identity ProtocolStatement (computer science)Personal digital assistantString (computer science)Pairwise comparisonElectronic mailing listPublic domainDirect numerical simulationData structureData dictionaryAlgorithmTotal S.A.Kolmogorov complexityRead-only memoryInformation securityElectronic mailing listNumberWordMultiplication signData dictionarySoftware testingString (computer science)Public domainFinite-state machineDomain namePerturbation theoryAlgorithmCartesian coordinate systemCASE <Informatik>Network topologySemiconductor memoryBitMatching (graph theory)Multi-core processorSoftware2 (number)Virtual machineAuthorizationLimit (category theory)Motion captureStack (abstract data type)Communications protocolTelecommunicationComputer wormMathematical analysisAdditionEvent horizonSubset1 (number)Open sourceAutomatonOrder (biology)Library (computing)ImplementationInstance (computer science)Data storage devicePresentation of a groupWebsiteCodecArithmetic meanTheoryUniversal product codeUniverse (mathematics)Public key certificateProcess (computing)Sign (mathematics)Parameter (computer programming)Broadcasting (networking)Moment (mathematics)Greatest elementServer (computing)MereologyPoint (geometry)Natural numberPhysical lawEngineering drawingComputer animation
06:26
Network topologyInformation retrievalPairwise comparisonBootingComputer configurationString (computer science)Data dictionaryLink (knot theory)InformationCodePartial derivativePersonal digital assistantSoftware testingConvex hullMetadataStatement (computer science)Address spaceDirect numerical simulationElement (mathematics)Read-only memoryHash functionData structureAreaCommunications protocolLocal ringVirtual machineTerm (mathematics)Keyboard shortcutTable (information)Counting2 (number)InformationNetwork topologyString (computer science)Order (biology)MetadataMatching (graph theory)BitNumberCASE <Informatik>Computer animationWordRange (statistics)Revision controlResultantData structureFlow separationRadical (chemistry)LengthElectronic mailing listIP addressHash functionSemiconductor memoryLevel (video gaming)Error messagePublic domainSoftwarePerfect groupPort scannerGeneric programmingDifferent (Kate Ryan album)AlgorithmTrailBasis <Mathematik>Point (geometry)Open sourceInheritance (object-oriented programming)Projective planeSingle-precision floating-point formatCodecNetbookGraph coloringStability theoryInstance (computer science)Physical systemSearch engine (computing)UsabilityVideo gameProcess (computing)Physical lawCrash (computing)Product (business)Constructor (object-oriented programming)Computer animation
12:45
Read-only memorySoftware bugMechanism designOutlierEvent horizonPoint (geometry)Floating pointNegative numberSign (mathematics)ForestError messageSummierbarkeitLatent heatSquare numberPredictionPhysical systemImplementationMusical ensembleExponential smoothingSingle-precision floating-point formatRegular graphExponential functionProcess (computing)Phase transitionComputerAlpha (investment)Shooting methodBinary fileElement (mathematics)Pairwise comparisonPersonal digital assistantSimilarity (geometry)Distribution (mathematics)LengthThomas KuhnComputer networkInterface (computing)Data analysisStreaming mediaEntropyStrutGoogolPatch (Unix)Boom (sailing)Video game1 (number)AreaProduct (business)MathematicsMultiplication signForcing (mathematics)Point (geometry)Core dumpSingle-precision floating-point formatConnected spaceOrder (biology)Data structureDiscrepancy theoryState observerMereologyDemosceneObservational studyReliefStudent's t-testEstimatorGreen's functionInstance (computer science)Degree (graph theory)CodecCASE <Informatik>WordWeightReading (process)Cellular automatonGoogolMathematical analysisSoftwareSystem callCivil engineeringField (computer science)Machine learningTwitterMeasurementCheat <Computerspiel>Very-high-bit-rate digital subscriber lineGroup actionOpen sourceCausalityLine (geometry)Expected valueBoundary value problemSpecial unitary groupNumberRevision controlDivisorNatural numberRange (statistics)Shooting methodData storage deviceElectric generatorReal numberBit rateView (database)Set (mathematics)Category of beingTime seriesSeries (mathematics)IterationBitType theoryComputer clusterSource codeFitness functionBeta functionMultiplicationFunktionalanalysisVirtual machineSummierbarkeitSmoothingGamma functionError messagePredictabilityPattern languageAlgorithmSoftware bug2 (number)Binary fileAlpha (investment)Computer fileFile archiverLengthMusical ensembleHeegaard splittingCodeSimilarity (geometry)Data analysisAverageExponential smoothingSoftware developerMatching (graph theory)Computer animationXML
21:20
Computer animationMeeting/Interview
21:58
Student's t-testShared memoryOpen sourceCondition numberProcess (computing)Touch typingInformation securityGoogolCommunications protocolCartesian coordinate systemMeeting/Interview
22:56
Projective planeOpen sourceProcess (computing)AlgorithmSeries (mathematics)Library (computing)MereologyCartesian coordinate systemWebsiteVirtual machinePattern recognitionAnalogyMathematical analysisTable (information)outputFocus (optics)Meeting/Interview
24:14
Motion captureFormal languageGroup actionLibrary (computing)Mathematical analysisOpen sourceComponent-based software engineeringCore dumpMeeting/Interview
25:15
RoutingCartesian coordinate systemFormal languageCommunications protocolProcess (computing)Peer-to-peerTelecommunicationKeyboard shortcutCASE <Informatik>AuthorizationBlock (periodic table)Meeting/Interview
26:24
Virtual machineDatabase normalizationCASE <Informatik>Order (biology)Motion captureLibrary (computing)MereologyFormal languageInstance (computer science)Context awarenessArmFreezingBackupElectronic program guideMathematicsTraffic reportingMeeting/Interview
27:44
MereologyProcess (computing)Communications protocolMeeting/Interview
28:06
EncryptionComputer wormMathematicsRadical (chemistry)WeightMeeting/Interview
28:42
Ring (mathematics)FingerprintProcess (computing)Meeting/InterviewComputer animation
Transcript: English(auto-generated)
00:09
Hello, my name is Luca Dery and today I am going to talk about network traffic classification for cybersecurity and monitoring. Before I start, I want to tell you a little bit more about me.
00:21
I am the founder of NTOP, a company that develops open source tools for network security and visibility. You probably know NTOP-NG and probably also MDPI because I presented this last year at FOSDEM. I am the author of various other open source tools and contributors to other tools such as Wireshark.
00:42
Finally, I teach at the University of Pisa as a lecturer. Last year at FOSDEM I have presented MDPI. This year I want to extend my presentation, adding new features that are present in MDPI but are probably not known to everyone.
01:01
In particular, I want to talk about network traffic analysis using MDPI. This is because on the market there are many tools and toolkits such as TPDK, PF-RING, NetMap or if you want also eBPF that allows you to capture events or packets for the purpose of network traffic analysis.
01:22
Unfortunately, most applications are still based on the top, bottom, X, OS that does certain activities. Whereas today the nature of the traffic is much more complex and we need to do something more than that. In order to avoid implementing the wheel many times, we have decided to put into MDPI additional features
01:43
that are not purely deep packet inspection oriented that allow applications sitting on top of it to analyze traffic. Please note that you can use MDPI on top of TPDK. You don't need to put all the top open source PF-RING stack on top of it. And also remember that this library is designed for speed.
02:04
So this means that we have tried to optimize the library as much as possible and to overcome limitations of typical solutions that are based on Python and R that can do similar things but only in post-processing because they are not fast enough or they require too many resources. Just to recap what is MDPI, it is a toolkit that was primarily designed for learning about network traffic protocols
02:29
and reporting what is the application protocol behind the certain network communication. Today we are going to talk about network traffic analysis and I'm going to present you some examples of network traffic problems that can be solved with MDPI.
02:44
The first problem is string searching. In traffic, sometimes we have to search specific strings, not just because we want to search in the payload in a certain word but also because we need to match the traffic against certain criteria. A typical example is substring matching that is implemented in MDPI by the implementation of the How-Currency-Calc.
03:06
Substring matching is necessary whenever you want, for instance, to match a certain domain name against a dictionary. So, let's say you imagine that you have a list of blacklisted hosts, a list of domain names that are not nice to contact,
03:23
a list of bombers and many things like that. So, we are talking about strings. And you want to do this matching. The matching has to be substring because when you have a domain name, you don't have to write the whole host name, only a subset of it. How-Currency-Calc is an efficient string searching algorithm that is pretty efficient.
03:42
Unfortunately, How-Currency-Calc is a little bit complicated to implement because it requires the implementation of automata. In essence, a state machine and a network, if you want, where we represent inside it all the possible nodes with the possible words of the dictionary so that whenever we have a string to match, the algorithm search inside this automata and try to find the best match, if any.
04:09
As you can see in the picture taken from Wikipedia, you have two types of nodes, the blue and the gray one. The blue nodes are terminal ones, so those that basically contain the match, and the gray ones are those that are used to build the tree.
04:23
I don't want to go too much into the algorithm because we don't have much time and I just want to describe how we can use it. In essence, the first thing to do, you have to initialize an automata with this NDPI into automata and you have to add all the possible words to it. In this case, it's a simple hello and word, and then you have to finalize the automata.
04:43
In fact, the main problem of How-Currency-Calc is that whenever you want to add a word, you have to rebuild the automata. The same happens if you want to remove it. So make sure that you have all the possible words, otherwise you need to start over with another automata and do a hot swap in case your application is processing, Then at the end, you see NDPI match string that allows you to check if inside this sentence there is at least one word matching the dictionary.
05:10
And of course, NDPI returns such a string. We have optimized the algorithm for networking, so therefore we can find strings that end
05:20
with a certain suffix or also we can have strings that begin with a certain suffix. In essence, everything you can expect for matching a domain name is present into this library, even though you can also use it for matching pure strings. Just to give you an idea, the memory used by the algorithm to create the dictionary is increasing with the number of words.
05:44
And as you can see, when you have about half a million words, that is half of the Alexa top of a million hosts, the size is about 900 megabytes. The build time is also increasing with the number of words. We have run this test on a very slow dual-core machine just to give you an idea of the speed.
06:03
So with half a million words, it takes about seven seconds on a dual-core Intel 3.2 GHz. But the nice thing is that if you have the search, as you can see, the search time is more or less linear regardless of the number of strings to search. It's just the memory that is causing a little bit more and also to build the automata.
06:23
The second problem I want to show you that you can solve with MDPI is IP matching. In this case, we need to find an IP address on a tree that is typical whenever you need to match several network prefixes with IP addresses. That again is typical if you have a list of blacklisted hosts, a list of spammers,
06:42
a list of network ranges that are not nice to contact, just to give you some examples. A radix tree is the base of this algorithm. In essence, it's a tree where we have in each node a single letter. So in this case, a cat, c-a-t-s, cat, c-a-t, and so on.
07:04
Whenever a node is a terminal node, so it's a match, it is basically designed with this yellow color, whereas if the node is intermediate, so it's used to be the tree, it's blue. This is the radix tree. Now, the radix tree is important because we need to match a certain prefix.
07:27
And matching a prefix is very important because a network, in essence, is an IP address that starts with a certain IP. So this is why we are interested in that. And the performance is good.
07:42
It's O-W, where W is the length of the string to be inserted. But here, we're talking about IP addresses. So how can we turn a radix tree into something meaningful for us? Simple. We start optimizing. So we start collapsing the nodes that contain words that can be collapsed together.
08:03
So in this case, c-a can be collapsed into a single node. So we move from a trie to a radix tree, where radix means something in common. And as you can see, this is a data structure that is naturally ordered. So therefore, if you navigate it, then you will have results in a specific order.
08:21
Now, in 1968, Morrison has created a special version of the radix tree called Patricia. In a Patricia tree, basically, we have nodes that instead of being letters, they are numbers. And it's pretty efficient for subnet matching in both IPv4 and IPv6. You can also do partial searches so that whenever you have to match, it's a network range.
08:43
So a slash 24 or a slash 32 in the case of IPv4. You can support, as I said, both IPv4 and IPv6. In MDPI, what you have to do is the following. First of all, you have to create the Patricia tree. You have to specify the number of bits you are going to use. 32 for IPv4 or 128 for IPv6.
09:01
And then you start hiding nodes, in this case, with MDPI Patricia Lookup. And then you can do MDPI Patricia Search best, because we always try to find the best match. That is usually what you want to do with networks. Along with the fact that you have or don't have a match, you can bind some metadata.
09:23
For this, you can bind some information about the network itself. If it's a good network, it's a blacklist network, it's a network of farmers. So anything you can have in mind, you can put, you can add to it. And in terms of performance, on the same machine I showed you before, you can have a Patricia tree built in less than a second.
09:47
It occupies about 17 megabytes with 76,000 prefixes, that is quite a lot. And as you can see, the search again is under one microsecond, so it's pretty fast. Again, the importance here is the speed, because we want to use MDPI live.
10:03
Another typical problem we have to address is probabilistic counting. So this means that whenever we need to know how many, whatever, we need to allocate a data structure. That is usually a hash table. So for instance, if you want to say, how many hosts does my host contact?
10:22
So you have to keep a list or a hash table of these values. Or for instance, if you want to know how many different countries a certain host has contacted. A typical question that allows you to answer the question, am I doing something local or global remote? So for instance, with Skype, you're going to contact half of the world with other typical protocols such as HTTP or TLS.
10:45
You're going to stay usually in a certain geographical area. So in order to answer these questions, the simplest thing to do is to use a generic data structure. But unfortunately, those data structures take a lot of memory, in particular if you have a lot of data. So if you are unfortunate, if you have a host scanner or a network scanner, you will end up using a lot of memory.
11:06
That's why we use probabilistic data structure. It means that data structures are not perfect. They introduce a little bit of error. But in return, you will use much less memory and you will have a lot of speed.
11:20
The one I'm going to present is called HyperLogLog that has been created by Fagioli some years ago. And it's a probabilistic data structure. So it gives you an idea of the cardinality of a set. Again, you can have, for instance, for a host, you can put the number of... So the IP address of host as contact, the number of countries, things like that.
11:43
And the memory that you're going to use here depends on the error that you're going to accept. I'll show you an example. Suppose I want to create two data structures, one for counting the number of different hosts that have been in contact, the one for counting the number of different counties that have been in contact.
12:01
When I initialize the data structure, I have to specify an I value. And the I value here in this table shows you, first of all, the memory that is used for a certain I and the error that you are going to expect when you want to know the cardinality. In my case, I use an 8. So it means that with 256 bytes, I'm able to have an error of about 6.5%.
12:25
That is pretty good. Because if I want to know scanners, I don't really need to be super perfect. I just need to know the amount of hosts or domains or whatever that have been in contact. That for me is pretty good. So with 256 bytes, I can do exactly that and calling a NDPI IP level account, I have the result.
12:47
Another typical problem is anomaly detection. That is basically whenever we want to understand if there is something that deviates from our expectation. So in this picture, you can see some people are not worked with the right red color as others.
13:01
This is our goal. And we want to do it for two main reasons. First of all, because we want to clean data. So if we find our players, if we find data that is a little bit unusual, it might be an error in measurement. Otherwise, because we find a problem. So the reasons are manifold.
13:20
Now I'm going to explain to you how we can do that with NDPI. Usually we do this with time series. Time series is another set of data points. I think you're pretty familiar with those. If you use Grafana, if you play with networking data, InfluxDB, these type of things. And once you have a time series, in essence, you have the data.
13:42
But here we have to introduce two new words. One is called observation. So it means that the value that we really read from the network. And the other one is forecast. Is the value that we expect at the next iteration to have. So for instance, now if I want to predict the next value of the traffic of my network in one minute, this is going to be the forecast.
14:06
And when the minute is passed, I'm able to do the observation to read the real value. The discrepancy between forecast and observation squared is called SSE. Sum of squared error that is used to understand how far is the prediction from the reality.
14:27
And how do we use it? Suppose to look at this picture. The series is the blue one. The green one is the prediction. So it's a way for us to mimic the real series. Again, in the future, because we predict the value.
14:42
And as soon as we have the value, we compare the real value with the prediction. In our algorithm, we have the ability of creating two bands, one low and one high. And we say if the series falls inside between the low and high band, then we are good. If it falls outside, then we have an anomaly.
15:01
Very simple. I don't want to make it too complicated because there is a lot of mathematics behind it. But in essence, there are three algorithms for doing that. The first algorithm called single exponential smoothing takes into account only the value. So gives a weight to the value we read. In double exponential smoothing, we give also a value to the trend.
15:21
So for instance, if you want to give an extra bonus to the fact that the value is increasing or decreasing. In the third case, if you have a signal that repeats over time, that is called season, then we can imagine that to predict the future, add the correction factor based on the seasonality.
15:40
For instance, if you have traffic of a host that during the night is low and during the day is high, you can speculate about the future traffic that will follow this same pattern. So these are the three algorithms. The last one is called all winters. So in essence, we have three values. Smoothing factors are called just to give an estimate of the value.
16:02
Alpha, beta and gamma. Now in NDPI, we have implemented all the three algorithms and you can decide based on the nature of your data if you want to use the first two or the third one. The first two don't take into account seasonality. So if you have a seasonality, you're obliged to use the last one basically. Otherwise, you can use the first two.
16:21
As you can see, we need the value of alpha and beta into the algorithm. So in order through that, either you use average values or something that you believe it makes sense or otherwise, you do something called fitting. So in essence, we provide the two functions that allow you to predict those values based on the past.
16:42
I will show you how it works. In essence, you allocate, in this case, a double exponential multiple data structure and inside the data structure, we continuously add the value that we read from the field. Every time we add a value, we receive back from this NDPI desk add value, a prediction and a confidence band.
17:01
So it means that we give you, back to you as a user, the value and the boundaries up and low that we expect to see. If your value is within the boundaries, then we are good. If it falls outside, we have an anomaly. If you have some measurements from the past, you can fill this algorithm with whatever values you want.
17:23
It doesn't really matter. And then at the end, you can call NDPI desk fitting and we return the best alpha and beta value for the past. So it means if your signals stay similar to what we have seen before, these are the best two values that you can use for the future for predicting the future. In essence, with NDPI, you can have something like this.
17:42
At the beginning, you see learning because the algorithm is still trying to learn how it works this signal. And then at some point, we start operating. OK, OK, OK. At some point, you have an anomaly because the value that we read, the 173, is falling outside the confidence band. Actually, it is lower.
18:01
In this case, the second reading is too high. The last thing I want to talk about today is binning. Data binning is a technique that allows you to split the value into something called bins. So in essence, a vector of numbers. A typical example, for instance, is packet length.
18:21
We are used to split packet length into bins of up to 64 bytes, from 64 to 128, 128 to 256, and so on. So this way, we don't have to keep all the individual values, but we can keep ranges. This is the goal of the bin. And we can use it for many things. For instance, if we want to compare two host time series, in essence, I can consider the point of the time series as a value of a bin.
18:46
OK, if I want to see if two connections start with the same packet length, I can use this as a bin or the length of the packet, as I showed you before. I want to show you an example.
19:00
This is code that is present inside NDPI as an example. Suppose that we have two or more time series. Suppose that we have the time series of many hosts of our network. I want to see when two hosts are similar. So I would like to know when two hosts behave the similar way from the network standpoint.
19:22
So thanks to this, there is this example. RRD is an archive for time series files. You take some RRDs of hosts generated by TopNG by other tools that allow you to pull from SNMP, it doesn't really matter. And then you give it to this tool.
19:41
In essence, this tool is trying to compare those bins and to find those bins that are similar. Here you see NDPI bin similarity. And this allows you to find hosts that behave more or less the same way and others that are very different. For instance, we have applied this technique inside NTopNG to find from SNMP those network ports that produce similar value,
20:04
so that in case of an attack, for instance, behave the same way, or ports that are supposed to behave in a certain fashion, you can find if they are similar or not, just to give you an idea. We have many, many ways of using that. And don't forget that this algorithm is super fast because with over 10,000 hosts,
20:21
we are able to compare reading the values and doing the match and all this in less than a second. So NDPI, as I said, is designed for speed. We have many more features, but I don't have the time to describe all of them. For instance, we have streaming data analysis. We have clustering, also called unsupervised machine learning.
20:41
We have other functions for high-speed JSON serialization, jitter, entropy, and so on. But I think you need to read the source code because we are running out of time. The last thing is the following. We have been recently awarded by Google because of our work in this field, and we would like to use this money to invest in the community and the development.
21:00
So if there are people interested to work in this field and being paid for developing open-source software, contact us. Thank you very much for being here today, and I encourage you to download NDPI and to play with it. If you want, you can contact me at any time. Thank you very much.
21:35
Showtime. Yeah.
21:41
Yeah. Thank you, Luca, for your talk. It was a very interesting topic. So I got a few questions. So how about contributing to NDPI and NDPI and some stuff? Yes, we would like to encourage people to contribute to it because we see that there is a need to help in various areas,
22:08
in particular, cybersecurity for dissecting protocols, not just to understand what is the application protocol behind it, but to understand the behavior of OST. And so we have recently been awarded by Google with this prize.
22:21
And the idea is to use this money to pay people, to pay students, to pay contributors. I mean, it doesn't have to be a job. It has to be a contribution to the open source. And therefore, so if you believe in, you know, if you see a value in what we are doing and you are open to help us with the new algorithms,
22:42
new protocol, new implementation or new ideas, please feel free to contact us. And we would like to be in touch with you and to understand what are your ideas and to, of course, to enroll you with this project. Yeah, sure. The next question is, what other open source projects is NDBI being used by?
23:07
Well, I know that, for instance, it is embedded in OpenWrt as a package, and there are many people that are using it inside small devices for blocking traffic using IP tables.
23:21
This is a typical example. Or I see that there are people that are using it to classify traffic and therefore to generate, let's say, data for machine learning algorithms. So it seems mostly to generate an input to other applications.
23:40
And what we said today is that we would like to also sponsor the fact that the library is also offering an API for traffic analysis so that if you need to create applications, just take DPDK, PFring or anything for packet capture, and then put NDPI for traffic analysis and focus on what you have to do.
24:03
So this is another opportunity for people. So the open source, I think, should not be limited to the application recognition, but also to the traffic analysis part. So you guys are also asking what language is supported by NDPI or is it more likely to be limited by capture methods?
24:28
Well, NDPI is a library that allows you to set the traffic and to generate the metadata. It's written in C. So in our idea, I mean, it can help you to simplify the design
24:46
of your application, because in essence, you delegate to an existing component the analysis. So I don't know if this is helping with answering the question.
25:07
Is NTOPNG based on NDPI or what is the relationship between these two open source projects? Yeah, let's say that inside NTOPNG we use NDPI as a layer for everything.
25:21
Because like I said, we have delegated to it many of the features that you have to implement. So NTOPNG is based on NDPI, but there are also other tools. PMAS just made a good suggestion on the chat saying that the DANOS that is based also on DPDK is using.
25:44
So let's say it implements something that every monitoring application or let's say a traffic processing application doesn't have just to route traffic or to be simply limited at the layer. Another example I forgot is OpenB switch, because in OpenB switch there is also somebody who moves the NDPI into it.
26:08
So that is possible, for instance, to control the traffic between peers that are talking to each other and limiting them to select the protocol. So for instance, you can say, allow SSH to pass or block Netflix, these type of communications.
26:24
You already mentioned that NDPI is written in C. What language bindings are available for NDPI? There is also a binding for Python, so you can use Python.
26:40
The Python binding is using NDPI and is extending it also in the context of machine learning for understanding the traffic. So Python is definitely one of them. And there is also somebody who ported Go, so you can also use it from Go. And being a pure C library, I don't think it's difficult to bring it to other languages such as Rust, for instance.
27:07
We didn't do it, but I believe it should be pretty simple. Also, because we try to be self-contained, namely that, you know, for instance, a bucket capture is not part of NDPI, simply because
27:20
we want you to use this library on top of, for instance, PDK or Netmap or anything. It doesn't have to be also bound to, for instance, be picked up. So this means that the library is pretty portable and doesn't bring any dependency with it besides the basic libc thing.
27:41
So it should be pretty easy to move it to other languages. Yeah, okay. Just a last question from my side. NDPI, we shouldn't, yeah, okay. NDPI, we shouldn't look at it just as a protocol, so we can do much more, I guess.
28:05
Yes, yes, that is the idea. That was the idea of the talk today. We can use it also for processing traffic. So, namely, even if you don't need it at all, you know, the public is better. You can use NDPI, for instance, for determining who is the top talker or what are the hosts that are contacting many other hosts.
28:22
You know, typical questions that you have in cybersecurity in general when you have to analyze network traffic. Yes. And the last one, the changes are from Crayta encryption, so we can do more encryption now than ever, I guess.
28:41
Yes, encryption is creating troubles to DPI because we cannot inspect the payload anymore, but it's also offering opportunities. Simply because when you talk with, you know, plaintext protocol, there is no real fingerprint. I mean, the fingerprint is probably inside the process of HTTP when you have to.