We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Turris Sentinel: Choosing the right database

00:00

Formal Metadata

Title
Turris Sentinel: Choosing the right database
Title of Series
Number of Parts
47
Author
Contributors
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Turris Sentinel is a network security oriented project maintaining a network of honeypot-like probes running on Turris routers. In the last few years we developed an open source framework for data processing and explored several message queuing and storage technologies which powers our data processing pipelines. In this lecture I would like to talk about our experiences with Redis, InfluxDB and PostgreSQL databases - their pros and cons and their position in the ecosystem.
25
Thumbnail
06:06
29
Gamma functionDatabaseFirewall (computing)SineRouter (computing)Open sourcePhysical systemArchitectureStreaming mediaProcess (computing)Mathematical analysisGateway (telecommunications)Point cloudSoftwareServer (computing)Connectivity (graph theory)Router (computing)Physical systemRule of inferenceMultiplication signType theoryClient (computing)Information securityRange (statistics)Streaming mediaComplex (psychology)Insertion lossDecision theoryFunction (mathematics)Visualization (computer graphics)Connected spaceElectric generatorStructural loadPersonal area networkDesign by contractRelational databaseProjective planeSphereDomain nameWindows RegistryOperating system2 (number)Online chatDirect numerical simulationOpen setResultantSinc functionSymbol tableDifferent (Kate Ryan album)DemonContext awarenessInternetworkingDatabaseFlow separationSoftware developerFirewall (computing)Open sourceRevision controlComputer animation
DatabaseArchitectureType theoryPrimitive (album)Concurrency (computer science)Data recoveryPhysical systemTable (information)Data structureRead-only memoryString (computer science)Set (mathematics)Hash functionCache (computing)Service-oriented architectureMessage passingDatabase transactionLevel (video gaming)Address spaceInterface (computing)SineRevision controlIP addressTime seriesData storage deviceData structureData typeUser interfaceDatabaseRelational databaseCache (computing)Database transactionSoftware developerTerm (mathematics)In-Memory-DatenbankElectronic mailing listAddress spaceThresholding (image processing)FrequencyQueue (abstract data type)Covering spaceMessage passingPublic key certificatePresentation of a groupInternet der DingeAnalytic setIncidence algebraLevel (video gaming)Hash functionKey (cryptography)Line (geometry)MathematicsSemiconductor memoryType theoryString (computer science)Set (mathematics)CASE <Informatik>Multiplication signDifferent (Kate Ryan album)Physical systemTable (information)AuthorizationComputer architecturePosition operatorObject (grammar)Operator (mathematics)Concurrency (computer science)Context awarenessData recoveryReplication (computing)Data compressionConnectivity (graph theory)InterprozesskommunikationServer (computing)Query languageWordOcean currentComplex (psychology)Computer animation
MeasurementEvent horizonRegular graphBefehlsprozessorMathematicsPasswordRouter (computing)Latent heatField (computer science)Variable (mathematics)Self-organizationTask (computing)BlogData modelRevision controlStreaming mediaTable (information)Library (computing)Standard deviationFrequencyTransformation (genetics)outputDigital filterIcosahedronSample (statistics)AverageRange (statistics)WindowTime seriesTable (information)Streaming mediaProcess (computing)Valuation (algebra)Group actionDemo (music)Library (computing)Range (statistics)FrequencyReguläres MaßMaxima and minimaLoginTask (computing)MathematicsRevision controlOcean currentSelf-organizationMeasurementFunction (mathematics)Field (computer science)Single-precision floating-point formatEndliche ModelltheorieData storage deviceArithmetic meanMultiplication signNumberCombinational logicQuery languageFunctional (mathematics)Point (geometry)BefehlsprozessorEvent horizonParameter (computer programming)Formal languagePasswordState of matterSystem callRouter (computing)Internet der DingeBitCountingSeries (mathematics)DatabaseFlow separationMatrix (mathematics)Structural loadDataflowTransformation (genetics)FluxResultantWindowDifferent (Kate Ryan album)Cartesian coordinate systemToken ringHeat transferWindow functionTimestampData modelComputer animation
Software testingQuery languageTable (information)Human migrationSet (mathematics)CountingRange (statistics)Digital filterTranslation (relic)Order (biology)Data Encryption StandardLimit (category theory)Local GroupWindowString (computer science)Task (computing)Query languageTable (information)Set (mathematics)Row (database)UsabilityIncidence algebraFunction (mathematics)Field (computer science)Presentation of a groupMeasurementFluxGroup actionNumberData storage deviceCountingTelnetProcess (computing)String (computer science)TimestampRange (statistics)2 (number)Task (computing)Scripting languageCASE <Informatik>Ocean currentPhysical systemResultantHuman migrationLine (geometry)TwitterCentralizer and normalizerSoftware developerIP addressWindow functionUser interfaceElectronic mailing listDatabaseFrequencyInterface (computing)Computer animation
Task (computing)Maxima and minimaClient (computing)Sample (statistics)Computer configurationCodeGamma functionToken ringData modelGEDCOMHuman migrationCustomer relationship managementHead-mounted displayPhysical systemTask (computing)Structural loadElectronic visual displayQuery languageSampling (statistics)Server (computing)Multiplication signProgramming languageUser interfaceCodeDifferent (Kate Ryan album)Computer configurationPlug-in (computing)Flow separationOpen sourcePresentation of a groupRouter (computing)DatabaseLibrary (computing)Software developerComputer animation
Interface (computing)ArchitectureSanitary sewerDigital filterForm (programming)Complex (psychology)BenchmarkConnectivity (graph theory)Public key certificateArithmetic progressionData storage devicePosition operatorAuthorizationInterprozesskommunikationFirewall (computing)LoginTime seriesQueue (abstract data type)Relational databaseMobile appUser interfaceService-oriented architectureComputer architectureDynamical systemServer (computing)Router (computing)Public-key cryptographyDatabaseProxy serverMessage passingMedical imagingPhase transitionContext awarenessView (database)Electronic data processingComplete metric spaceComputer animation
ArchitectureCommunications protocolMatrix (mathematics)RobotFirewall (computing)Address spacePhysical systemRevision controlEmailIP addressOperating systemPerspective (visual)User interfaceSoftwareFile Transfer ProtocolFlow separationMultiplication signGoodness of fitTouch typingPoint (geometry)Process (computing)MereologyRouter (computing)View (database)Software testingIntegrated development environmentProduct (business)Electronic mailing listConnectivity (graph theory)Client (computing)Human migrationDynamical systemEvent horizonType theoryComputer animation
JSONXMLUML
Transcript: English(auto-generated)
Welcome everybody to our next talk in ages four on FrostCon Cloud Edition 2021. This talk or this lecture is in English. Since Martin is an English-only speaker,
at least for international conferences, but since FrostCon is international that's not a problem. Martin will talk about the Terrace Sentinel and his experiences with databases for their honeypot network probes running network and the advantages and the pros and cons
of the different database types they have tested and used. If you have any questions, please join us in the BBB conference room because there's a chat you can place your questions in
and we will have a Q&A session after the talk and Martin will try to answer every question you have there. If you are in the stream and have a question, just hop over to the conference room and just type your question in the chat. Now without further ado, Martin about Terrace Sentinel.
Thank you for the introduction. Hello again. Ladies and gentlemen, thank you for watching. I would like to first introduce myself. My name is Martin Brodek and I work for Czech Domain Registry, a company called CZNIC and I work there as a guarantor of
a security team called Terrace Sentinel which provides security enhancements for Terrace workers. As you may mention, CZNIC is therefore not only a domain registry, it also
supports or directly develops a wide range of other security DNS related educational or network related projects like FRED which is a domain and a registry software which currently covers the Czech national domain and also the domain of several other countries.
The other big project is NodeDNS, an authoritative DNS server which is currently used by the Czech domain DNS stack. NodeResolver is a lightweight and highly scalable DNS caching recursive server which is used among others for example by
CodeFire. BERT is an internet routing daemon used by the majority of European network exchange points, for example London's Lynx or Frankfurt-based DX. And in the context of this
presentation, the most important project is Turis. Turis project emerged some eight years ago as a scientific project whose aim was to monitor internet connection of an average Czech household and to detect potential threats inside from inside or outside its
network. To achieve this, router-like probes were developed and distributed among the people in exchange for symbolic one-Czech ground. The routers were
built on powerful hardware, they were able to handle up to a gigabit connection and of course they were open source. Their operating system called Turis OS was based on the top of widely known open source OpenWRT and beside this it was equipped with
a bunch of open source monitoring software which was able to collect data about attempts to attack the router or attempts to scan the router's open ports. This now I can say old system was
called Uconnect and its main results or main outputs was dynamic firewall which was then pushed back to the routers so that the routers were secured by the same data they provided to our headquarters and the other very important output was Greylist which was free to
use by everybody. The routers were a great success and there was an increased public demand for them even in the commercial sphere so the company decided to run a crowdfunding individual campaign for the next generation of the routers. The campaign was successful
and in the 2016 the second generation of the routers were developed and started to be commercially available in the Czech Republic and around the world. The second generation was named Turisonia and wasn't the last one because currently the last one the most
new generation is called Turis MOX. It's picked it here and we think that it's actually the first modular router in the world. These new routers started also to be equipped with a new version of our data collection system. We named this system Turis Sentinel and
data collection was just optional this time so you were not forced by any contract to collect the data and independently of your decision about data collection you were still
able and you are still able to use its outputs like the dynamic firewall and the Greylist. Now why I am bothering you is the history. The reason is that this lecture is going to be about databases about the databases which we used and the old data collection system I mentioned
was built around a single central database which was responsible for data storage, its updates and the final visualization. This system was quite well designed
for a scientific purposes and was scaled to work okay with few thousands of connected routers. But with the commercial success of the later generations of the routers the database
seems to be or was in a big danger of being underperformant under a big load of all that inserts and updates and selects and everything. So before this can happen and before the database started to be really set we decided to start the development of the new
data collection system. I already said the name it's name is Turis Sentinel and we started started the development with three important rules in our mind. The first important rule
concept was to prepare for any amount of connected devices because this time we can be never sure how many new routers will be deployed and those how many connected clients there will be even not mentioning that we think about the possibility of Sentinel deployment even
outside Turis routers which is now under development. The second most important rule was that we try to process the data in streams so there would be no need to update the already
stored the data in our databases and to achieve this we created I think quite complex pipeline infrastructure in our servers. If you are interested in more detail in our pipeline infrastructure there are several other talks in which I described or my colleagues described
the exact components of the pipelines which are mostly also open source. The third rule we had in our mind was that there should be no central database
this time and we then try to use rather three different types of databases and try to use them so they are used in a best way in which they were developed or how to say it.
So this is probably the very beginning of this lecture. First I will give the brief overview of the database we use currently. I will say a very few words about Postgres as it's
very well now database so there's probably no need to describe it and do some loose time with it. Then I will say a few more words about Redis in memory database and then I will spend a big portion the majority of this lecture with InfluxDB. By the end of this lecture
I would like to show you the current architecture overview and the position of all the database instances in it. So start with Postgres as all of you probably know Postgres
or PostgreSQL is a traditional relational database which development started some 30 years ago so now it has quite a long tradition it supports a wide range of from primitive to a very complex
data types and it also has very powerful and advanced concurrency proven reliability and disaster recovery. We use Postgres as the central main database in the old data collection
system so we also were facing some drawbacks of this database. For us the most important was the lack of some easy or implicit data retention because with all the things we've done to the database with all the updates we were like facing the problem of growing data
and so this was the problem I think we were to resolve. There are other drawbacks like there's probably no support of columnar tables and also the compression like some
implicit compression of the data store is a big pain. The second database we use it's Redis in memory database it's usually referred us to key value storage but its authors prefer to
talk about it as about in memory data structure store which emphasizes the fact that Redis is able to store even more complicated data types. Among other strings then list of strings sets
of strings even sorted sets of strings and also object like object or data types which is which are called hashes in the context of Redis and Redis could be used almost like a traditional database you can use it also as a cache and it could be used as a
in all that ways. Redis has built-in replication support for Lua scripting and transactions.
When you are deploying Redis you can be sure that you can set between different levels of on-disk persistence and a very important thing we are working on is the support of
data retention because whenever you store a key to the Redis you can explicitly set its expiration and you can even be notified by so-called Redis keyspace notification when the key expires. This is our use case of Redis. I already said that we use it in
many different situations and we mainly use it as a storage for the list of the IP addresses placed on our gray list. We really take advantage of the data retention here because when we
place some address to the gray list and there are no recorded incidents in a defined period of time the address simply expires and it's dropped from the gray list whereas when there are some
recorded incidents meanwhile the address is on the gray list the address expiration is simply prolonged and stays there for some longer period. Before the address is even placed on the gray list it must surpass a defined threshold of some score we call it a score of
evilness and we use Redis also to keep these scores of each IP address. The usage as a message queue we use Redis as a message queue in our certificator component.
The certificator component takes care about issuing a yellow certificate that ensures that the communication between our probes and the server will be secure and the last use case cache we use simply Redis as a cache for our web interface when there are cached the most
used queries. And now the example we use Python in our pipelines so this is a Python example I mentioned that Redis supports transactions so we want to use all that operation listed
below in a single transaction so first we acquire a pipe and using this and all the commands present in this pipe are then finally really proceed in one transaction.
So the first thing we want to do here actually yeah so this example tries to show you how we increase a score of an IP address before it surpasses the threshold and it's placed on the gray list so in this few commands we just want to increase the score so first
when there are there is no key for the score yet in the Redis defined by this nx equals true we set the score to zero and set its expiration then in both cases when the score is either new
created or there was an older one we want to increase it by the defined score and then set the expiration of the key we have to set the expiration here once again because there's no guarantee in Redis that the key doesn't expire in this very transaction
so in the situation there was a there was the score key before and we didn't create it by this line and then by a big accident the key expires between these lines it would it would be
implicitly created by this line but without an expiration set so this is the reason why the expiration must be explicitly prolonged this is from our Redis and the rest of
almost the rest of the presentation i will spend with InfluxDB as i think that it's not so widely known and you may want to know something more about it so InfluxDB is a
time series database its development started in 2013 and it took our attention only a year ago when a big release named 2.0 was finished this release brought us
major changes which i cover in the next of the in the rest of the presentation for now we can say that InfluxDB is very widely used mostly in iot monitoring analytics and more
but firstly i would like to define what the term time series exactly means time series is a we can say it's a measurement of a single value during the time
and it could be either regular then we refer to it as a matrix that can be for example temperature or some CPU load parameters and besides regular measurement of a single value during the time can be also irregular then we talk about it as
about events that can be which can be example a change state of some iot sensor or a situation when some attacker used a specific password in an attempt to log into a twist router this
time series consists of measurements or single points of the measurement every measurement inside time series inside time series consists of few fields the first field is the very name of the measurement which must be defined the second field is defines the
name of the measured quantity it's simply called field and then we really for sure want to store the value itself and when we are
talking about the time series database we want to for sure store also the time stamp then we are free to use as many text as we want the important thing about text is that they
are indexed so whenever you want to store some important data besides the value it should be stored as a tag because the searches the searches for an exact value stored in the value field can take significantly longer all the values with the same combination of measurement
field and tax for a time series so the we can say we can then say that all the values inside a single time series differs only in its time stamp and the value
that all the text field and measurement are same inside one time series the data model is of the influx db is one of the main things that changes in the version 2.0 the current data model
consists of organizations and users which are independent then the organizations consists of buckets which are the exact places when we store the data the buckets also takes care about the data retention because we are able to say how what's the max age of the data stored inside
a bucket then besides bucket there are also tasks which are some usually regular actions which may transfer data between buckets they can modify them in several ways
and inside influx db there are also checked the runs of the tasks and there are available logs of them besides organized besides buckets and tasks there are also dashboards which i will probably show you later in this lecture in a small demo i've already mentioned
the users they are independent of the organization so every user can be assigned access to one or more organizations and every user can you can have also one or more access tokens
for example for different applications another big change in the version 2.0 is the new language for data querying and scripting of regular tasks which is called flux it was inspired by javascript and it's a preset for stream data processing
because each time series in flux inside influx that database is processed as a single table inside flux so when we want to query more time series the result would be something called
a stream of tables and we are then able to merge the tables inside the stream or create split them to more tables create some time windows and much more to achieve all this
task flux is provided with a quite rich standard library which can do all that mentioned things for us first usually usually first we want to have some output from in front of the database
so we can use a from function in which we define the bucket we want to use then there is a mandatory function which must mandatory follows which is called range this function defines which
time period we are interested in then some filtration can flows and the probably most important thing the transformations of the stream we can as i already said merge tables using a function there we should
remember that every new tech added to the data would create a brand new table inside the stream processing so when we want to ignore some text we have to group the tables together
or to split them based on some other tech if we want to split the tables based on some time periods period we want to we should use the window function and we surely are able to do
some data aggregation based on several parameters and use probably count function some and much more the output is provided by the function yield which is implicitly placed on the end of every query but can be placed even in the middle of the query so that we are able
to get more outputs from a single one now let's look at a very basic example of fast language in this example we simply query the bucket call called my bucket we want to query all the data
younger than one year and we want to see only the data which are from a measurement called simply temperature measurement we can complicate this query a bit so in the way that we want to
know the mean value of the measure of temperature so we simply have to add the group function to group all the possible tables together so we have all the measured temperatures inside one table and then we simply apply the mean function to the table maybe it's also important to note
that the there's if we didn't use the group function there would be as many single outputs like single mean values as there were as the number of the tables was another slightly more
complicated example and can be used when we want to see the mean values spare every month so as before we start by querying the maxi at maximum one year old data we
apply to filter we group it in into one big table and then we create a table for every month using the window function so now we can we can say that there should be some 20 different sorry 12 different tables we compute
a mean value for each of them and simply place all that mean values all the 12 values into one table which we are going to see on the output
uh it's it's it is very nice to see these uh simple queries but as i already said we were using our old data collection system before so we were facing also a problem of the data migration from our old central data postgres database to the new influx db
the flux developers simplified this task for us because they also prepared a flask sql package and using this package it's very easy to connect to our current postgres database
and get all the data from it this is an example of such a maybe migration script so at the beginning we simply connect to the database and define the sql query to load the
data from the old database in this very query we simply want to query all the incidents uh recorded on our mini on our telnet many honeypots and we want the incidents to be
enhanced by the country where the originating ip address is registered when we want to uh process the resulting data in the influx we first must
define what the timestamp would be and because the timestamp stored in the postgres was only in a second precision in seconds we have to uh multiply it by this very big number because influx internally stores the data in nanosecond
precision then we have to define the measurement name simple incident count set the value to one because there's exactly one row for each incident then can be done some aggregation but
i will talk about this later and because the output the value can be defined only as a string this way we then have to convert it when explicitly to integer the rest is just defining the field value and setting the appropriate text
where here we have only the country tech another problem was with the this with the presentation of the data because all our queries were written in sql
and we therefore had to rewrite all our sql queries into flux as an example i've prepared this query when we want to query the number of incidents report reported by different countries
it in sql it's quite straightforward but i think that it's straightforward in flux as well we simply defined range filter as before then we want to
group the data in columns by country so there is a different table for every country then simple count the number of incidents per every table so pair every country for ease of use rename the column value to column count
define that we want to keep only the column containing the count and country make the one big table and we are done if we want to something more complicated and this is probably the most
complicated sql query we were using this query first use it use this nested query to get the number of to get the list of most active countries usually 10 most active countries and then we want to select the number of attackers
of all that most active countries so we want to count the distinct ip addresses i think that once again in flux this is maybe even more straightforward because there's no nested query
it's i think much more linear first we simply compute the list of the most active countries using this few flux commands which were are really similar like that before
what's maybe interesting is that we are only interested in the measurements that includes the country tag and then there is there is uh then we obviously count the incident the number of incidents for each country group them
sort them and limit the result only to the top 10 and the last line is probably the only piece of magic in this very in this very q query because there we transform the resulting table
into a set a set of countries which is then stored inside the top countries variable the variable is then used in the following query so we are only interested in the queries
which country text tech is contained in the set we then want to see the trend of the incidents caused by this country and we want to use the numbers for every day so we
use this window function to have a table for every day then we group it by country just simply count and well that's that's almost that's almost all uh what i was
not yet mentioning is the data retention we are using in our use case and we use infaxdb mainly because of its simply data retention and we decided to use four buckets each bucket with a
different accuracy in the beginning the data are stored into the base bucket which has defined a data latency to one week and every hour the data are
aggregated so basically some summed up and the aggregated data are transferred using an influx task to a bucket named hourly exactly the same way every day
the hourly the hourly bucket is aggregated and daily data are stored and exactly in the same way the weekly data are prepared so in the base bucket there should be no data no data longer than three longer than one week
in the hourly bucket there should be no data older than three months and in the daily bucket there should be no data older than one year whereas in the weekly bucket we plan to store data for no i would not probably say forever far but for a longer
period which is not yet defined i already said that we use three tasks to transfer the data between buckets and the aggregation is done using that i also didn't get mentioned that the
influx database comes with a quite nice web interface which i am now going to show at least i will try the yeah that seems that something is happening so the interface looks like this
i think that it's quite user-friendly especially for the beginners also the influx db developers provided a big library of code examples for many different programming
languages so it's very easy to start using it for example we used this python example almost one-to-one if you do not want to use any code there are other options for example
the telegraph plugins which are fit to scrape your data from several sources including cisco routers system and databases much much more for the purposes of this
presentation i installed this morning a telegraph system plugin so i should have some yeah yeah so i hope i should have some data there by this time
i can explore it by this user interface clicking telegraph maybe system it's load choose that server name submit a query and we can say that there are some peaks of
load during the time i also set up a task which is called down sample system load and the task is is supposed to don't sample the system load every five minutes and the
way that it should compute its mean value so i can try to uh display both the queries so this is the exact system load and i can i can also add
the aggregated values with the mean value displayed here the there is also a place for the dashboards here in the web interface and
when you use the telegraph the system dashboard is even pre-configured for you it's not able to scrape all the system data so it's only possible to see the load i've showed before
but it's not a complicated task to make it work so this was the web interface and now when you know almost everything about about influx and when you probably already know
almost everything about postgres and about freddy's i can show you the complete architecture overview of our pipelines of our server infrastructure with the databases highlighted
so to tell you full image on the left side it's supposed it's supposed to display the rotor our rotor inside which consists of mini pots which are our minimal
honeypots there are also collectors of our firewall logs and honeypot as a server phase proxy which is a full-fledged ssh honeypot but not run directly on the rotors
but all that terrific is redirected to our headquarters so that the rotors are not in under threat then there are our sentinel component other sentinel components like proxy which got us all the collected data and send them using mqtt to our servers
there's the certain components which acquires the certificate for the communication and then there's the dynamic firewall subscriber which receives the updates of the dynamic firewall from our headquarters using 0mq now to the server infrastructure i've already mentioned
the certificator component certificator component used redis database as a message queue here because all the requests which are acquired by our certification app are then
placed to the redis queue using commands lpush and then they are processed by our certification authority the certification authority used postgres database and this is probably
this is for sure the one of two only places in our pipelines where we currently use postgres and we use postgres here as a storage of public keys of the routers so that we are able to check the communication of the routers which is signed by its private
keys are really from them and from some of the others yeah so this is done using the component and the other place where we use the postgres is actually here here we store the
issued certificates so that when there is already issued certificate in the postgres there is no need for issuing another one so we use progress only for as a certificate storage
and a storage of the public keys of the routers then we can continue to our other pipelines these are the in the middle these are the data processing pipelines the data
goes through go through the our mosquito broker then there are few data enhancing components inside the pipeline itself and the final component called dumper simply stores the data in a form of time series to our influx database
in this situation we tried to find the best trade-off between the complexity of a relational database and the simplicity of time series database
and we were doing even some benchmarks to help us decide whether it's better to use influx and when it's better to use postgres we end up with something like
the postgres is probably much is probably better in performance whereas influx is much easier in the context of data retention and also the storage of the needs for the data storage are
bit lower so that's it and other uses of redis are also here by the end of our data processing pipeline where this is here connected to our web interface which is called
sentinel view and it's here to help influx with the most used queries so that redis is able to cache them for the better performance the last position where we use redis is the middle
middle of our dynamic firewall pipeline in the dynamic firewall pipeline we aggregates the data about the attackers and we store the score about every attacker in the
redis and in the very same redis we also store the list of ip addresses currently placed on our gray lists we used redis keyspace notification to inform the routers using the event listener
about the addresses which were dropped from the gray list and we use scoreboard component to notify the routers the clients about the addresses they very newly newly added to the
gray list this is probably all for me so thank you for your attention and i think that we have plenty time for any questions and answers thank you martin for your insights
um as i said before if you have any questions to martin please join us in the big blue button conference come to the chat and ask your question there and i can read it to martin and he can answer it um in the meantime i can i can have a one or two questions by my own
can you tell us when roundabout you switched from the old postgres based system to the new one uh we haven't switched yet actually we are now in the process in the last part of
process of switching from the old data collection system to the new one in the next few weeks we are probably going to migrate the last few routers which are still using our older version of our operating system so it's uh pretty like different things
are connected here together it's not very straightforward and regarding the switch from postgres to influx uh to be honest we started our new data collected system called
risk sentinel also with postgres and we decided to switch to influx later currently we have deployed in our web interface sentinel view with influx only in testing environment and we are going to deploy it on production in the next few weeks and
can you tell us round about how many data points or how many gigabyte of data you you're collecting or storing because you have a retention system you're destroying a lot of data which is good from the gdpr perspective but what is the usual amount of data you are
handing if you're if you're analyzing the data actually the data we gather is quite skyrocketing because with the new data collection system we also deployed a new
types of monitoring software new mini pots for example ftp mini pot http mini pot there were there was already talent mini pot and the new one is also smtp mini pot and especially the smtp mini pots are very like intensively used by the attackers
and we deployed majority of the mini pots uh almost a year ago and we have several terabytes of data collected from by the time so that's a good amount of data to analyze terabytes
yeah okay apparently no one has a question here at least there's nothing coming in the stream and in the chat um so i can tell everyone if you want to come get in touch with martin just
give him an email at martin.prudek at nick.cz martin thank you very much for your lecture thank you very much to show us what what you're uh what you're doing with plugs with with the honeypot network what you're planning to do in the next few weeks i wish you
good luck with the migration and for everybody have a lovely day uh stay at frostcon have a have a have some some more lectures like the one following here which will start at 1600 and it will be about matrix bots so the the messenger protocol matrix and how to program
bots about uh for yourself and by yourself so have a lovely afternoon and bye-bye thank you so much thank you for the opportunity to speak here bye