VoIP Troubleshooting and Monitoring with SIP3
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44644 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
InternettelefonieComputer networkSession Initiation ProtocolWordSystem programmingBuildingSoftware developerTelecommunicationComputerInternettelefonieBitCodeRevision controlInternet service providerScaling (geometry)TelecommunicationProjective planeMonster groupDifferent (Kate Ryan album)Core dumpNumberLibrary (computing)Dependent and independent variablesCASE <Informatik>Physical systemGoodness of fitInformationProcess (computing)Type theoryMultiplication signOpen sourceSession Initiation ProtocolProduct (business)Service (economics)Stack (abstract data type)MereologySynchronization2 (number)Message passingResultantControl flowWeb pagePoint (geometry)Right angleSoftware frameworkSoftware developerHypermediaPresentation of a groupSoftware bugSystem callLogicOpen setNeuroinformatikCross-correlationComputer animation
07:18
DiagramArchitecturePartition (number theory)Level (video gaming)Metric systemComputer wormLevel (video gaming)DatabaseInstance (computer science)Matrix (mathematics)Physical systemExecution unitDifferent (Kate Ryan album)Multiplication signPartition (number theory)Mixed realitySession Initiation ProtocolFlow separationCoefficient of determinationConnected spaceDimensional analysisMetreComputing platformEvent-driven programmingNumberMetric systemOpen sourceDiagramImage registrationService (economics)Address spaceJava appletInformationSoftware frameworkMereologyNetwork topologyReal numberParameter (computer programming)INTEGRALFluxCommunications protocolComputer architectureBeat (acoustics)Computer wormRevision controlProduct (business)Software bugStructural loadFormal languageCASE <Informatik>Wave packetIntegrated development environmentSoftwareGraph coloringDrop (liquid)Dependent and independent variablesSharewareSubject indexingOpen setGoodness of fitDisk read-and-write headMathematical optimizationAuditory maskingGodAverageSystem callInternet service providerType theoryMessage passingTime seriesDialectProjective planeState of matterPoint cloudAbsolute valueCross-correlationRaw image formatSlide ruleNP-hardEvent horizonMathematicsCodeAuthorizationComputer animation
14:29
TouchscreenComputer animation
15:25
Commodore VIC-20Binary fileSharewareSign (mathematics)View (database)WindowSummierbarkeitSmith chartOnline helpInformationOpen sourceProjective planeMultiplication signComponent-based software engineeringArtificial neural networkDifferent (Kate Ryan album)Dimensional analysisCommunications protocolExpert systemAverageMetric systemMereologyAxiom of choiceSession Initiation ProtocolINTEGRALSoftwareLoginSharewareLevel (video gaming)Revision controlReal-time operating systemPlug-in (computing)Electronic mailing listMoment (mathematics)Mathematical analysisCodeScripting languagePhysical systemAdditionTerm (mathematics)Goodness of fitPresentation of a groupProcess (computing)DiameterMessage passingOrder (biology)CASE <Informatik>System callOperator (mathematics)InternettelefonieHypermediaVirtual machineCross-correlationOcean currentWrapper (data mining)Enterprise architectureMedical imagingMachine learningComputer fileMatrix (mathematics)Video gameContext awarenessStatisticsFluxSoftware bugPlastikkarte2 (number)Computing platformKernel (computing)Time seriesLipschitz-StetigkeitBefehlsprozessorConnected spaceInstance (computer science)Nichtlineares GleichungssystemGame theoryMusical ensembleState of matterIP addressImplementationComputer animation
24:13
Point cloudComputer animation
Transcript: English(auto-generated)
00:05
More VoIP and monitoring. We got Oleg here who's going to tell us about CIP3. It is his first ever public presentation, so please give him some love and encouragement. Thank you guys. Thank you guys. Thank you for coming.
00:25
I'm going to tell you about yet another voice-over IP monitoring and troubleshooting system called CIP3. You might have heard about Tapir. Tapir is the previous version of our system,
00:46
but I'm pretty sure that for most of you it will be something new today. Now let me introduce myself. That's all you need to know. I'm Aga Fox everywhere in GitHub and in social media, but a bit more.
01:03
I have been designing and developing distributed systems, different distributed systems as developer and architect for more than 10 years. And last six I'm spending my time in telecom. Telecom did lots of great stuff last couple of decades. But it's obvious that we need to update and adopt telecom to modern technologies and to modern stacks just to make it up to market.
01:33
So that's what I'm trying to do all the time in CIP3 as well and on my previous places.
01:43
So now let me start from a scary story. It's called JAWS 3261. There was a company who built a great, pretty good voice-over IP provider service. And at some point they just realized that they have something like 20,000 messages per second.
02:05
And engineers who designed originally and who built all these systems, they're spending most of their time just trying to help with customer support tickets. Because customer support tickets is a simple thing. I tried to call from A number to B number and I had no success.
02:25
What was the problem? And when was this call? This call was something like three hours ago. But when exactly? I don't know, maybe 15, 20 minutes difference. Just imagine now that 20,000 CIP messages per second and 20 minutes, it's a good amount of data.
02:43
So this company, they stored TCP dumps and they rotated it by time and by size. And then they tried to find and correlate CIP sessions and all related to this call information from these TCP dumps.
03:02
So it was insane. Another thing, this company, they didn't have much of budget, but they asked us to help. We decided to help and the first thing we did, of course, because we are lazy engineers, we tried to make a research, what do we have on market, what do we have in open source, of course.
03:24
And let me recall, they had a simple requirement, release poor engineers and make support team responsible for troubleshooting. It's easy, right? Just A number, B number and time, that's it. We did deep, deep, deep research. By deep research, I mean that we looked far than second page of Google.
03:47
Than first, actually. And that's what we found so far. It was Oracle Paladin, it was Homer, and it was VoIP Monitor. So three solutions. Unfortunately, we didn't have much budget to deploy any of these.
04:04
And we took Homer back then, it was Homer 5, and unfortunately 20,000 CIP messages per second. They couldn't handle it in open source version. Because it was MySQL and it was far, it was two years ago. That's why the only one thing we could do is we created Monster.
04:25
We created Tapir. And this Monster did the job. So Tapir was able to deal with any amount of data without any problems. But okay, it was restricted, it was developed for particular case, right?
04:40
When you have A number, B number and date, that's it. So we made poor engineers happy. Yes, and they kept going and building new business logic and customer was happy as well. Tapir was, Tapir is based on lots of open source frameworks and libraries and projects
05:02
for capturing, for processing, for storing data. So we couldn't, to do not pay back to communities. That's why we pushed Tapir on GitHub. I still remember that night when we gathered these guys. We had some beers and we did this git push and we started waiting for glory and growing community.
05:29
Next two years, it was two years ago, next two years, we didn't sleep. We didn't eat because we were fixing issues and working with pull request because it was crazy.
05:41
And now I'm proud to say that we have the most starred telecom project ever. I think you can do like this here. Okay, that's how we dreamed. Reality is a bit different. Reality is a bit different.
06:01
After two years, we have 36 stars. I think that 35 is from our friends and 36 is from my mom. And, but the worst part that we had only one open issue. And it was like a question and something like this. So without marketing, you can, I mean, open source now think without marketing.
06:22
But we truly wanted to make the project good. But we didn't give up, as you understand. That's why I'm here now. We just took a break. We kept working on our main projects and main activities. And meanwhile, we were collecting information. What else?
06:43
Voice over IP providers, like different, but mostly like big size providers want to have and want to, let's say, see in our type of products. And in collecting this information, we came to simple requirements number two.
07:00
So release pool, engineers make support team responsible for troubleshooting. Release pool support team and make computers responsible for monitoring to prevent troubleshooting. So it's easy. What could we do? We did C3. C3 is the next version of Typeer and now it will be like original, now it will be the name of brand and product.
07:25
We are going to work and develop. So let me introduce you. Let me give you some technical details because I was explaining some stuff so far. This is our architecture diagram. Pretty common for any monitoring platform.
07:44
Absolutely. I mean, it's a typical thing. You just can change names and that's it. So Captain kind of stomach. It captures data. It's adapter. So it encapsulates different protocols from,
08:01
I mean, network, like raw protocols from Salta and sends information and internal protocol to Salta. Salta is a bit hard. It's an engine. It's event driven pipeline based on, OK, all our product made written in Java and Java based languages.
08:20
For instance, now your version of C3 written on Kotlin. Kotlin is the language from JetBrains. Guys behind IntelliJ IDEA. So Salta is pipeline and it's responsible for retrieving data from different sources. You can see that we can grab data from third party sources like OpenSips, ReSwitch, Asterix.
08:43
And Salta aggregates, correlates all call related information. Partially correlates, not fully. And sends data like metrics to third party monitoring to InfluxDB because we use InfluxDB and to MongoDB. Tweak is our brain because we correlate information only partially because you can't correlate it fully and you can't correlate it.
09:08
Not all information comes in real time. That's why Tweak makes lots of work aggregating another part of information. And Hoof is just a beautiful UI.
09:20
That's it. I skipped database layer because I want to spend one more slide on it. For payload we use MongoDB and we optimized it a lot. Mongo distributed and shared by itself so it actually has a very good performance. But to be able to handle any amount of data we do lots of tricks.
09:41
I mean kind of best practices. First of all for SIP we do a couple of levels of partitioning. On partitioning level one we separate information by SIP methods. Like by SIP calls and SIP registers. On partitioning level two we separate data on call index because you don't need actually to index every message you have.
10:08
For instance if you have 100 trying why do you need to index it? Or if you have at the same time 100 trying has the same fields like from to whatever. So that's why we have index created originally from initial methods with some additional information from another methods.
10:26
And also we have partitioning by time. Because without partitioning by time you can't actually build a good monitoring system. So all these optimizations they help us to have search agent and such a nice feature like advanced search.
10:43
It's a new one because as I said Tapir was able to search only by number. Like A number, B number and daytime. Here you see advanced search new dashboard where you can search by any type of information you want.
11:01
Here it is. It's a real example from production. You can see that engineer he looked in the beginning he looked for SIP register and state unauthorized. He found out that okay I see some anomalies. He put a caller mask in SIP register and he found out that okay somebody is just doing registration every 5 milliseconds.
11:27
Just you know incrementing number. And then he just checked that okay all these things comes from the same source address. So it's a fraud detection. I mean it's a real case of troubleshooting of fraud detection.
11:42
I just changed numbers of course. But to realize that you have some issues. That you have some anomalies on your network especially when you have 20,000 SIP messages per second to 100,000. You need metrics. You can't live without metrics and you can't have only for instance average call duration or...
12:04
Okay you can have even average call duration and ASR as metrics but you need to have multiple dimensions. You need to have ability to see what are the average call duration or different metrics. Let's say for particular customer, for your interconnection partner, whatever.
12:23
Because you can have a partial disruptions on your service and you need to know about this. So that's why C3 collects and correlates metrics by any dimension. So you can say okay I want to have it by trunk.
12:40
I want to have it by user agent. I want to have it by this or that. You can have it here. And here is the real example of our customer company Telestacks. They are C-Pass provider who is working on RESTCOM platform and they have installed RESTCOM platform on Amazon in three different regions.
13:05
In US, in Japan and in Ireland. And you know, till two times they asked us to provide them some metrics, some information about their service.
13:21
They already had something like environment with requirements. For instance, they had infrastructure as a code project where with Ansible they deploy all these services in cloud. And they already had Datadog as monitoring system. So what we did, we just provided them part of C3 integrated with Datadog.
13:46
How did we do? You see under C3, you can see this logo. It's a Java framework called Micrometer. And this is actually an adapter for any, like literally any, almost any time series database you want to use.
14:03
You can think of. For instance, this Micrometer can send metrics to Datadog, to Primitives, to InfluxDB, to Elasticsearch, to New Relic, whatever. Whatever you choose. Because let's say that in our world now, integrations are like a new feature.
14:21
I mean, you need to provide flexibility in the integrations. So now it's a demo time. Yeah, dangerous demo. So the things that, as you know, we started from... Okay, I need to do something to make it visible, right?
14:55
Can anybody help me because I think it's... It's a duplicate screen.
15:07
So I want to make this screen.
15:25
Okay, nice. Okay. So we started from GitHub project, as you know, and it had no success. That's why now we want to build our community around demo project.
15:41
So we have demo C3.io. You can log in there. Okay, you can log in there and you can actually try it by yourself. Okay, with artificial data for now. But we are going to implement and put all our new features under this project.
16:01
So I think that to try is even better than to see it on GitHub. But GitHub version is coming too. So you see simple search is something that we think is good for support teams. Because it's something that replies on most of the customer's tickets.
16:22
You have this advanced search where you can search by method, by anything you want. IP, address, list of hosts.
16:45
So it's again the same thing. Unfortunately for demo we did only one leg, but we will add more information and you will see pretty much the good picture with correlation of all legs and all methods. Currently it works only with SIP.
17:02
No RTP and RTCP. And also you can check out technical dashboard. We just put it as an example. You have ISR, average call duration, call attempts by different states, average call duration by direction, whatever. So it's just a showcase. We will add more showcases and also we will add business dashboard
17:23
which will show you different business metrics. Because when we realize that we have all information about network exchange, we can use it for business as well. We can show you some business insights and this is great.
17:42
Now let me try to get back. You see some code. Here is our roadmap. We are going to release GitHub version soon. After that we will be working on DPDK capturing because now we are working on RTP and RTCP.
18:04
And when we talk about 20,000 SIP messages per second, it's insane amount of media. And libpickup, actually we use originally wrappers over libpickup, is not capable to deal with this thing. That's why we need to go on the low level and use DPDK.
18:21
DPDK is toolkit for capturing information directly from network card, without routing to kernel. So after that we are going to implement machine learning. We have metrics, we have different dimensions and we can predict anomalies. And actually even now with InfluxDB and different plugins for it,
18:44
you can have something like time series, metric analysis and everything. And we are going to introduce more of UI improvements. That's it for I think next maybe three, four months. And actually that's it at all.
19:02
Thank you. Thank you for... Any questions? I can share this T-shirt for questions.
19:29
Actually I have stickers also, if you want, because it's my first presentation as a speaker, but I love conferences and I am a sticker addicted person.
19:46
So I have a proprietary system and we want to make some monitoring. What kind of exports do you support? I mean what should I export to the system in order to have these stats?
20:03
Can you repeat the question? So they have like a proprietary system or they consider using this one, what kind of exports they should do in order to use your tools? So they have a system that's proprietary,
20:21
you have like a protocol like I don't know, help or JSON or so that they can use to... Okay. First of all, we are capturing from network, but if you have a proprietary protocol, okay, we can make an integration because we have our internal protocol. And in addition to this, we do not take advantage of open source
20:42
and we can foresee at the moment you can send help with v3. So in future maybe it will be for RTP and RTCP as well, because guys did a great job integrating with other platforms. So we can do not take this advantage.
21:02
Okay, so somehow related until we get there, is there any Docker image or virtual machine that someone can just take and run and... Sure. In a month, as I said, we will release an open source version and it all will be with Docker. And also we are working on Ansible part where you can just deploy components,
21:22
whatever you need. I mean, because this system has different components and you can deploy it with Ansible scripts or Puppet, Chief, whatever you need. Data ingestion is based on pickup files, packet captures, right? Now it's live capturing or pickup files, yes.
21:42
What about encrypted traffic? You can't decrypt it. Okay. So that's the use case in enterprises. It's a good question, but we have it only in our roadmap at the moment. Maybe other data sources you suggest? Okay.
22:00
The thing is we are a small team and we are working on... Thank you for your question. We are working on features based on our customers' requests. And okay, yes. When we will be big and fast growing, we will implement SSL or if our customer will ask it, we will do it for him, right?
22:22
And also, if we talk about different connections, there are plenty of choices. Because, for instance, we were thinking about integration with SS7 protocols because some of our customers are close to mobile operators and we are good in camel and in diameter.
22:42
So if they will ask us, we will implement this as well. At the moment, we have roadmap which I just showed you and we are trying to move very fast and just to have new and new requirements from our customers. Just like this.
23:00
I have one more question. You're saying that you are using F3. Is there on your roadmap to support other kind of messages besides SIP, like logs and stuff like that, and RTCP? I mean, you said you are going to support RTCP.
23:22
Yes, we are going to support RTCP and RTCP. I was interested in logs more because it's interesting to attach. But you have Elasticsearch. Why do you need us? I mean, we are more like experts in terms of real-time communications, in terms of voice over IP. So all these metrics, you need to know how to aggregate it,
23:46
what to extract, and where to put. We want to be experts in this case. We don't want to compete with Elasticsearch. So you still can use Elasticsearch for your logs and you can use this thing for voice over IP at the moment.
24:02
I was thinking of attaching a context with a call, that's why. Ah, okay. Okay, we are a bit over.