Network troubleshooting in heterogeneous cloud environment with Skydive
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 47 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/37939 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
6
11
13
20
23
28
31
32
35
37
40
42
44
47
00:00
Point cloudComputer networkIntegrated development environmentSystem programmingMathematical analysisReal numberSpecial unitary groupNetwork topologyMotion captureBinary fileDataflowOpen setSoftwareCore dumpMathematical analysisPresentation of a groupMereologyComputer networkDifferent (Kate Ryan album)Communications protocolProduct (business)Real-time operating systemCASE <Informatik>Natural numberVirtual machineNetwork topologyRow (database)IP addressWeb 2.0RoutingInstance (computer science)Integrated development environmentRouter (computing)Reflexive spaceAddress spaceTable (information)WeightComputer architectureNeuroinformatikComputing platformConnectivity (graph theory)Motion captureCuboidResultantPredictabilityBinary codeWritingFile formatShooting methodState of matterPoint (geometry)Computer animation
04:21
Interface (computing)NumberAddress spaceMatrix (mathematics)Computer networkInformationDrop (liquid)1 (number)Video gameComputer animation
05:09
Computer networkSummierbarkeitComputer networkObject (grammar)Link (knot theory)Interface (computing)User interfaceComputer animation
05:50
Inclusion mapInterface (computing)Network topologyMechanism designComputer animation
06:21
Motion captureMIDILimit (category theory)Interface (computing)Electric generatorFiber bundleCore dumpComputer configurationStability theoryRight angleComputer animationLecture/Conference
08:11
Motion captureGame theoryNamespaceComputer networkState of matterSpacetimeUser interfaceComputer animationLecture/Conference
09:11
Content (media)Computer animation
09:25
WordData storage deviceComplex (psychology)Order (biology)MereologyDefault (computer science)TheoryPosition operatorRight anglePhysical lawComputer animationLecture/Conference
10:16
Computer networkArmNamespaceComputer animation
10:52
WordComputer networkInterface (computing)SpeciesService (economics)Workstation <Musikinstrument>Sparse matrixUser interfaceBasis <Mathematik>Order (biology)SpacetimeBus (computing)NamespaceLaptopHeegaard splittingMultiplication signDecimalObservational studyWordContent (media)Student's t-testPoint (geometry)Motion captureComputer animationLecture/Conference
14:04
Interface (computing)Motion captureGoodness of fitBitComputer animationLecture/Conference
14:35
Special unitary groupGoodness of fitTunisSource codeContent (media)Event horizonInterface (computing)Mixed realityLevel (video gaming)MultiplicationComputer animation
15:35
DataflowBinary codeWeightOpen sourceOrder (biology)Cartesian coordinate systemNetwork topologyWritingArithmetic meanMechanism designTerm (mathematics)Rule of inferenceEmailWeb 2.0MathematicsInterface (computing)AreaElectronic mailing listIntegrated development environmentError messageInjektivitätClient (computing)Food energyComputer animationLecture/Conference
18:04
System programmingXML
Transcript: English(auto-generated)
00:06
Hello everyone, my name is Sylvain Bobot, I'm working for Red Hat and I'm going to introduce my colleague, which is another Sylvain, but also from Red Hat, Sylvain Afshan. And today we're going to talk about Skydive, which is a software that was created two
00:23
years ago at Red Hat and it's a real-time network topology and protocols analyzer. So the reason behind Skydive is that that's one of the primary use cases, troubleshooting and troubleshooting the network is particularly hard, so by nature it's distributed, so you
00:44
have to SSH a lot to many machines and when using SDN you can have a multiple SDN, so for example you could have an open stack running with the Notron SDN and then on top
01:01
of this you could have like a flannel network and then troubleshooting is getting really really hard. And the toolbox that is available to you is, well if you use a proprietary SDN you can have other tools, but the basic toolbox is the IP route utils, so the IP address
01:22
and NATNS bridge, stuff like that. You also have the Open vSwitch tools, so OVS, VSTTL, OFCTL to show the flows and stuff like that. You also have of course TCP dump and WireShark to do your packet analysis.
01:44
So one of the goals of Skydive, we have to deal with many SDNs, so we did not want to be tied to a single SDN, so we wanted the SDN agnostic to be able to use Skydive with flannel, Notron and stuff like that.
02:05
Another goal is we have to be able to do real-time analysis, so on a running platform like production, but also some use cases is post-mortem analysis, so for example if you had an issue, the reflex is to delete the instance and to try again, and then you
02:27
are not able to troubleshoot anymore, so we do Skydive record everything so that you can do the analysis later, and it has to be lightweight, because it is supposed to be running on your production machines, and really easy to deploy, because when
02:45
a problem occurs, you have to be able to deploy Skydive really fast. So we came up with this software, which is a distributed architecture, so to answer
03:01
the easy to deploy question, it is the one binary, so for all the different parts of Skydive, so a single binary, you just copy it and then you can run it. We also have an all-in-one mode, so you just start and you have everything. So it is composed of only two components, which is the agent, the Skydive agent,
03:25
it is supposed to be running on all your compute nodes, and it is responsible to capture the network topology, and also to capture the network flows,
03:41
and then it forwards everything to another component, which is the Skydive analyzer, and its role is to aggregate all this topology information, do some more analysis, and also to serve the API. So as this presentation is released short, we are going to
04:06
do a large demo, and we start with a very quick overview of Skydive. So in our environment, we have one Skydive analyzer and two Skydive agents, and so how does it look like? So that's the web interface, everything is accessible
04:25
using the command line, but let's use this. So here you have one agent and another one, they are both connected, you see all the network interfaces, so the local host, the ETH ones,
04:40
and they are both connected to a top-of-rack switch. So you can click on every node and get the information about the interfaces, so the MAC address, the MTU, and all the relevant information, and you can also have the interface metrics, so the number of packets
05:02
dropped and received and stuff like that. So let's go back here, and now let's create some network objects and see how Skydive reacts. So if we create an interface,
05:28
Skydive is listening to the Netlink events, sorry for the lag, so we see that the interface just appeared,
05:43
where is it? Yeah, so here it is, and then we can also add an interface on this interface, and we can see that Skydive was able to see the new interface.
06:01
So we really do not do any polling, we do this, we try to subscribe to all the topology mechanism, so for OBSDB we listen for events, for Netlink, for Docker, we try to avoid the polling as much as possible, and then
06:22
so now we're trying to generate and to capture some packets, so to do this, we can do this, we can select an interface, so I'm going to select the eth1 of every machine,
06:42
up, and you can select a pass, up, and then you can specify your BPF filter to select only a certain kind of packets, oh and sorry I'm going to do this again because I want to enable some options,
07:04
up here, yeah let's ask for 10 packets, okay and then Skydive bundles a packet traffic generator, and so you can choose,
07:29
you can select generic traffic, so you can generate ICMP, TCP dump and stuff like that, okay and then I'm going to just generate ICMP packets from there to there, and then
07:45
we can see that's up here at the bottom right, you can see that I can see the ICMP v4 packets that was just generated, and here you can also, with this button you just
08:04
can open this with a wire shark and get access to the full data, okay so that was for the very simple demo, now a little more complex, we are going to create a
08:21
docker container, so here, sorry, we can see that docker creates a network name space for every container, and so we can see it here, and an interesting thing, oh stop moving,
08:43
okay and so that's the network name space and the physical interface, but we can see that there is a picto here to say that it's a docker container, so Skydive when it saw this network name space, it asked docker if he knows about this name space, and so you can have,
09:06
it's right there, but you can have the, sorry, so you can see here the name space, but you can also click in the container and get the container ID and the container name and the
09:27
demo, so we have other container connectors, so we have connectors for OpenStack Neutron, for OpenContrail, Kubernetes connector is underway, and yeah, I don't know what's,
09:44
okay, yeah can you hear me, so yeah perfect, hi, so I'm the other Sylvain, and I'm going to continue to add complexity for the demo, so we are going to deploy,
10:00
to use a docker swarm in order to deploy an NTL application, so basically a data store with MySQL container and two WordPress containers, so I'm going to do that, so first I'm going to initialize docker swarm, okay, which is done now, and we will see that we have some lags,
10:30
okay, so we can see that Skydive already detected what docker swarm did, so we have much more network name spaces created, I'm not going to see what this container
10:48
are doing, but I'm going to continue to create what I explained before, meaning the the WordPress and the MySQL, so we're going to do that, we're going to start by create a docker swarm network
11:02
used in order to interconnect our two services, the MySQL and the WordPress, we start the container, the service, the MySQL service, so it's starting,
11:21
and we should have the MySQL container appearing very soon, I hope, yeah just here, it's too huge for my laptop with the three VMs, so we have much more containers,
11:48
network namespace involved in the deployment, so we have here, we know that this is the MySQL instance, we can click on it as explained by Sylvain just before, and we have much more details about the service, we have the docker labels just here, and we can see we have probably one
12:08
namespace related to the docker network, so I'm going to continue to start the WordPress, the first one on the the agent one which is done, so it should be there soon, yeah,
12:42
so now we have everything connected, so we can see that we have the other container just there in this namespace, sorry for the click, and just here, and it seems that these two namespaces are interconnected by this one, this namespace, and this one as well with this path,
13:04
in order to check what is the namespace used by the the network, okay, I'm going to do this fast, so this one is probably the one used for the for the network that I created just before,
13:23
so I'm going to start the capture between this point and this point in order to confirm that we have some packets, I'm going to do that very quickly because we are short in time, not this interface, but this one, this one, I'm going just to capture the traffic for
14:02
MySQL, okay, we have now a capture, and I can go to the interface of the of WordPress in a bit of traffic, so this is not the good port, sorry for that, okay, so we should have some
14:37
traffic there now, just here, and if we expand the flow, we can see that we have a MySQL
14:44
traffic just there, so I am not going to continue, but I'm just going to explain what what we have just after with the other container scheduled on the other node, on the other host, we are able to follow packets passing from one host to the other one, and using the overlay,
15:03
so the vxlan interface, and we are able to follow packets within a tunnel, so and we do support multiple tuning encapsulations, so we we do support vxlan, GRE, GeneV, so and you can have a mix between them, so it means that if you have, for example, an OpenStack deployment,
15:21
and on top of it, you have a container deployment with another SDN on top of that, you can follow the packets leaving the first level of the encapsulation, and going to the other one, so I did capture some traffic, and I'm going to skip this one, so we did everything with the
15:44
web UI, but as Sylvain said just before, everything is doable with the command line, so you can create the topology, you can create the flows and the packets, you can create captures, you can do packet injection, and we have an alerting mechanism that I'm not going to
16:03
show you right now, but I can explain, you can write alert rules for flows and topology, meaning that if there is a change in terms of flows or traffic or in terms of interfaces and stuff like this, you can get informed that something changed, so you can write, for example,
16:21
rules for bound weights or up-down interface, or if there is no more certain kind of container. For the roadmap, we are working on an eBPF probe in order to capture the flow in a light-weight
16:41
manner, a DPTK1, so we have POC for both of them, and we are working on the layer 3 topology, because currently we have a kind of layer 2 topology, and we are working in order to add layer 3 and application topology too. It's an open source tool, and you can reach out to us on IRC or on the mailing list.
17:07
Thank you. If you have questions, yeah, basically this is a single binary and statically linked,
17:34
so you can just copy the binary somewhere, and you can start it as a not-in-one service,
17:41
and it comes with everything, so with one binary you have the client, you have the analyzer, if you want to have a distributed environment, but you can just have one. Sorry? Okay, okay, okay, but you can come. Thank you!