Efficient Network Analytics with BPF/eBPF using Skydive
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 50 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43096 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
All Systems Go! 201842 / 50
1
4
11
12
13
14
16
17
19
23
24
25
29
30
32
34
35
39
40
41
43
44
50
00:00
SpacetimeChainSystem programmingComputer networkNetwork topologyReal numberCommunications protocolComplex (psychology)Open sourceTopologyOverlay-NetzAerodynamicsSoftware frameworkMathematical analysisModemPersonal digital assistantVisualization (computer graphics)DataflowMotion captureVideo trackingMotion capturePlug-in (computing)Multiplication signInterface (computing)Complete metric spaceSoftwareNumberComplex (psychology)CASE <Informatik>TrailGraph (mathematics)Link (knot theory)Real-time operating systemNetwork topologyComputer networkMetric systemPresentation of a groupInformationConnected spaceCommunications protocolDataflowCartesian coordinate systemOpen sourceProjective planeArray data structureOpen setBand matrixArtificial neural networkMereologyMatrix (mathematics)User interfaceReal numberCore dumpData storage deviceDifferent (Kate Ryan album)Mixed realityLatent heatProduct (business)Point cloudView (database)Mathematical analysisBridging (networking)Computing platformTerm (mathematics)CuboidRight angleVisualization (computer graphics)Binary code
06:38
System programmingGroup actionArchitectureMotion captureCASE <Informatik>Latent heatDemo (music)MereologyMathematicsOpen sourcePoint (geometry)Slide ruleGraph (mathematics)InformationMechanism designSoftwareMultiplication signNetwork topologyLevel (video gaming)Mathematical analysisGroup actionComputer architectureMetadataInterface (computing)NamespaceComputer animationProgram flowchart
08:22
System programmingGraph (mathematics)Query languageFormal languageTraverse (surveying)InformationCASE <Informatik>Interface (computing)Interactive televisionFile formatSource code
09:08
DataflowVideo trackingPersonal digital assistantCommon Language InfrastructureArchitectureNetwork topologyLink (knot theory)Computer networkNamespaceTable (information)Power (physics)System programmingLibrary (computing)Mechanism designSoftwareInformationCASE <Informatik>Web 2.0Data storage deviceVirtual machineGraph (mathematics)Connectivity (graph theory)Power (physics)Plug-in (computing)Network topologyLibrary (computing)Computer architectureGraph (mathematics)Point (geometry)Interface (computing)Medical imagingElasticity (physics)DataflowConnected spaceTable (information)BlogMotion captureQuery languageLocal ringDigital electronicsDatabaseBridging (networking)Product (business)PA-RISC 2.0NamespaceLink (knot theory)Flow separationMereologyWeb browserInstance (computer science)Computer wormMechanism designFluidFood energyGroup actionData structureComputer programmingSlide ruleCommunications protocolComputer animation
14:19
Connectivity (graph theory)Presentation of a groupTablet computerPhysical systemSource codeXMLProgram flowchart
14:40
System programmingSynchronizationGraph (mathematics)Service (economics)NamespaceLink (knot theory)AerodynamicsSoftwareLink (knot theory)Web 2.0Level (video gaming)MereologyCartesian coordinate systemNamespaceVirtual machineService (economics)Graph (mathematics)Computer animation
16:28
System programmingLink (knot theory)Virtual machineService (economics)Fitness functionCubePhysicalismComputer animation
16:48
Physical systemDataflowTable (information)Motion captureTexture mappingNetwork topologyVideo trackingSpacetimeKernel (computing)System programmingProcess (computing)Metric systemSimultaneous localization and mappingDigital filterDrop (liquid)Core dumpQuery languageBefehlsprozessorTask (computing)Message passingLimit (category theory)MappingKey (cryptography)Overhead (computing)DataflowNetwork socketType theoryProcess (computing)SpacetimeScaling (geometry)Kernel (computing)Matrix (mathematics)Latent heatQuery languageInheritance (object-oriented programming)Table (information)MereologyMotion captureSerial portObject (grammar)FlagBytecodeBitDifferent (Kate Ryan album)Arithmetic progressionInformationFluidMultiplication signHash functionNetwork topologyLine (geometry)Web 2.0Metric systemComputer animation
21:52
Socket-SchnittstelleSystem programmingService (economics)Motion captureProcess (computing)Control flowGraph (mathematics)Duality (mathematics)Network topologyInformationMotion captureLink (knot theory)Virtual machineMereologyConnectivity (graph theory)2 (number)Electronic mailing listWritingQuery languageProjective planeMetadataProcess (computing)Equivalence relationSocket-SchnittstelleGraph (mathematics)SoftwareNamespaceShift operatorInternet service providerCartesian coordinate systemStack (abstract data type)DataflowMatrix (mathematics)Address spaceMathematical analysisLibrary (computing)View (database)Computer animation
26:57
System programmingComputer animation
Transcript: English(auto-generated)
00:06
Okay, hello everyone, I'm Sylvain Beaux, working for Red Hat, and I'm replacing my colleague that was supposed to do the presentation, and today we're going to talk about Skydive. Last year we did a presentation about Skydive, so I don't know how many of you saw the
00:23
presentation. Did someone? No? Okay. One person. Okay. So what is Skydive? Skydive is a real-time network topology and protocol analyzer. So what it does, basically, it collects all your network topology of your whole infrastructure,
00:47
and it allows you to do some traffic capture and to analyze the traffic. So the reason why we started this project is that networking is obviously very complex.
01:04
For example, right now you could have, like, on your cloud, you could have, like, an OpenStack, and on top of this you can have Kubernetes, each one with a different SDN, so you can The complexity is huge. It changes a lot, so basically VMs and containers are created and deleted all the time, so it's
01:27
very, very dynamic. There are often you make use of channeling, so you have, like, VXLAN, GRE, GeneV stuff, so it can make troubleshooting complicated.
01:43
And that's the main use case for Skydive, which is troubleshooting. And there was a complete lack of open source tooling for troubleshooting. You were basically stuck with the IP, NetNS, TPCP dump, and all the usual toolbox, but
02:02
that's not enough. So our goal was to design a software that is agnostic to the SDN, and not SDN, but any platform, so we are not tied to OpenStack, but we can work with it, we are not tied to Kubernetes, but same.
02:21
We want to be able to do this troubleshooting and this analysis in real time, but also to be able to do this as a post-mortem stuff, so, for example, basically a user created a VM, he had connectivity issues, he deleted everything. And so we had to find a way to be able to go back in time and to see what happened.
02:48
So we wanted something very lightweight, easy to deploy, because if you have an issue in production, you don't want to deploy a very complex software.
03:03
So it's a single binary, you put it on the machines, and you're good to go, basically. So Skydive can really be seen as a toolbox, you can use it in every way you want, but one very often used use case is just the visualization, just to be able to see what's
03:27
in your infrastructure, so you can see, well, you can barely see here, but that's the neural network topology, so here you can see, for example, a top-of-rack switch, and on each port of this top-of-rack switch, there is a machine, and in the different
03:45
nodes and stuff, there are the physical interfaces, the network temp spaces, the open vSwitch bridges, stuff like that. And you can, of course, you can zoom, you can zoom out, you can restrict the view,
04:01
because it can be huge on your infrastructure, you can click on every node and get information, so precise information, the MTU, the names of the containers and stuff. So another stuff you can do with Skydive is capture traffic, to be able to
04:22
troubleshoot, so on this screenshot here, you can see the node on the left, the yellow one, so this one, we are capturing the traffic on this node, and when you click on this node, you can get all the flows, so that's the arrays at the right, so you
04:41
can see the different TCP flows with the source IP and destination IP, and you can even have more information in a specific node, so you can see that we look at the link layer, the network layer, the application layer, we get the metrics, so we see the
05:01
number of packets, the number of bytes for this flow, the start and the stop, we also measure the RTT, so different information on the flow. We also compute something that is useful, which is the tracking ID, so for example, if you have two VMs talking on SSH together, the traffic goes on different interfaces,
05:30
and sometimes there are tunnelling, so the traffic can be encapsulated, so we compute what we call a tracking ID, which allows us to follow this SSH traffic on all the
05:43
nodes of your infrastructure, so we selected one flow, and all the yellow nodes were the interfaces where this traffic was seen. As we collect all the metrics, all the interfaces metrics and the flow metrics,
06:02
we are able to graph them, so we developed a Graphana plugin for this, it's available directly from your Graphana installation, I think it's a Graphana marketplace, something like this, and here you can, this plugin directly talks to the Skydive API, and you
06:26
can draw like the bandwidth for a specific VM or for a user of your cloud, or whatever you want. So that's a demo of Skydive, which is an action, so here you have the, it's a pretty huge
06:49
infrastructure, so the yellow part is the top of our switch, and all the other nodes, and so you can expand different, in this case it's the namespace, and you can create
07:03
the capture, so for capture you select one node, one source node, and the destination, and it will create, it will capture the traffic on the path between those two interfaces, we created an ICMP with a BPF filter, so that's it, and then we can also inject
07:21
traffic, so with the same mechanism you select nodes and stuff, so that's it. Okay, so now a very short slide on the architecture, very top level, at the centre of Skydive there is a graph engine, so basically we create nodes and edges, and we store the
07:49
information on the nodes as metadata, and as I said, as we want to do the postmortem analysis, every change on this graph is archived, so we are able to create, to recreate the graph
08:05
as it was at a specific point of time, so like two years ago, how was my network topology, and so this graph is populated by probes, and we will see more later, and so
08:23
the way you interact and you get information from this graph is using the API, and this API is accepting a syntax which you call the Gremlin language, the Gremlin language is a graph traversal language, and so it looks like what you see on the top,
08:47
so here it's a very very basic query that we do on the command line, we just ask Skydive the nodes that are named, that have a specific name, so for this case it was my Ethernet interface, and so it gives you all the nodes that match as a JSON,
09:09
and so if you go back to the example, the use case I showed you before, so with the tracking, so the traffic where my SSH traffic was seen,
09:24
on white there is the Gremlin query corresponding to this, so we identified a flow, we got the trafficking idea of this flow, and then we ask Skydive the nodes that have seen this flow, and well same for the Graphana plugin accepts Gremlin query,
09:46
so basically you can graph anything that you want, so here we are graphing the ICMPV4 traffic, and we are aggregating all the flows, so now a more precise architecture,
10:06
slide, so we have two components in Skydive, the first what you see on the left, which is what we call the agent, basically it has to be started on all the machines of your infrastructure, so your compute nodes or your Kubernetes nodes, and so those they have
10:24
probes and they collect their local topology, and they send this topology to one or more analyzers, which so aggregates all those local graphs and creates a big graph with it,
10:41
and it serves the API and the web UI, and it stores everything in a database, so it's pretty common design, and in our case we support a elastic search mainly, and so on the agents, where do we get the information from, so first we
11:02
talk to the netlink, so it gives us information about the interfaces, we also talk to eth tool to get to know what the features that are supported by this interface or this card, we also collect all the network namespaces that exist on this node,
11:25
we talk to open vSwitch using the OVSDB protocol, we talk to Docker, to Kubernetes, to Neutron, and so on, and even the sockets, which is a probe that I will describe later,
11:43
so now we are going to see what's new since last year, because we added a way to capture traffic using DPDK, so for high performance use cases,
12:04
we are now able to capture traffic on OVS, so we were able to capture traffic on OVS, but on the whole bridge, now we are going to, you can capture only the traffic for a specific port, we fetch the routing tables, the RP tables for the nodes, so we work closely with the IBM
12:30
team and we worked on adding support for the power architectures, so this is available for OpenStack and you also have the Docker image for the power architecture.
12:47
We also have improvements on the deployment side of Skydive, we have a nice Ansible library to deploy on your infrastructure, there is also a blog post which describes how to install
13:03
Skydive and makes use of the Ansible network stuff, so that you can get information using LLDP about your switch, so your Skydive is populated with your switches for instance,
13:20
and we have, not very sexy, but an airbag mechanism, so that's another feature we have since last year is workflows, so basically if you, with Skydive, typical stuff that you want to do is to check the connectivity between two machines, basically what you would do, you would select
13:43
the node, you would create, capture the traffic on the right points of your infrastructure, you will generate some traffic and then you will query the Gremlin API to see if the flow that you injected was seen properly, so workflow is basically a way to automate those actions.
14:07
The bad side is that it's JavaScript, you have to write it in JavaScript, but the nice thing is that you can run it almost everywhere, you can run it in your browser, you can run it as a
14:25
and you can also, there is a JavaScript engine embedded into Skydive, so you can have Skydive execute your workflows, so and you have a, once you create a workflow,
14:41
it will appear in the web UI and you can have a nice way to trigger your workflows. We also have a new Kubernetes probe that was created a few months ago, so it's still an early stage of this probe, it's basically synchronized the Kubernetes
15:06
resources and put them into the Skydive graph, so we support many Kubernetes resources, the namespace, the service, the pod and some others. What we do, we simply, importing these
15:23
resources is not enough, we create links between those resources, so for example you can have what pod are part of a service or you can see the network policy, so which pod this network
15:40
policy applies to, and you can also go down to the, so that's the application layer, obviously, but you can also go down to the physical layer, so you can get your service, then your pod, then your Linux, then your docker container and then your VTH and your
16:11
so it's pretty easy to deploy Skydive on your Kubernetes, it makes use of demand sets, so it will run on all your nodes, there is also a hand chart which is available,
16:28
and so that's what it looks like here, so it's pretty messy, but you can see here there is the mini cube, and then the service and the pods, and on the right side there are links to
16:44
the physical machine and its docker containers. What's new, we also have a way to capture traffic using eBPF, so last year we had the iF packet and OVS, but for better performance we created this probe, so
17:07
it's separated in two parts, the kernel space and user space, so the kernel space obviously is where the eBPF bytecode runs, so it's attached on a socket and then
17:24
so we get the from the kernel and we create a flow table on the kernel side, and so when we see a packet, we try to find which flow it maps to, so we compute a session
17:44
key which is basically an hash of all the layers we saw in the packet, and we also maintain the counters, so we are able to give you metrics about on those flows, and periodically
18:01
this flow table on the kernel size is synchronized with the user space side, and when we do this, we compute a few more stuff like UID, tracking ID, and then we are able to do the mapping between the topology, so how to do this is very easy,
18:22
that's another capture type, so you just select this, this is the well part of this probe, it's really really really simple, it's 300 lines of C, so it's really tiny, and then at the end we can see that we compute the hash using
18:43
the fnv hash function, so really really simple stuff, so regarding the performances, it's a bit like comparing apples and oranges, but I'm going to do this, why? Because it's
19:03
the eBPF capture, the flow probe is not a feature complete, there is no support for tuning, no TSCP reassembly, no IPv4 defragmentation, so it's still a work in progress, and it's not very easy to measure the performance as it does not account
19:24
as the skydive process, so this summarizes a bit the pros and cons for different capture types, so for IF packets you have basically a support for every kernel, but the overhead is huge,
19:44
but there is no limitation, we support all the skydive features, the skydive features are supported with IF packets, it's not really a capture type, but we have eBPF, so you can restrict the amount of traffic you want just by specifying eBPF features, so this way you can,
20:05
you don't have everything, so you have no packet metrics and stuff, but still you can do very useful things, and then you have eBPF, but you have to use a recent kernel, the overhead is really really small, but no classification, so if we do, it's not really a small benchmark,
20:30
so we are using iperf, skydive was pinned on just one core, and so if we generated traffic we're using and with the capture with the IF packet at four gigabytes,
20:45
we started to saw packet drops, so if we specify eBPF filter, so I did not put the feature, but it's only specific flags of the TCP flags, we were able to capture and analyze 15 gigabytes,
21:02
and on eBPF it was 2027, basically the overhead of what we do is really really simple, it's there is almost no overhead, so one bottleneck was the way we, when we ask skydive the flows and it returns the flows, it was returning the flows as Json objects, and the
21:24
serialization was just killing it, so what we did, we switched from Json to protobuf over our web socket, and that reduced a lot the time, the query time.
21:43
Another use of eBPF in skydive is that what we call the socket info, so basically we want to see which process is talking to which process, and what container is talking to and which container, and so it's to mimic the equivalent of the ss command, and so
22:03
so far to do this, first we used slash proc parsing, which was invaluable, and so we found this very nice library, which is a TCP tracer eBPF, which makes use of eBPF to do this, and we integrated it as a probe in skydive, so we put
22:29
on as metadata on the host, we have a list of all the sockets active, so who's listening, who's connecting to who, and then you can write gremlin queries, so we can ask which host is
22:45
hosting the httpd process, we can see, we can ask who's talking to the 10.0.0.10 address on the https part, and you can also, when you select flows, you can also go back to the
23:07
sockets that generated those flows, so with this we can show you a nice view of
23:22
what we call the flow matrix, so you can see the nodes, it's an open stack deployment, so you can see which process is talking to who, so not run the httpd which is going to talking to ovsdb and stuff. Now for the road map, we plan to
23:46
add a hybrid capture, so what we mean by hybrid capture, we want to capture only the first packet, so to do a huge analysis, so we look at the DHCP,
24:02
even the application layer, so what's inside this packet, and then for the other packets we can do just lightweight capture with eBPF, we don't have to do a full analysis of all the packets, and we want to increase the use of eBPF, because right now the way we
24:27
retrieve the network namespaces is just by passing slash broke, and so we are not notified of new network namespaces, and we want to see, we are investigating about using eBPF to be able to get retransmission
24:44
counters from the Linux stack, and so another thing we did this last year is that we thought that maybe it could be useful for some projects to have a graph engine, so it's
25:01
as I said with a full history, so we extracted this from skydive to create a dedicated component which is called Steffi, that gives us a nice joke, and you can integrate it in your tool, and there is also an LLDP probe which is just being developed right now,
25:26
and that's pretty much it, if you have any questions. What is LLDP? LLDP, it's linked
25:46
to be able to discover the topology, so basically you switch sense packets every second to give you, and so just on your machine you can know if you are on this specific switch
26:01
and on what part of this switch and the speed of the link on those informations to discover the topology. Hi, is there anything specific required to run on kubernetes,
26:27
is it tested across different providers or is it expected to just work? I expect it to just work, yeah, by providers you mean, so yeah it works on OpenShift because we test it,
26:46
so yeah basically it's supposed to work. Okay, thank you very much.