Network Traffic Analysis of Hadoop Clusters

FOSDEM VZW

Kämpf, Mirko Balassi, Márton

Formale Metadaten

Titel

Untertitel

Understand the common usage patterns and identify typical / atypical workloads.

Serientitel

FOSDEM 2017

Anzahl der Teile

611

Autor

Kämpf, Mirko

Balassi, Márton

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42215 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2018

Sprache

Englisch

Produktionsjahr

2017

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Cybersecurity is a broad topic and many commercial products are related to it.We demonstrate a fundamental concept in network analysis: re-construction andvisualization of temporal networks. Furthermore, we apply the method todescribe operational conditions of a Hadoop cluster. Our experiments providefirst results and allow a classification of the cluster state related tocurrent workloads. The temporal networks show significant differences fordifferent operation modes. In reallity we would expect mixed workloads. Ifsuch workload parameters are known, we are able to handle a-typical eventsaccordingly - which means, we are able to create alerts based on contextinformation, rather than only the package content. We show an end-to-endexample: (1) Data collection is done via python, using the sniffer script; (2)using Apache Hive and Apache Spark we analyze the network traffic data andcreate the temporary network. Finally, we are able to visualize the resultsusing Gephi in step (3). In a next step, we plan to contribute to the ApacheSpot project. # Expected prior knowledge / intended audience: No special skills required, but minimal exposure to the Hadoop ecosystem ishelpful. # Speaker bio: Márton Balassi is a Solution Architect at Cloudera and a PMC member at ApacheFlink. He focuses on Big Data application development, especially in thestreaming space. Marton is a regular contributor to open source and has been aspeaker of a number of open source Big Data related conferences includingHadoop Summit and Apache Big Data and meetups recently. Mirko Kämpf is a Solution Architect at Cloudera and the initiator of theEtosha project. He holds a Diploma in Physics and worked on several projectsrelated to complex systems analysis. His focus is on time dependent networkanalysis and time series analysis, using tools from the Hadoop ecosystem, andespecially on the related metadata management. Mirko is actively using opensource tools, author of several blog articles in the Cloudera engineeringblog, and a speaker in Big Data related conferences and meetups.