We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Network Traffic Analysis of Hadoop Clusters

Formale Metadaten

Titel
Network Traffic Analysis of Hadoop Clusters
Untertitel
Understand the common usage patterns and identify typical / atypical workloads.
Serientitel
Anzahl der Teile
611
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2017

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Cybersecurity is a broad topic and many commercial products are related to it.We demonstrate a fundamental concept in network analysis: re-construction andvisualization of temporal networks. Furthermore, we apply the method todescribe operational conditions of a Hadoop cluster. Our experiments providefirst results and allow a classification of the cluster state related tocurrent workloads. The temporal networks show significant differences fordifferent operation modes. In reallity we would expect mixed workloads. Ifsuch workload parameters are known, we are able to handle a-typical eventsaccordingly - which means, we are able to create alerts based on contextinformation, rather than only the package content. We show an end-to-endexample: (1) Data collection is done via python, using the sniffer script; (2)using Apache Hive and Apache Spark we analyze the network traffic data andcreate the temporary network. Finally, we are able to visualize the resultsusing Gephi in step (3). In a next step, we plan to contribute to the ApacheSpot project. # Expected prior knowledge / intended audience: No special skills required, but minimal exposure to the Hadoop ecosystem ishelpful. # Speaker bio: Márton Balassi is a Solution Architect at Cloudera and a PMC member at ApacheFlink. He focuses on Big Data application development, especially in thestreaming space. Marton is a regular contributor to open source and has been aspeaker of a number of open source Big Data related conferences includingHadoop Summit and Apache Big Data and meetups recently. Mirko Kämpf is a Solution Architect at Cloudera and the initiator of theEtosha project. He holds a Diploma in Physics and worked on several projectsrelated to complex systems analysis. His focus is on time dependent networkanalysis and time series analysis, using tools from the Hadoop ecosystem, andespecially on the related metadata management. Mirko is actively using opensource tools, author of several blog articles in the Cloudera engineeringblog, and a speaker in Big Data related conferences and meetups.