AI VILLAGE - It's a Beautiful Day in the Malware Neighborhood

Zitieren

DEF CON

Maisel, Matt

Formale Metadaten

Titel

AI VILLAGE - It's a Beautiful Day in the Malware Neighborhood

Serientitel

DEF CON 26

Anzahl der Teile

322

Autor

Maisel, Matt

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/39783 (DOI)

Herausgeber

DEF CON

Erscheinungsjahr

2018

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Malware similarity analysis compares and identifies samples with shared static or behavioral characteristics. Identification of similar malware samples provides analysts with more context during triage and malware analysis. Most domain approaches to malware similarity have focused on fuzzy hashing, locality sensitivity hashing, and other approximate matching methods that index a malware corpus on structural features and raw bytes. Ssdeep or sdhash are often utilized for similarity comparison despite known weaknesses and limitations. Signatures and IOCs are generated from static and dynamic analysis to capture features and matched against unknown samples. Incident management systems (RTIR, FIR) store contextual features, e.g. environment, device, and user metadata, which are used to catalog specific sample groups observed. In the data mining and machine learning communities, the nearest neighbor search (NN) task takes an input query represented as a feature vector and returns the k nearest neighbors in an index according to some distance metric. Feature engineering is used to extract, represent, and select the most distinguishing features of malware samples as a feature vector. Similarity between samples is defined as the inverse of a distance metric and used to find the neighborhood of a query vector. Historically, tree-based approaches have worked for splitting dense vectors into partitions but are limited to problems with low dimensionality. Locality sensitivity hashing attempts to map similar vectors into the same hash bucket. More recent advances make the use of k-nearest neighbor graphs that iteratively navigate between neighboring vertexes representing the samples. The NN methods reviewed in this talk are evaluated using standard performance metrics and several malware datasets. Optimized ssdeep and selected NN methods are implemented in Rogers, an open source malware similarity tool, that allows analysts to process local samples and run queries for comparison of NN methods.