We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

AI VILLAGE - It's a Beautiful Day in the Malware Neighborhood

Formal Metadata

Title
AI VILLAGE - It's a Beautiful Day in the Malware Neighborhood
Title of Series
Number of Parts
322
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Malware similarity analysis compares and identifies samples with shared static or behavioral characteristics. Identification of similar malware samples provides analysts with more context during triage and malware analysis. Most domain approaches to malware similarity have focused on fuzzy hashing, locality sensitivity hashing, and other approximate matching methods that index a malware corpus on structural features and raw bytes. Ssdeep or sdhash are often utilized for similarity comparison despite known weaknesses and limitations. Signatures and IOCs are generated from static and dynamic analysis to capture features and matched against unknown samples. Incident management systems (RTIR, FIR) store contextual features, e.g. environment, device, and user metadata, which are used to catalog specific sample groups observed. In the data mining and machine learning communities, the nearest neighbor search (NN) task takes an input query represented as a feature vector and returns the k nearest neighbors in an index according to some distance metric. Feature engineering is used to extract, represent, and select the most distinguishing features of malware samples as a feature vector. Similarity between samples is defined as the inverse of a distance metric and used to find the neighborhood of a query vector. Historically, tree-based approaches have worked for splitting dense vectors into partitions but are limited to problems with low dimensionality. Locality sensitivity hashing attempts to map similar vectors into the same hash bucket. More recent advances make the use of k-nearest neighbor graphs that iteratively navigate between neighboring vertexes representing the samples. The NN methods reviewed in this talk are evaluated using standard performance metrics and several malware datasets. Optimized ssdeep and selected NN methods are implemented in Rogers, an open source malware similarity tool, that allows analysts to process local samples and run queries for comparison of NN methods.