Detecting random strings; a language based approach

110

Cite

DEF CON

Namazifar, Mahdi

Formal Metadata

Title

Detecting random strings; a language based approach

Alternative Title

Detecting Randomly Generated Strings

Title of Series

DEF CON 23

Number of Parts

109

Author

Namazifar, Mahdi

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/36389 (DOI)

Publisher

DEF CON

Release Date

2015

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Numerous botnets employ domain generation algorithms (DGA) to dynamically generate a large number of random domain names from which a small subset is selected for their command and control. A vast majority of DGA algorithms create random sequences of characters. In this work we present a novel language-based technique for detecting strings that are generate by chaining random characters. To evaluate randomness of a given string (domain name in this context) we lookup substrings of the string in the dictionary that we’ve built for this technique, and then we calculate a randomness score for the string based on several different factors including length of the string, number of languages that cover the substrings, etc. This score is used for determining whether the given string is a random sequence of characters. In order to evaluate the performance of this technique, on the one hand we use 9 known DGA algorithms to create random domain names as DGA domains, and on the other hand we use domain names from the Alexa 10,000 as likely non-DGA domains. The results show that our technique is more than 99% accurate in detecting random and non-random domain names. Speaker Bio: Mahdi Namazifar is currently a Senior Data Scientist with Talos team of Cisco Systems' San Francisco Innovation Center (SFIC). He graduated his PhD in Operations Research from the University of Wisconsin-Madison in 2011. His PhD work was on theoretical and computational aspects of mathematical optimization. During his PhD Mahdi was also affiliated with Wisconsin Institute for Discovery (WID) and the French Institute for Research in Computer Science and Automation (INRIA). Also he was a National Science Foundation (NFS) Grantee at the San Diego Supercomputer Center in 2007 and a Research Intern at IBM T.J. Watson Research Lab in 2008. After graduate school and before his current position at Cisco he was a Scientist at Opera Solutions working on applications of machine learning in a variety of problems coming from industries such as healthcare and finance.