AI VILLAGE - AI DevOps: Behind the Scenes of a Global Anti-Virus's Machine Learning Infrastructure

Cite

DEF CON

Long, Alex

Formal Metadata

Title

AI VILLAGE - AI DevOps: Behind the Scenes of a Global Anti-Virus's Machine Learning Infrastructure

Title of Series

DEF CON 26

Number of Parts

322

Author

Long, Alex

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/39767 (DOI)

Publisher

DEF CON

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Thus far, the security community has treated machine learning as a research problem. The painful oversight here is in thinking that laboratory results would translate easily to the real world, and as such, not devoting sufficient focus to bridging that gap. Researchers enjoy the luxuries of neat bite-sized datasets to experiment upon, but the harsh reality of millions of potentially malicious files streaming in daily soon hits would-be ML-practitioners in the face like a tsunami-sized splash of ice water. And while in research there’s no such thing as ““too much”” data, dataset sizes challenge real-world cyber security professionals with tough questions: ““How will we store these files efficiently without hampering our ability to use them for day-to-day operations?”” or ““How do we satisfy competing use-cases such as the need to analyze specific files and the need to run analyses across the entire dataset?”” Or maybe most importantly: ““Will my boss have a heart-attack when he sees my AWS bill?”” In this talk, we will provide a live demonstration of the system we’ve built using a variety of AWS services including DynamoDB, Kinesis, Lambda, as well as some more cutting edge AWS services such as Redshift and ECS Fargate. We will go into depth about how the system works and how it answers the difficult questions of real world ML such as the ones listed above. This talk will provide a rare look into the guts of a large-scale machine learning production system. As a result, it will give audience members the tools and understanding to confidently tackle such problems themselves and ultimately give them a bedrock of immediately practical knowledge for deploying large-scale on-demand deep learning in the cloud.