AI Village - Loss Is More! Improving Malware Detectors by Learning Additional Tasks - TIB AV-Portal

AI Village - Loss Is More! Improving Malware Detectors by Learning Additional Tasks

00:00

2

Formal Metadata

Title

AI Village - Loss Is More! Improving Malware Detectors by Learning Additional Tasks

Title of Series

Number of Parts

335

Author

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/48321 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Presentations from DEF CON 27 AI Village

DEF CON 2719 / 335

1

52:39

Ethics Village - Who is Tracking Your Body Health Apps and Your Privacy

2

1:52:24

Ethics Village - Teaching Consulting Pentesting and Ethics

3

1:51:58

Ethics Village - A Rant on Disclosure

4

51:03

Ethics Village - Law Professor Round Robin

5

1:40:23

Ethics Village - Is It Ethical to Work On Autonomous Weapon Systems?

6

1:55:34

Ethics Village - Void If Removed: Securing Our Right to Repair

7

1:50:41

Ethics Village - Ethical Issues in Cyber Attribution

8

48:51

Ethics Village - Coffee Talk

9

45:39

Ethics Village - Coffee Talk

10

23:56

AI Village - How to Get over your malicious ex(tensions) using deep learning

11

29:18

AI Village - Datasciencery by the Splunk Field

12

57:30

AI Village - Deep Fakes Panel

13

38:17

AI Village - Seeing is deceiving: The rise of AI synthesized fake media

14

00:57

AI Village - Opening Remarks

15

27:16

AI Village - Fighting Malware with Deep Learning video

16

30:36

AI Village - Deepfakes, Deep Trouble: Analyzing the Effects of Deepfakes on Market Manipulation

17

31:50

AI Village - Securing your kubeflow clusters

18

26:56

AI Village - Misinformation Keynote

19

40:54

AI Village - Loss Is More! Improving Malware Detectors by Learning Additional Tasks

20

00:00

AI Village - Machine Learning Static Evasion Competition

21

42:34

AI Village - Machine Learnings Privacy Problem

22

46:59

AI Village - From Noisy Distorted data sets to excellent prediction model

23

23:11

AI Village - Backdooring Convolutional Neural Networks via Targeted Weight Perturbations

24

25:20

AI Village - Automated Injection and Removal of Medical Evidence in CT and MRI Scans

25

31:40

AI Village - Behavioral Biometrics and Context Analytics: Risk Based Authentication Re-Imagined

26

44:15

AI Village - A 'buyers guide' to the market promise of automagic AI-enabled detection and response

27

24:00

AI Village - Clairvoyance: concurrent lip-reading for the smart masses

28

33:29

AI Village - Exploratory Data Analysis: Why and How (in Python)

29

23:38

AI Village - Regulating AI and Algorithms: Lessons from safety and security certifications and audits

30

46:16

AI Village - Disinformation: Its the Thought that Counts

31

33:31

AI Village - A Tutorial on Hacking Facial Recognition Systems

32

36:15

The ABC of Next Gen Shellcoding

33

32:25

Exploiting Qualcomm WLAN and Modem Over the Air

34

38:56

HackPac: Hacking Pointer Authentication in iOS User Space

35

47:21

All the 4G Modules Could Be Hacked

36

34:14

Your secret files are mine: Bug finding and exploit techniques on file transfer app of all top android vendors

37

44:49

More Keys than the Janitor: Hacking exposed AWS EBS Volumes

38

39:59

Phreaking Elevators

39

38:35

Breaking Google Home: Exploit It with SQLite(Magellan)

40

18:27

Surveillance Detection Scout - Your Lookout on Autopilot

41

1:53:07

Closing Ceremonies

42

43:03

I'm on your phone, listening - Attacking VoIP Configuration Interfaces

43

45:33

Adventures In Smart Buttplug Penetration (testing)

44

37:42

Backdooring Hardware Devices By Injecting Malicious Payloads On Microcontrollers

45

38:34

Can You Track Me Now? Why The Phone Companies Are Such A Privacy Disaster

46

41:51

The Tor Censorship Arms Race: The Next Chapter

47

46:06

Introduction to Hardware Hacking: How you too can find a decade old bug in widely deployed devices

48

36:55

No Mas How Side Channel Flaw Opens ATM Pharmacies and Gov to Attack

49

48:50

Harnessing Weapons of Mac Destruction

50

40:47

Hacking Congress The Enemy Of My Enemy Is My Friend

51

46:10

DEF CON to help hackers anonymously submit bugs to the government

52

45:57

Infiltrating Corporate Intranet Like NSA Preauth RCE

53

42:12

Exploiting Windows Exploit Mitigation for ROP Exploits

54

40:55

SELECT code execution from using SQlite

55

49:23

Behind the Scenes: Industry of Social Media Manipulation Driven by Malware

56

19:54

Confessions of an Nespresso Money Mule

57

36:28

RACE - Minimal Rights and ACE for Active Directory Dominance

58

20:24

Poking the S in SD Cards

59

20:20

I Know What U Did Last Summer 3 Yrs Wireless Monitoring DEFCON

60

16:20

Apache Solr Injection

61

45:05

EDR is Coming; Hide Yo Sh!t

62

36:46

Sound Effects: Exploring Acoustic Cyberweapons

63

44:25

Relaying Credentials has Never Been Easier: How to easily bypass the latest NTLM Relay Mitigations

64

40:29

How To Improve Coverage Guided Fuzzing Find New 0days

65

36:42

Meticulously Modern Mobile Manipulations

66

43:50

Next Generation Process Emulation with Binee

67

40:57

Hacking Your Thoughts Batman Forever meets Black Mirror

68

37:30

Web2Own: Attacking Desktop Apps fromWeb Security's Perspective

69

40:47

API Induced SSRF: How Apple Pay Scattered Vulnerabilities Across the Web

70

38:35

How You Can Buy ATT TMobile and Sprint Real Time Location Data

71

56:06

Behind the Scenes of the DEF CON 27 Badge

72

41:02

Your Car is My Car

73

23:57

Vacuum Cleaning Security: Pinky and The Brain Edition

74

48:48

Get Off the Kernel if You Can't Drive

75

22:41

Re: What's up, Johnny?

76

46:58

Evil eBPF: Practical Abuses of In-Kernel Bytecode Runtime

77

41:57

MOSE Using Configuration Management for Evil

78

48:07

100 seconds of Solitude Defeating Cisco Trust Anchor

79

42:12

Help Me Vulnerabilities You're My Only Hope

80

38:16

Hacking WebAssembly Games with Binary Instrumentation

81

37:34

Malproxy Leave Your Malware at Home

82

32:47

Firmware Slap Automating Discovery Exploitable Vulns

83

38:24

GSM We Can Hear Everyone Now

84

42:47

Information Security in the Public Interest

85

17:46

Reverse Engineering 17+ Cars in Less than 10 Minutes

86

39:02

Cheating in eSports: How to Cheat at Virtual Cycling using USB Hacks

87

41:17

HAKC THE POLICE

88

46:18

Duplicating Restricted Mechanical Keys

89

42:12

Are Your Childs Records at Risk? The Current State of School

90

47:41

The Ether Wars Exploits counter exploits and honeypots

91

28:36

Owning The Clout Through Server Side Request Forgery

92

28:39

Don't Red Team AI Like a Chump

93

44:17

Want Strong Isolation? Just Reset Your Processor

94

22:07

Unpacking Pkgs: A Look Inside MacOS Installer Packages

95

48:41

Are Quantum Computers Really A Threat To Cryptography?

96

37:28

SSO Wars: The Token Menace

97

17:34

Please Inject Me, a x64 Code Injection

98

52:34

Weaponizing Hypervisors to Beat Car and Medical Device Attacks

99

48:08

.NET Malware Threats: Internals And Reversing

100

22:21

JOP ROCKET A Wicked Tool for JOP Gadget Discovery

101

40:05

HVACking Understand the Delta Between Security and Reality

102

42:04

Im In Your Cloud Pwning Your Azure Environment

103

47:25

Zombie Ant Farm Practical Tips for Playing Hide and Seek

104

44:53

Why You Should Fear Your mundane Office Equipment

105

49:05

Key Search Attacks Against Modern Symmetric Ciphers

106

37:14

Defeating Bluetooth Low Energy 5 PRNG for Fun and Jamming

107

46:18

Cult of the Dead Cow - Change the World, cDc Style

108

35:52

Tag side attacks against NFC

109

42:01

State of DNS Rebinding Attacks & Singularity of Origin

110

41:37

Reverse Engineering 4g Hotspots for Fun Bugs Net Financial Loss

111

43:40

Rise of the Hypebots Scripting Streetwear

112

43:16

Say Cheese How I Ransomwared Your DSLR Camera - DEF CON 27 Conference

113

36:50

MI CASA SU CASA My 19216811 is Your 19216811 -

114

40:20

Deep Learning Revolutionizing Side Channel Cryptanalysis - DEF CON 27 Conference

115

23:56

Go NULL Yourself

116

23:53

Breaking the Back End Sometimes it's Just Bad Design

117

44:33

Aviation Village - A Hackers First Solo Airplane Avionics Security 101

118

11:37

Aviation Village - CAN Bus in Aviation

119

49:45

Aviation Village - The Long Haul The State of Aviation Policy

120

25:35

Aviation Village - A Hacker walks into a Flight School and says Ouch

121

34:35

Aviation Village - Behind the scenes of hacking airplanes

122

27:59

Bio Hacking Village - Pazient Zero Day

123

35:24

Bio Hacking Village - Beyond the Firmware: A comprehensive view of the Attack Surface of a Networked Medical Device

124

27:17

Bio Hacking Village - Employ Cybersecurity Techniques against the Threat of Mediacal Misinformation

125

44:23

Bio Hacking Village - The DIY Artificial Pancreas

126

35:15

Bio Hacking Village - DIY Medicine: Biohacking Pharma

127

32:21

Bio Hacking Village - How to Level Up Experiential Learning

128

1:09:19

Bio Hacking Village - Doctor Hacker Panel

129

40:40

Bio Hacking Village - Spectra: Open Biomedical Imaging

130

34:02

Bio Hacking Village - Forensic Science and Information Security presents: "Lifetime lovers, part-time friends"

131

30:37

Bio Hacking Village - Building a New Decentralized Internet With the Nodes

132

29:37

Bio Hacking Village - Liven Up

133

21:57

Bio Hacking Village - Malware and Biology video

134

52:07

Bio Hacking Village - Medical Simulations Panel

135

1:08:18

Bio Hacking Village - Amputees and Prosthetics Challenges

136

21:07

Block Chain Village - What Is Bitcoin

137

31:24

Block Chain Village - Alice and Bobs Big Secret video

138

34:31

Block Chain Village - Forcing a trustworthy notion of sequential time

139

40:03

Block Chain Village - Privacy Enabler or Hindrance to the Success of Blockchain

140

30:58

Block Chain Village - Hacking Cryptocurrencies

141

36:59

Block Chain Village - Responding to Firefox 0 days

142

38:55

Block Chain Village - A Smart Contract Killchain

143

43:53

Blue Team Village - Memhunter

144

49:52

Block Chain Village - The CryptoCurrency Security Standard CCSS

145

41:09

Blue Team Village - A Theme Of Fear: Hacking The Paradigm

146

26:00

Blue Team Village - The Cyber Threat Intelligence Mindset

147

21:37

Blue Team Village - Blue Team Guide For Fresh Eyes

148

44:14

Blue Team Village - Security Strategy for Small - Medium Business

149

45:08

Blue Team Village - Anatomy Of A Megabreach Equifax Report

150

30:29

Blue Team Village - Serverless Log Analysis On AWS

151

26:02

Blue Team Village - Killsuit - How The Equation Group Remained Out Of Sight

152

48:19

Blue Team Village - Who Dis?

153

16:43

Blue Team Village - When A Plan Comes Together Building a SOC A Team

154

19:05

Blue Team Village - Extending Zeek For ICS Defense

155

40:43

Blue Team Village - BloodHound From Red to Blue 1 point 5

156

40:30

Blue Team Village- An Introduction To Malware Analysis video

157

36:53

Blue Team Village - Evaded Microsoft ATA? but you are completely exposed by event logs

158

29:32

Car Hacking Village - What is UDS?

159

26:22

Car Hacking Village - Tell Me Lies

160

42:23

Car Hacking Village - Flashing ECU Firmware Updates from a Web Browser

161

17:17

Car Hacking Village - Infotainment Hacking

162

24:52

Car Hacking Village - Fast, Furious and Insecure

163

36:22

Car Hacking Village - Hacking into Automotive Clouds

164

18:06

Car Hacking Village - Legal GNSS Spoofing and its Effects on Autonomous Vehicles

165

19:58

Car Hacking Village - Lojack'd pwning car alarms vehicle trackers

166

32:53

Crypto and Privacy Village - MITM privacy attack on mixed mode butterfly key expansion protocol

167

53:29

Crypto and Privacy Village - Black Mirror: You are your own privacy nightmare

168

38:05

Crypto and Privacy Village - Implementing a Zero Knowledge Proof

169

30:55

Crypto and Privacy Village - TLS decryption attacks and back-doors to secure systems

170

25:33

Crypto and Privacy Village - Migrating to quantum-safe crypto to protect against the quantum hacker

171

18:29

Crypto and Privacy Village - Scrubber: An open spurce compilation to protect journalistic sources

172

54:14

Crypto and Privacy Village - Tiplines Today

173

20:30

Crypto and Privacy Village - Enabling HTTPS for home network devices using Let's Encrypt

174

47:48

Crypto and Privacy Village - Fighting non consensual pornography

175

49:30

Crypto and Privacy Village - I Am Spartacus (And You Can Be too!)

176

55:35

Crypto and Privacy Village - How PKI and SHAKEN/STIR Will Fix the Global Robocall Problem

177

28:23

Crypto and Privacy Village - Giving Cops the Finger

178

15:07

Crypto and Privacy Village - Stop right now!

179

43:39

Crypto and Privacy Village - Easy PAKE Oven

180

45:07

Crypto and Privacy Village - How to Integrate Privacy Legal Security

181

26:50

Crypto and Privacy Village - Stop Facebook From Buying Your Brain

182

19:18

Crypto and Privacy Village - Towards Usable Dining Cryptographer Networks with Howl

183

26:00

Crypto and Privacy Village - Telegram - Snoop the messages

184

27:43

Hack the Sea Village - Hacking an oil rig

185

06:07

Hack the Sea Village - Hack the Sea 1.0

186

42:32

Hack the Sea Village - Maritime Cyber Policy Panel

187

24:28

Hack the Sea Village - Finding Flaws in a Satcom Terminal

188

17:55

Hack the Sea Village - NSC and Special Assistant for Cyber

189

16:40

Hack the Sea Village - C2 over Maritime Automatic Identification System (AIS) and commercial aggregation websites

190

58:12

Hack the Sea Village - Remote Yacht Hacking

191

24:17

Hack the Sea Village - PENTESTING

192

16:03

Ham Radio Village - Hunting tape measure yagis and offset attenuators

193

25:02

Ham Radio Village - The RF Dark Arts

194

33:31

Red Team Offensive Village - SiestaTime Automation fr Long term implants

195

1:10:05

Red Team Offensive Village - Red Team Framework (RTF)

196

40:04

Red Team Offensive Village - Bad Salt (Adversarial DevOps)

197

32:05

Red Team Offensive Village - Breaking NBAD and UEBA Detection

198

40:52

Red Team Offensive Village - Casting with the Pros s

199

41:23

Red Team Offensive Village - Through the Looking Glass: Own the Data Center

200

26:35

Social Engineering Village - Swing Away: How to conquer Impostor Syndrome

201

51:37

Social Engineering Village - 10 Years

202

32:08

Social Engineering Village - Getting Psychic

203

24:36

Social Engineering Village - The Basics of Social Engineering

204

27:46

Social Engineering Village - The Voice Told Me To Do It

205

28:27

Social Engineering Village - Rideshare OSINT: Car Based SE For Fun and Profit

206

31:09

Social Engineering Village - Leveraging the Insider Threat oh,

207

29:50

Social Engineering Village - Red Teaming: Insights and Examples from Beyond the INFOSEC Community

208

31:49

Social Engineering Village - The Aspies Guide to Social Engineering...

209

27:53

Social Engineering Village - Hacking Your Career Thru Social Engineering

210

29:20

Social Engineering Village - OSINT in the Real World

211

1:13:42

Social Engineering Village - The SEPodcast Ep 120 Live Guest Robin Dreeke

212

30:14

Social Engineering Village - Why vigilantism doesn't work

213

33:57

Social Engineering Village - Ex3cutiv3s In Th3 R3d

214

30:34

Voting Village - Machine Voting: The Bulgarian Experience

215

32:04

Social Engineering Village - R Paul Wilson

216

21:28

Voting Village - Putting Voters First: Expanding Options to Vote

217

23:07

Voting Village - Beyond the Voting Machine.

218

14:14

Voting Village - Chris Krebs

219

31:14

Voting Village - The DARPA SSITH Program at DEFCON

220

26:59

Voting Village - Trustworthy Elections Evidence and Dispute Resolution

221

56:26

Voting Village - The Role of State and Local Election Administrators in Securing Elections

222

28:16

Voting Village - While the Bots Distracted You Overt Russian Information Operations

223

32:35

Voting Village - Thirty Years Behind the Ballot Box

224

24:36

Voting Village - Bootstrapping Vulnerability Disclosure For Election Systems

225

36:09

Voting Village - Voting Village Speaker Track

226

24:38

Voting Village - Assessing Election Infrastructure

227

26:38

Voting Village - Addressing the election security threats posed by very small Jurisdiction

228

28:30

Voting Village - Unclear Ballot: Automated Ballot Image Manipulation

229

28:45

Voting Village - Ideas Whose Time Has Come: CVD, SBOM, and SOTA

230

1:28:20

Voting Village - Panel

231

23:56

Voting Village - Organizational Cybernetics: A Key to Resilience for the Digital Village

232

50:48

Voting Village - What Role Can Journalists Play in Securing Elections?

233

20:26

Voting Village - If the Voting Systems are Insecure Let's just Vote on our phones!

234

25:26

Voting Village - Incident Lifecycle and Incident Response Management Planning

235

31:39

Voting Village - Keynote Remarks from Senator Ron Wyden

236

35:45

Voting Village - 2020 - Ready? or Not?

237

24:57

Voting Village - Election Security (Beyond the Machine)

238

29:21

Social Engineering Village - Hacking Hollywood

239

49:46

Crypto and Privacy Village - Sartorial Hacking to Combat Surveillance

240

53:47

Social Engineering Village - Sizing People Up

241

50:12

Social Engineering Village - I PWN thee, I PWN thee not!

242

01:37

Appsec Village - Welcome and Introductions

243

41:55

Appsec Village - How bad could it be? Inside Law Enforcement and local.gov appsec

244

41:56

Appsec Village - Crypto Failures and how to avoid them

245

31:53

Appsec Village - Vulnerabilities that Hide from Your Tools

246

56:01

Appsec Village - Keynote: The Unabridged History of Application Security

247

29:33

Appsec Village - huskyCI: Finding Security Flaws in CI Before Deploying Them

248

23:46

Appsec Village - Purple Team Strategies for Application Security

249

43:28

Appsec Village - Automate Dynamic Application Security Testing

250

29:12

Aviation Village - Ideas whose time has come: CVD, SBOM, and SOTA

251

23:33

Aviation Village - Hacking the airforce and beyond

252

28:13

Aviation Village - In The Air And On The Air Aviation Radio Systems

253

31:12

Aviation Village - Wireless Attacks on Aircraft Instrument Landing Systems (ILS)

254

23:03

Aviation Village - An Introduction to the ARINC standards

255

47:56

Process Injection Techniques Gotta Catch Them All

256

31:58

Packet Hacking Village - Old Tech vs New Adversaries Round 1...Fight

257

37:41

Packet Hacking Village - Head in the Clouds

258

41:50

Packet Hacking Village - Stego Augmented Malware

259

54:22

Packet Hacking Village - Phishing Freakonomics (aka "Picking Winners")

260

16:50

Packet Hacking Village - Hunting Certificates and Servers

261

56:42

Packet Hacking Village - 5 Years and 40,000+ Hours Later

262

48:13

Packet Hacking Village - "First Try" DNS Cache Poisoning with IPv4 and IPv6 Fragmentation

263

24:43

Packet Hacking Village - Personalized Wordlists With NLP by Analyzing Tweets

264

25:03

Packet Hacking Village - Bestsellers in the Underground Economy

265

21:57

Recon Village - Can you add a conference line, please?

266

28:43

Recon Village - The OSINT space is Growing! Are we Ready?

267

30:40

Recon Village - Hack to Basics - Adapting Exploit Frameworks to Evade Microsoft ATP

268

40:43

Recon Village - DECEPTICON

269

24:52

Recon Village - From Email Address to Phone Number

270

40:22

Recon Village - Building OSINT and Recon Program to address Healthcare Information Security Issues

271

29:24

Recon Village - Hack the Planet!

272

20:37

Recon Village - Social Media: The New Court of Public opinion

273

48:39

Recon Village - PREBELLICO PIE

274

31:28

Recon Village - Lets get technical and hunt harder

275

32:40

Recon Village - Derevolutionizing OS Fingerprinting: The cat and mouse game

276

21:39

Recon Village - Use Responsibly

277

24:02

Recon Village - A URL Shortened By Any Other Name video

278

39:04

Recon Village - Asset Discovery

279

24:45

Recon Village - OSINT Approach in Big Data

280

19:35

Recon Village - Generating Personalized Wordlists by Analyzing Targets Tweets

281

12:48

Recon Village - Finding the needle in the Twitter haystack

282

53:36

Packet Hacking Village - (Re)thinking Security Given the Spectre of a Meltdown (hold my beer)

283

43:57

Packet Hacking Village - The Art of Detection

284

54:50

Packet Hacking Village - Attacking and Defending Kubernetes

285

49:09

Packet Hacking Village - Solving Crimes with Wireless GeoFencing and Multi-Zone Correlation Analytics

286

50:09

Packet Hacking Village - CIRCO Cisco Implant Raspberry Controlled Operations

287

51:49

Packet Hacking Village - Hacking Corporate Org Socialization

288

36:54

Packet Hacking Village - Patching: It's Complicated

289

21:09

Packet Hacking Village - Sandbox Creative Usage

290

43:50

Packet Hacking Village - Wi-Fi Threat Modeling and Monitoring

291

18:54

Packet Hacking Village - Security to Make the CFO Happy

292

21:12

Monero Village - InfoSec vs Hacker The War for the Soul of a Technology

293

30:00

Monero Village - Houston we have a problem: 86(R) H.B. 4371 is a no-go!

294

20:15

Monero Village - Perspectives of Privacy

295

23:17

Monero Village - Tale of Two Tongues

296

58:52

Monero Village - Encrypting the Web Isnt Enough: How EFF Plans to Encrypt the Entire Internet

297

31:13

Monero Village - Monero is UGLY and DIFFICULT to use

298

32:37

Monero Village - AirBNB for Retail Internet A Distrubuted Internet Tech

299

11:55

Monero Village - Cake Wallet - the Open Source Monero Wallet App

300

32:02

Monero Village - If You Like It, Then you have Put a Ring Signature on It

301

28:00

Monero Village - Handling Broken cryptography in cryptocurrencies

302

28:20

Monero Village - The Future of Accessible Mining

303

31:57

Monero Village - Monero's Release Schedule: we can do better

304

26:07

Monero Village - Critical Role of Min Block Reward Trail Emission

305

1:19:50

Monero Village - Keynote Speech Monero Introduction and Investor Perspective

306

45:25

Monero Village - Bulletproofs deep dive

307

25:51

Monero Village - The Cost of Good Open Source Software

308

35:42

IoT Village - Regulatory Trends in IoT and impact on research community

309

28:44

IoT Village - BITFI: You Wouldnt Steal My Cloins

310

37:27

IoT Village - Greenwaves and Ham

311

33:59

IoT Village - H(ACK)DMI: PWNING HDMI FOR FUN AND PROFIT

312

16:00

IoT Village - Get your next Europe trip free! Long live the vulnerable EV charging points!

313

44:04

IoT Village - Shining a light on a black box

314

50:17

IoT Village - Panel - IoT Security and Manufacturers Q and A Panel

315

40:55

IoT Village - Next gen IoT Botnets 3 moar ownage

316

37:42

IoT Village - "Mixing industrial protocols with web applications flaws in order to exploit devices in the internet"

317

43:29

IoT Village - Fitbit Firmware Hacking

318

46:19

IoT Village - Privacy leaks in smart devices: Extracting data from used smart home devices

319

27:30

IoT Village - A Glorious Celebration of IoT Security

320

28:35

IoT Village - Hacking the Zyxel NAS

321

28:10

IoT Village - Spy vs. Spy

322

36:51

Hardware Hacking Village - Reversing Corruption in Seagate HDD

323

20:32

Hardware Hacking Village - Infrared: New Threats Meet Old Devices

324

43:56

Hardware Hacking Village - Another Car Hacking Approach

325

27:18

ICS Village - Purple Teaming ICS Networks

326

24:10

ICS Village - ICS/ IOT Threat Landscape

327

29:39

ICS Village - Hack the World and Galaxy with OSINT

328

28:05

ICS Village - Abusing the IoT in Smart Buildings

329

24:42

ICS Village - IT OT Convergence Cybersecurity

330

27:01

ICS Village - HVACking: Understanding the delta Between Security and Reality

331

30:32

ICS Village - CRASHOVERRIDE: Re- Assessing 2016 Ukraine as a Protection Attack

332

24:58

ICS Village - SCADA: the next Stuxnet

333

29:11

ICS Village - Nation-State Supply Chain Attacks for Dummies and You -or- Chipping Cisco Firewalls - etc.

334

22:55

ICS Village - Pin the tail on the cyber owner

335

25:34

ICS Village - Changium IPiosa

Automatic playback

Speech

Text

Image

00:00

Insertion lossMalwareTask (computing)MalwareTask (computing)AdditionInsertion lossComputer animationMeeting/Interview

00:33

TwitterGoogolChannel capacityMachine learningDemo (music)Pattern recognitionMalwareVirtual machineAdditionFunction (mathematics)Channel capacityInsertion lossAreaAdditionBitFlow separationMereologyProjective planePattern recognitionVirtual machineMetadataMultiplication signQuicksortSoftware bugInformationDigital photographySingle-precision floating-point formatWave packetUniverse (mathematics)SimulationInformation securityMalwareFunctional (mathematics)TwitterMathematical optimizationDemo (music)MultilaterationMachine learningComputer animationMeeting/Interview

03:08

MalwareVariety (linguistics)Fluid staticsBinary fileSample (statistics)Electric currentSource codeEndliche ModelltheorieNumerical analysisRepresentation (politics)Vector spaceLatent heatMerkmalsextraktionComputer networkAriana TVInformationService (economics)TrailNumberVirtual machineMalwareInsertion lossDiagramVector spaceVariety (linguistics)Type theorySoftware bugBitInformation securitySource codeSampling (statistics)Representation (politics)Multiplication signService (economics)InformationElectronic signatureHybrid computerOcean currentNumberArtificial neural networkFunction (mathematics)GradientReflection (mathematics)Point cloudNumeral (linguistics)WeightQuicksortBinary codeOperator (mathematics)Computer fileContext awarenessThresholding (image processing)Form (programming)Fluid staticsDifferent (Kate Ryan album)Functional (mathematics)SoftwareDynamical systemMessage passingWave packetReal-time operating systemCASE <Informatik>Electronic mailing listMeeting/InterviewComputer animation

08:45

Augmented realityHypothesisInsertion lossMathematical optimizationFunction (mathematics)Source codeInformationWeightAttribute grammarMalwareDynamic random-access memoryInstallation artTerm (mathematics)No free lunch in search and optimizationCountingBinary filePauli exclusion principleNewton's law of universal gravitationData modelInsertion lossMathematical optimizationHypothesisMusical ensembleAugmented realityBinary codeBitAdditionEntropie <Informationstheorie>Functional (mathematics)CountingLink (knot theory)Wave packetMalwarePerspective (visual)Sampling (statistics)CASE <Informatik>Derivation (linguistics)Attribute grammarMultiplication signSoftwareContent (media)Function (mathematics)10 (number)WeightConsistencySemantics (computer science)InformationMorley's categoricity theoremDependent and independent variablesRange (statistics)SummierbarkeitType theoryEndliche ModelltheorieComputer architectureOnline helpPoint (geometry)QuicksortProof theoryIntegerSource codeTask (computing)Parameter (computer programming)NumberArtificial neural networkInferenceRepresentation (politics)Variety (linguistics)Greatest elementCartesian coordinate systemSlide ruleProcess (computing)Meeting/InterviewComputer animation

15:04

Performance appraisalSample (statistics)Wave packetSoftware testingEndliche ModelltheorieSampling (statistics)Software testingPerformance appraisalConsistencySet (mathematics)Validity (statistics)Wave packetOrder (biology)Heegaard splittingFitness functionMeeting/InterviewComputer animation

16:25

StatisticsInformationPredictionNumberScaling (geometry)Insertion lossSampling (statistics)Theory of relativityCountingThresholding (image processing)InformationMeeting/Interview

17:22

StatisticsMatrix (mathematics)InformationMultiplication signMeeting/Interview

17:56

Computer networkArtificial neural networkBinary fileMalwareLatent heatStatisticsHistogramUser interfaceHausdorff dimensionString (computer science)MetadataSoftwareDreizehnPairwise comparisonRandom numberWeightVarianceReceiver operating characteristicStapeldateiInsertion lossCountingFingerprintSimilarity (geometry)Parameter (computer programming)Type theoryInformationMultiplicationAsynchronous Transfer ModeReflection (mathematics)BijectionRule of inferenceEndliche Modelltheorie1 (number)CurveMalwareDifferent (Kate Ryan album)ResultantCountingTerm (mathematics)StatisticsDialectVarianceAreaInsertion lossBit rateRepresentation (politics)Type theoryQuicksortCombinational logicSummierbarkeitReceiver operating characteristicThresholding (image processing)Artificial neural networkEntropie <Informationstheorie>Two-dimensional spaceParameter (computer programming)InformationPosition operatorComputer programmingProcess (computing)Functional (mathematics)Arithmetic meanReduction of orderAsynchronous Transfer ModeHistogramScaling (geometry)Characteristic polynomialLengthDimensional analysisGauge theorySimilarity (geometry)SoftwareBoss CorporationLogarithmDerivation (linguistics)Vector spaceMetadataBinary codeGroup actionHash functionComputer fileString (computer science)Field (computer science)WeightLatent heatControl flowMathematical optimizationForm (programming)Cartesian coordinate systemSampling (statistics)FingerprintLength of stayMeeting/Interview

25:30

Mathematical optimizationoutputHash functionSign (mathematics)Representation (politics)Content (media)InformationComputer networkReceiver operating characteristicFingerprintInsertion lossOrder (biology)Computer fileInformationPseudozufallszahlenFunctional (mathematics)Different (Kate Ryan album)Error messageRepresentation (politics)Content (media)Endliche ModelltheorieLinear regressionMultiplicationSampling (statistics)Mechanism designMathematical optimizationSound effectSign (mathematics)Hash functionObject (grammar)QuicksortRegular graphSource codeType theoryService (economics)SoftwareSurfaceMeeting/Interview

27:57

Computer networkMathematical optimizationCorrelation and dependenceMalwareInformationSoftwareFunction (mathematics)Content (media)Point cloudMathematicsWave packetInsertion lossCartesian coordinate systemMalwareGroup actionRegular graphArtificial neural networkResultantQuicksortSound effectService (economics)Source codeEndliche ModelltheorieMeeting/Interview

29:12

Direction (geometry)Domain nameGroup actionTwitterDerivation (linguistics)AlgorithmDirection (geometry)Group actionDomain nameComputer architectureCodeGame theoryBlogInsertion lossAlgorithmResultantElectric generatorInformationVotingSimilarity (geometry)File archiverMeeting/Interview

30:23

Artificial neural networkAerodynamicsMultiplicationTask (computing)MalwareInsertion lossMorley's categoricity theoremEntropie <Informationstheorie>Functional (mathematics)Insertion lossMalwareDynamical systemMultiplicationQuicksortSampling (statistics)Type theoryWechselseitiger AusschlussVariety (linguistics)FamilyContent (media)Set (mathematics)Different (Kate Ryan album)WeightComputer fileModal logicMeeting/Interview

31:20

Execution unitAttribute grammarPlastikkarteMalwareDerivation (linguistics)Pairwise comparisonPredictionEinbettung <Mathematik>PlastikkarteMalwarePredictabilityEndliche ModelltheorieAttribute grammarSemantics (computer science)Computer animationMeeting/Interview

32:01

Computer networkTopologyData structureSimilarity (geometry)Pattern recognitionDemo (music)Mathematical optimizationMixed realityAttribute grammarPattern recognitionMixed realityDifferent (Kate Ryan album)Attribute grammarMalwareMathematical optimizationModal logicDemo (music)Type theoryGroup actionSoftwareObject (grammar)Computer animationMeeting/Interview

32:37

MiniDiscContext awarenessoutputFunction (mathematics)Contrast (vision)Computer fileMultiplicationDisk read-and-write headDirection (geometry)Contrast (vision)Functional (mathematics)InformationType theoryContext awarenessInsertion lossMiniDiscSource codeQuicksortEinbettung <Mathematik>Representation (politics)Different (Kate Ryan album)outputMeeting/Interview

33:47

Local GroupCollaborationismCollaborationismSoftware developerMultitier architectureGroup actionOffice suiteComputer animationMeeting/Interview

34:35

Information securityFundamental theorem of algebraDemo (music)Pattern recognitionInteractive televisionElectronic mailing listDrum memoryGraph coloringInformation securityPattern recognitionPresentation of a groupCross-correlationCollaborationismMeeting/InterviewComputer animation

35:15

Insertion lossDependent and independent variablesDifferent (Kate Ryan album)Term (mathematics)Endliche ModelltheorieoutputComputer fileType theoryRepresentation (politics)Library (computing)RandomizationSource codeResultantAdditionFunction (mathematics)ForestBitEinbettung <Mathematik>Multiplication signMusical ensembleMultiplicationMalwareAreaService (economics)Maxima and minimaGoodness of fitPropagatorContext awarenessProcess (computing)Functional (mathematics)Interpreter (computing)Artificial neural networkPoint (geometry)Meeting/Interview

Transcript: English(auto-generated)

00:00

Our next speaker is Dr. Ethan Rudd, Senior Data Scientist at Sophos. Hello, everybody. Yes, my name is Dr. Ethan Rudd.

00:21

I'm a Senior Data Scientist at Sophos. And the title of my talk is Losses More, Improving Malware Detectors by Learning Additional Tasks. So before I go into the meat of the talk, I just wanted to let you guys know a little bit about who I am so that you know that I'm not just some random guy that sort of stumbled

00:42

in here off the street. This is my first time at Def Con. Very excited to be giving a first Def Con talk. And thank you. I've been on the Sophos data science team for about two and a half years in a research capacity.

01:01

Prior to that, I worked on several projects and in several areas of applied machine learning. My PhD research was funded by the IARPA Janus project, face recognition project. I did a project at Google with their advanced technologies and projects team. And then I've been involved in several other small business and university projects.

01:22

So I mentioned the face recognition stuff because we're also running a great facial recognition demo at the unwind session. So please check that out. And you can check me out on Twitter or Google Scholar for various research if you like this talk. So what is this talk about?

01:41

Well, as we've seen, there have been several great talks on machine learning for information security prior to this one. But for many machine learning malware detectors, we're looking at training on a single malicious or benign label when there's actually lots of additional information available,

02:01

lots of additional labels, lots of additional metadata, etc. And so really the question that we answer in this talk is, can we train and craft a bunch of auxiliary labels to train on rather than just having a single malicious or benign label and can we get better performance? Well, as it turns out, we can.

02:21

And interestingly enough, we also find that these performance gains can be attributed to a better informed classifier. And I'll explain what I mean by that a little bit later on. So in before and after photos, if you will, what we're talking about is adding additional loss functions during the optimization. So on the before side,

02:41

you'll see that we have only a single loss function. This is how many malicious and benign detectors are trained. And they work pretty well. There are a lot of them that are commercially deployed. You can get good performance. But if you add a bunch of auxiliary loss functions on a bunch of labels, hence the after loss part of this before after thing,

03:04

we get way, way better performance as it turns out. So before I dive into exactly how we formulate this, I'll just give a brief review of machine learning for malware detection. So up until about 2015,

03:21

most malware detectors were largely signature driven. There were a few machine learning, but ML really took off around then. Now actually a lot of the detectors consist of hybrids ML and signature. They use signature largely for blacklisting. And one can really triage detection as this diagram here

03:44

where ML and signature detectors actually both work on static and dynamic features. For ML, static is a little bit more common and we focus largely on static detection in this talk. And the reason by the way that static features are more common is that to get ML to work well,

04:03

it requires lots and lots and lots of data. And it's easier to collect a lot of static data. So we find that we can actually do very, very well with that. So a typical detection pipeline is built on some sort of a binary classifier,

04:20

deep neural network, or maybe a gradient boosted machine. I'll discuss the deep neural network use case for this talk. And we're talking about training on millions to hundreds of millions of labeled malicious and benign samples. And we're also talking about classifiers that are periodically retrained

04:40

to be able to reflect current threat landscapes. These can be deployed in a lot of different contexts. They can be deployed actually on endpoints. They can be deployed in the cloud. They can be deployed in security operations centers. It really doesn't matter for the purpose of this talk. Now, as far as labeling sources that a lot of vendors use,

05:02

they rely on vendor aggregation services or threat intelligence feeds, which basically take a bunch of vendors or different labeling sources throughout the industry and submit malicious and benign samples to those and say, okay, how do these label the samples? And then some sort of an aggregate label

05:22

is generally derived. Often there's also a little bit of time lag between the time at which the samples are submitted to the vendors. And there's a little bit of time lag that's left to basically let vendors

05:43

update their blacklists and let the scores settle down. But long story short, most approaches take an aggregate malicious or benign label that is obtained from these threat intelligence feeds. Now, these detection engines,

06:04

also because they use ML, they need to convert the malicious and benign samples to some sort of an ML friendly numerical representation. There are a variety of ways to do this. Some try and do something that's closer to raw bytes.

06:22

Some use various types of feature vector representations. We presume a specific one and we use a portable executable malware and benignware in this work. But the approach is fairly broad and can be done in a lot of different ways.

06:41

So the way that a typical neural network will look is you'll have features that are extracted from malicious and benign samples during training. A forward pass is done through the network. The output of the classifier is taken and then some loss function along with the associated label is used to correct the representation

07:02

so that we have a good representation that we can then later deploy. At deployment time, we take this learned representation and here, I just want to highlight, we don't have any labels on the malware samples. That's what our classifier serves to do. We deploy that to wherever we're deploying,

07:21

whether we're deploying on endpoints, whether we're deploying on SOX, whether we're deploying the cloud. And then we submit our feature vectorized forms of our files in real time to the classifier. And in this case, our classifier's neural network. It could be whatever, but we're dealing with neural nets in this work.

07:43

And we use the predicted output basically as a score. This says, okay, how malicious is this file? A maliciousness score, if you will, that one can threshold in a variety of ways. So this is how things are currently done or commonly done, I should say.

08:03

However, just a little bit more on these threat intelligence feeds. They have lots and lots more information than just whether a given file is malicious or benign. In fact, that's even a simplification from what they're providing. They also provide information on individual vendor detections.

08:21

They provide, of course, the net number of vendor detections. And then they provide some stray information on the detection names per vendor at the very least. Some provide a lot more. So really revisiting the original question that I posed, can we craft and learn from auxiliary labels and get a better detector?

08:42

And the answer is yes, in fact, we can. The technique that we've derived to do this, we refer to as ALOHA, or auxiliary loss optimization for hypothesis augmentation, hence the nice Hawaiian art here, besides. All right.

09:03

So in short, we've got all this auxiliary information that we want to utilize. And as we saw in the case of just a malicious and benign label, well, we have this loss function that we use here. So why not just add more loss functions? And that is really the crux of what ALOHA does.

09:24

We have more labels, more loss functions. And this has a couple of nice advantages. First, although we have more labels and we use more network outputs during training, we do not actually have to use these during deployment time.

09:41

So we can notionally get a much better network representation during training. But at deployment, we don't have to update our infrastructure at all. Now, alternatively, we can use the additional auxiliary outputs to do certain additional tasks. So if we want to do things that are,

10:02

I'll say specific to like an EDR or an MDR type application, where we're getting more fine grained information about the particular malicious samples, say maybe in a sock, but maybe not on our endpoints, we can sort of dual purpose this training and use the learn models in a variety of ways.

10:21

So for our labeling sources in this work, we selected nine vendors from our aggregation feeds and used detection labels from each of the respective vendors. We also used the net number of vendor detections and there were more than nine in our feed. There were tens and used the integer value

10:42

of the number of vendor detections as an auxiliary target as well. We also use our main target of this aggregate malicious and benign label. And then we also use 11 semantic malware attribute tags. So these describe the content

11:02

of malicious and benign samples. These are derived from the detection names within our feeds. The derivation process, I can speak a little bit more to at the end, but I would actually refer you guys to a, in my opinion, very good paper that we wrote on it. And I'll include the link for that in the end.

11:22

But basically these tags are not mutually exclusive and they summarize the content of most benign samples in ways the human can understand. So for each of these additional labels and network outputs,

11:41

we have additional loss functions. And so our main aggregate loss function is actually a binary cross entropy loss taken between the output of the network and the aggregate malicious and benign tag or aggregate malicious and benign label. Now this is a pretty common loss function

12:02

for a lot of neural networks that are doing malware classification. Well, with respect to our auxiliary losses, we have a loss function that is specific to the vendors, one for the tags, the semantic tags, and then one for the counts. And for the vendor loss functions,

12:21

we actually take a sum of binary cross entropy losses for each individual vendor response. For the tag losses, we do a very analogous thing, but for each of the attribute tags. Now I would again point out that none of these tags are mutually exclusive. So we use binary cross entropy here rather than,

12:43

or rather than say a softmax categorical cross entropy and take the sum. Then for the count loss function, we use a Poisson loss on this. Now, prior to that, we do an exponential activation

13:01

to constrain our count range from zero to, well, basically to be non-negative, as counts can't be non-negative. So our total loss is written at the bottom here. And this consists of the most benign loss with all of our auxiliary losses just summed

13:21

and multiplied by a constant. Now, the constant in this case, we use 0.1. We didn't explore good values of this in depth, but other work has, and so we did this sort of in a principled manner consistent with some other work that I'll reference towards the end here.

13:40

So during training, and you've seen this at the beginning of the talk, but basically we have all these aggregate loss functions where we have a main malicious and benign loss that's what we're trying to ultimately optimize for and detect, but with respect to the aggregate loss functions, we have our vendor counts, our individual vendor detections,

14:01

and our attribute tags. Now, the aggregating all of these losses together, we can use all of these or just one or two of these auxiliary losses, or we could even potentially add more if we had more information in our feed. I would very much point out that this is,

14:22

you know, this is sort of a proof of concept model, a very general sort of architecture that I'm describing. But the point is that adding these auxiliary losses theoretically helps. At inference time, however, absolutely nothing has to change whatsoever

14:43

with respect to the network outputs. You'll see that in the prior slide, we had all of these different outputs that we added, but we pruned those, we pruned the associated model parameters at inference time. And so our deployment infrastructure can remain entirely the same.

15:01

We don't have to change that at all, which is nice from an engineering perspective. So I've made these claims that the Aloha model works very well. Now I intend to actually provide some evidence of that. And to do that, we collected a data set of approximately 9 million training samples,

15:23

100,000 validation samples, and 7.7 million test samples. And the training and validation splits was taken temporally before the test split to ensure basically a fair evaluation. I mean, we can't fit in order to ensure

15:42

a temporal consistency, we ordered our samples as follows. And for our aggregation, for our aggregate malicious benign label, we used what we call a one minus five plus criterion here, which basically means that for one or fewer

16:02

vendor detections, we label as benign, and for five or more, we label as malicious, and then we ignore those with two to four labels. Now I'd mentioned there are more sophisticated ways to do this, this is just the one we chose largely for simplicity, but there are more sophisticated ways to do this.

16:21

But this works pretty well, it's a fairly standard practice. When we look at actually vendor counts across our data set and looking at this was one of the reasons why we chose the one minus five plus. But what you'll see is that there are

16:40

a disproportionate number of one minus and specifically zero and then a lot that have many, many, many vendor counts. And bear in mind, these are taken over a logarithmic scale. However, we still see, and this was one of our motivations for using account loss initially,

17:01

that just taking these basic thresholds washes out a lot of finer grained information. So this was actually one of our motivations for the account loss. And as you can see, it's not really a common occurrence, but we might be able to say something about relative sample difficulty by adding that.

17:24

We also looked at the respective vendor agreements with one another and these are plotted in this confusion matrix here for each of our nine selected so-called high coverage vendors. And as we can see, we see an agreement that occurs most of the time but not all the time.

17:43

I mean, vendors are consistent all approximately 85 to 95% of the time, but they don't always agree. So perhaps there's some independent auxiliary information that we can glean from these. As features, we use the same features

18:00

as Saxon Berlin did in their work, deep neural network based malware detection using two-dimensional binary program features. In full disclosure, Sax is my boss. So that's one of the reasons why we chose to use these features. We used the features that he and others

18:20

within our group derived. And I won't go into these in depth, but I leave the paper there and I just wanna give sort of a semblance of what these are. So basically they fall into three different camps. So 512 of the dimensions of our net 1024 dimensional feature vector are based on

18:43

windowed byte statistics and basically aggregate histograms of, sorry, windowed byte statistic histograms, which are basically aggregate statistics over the entire file. We then have 256 dimensions devoted to a

19:01

two-dimensional string length hash histogram or basically across a logarithmic scale of different lengths. We apply the hashing trick. And then we also have specific P metadata fields like the exports, like the imports, et cetera, that are hashed into another 256 dimensional vector.

19:23

And all of these get concatenated, so that's our representation of individual files. So that's how our dataset breaks down. When we compare performance here, so we tried using different combinations of

19:40

our main malicious and benign loss with different auxiliary losses. So we used one, just our malicious and benign loss as our baseline. So that's sort of tantamount to a lot of types of models that are currently deployed. Then we applied each individual loss type. Then we applied everything combined,

20:03

which I guess you might say is the full Aloha model. And we fit each of these classifiers for each different loss combination. Actually, we fit five different classifiers and we report our results in terms of mean and variance statistics over

20:22

receiver operating characteristics curves to be able to gauge statistical significance. Now, I know that there's a lot of talent in the room with a lot of different backgrounds. So for those of you that might need a refresher on receiver operating characteristics curves, or ROC curves, basically we look at

20:42

this false positive rate across the x-axis, and then a true positive rate or a detection rate at that false positive rate across the y-axis. And so typically what's done in the industry is at various false positive rates that are deemed sort of acceptable to the user,

21:01

a threshold is chosen and then you'll get the true positive rate at that threshold. So what we see when we add our count loss is that we do in fact get better performance in terms of both the area under the receiver operating characteristics curve,

21:22

which is basically a gauge of how good is the curve overall. But specifically we see also at higher false positive rate regions, or sorry, lower false positive rate regions, we see a particular bump, and this becomes

21:41

a bump in detection rate. Now, this becomes relevant because as we get to lower and lower FPR regions, there are more deployment scenarios that we can address with our models. Similarly for the vendor loss, when we add that to our baseline,

22:00

we see a boost in the receiver operating characteristics curve or ROC curve at the relevant region. We don't see quite as much of a boost in terms of the area under the curve. In fact, the area under the curve stays statistically pretty similar. And that's not to say that this isn't still a very significant result.

22:23

In fact, again, the AUC is a net statistic on the curve, but we don't really care so much about the higher false positive rate regions because the detection rates there are very, very good already. And so we can deploy those very easily.

22:42

But as we're getting down, we see that although the AUC is relatively similar, we still see this as basically a win. The tags loss gives us similar results. And I'd mentioned that actually both of these, both of these loss functions, the tags and the vendor's loss functions,

23:02

not only do they assume sort of a similar functional form, but they also give us an even better result than the Poisson loss function did or the count loss at the lower false positive rate areas. But they are slightly worse in terms of AUC performance.

23:23

When we combine everything together, what we find is that we get even better results. And we find that not only are our results far better, but our variance between different model instantiations is reduced. And we see basically there are two modes of improvement here.

23:41

There's an improvement at the higher false positive rates that is above 10 to the negative third. And then there's an improvement below that. And basically the higher FPR improvements, these are really what are driving the area under the curve improvements.

24:00

But again, the lower FPR ones are still quite relevant. So in summary, we see that yes, adding additional losses does seem to improve performance. And notably it also reduces variance across different instantiations of the model.

24:20

We suspect that this is actually occurring, this variance reduction is occurring, because as you have more things to optimize for, you're inherently sort of constraining your optimization process. So there aren't as many different types of ways that parameters can vary. We also see that there are similar behavior

24:41

for similar loss types. So both of the vendor losses and tag losses it consists of sums of binary cross entropy losses. And again, these seem to drive different things with respect to our RRC curve. We suspect that we see these higher FPR gains in detection

25:04

for the count loss, because it actually does communicate something about the difficulty of samples. And then with respect to the tag losses, perhaps the network's able to correlate some sort of information between when,

25:20

say, just one or two of these vendor tags trigger versus when, say, all of them trigger. And so it drives things really at lower FPRs. So, okay, we've presented, or I've presented some evidence hopefully that the ALOHA model is able to deliver

25:41

better detection performance. But now I'll just really briefly discuss what's driving this performance gain. Is it some sort of a smoother optimization surface that's brought about due to a regularization effect of multi-objective optimization? Or is it perhaps due to a more informed representation

26:03

from all of these different auxiliary label sources? And going into this, we sort of suspect the latter of these two. But we wanna actually make sure and then see what's going on here. So in order to test this, we used auxiliary loss functions

26:21

on so-called non-informative targets. So we employed various mechanisms of duplicating labels. One of the ways that we did this was, or using, or providing labels that delivered no additional information about the sample.

26:41

One way that we did this was with a pseudo-random label, where we took the hash of the file contents and just took the sign of that as an auxiliary target. So for a given file, you're going to be looking at the same label, but the labels are just pretty much randomly there. We also tried adding a duplicate target

27:02

and optimizing for that. And then we also applied a duplicate target with a different type of loss function. So we scaled and shifted a copy of the target label, and then we used a mean squared error loss on this, which is a common regression loss. And so from these, what did we find?

27:21

Well, we found that adding these non-informative losses did not improve our performance in a noticeable manner at all. In fact, it was statistically identical, if not worse, when we added auxiliary targets to our original target as well.

27:42

So this suggests that, yes, the Aloha network gains are actually coming from additional information from the additional labels. The network's doing what we want it to do, and it is learning a better representation, a better informed representation. So overall, what we find is that, yes,

28:02

our Aloha technique works well and it seems to be a result of the neural network's ability to actually correlate information from auxiliary labeling sources. It's not just simply an artifact of regularization. We also have the advantage that the network can be trained and deployed with minimal changes

28:22

to existing infrastructure that's out there. So no re-engineering of anything on the endpoint, anything on the SOC, anything in the cloud has to take effect. And then also there are additional applications that these outputs can be used for, like EDR and MDR.

28:41

So one additional application, as an example, is since we have outputs that describe the content of the malware, we can actually group malware by the predicted tags. And we might have an application where we might want to deploy that for internal use or say as a service,

29:01

but yet still be able to deploy our model. Well, we can do that all under this one training regime just by pruning our losses and pruning our outputs respectively. So before I take some Q&A here, I just wanted to also mention some related research and some directions for future work

29:21

if you found this topic interesting. There's been a lot of research by our group and also by other groups that is related. It's interesting to look into and it can perhaps be leveraged in some very similar ways. So first, I'd also mentioned that Aloha is a USENIX paper now,

29:42

so I'll be presenting this at USENIX next week. But interestingly, oh yeah, and it's available on archive as well, so feel free to check it out if you want more information. And interestingly, a gentleman by the name of Jason Trost actually did a nice blog post

30:01

where he used Aloha architectures for a much different problem using end gaming's domain generation algorithm detection code. He tailored that to use basically these Aloha losses. And anyway, he did a nice blog post about his results.

30:24

There's also a paper out of Microsoft Research called MTNET and this paper actually is similar to ours in a variety of ways. It uses an impressively large dataset of dynamic features and it largely substantiates a lot of our findings.

30:41

However, they use only one type of loss function. They use multiple loss functions, but only a categorical cross entropy softmax. They do something sort of similar to a tagging approach that we do describing the content of each sample except they use Microsoft malware family names

31:03

and so they actually do employ some sort of a mutual exclusivity assumption here. But anyway, it's another great paper and it's very cool to see that they're also able to sort of substantiate our findings with a much different data modality. It's also PE files, but it is dynamic features.

31:23

I'll also mention the paper on malware attribute tagging. So smart semantic malware attribute relevance tagging is a paper that we also put out there that which will pretty much tell you everything about everything that you want to know about malware tagging. It is the approach, the tagging approach

31:43

is the same as we employ in this work. So please see that for details on the tagging problem and the tag prediction problem. There are several other models that we employ in the smart paper as well. So if you're interested in that, check it out.

32:03

I will also mention another paper that Moon, a mixed objective optimization network. The reason why I bring this up is while this is applied to facial attributes, it has nothing to do with malware. It is actually the approach that I use in my face recognition demo,

32:20

which again, please stop by during the wind down session. If you want to see basically how this type of optimization can be employed very powerfully in action, the approach is fundamentally the same as Aloha, but with a much, much different data modality. One final work that I'll mention, and then I promise I'm done,

32:41

is a work paper that we did called Learning from Context. It uses multi-view learning or multi-input learning in contrast to our approach using multiple labels and multiple loss functions. But using this approach, we are able to include extra information

33:03

in the representation, just in a different way. We're sort of turning the Aloha approach on its head. And this type of approach could be trivially combined with Aloha, I would mention. So that's maybe a nice direction for future work. So having multiple PE file features

33:24

and then also other auxiliary information, like we took embeddings of the PE file on disk and concatenated those together. But also potentially multiple labels, you know, just adding multiple other sources of information

33:41

into the representation, it seems to work well. So it's definitely an avenue for future research. I'll finally close with an obligatory Sophos pitch. So I'm with the Sophos Data Science team. We do a really cutting edge research and we're always interested in transparency and collaboration and in,

34:01

hopefully, I've communicated publication. And while we're not the only one, we are one of the only research teams in the MLSec industry that is getting papers accepted at some of the top tier academic venues like USENIX. Our group consists of about 10 to 15 people were split about half between research,

34:22

about half in development. And so check out our group if you're interested. You can talk with me or you can talk with Rich Herang, who's also here, who's one of our directors of data science. Here's a picture of our team, lots of great colorful characters.

34:40

There's some more Sophos presentations going on this week. As I said, I have a facial recognition booth. Rich has a talk on hacking facial recognition on the 10th. And then he also presented a talk on security data science at B-Sides. So if any of you saw that, there's just a name to correlate.

35:01

And then I'd like to thank Sophos for funding and for promoting this research. And I'd particularly like to thank my collaborators and my co-authors here for all the work that they did. This was definitely a team effort. So with that, I'll open it up for questions.

35:29

Yes. Thank you.

35:50

Yeah, so that's a great question. The question is, so this talk was about incorporating these auxiliary losses on neural networks, but have we tried other classifiers like ensemble models,

36:02

random forest boosting, et cetera? And while I would say that we don't have, while we don't have concrete results here on those, there's nothing that would preclude a person from doing so. The representation that's learned by an ensemble model is a little bit different.

36:22

So I guess that like in the, in say a random forest or in a boosted model, I guess the question is how you would do shared splits in a way that works well across data. So I'd say that the technique could be very well applied. I don't know how well it would work.

36:41

I can say that I've looked at some of the libraries that are out there for this, like light GBM, like XGBoost, et cetera. And they generally assume that you're gonna be using only one loss function, but there's nothing that would preclude somebody from implementing it. I just don't know how well it would work. Thank you.

37:02

Let's see more questions. Yes, please. So what are some of the next steps of the process? So, and some of the features that we want to develop. So features in terms of representation of the,

37:21

sorry, in terms of representations of the malware fed to the classifier or features in terms of just like extra things to tack onto the classifier? Sure, sure. Yeah, so extra additions to this overall technique. So I would certainly say that the approach that I'm most interested in is actually having

37:41

a unified multi-input and multi-output model that's really able to learn multiple labels or learn from multiple labels, but also have multiple just heterogeneous inputs. Like you could have as an example

38:02

the character embeddings of the file path. You could also apply this, I would say, to a lot of different malware types. I've been talking about P files this entire time, but there are a lot of different types of malware that one could apply this to.

38:21

So I'd say that those are two different avenues that I'd certainly like to go down. And then I'd also say that there are other sources of data that are on some of these threat feeds. And so I think that looking into those would be very interesting as well.

38:42

Yes, please. Yeah, so I'd say that not only does it not do worse,

39:02

I mean it does, in aggregate it does better. And I would say that yeah, if you have multiple inputs, yes, we do see in fact, and we have seen, I'd actually point you to that learning from context paper, that yeah, we do get a nice performance bump, but actually yes, having missing data

39:22

is a little bit more of a problem with that. So for our loss functions here, if we have a missing label, we can just zero that entry out into loss. But if we have a missing, and just back propagate that, but if we have a missing input,

39:41

well that becomes a lot more hairy, and that's an area of research that I'd really like to see addressed a little bit better. Let's see, I think we have time for one more. One more question. Yes, please.

40:08

Yeah, yeah, that's a really good question. So the question is which inputs are most prominent in terms of the respective output response? And so this goes back to a lot of the model interpretability literature.

40:22

So I'd say the lime shop values, layer wise relevance propagation, a lot of the literature in that area would be very good to look at, or techniques like activation, maximization. Yeah, those are a few techniques, and this is definitely an area where I think

40:42

that not only I, I think that, not to speak for the entire industry, but I think that they're interested in that. So anyway, I can chat more after on that, but yeah, thank you. Thank you for the question, good question. Thank you. Thank you.