VDLDS - All Your Voice Are Belong To Us

Video thumbnail (Frame 0) Video thumbnail (Frame 3329) Video thumbnail (Frame 10619) Video thumbnail (Frame 19610) Video thumbnail (Frame 22888) Video thumbnail (Frame 27384) Video thumbnail (Frame 29049) Video thumbnail (Frame 38654) Video thumbnail (Frame 40694) Video thumbnail (Frame 41587) Video thumbnail (Frame 43419) Video thumbnail (Frame 44583) Video thumbnail (Frame 45601) Video thumbnail (Frame 46522) Video thumbnail (Frame 47820) Video thumbnail (Frame 48706) Video thumbnail (Frame 49570) Video thumbnail (Frame 50436) Video thumbnail (Frame 52982) Video thumbnail (Frame 54240) Video thumbnail (Frame 55107) Video thumbnail (Frame 56180) Video thumbnail (Frame 57281) Video thumbnail (Frame 59076) Video thumbnail (Frame 60344)
Video in TIB AV-Portal: VDLDS - All Your Voice Are Belong To Us

Formal Metadata

Title
VDLDS - All Your Voice Are Belong To Us
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2013
Language
English

Content Metadata

Subject Area
Abstract
Anytime you want to bypass the system, you tend to have a telephone conversation instead of leaving a paper trail. Data Leakage Prevention (DLP) is on top of the list for most organizations, be it financial or medical industry. In order to overcome this issue we need to devise a new system that can monitor phone conversations. Voice Data Leakage Detection System can be used for tracking Credit card, social security numbers, along with other PII data. An extension of this can be used for tracking Accounting and Financial information that leaves the organization before the information is actually public. This will help spot the people leaking insider information to traders, competitors and other news sources. By utilizing a signature system, each environment can quickly capture sensitive information like Acquisition/Sale of organization, or honeypot data to find the insider leaks. Ganesh Devarajan is the Sr. Security Architect within Go Daddy's Security Research Team. His focuses are Web Applications security, Malware Analysis, Reputation Service and Cloud security. Ganesh has a wide variety of experience in his field. Prior to joining Go Daddy in 2010, he worked as a security researcher for the TippingPoint DVLabs and THECASE Research Center in Syracuse, NY. He has publications in a variety of fields, ranging from Supervisory Control and Data Acquisition (SCADA) Securities, Role Based Access Control (RBAC), Wireless Securities and Runtime Software Application patches. His talks have been presented at various venues, including RSA, Department of Defense (DoD) Cybercrime conference, Computer Security Convention DEFCON, LayerOne, Reboot, National Petrochemicals & Refiners Association (NPRA), SMi, Hawaii International Conference on Social Sciences (HICSS), International Information Security Conference (IFIP/SEC) and Hacker Halted. Don LeBert currently works as a Security Engineer for GoDaddy.com Inc. He has been working with hosting providers for the past 5 years filling the role of Networking Administrator, Server Administrator and Server Manager. Don currently holds a Bachelors degree in Information Systems and Masters degree in Information Security.

Related Material

Video is accompanying material for the following resource
Server (computing) Call centre Information Projective plane Plastikkarte Streaming media System call Number Process (computing) In-System-Programmierung Internet service provider Representation (politics) Figurate number Office suite Data conversion Information security Physical system
System call Memory card Multiplication sign Tap (transformer) Workstation <Musikinstrument> 1 (number) Insertion loss Client (computing) Function (mathematics) Software bug Data model Heegaard splitting Different (Kate Ryan album) Information Data conversion Personal identification number (Denmark) Sanitary sewer Physical system Graphics tablet Enterprise architecture Call centre Pattern recognition Email Memory card Data storage device Sampling (statistics) Public-key cryptography Social engineering (security) Type theory Vector space Internet service provider Website Pattern language Figurate number Hacker (term) Session Initiation Protocol Row (database) Point (geometry) Server (computing) Rule of inference Shareware Number Twitter Session Initiation Protocol Architecture Internetworking Software Energy level Information Server (computing) Plastikkarte Basis <Mathematik> System call Leak Shareware Word Software Personal digital assistant Blog File archiver Communications protocol Spectrum (functional analysis) Computer worm
Multiplication sign 1 (number) Data transmission Web 2.0 Heegaard splitting Synchronization Electronic visual display Data conversion Extension (kinesiology) Position operator Graphics tablet Email Cross-platform Bit Regulärer Ausdruck <Textverarbeitung> Sequence Type theory Proof theory Wave Process (computing) System identification Self-organization Row (database) Laptop Point (geometry) Dataflow Server (computing) Computer file Open source Virtual machine Motion capture Online help Number Revision control Architecture Codierung <Programmierung> Communications protocol Computer architecture Information Uniqueness quantification Weight Physical law Counting Plastikkarte Audio file format Cartesian coordinate system Timestamp System call Communications protocol Computer worm Library (computing)
Computer program Android (robot) Java applet Multiplication sign Source code 1 (number) Data dictionary Likelihood function Hypermedia Different (Kate Ryan album) Core dump Data conversion Endliche Modelltheorie Social class Physical system Office <Programm> Scripting language Area File format Hoax Moment (mathematics) Data storage device Bit Hecke operator Regulärer Ausdruck <Textverarbeitung> Electronic signature Type theory Oval Self-organization output Game controller Computer file Motion capture Control flow Translation (relic) Streaming media Event horizon Rule of inference Revision control Session Initiation Protocol Architecture Computing platform Computer architecture Form (programming) Module (mathematics) Mobile Web Default (computer science) Multiplication Military base Database Audio file format Advanced Boolean Expression Language Shareware Word Logic Personal digital assistant Analog-to-digital converter Speech synthesis Social class Natural language Finite-state machine Window Computer worm Library (computing)
Point (geometry) Standard deviation Parsing Computer file Multiplication sign Tap (transformer) Mathematical analysis Hand fan Architecture Uniform resource locator Process (computing) Computer configuration Logic Core dump Hard disk drive Self-organization Configuration space Cuboid Data conversion Traffic reporting Computer architecture Physical system
Point (geometry) Digital filter Mobile app Server (computing) Parsing Computer file Code Mobile Web Virtual machine Real-time operating system Function (mathematics) Heat transfer Number Architecture Internetworking Scalar field Single-precision floating-point format Cuboid Data conversion Pattern recognition Real number Audio file format Shareware Teilkörper Software MiniDisc Speech synthesis Asynchronous Transfer Mode
Internettelefonie Musical ensemble Menu (computing) Predictability
Execution unit Division (mathematics) System call Data transmission Number Session Initiation Protocol Mathematics Proof theory Data conversion Imaginary number Communications protocol Computer worm Wide area network
Convex hull Data conversion
Dialect Execution unit Information management Clique-width Motion capture Line (geometry) System call Type theory Duality (mathematics) Malware Moment of inertia Data conversion Codierung <Programmierung> Fingerprint
Mathematics Duality (mathematics) Motion capture Bit rate Data conversion
Default (computer science) Standard deviation Computer file Division (mathematics) Plastikkarte Translation (relic) Audio file format Line (geometry) Mereology Shareware 2 (number) Number Word Order (biology) Physical law Office suite Figurate number Data conversion
Server (computing) File format Direction (geometry) Multiplication sign Time zone Menu (computing) Function (mathematics) Login System call Number Web 2.0 Self-organization Data conversion Extension (kinesiology) Mutual information Fingerprint
Touchscreen Physical system
Computer file Sampling (statistics) Bit Rule of inference Shareware Physical system
File format Weight Memory card First-order logic Teilkörper Plastikkarte Menu (computing) System call Element (mathematics) Open set Number Number Word Explosion Natural language Endliche Modelltheorie Traffic reporting Personal identification number (Denmark) Physical system Spacetime Address space
Dialect Line (geometry) Direction (geometry) Multiplication sign Data dictionary Focus (optics) Wave packet Usability Type theory Word Explosion Different (Kate Ryan album) Normal (geometry) Speech synthesis Natural language Pattern language Endliche Modelltheorie Form (programming)
Number Cuboid Information Local ring Dilution (equation) Window
hey everyone my name is Ganesh DeRozan I work for GoDaddy as a security researcher and this is done he also works in GoDaddy as a security engineer and will I'll let him talk about himself and yes so my name is Salma Burt I work at GoDaddy I'm a security engineer there so my job is pretty varied this doesn't exactly pertain to what I do on the daily basis but because they're all so nice and so kind they let us kind of work on a project that we had so my day to days are anything from setting up new systems prior to working at GoDaddy I was basically worked at crappy ISPs and service providers so I got hired on and ever since then I've been happy to actually do research every now and again so our talk is VD LDS which also stands for voice data leakage detection system so the whole idea behind this was like we're like sitting over there in our office and then we we have a huge call center at GoDaddy and what we were thinking about it was like how cool be if we can just like walk by the call center and start getting all the credit card numbers and because our call center is like really strict we're like you can't take your phones in there like you can't talk in the phone when you're walking through it or like you can have a piece of paper or pen where you can write stuff down so like they're like pretty strict that way but what we were trying to figure out was like how can I manipulate the system like how can I steal information from like a phone conversation or like any other like most people like whenever like you get a call from one of your customer service representative or like something that you're calling to pay your bills or whatever you feel more confident to give out your personal information and your credit card information over there but so we thought like it'll be interesting if we can decode that weis stream and then convert it and like figure out the numbers in there and pull it out for our own personal gains kind of thing so that things started first disclaimer if you
get caught doing something like this and we are not responsible so it's all on you so the agenda is like basically like we want to go through like what's the problem that we actually came across like what is it that we want to go what is it that we want to cover and what are some of the background stuff that's out there and the different scenarios that we could think of and they're like million other scenarios out there and the different deployment architecture model that we have for our tool and like a demo and future work will probably do the future work before the demo so the problem so as we know when you're on your phone more people tend to let things slip and they do that for multiple different reasons even when you're in an enterprise or just you know on your cell phone you tend to say things thinking oh well you know it's it's either encrypted by the cellphone provider or maybe it's encrypted you know by my enterprise they're always willing or you or most people are willing to give out their credit card number I mean how many people have sat there even at work on your Cisco phone and said okay I'm gonna call the pool guy because stuff just blew up and I totally need to have them get out there today I'm gonna give him the credit card number over the phone so he's at my house by the time I'm home well yeah that's great but it's not as safe as you think it is so you give out your credit card number sometimes you're trying to set up a new maybe you're applying for schools so you got to drop your social security number well they're not gonna wait for you to walk over to their station or send it through the internets you're gonna give it over the phone so a lot of the times people also try to use that phone system which they don't think they are going to get caught on and they'll try to slip out insider information they'll try to talk about your company's acquisitions before it you know hits the streets at all the problem with that is obviously as an enterprise we're always trying to figure out you know how can we protect our data to the point when we're ready to release because the rumor mills really caused a lot of damage and if you can control that or at least know where it's coming out from then you're gonna be a lot better off a lot of the times and enterprises you have hundreds of people that are on the phone all the time and there are systems in place that do monitor these phone calls but just because you can monitor them and archive them and maybe have that audio for later doesn't mean you're gonna be able to sit there and go through a hundred phone calls an hour and you're gonna be able to sit down and listen to every 20-minute phone call or hour-long phone call there's just not enough manpower so some of the background again like so who does this kind of thing like government is already doing it like whenever you want it like having a tap warrant on someone's phone you can go through the service provider and kind of like to tap you all your calls and how efficient is it if you know like the person talks on the phone for like 20 hours a day or something and then you don't want to sit through there and like to listen to every single conversation but like if you can figure out like these are the key words that you want then like you can already trigger that up and do that kind of thing and so one thing that Vita was like things like jihad and terrorist and all those kind of things like the government's already tracking those kind of calls and then they do all the voice matching recognition and all those kind of things but those are all done on the service provider level and we wanted to bring something which is firstly not that expensive and secondly much much cheaper to do it and easier to do it I mean like you can also go plant a physical bug into the phone and then do the recording and transmitting it out simultaneously but we thought like let's just make it simpler on the wire that way like I don't have to be physically present over there to do some of these kind of work and there are commercial software which are out there which does this recording for you but then the ones that we actually looked into like you didn't have that expertise of detecting stuff on the fly which would like trigger further investigation kind of like like we at least as a provider like we have a huge call center and then like over there we get tons of calls like I can't have somebody go sit in like every listened to every single call so if I could like trigger like sometime an angry customer or somebody who's just abusive or like they're like asking phishing for like credit card numbers or things like that I think like I can trigger that one and log that and figure out who's doing what kind of thing so
what is DLP DLP is data loss prevention and it's something that every enterprise deals with every day and most of the time DLP is targeted at things leaving their network but it's also targeted at data at rest well the two things that you're dealing with there are either on a server or they're on a client workstation so they're digital documents their text pad their emails they're not voice they're not audio they're not video and a lot of the times you run into the issue or that situation where the data becomes encrypted and it comes out of your network anyway because you don't necessarily have everybody's private key you know anybody can start and get GnuPG and encrypt a document and send it out using somebody's public key it's a problem but we can at least try to limit kind of the attacks or at least the vectors of getting this information out which is why that voice is a great thing to start with so a lot of the scenarios obviously that we're trying to deal with cover a broad spectrum one of the things that we want to attempt to do with this technology is deal with social engineering we want to be able to detect that same kind of talk pattern that social engineers use when you're in a call center you can have one person you know make 40 50 calls and it's gonna say the same thing every time until somebody either does what they want or get some of the information that they're really going after and it doesn't matter I mean it can be one person it can be three and you're gonna see it with this you should be able to detect that type of activity because we've made it in a way that you can write a rule that's going to parse through the text output in the manner that you need so if something new is out there that you need to pull out in a different manner that's possible another scenario is obviously your insider trading and leaks everybody has that issue I mean we can attest to the fact that when people invested into GoDaddy recently that it's very short after that information comes out internally that it's all over Twitter and then shortly after Twitter it's out on the Internet's and then there's like a blog document less than an hour later so it is if you know people doing that on their cell phones in our case probably not so much I mean we require that they don't use cellphones and the call centers are on the floor so no is it possible that they're calling out over our phone systems absolutely and that's also the best way to get the most information out fast all right you know it's calls to bookies we want to deal with our call centers most importantly we want to stop or at least know when people are giving out our vendor account information obviously who you have infrastructurally is just as important as who you're going to use or where it's gonna go in the future so who maybe you're gonna be buying their products of at the same time we made this pretty modular so if you're going from one end you're being a nice guy and you're trying to block that stuff or detective leaving your network you can use it that way but it could also be used to spy on people it can sit on their workstation at home maybe they have a soft phone so that's that VoIP traffic and you can sit there you can spy on your wife or your spouse or whatever you can try to steal their personal identifying data it all flies out over the phone with all flies out after yeah off their system you can try to snag all their credit cards or you know maybe we just want to know who the hell is cursing us out and interact with that maybe blank it out you know try to bleep out if you will that type of car so reset the call so one of the things that I also actually heard was like one of my colleagues friend basically happened to record all his phone conversation and when the bank came in for the foreclosure he was able to play that and then basically get out of the foreclosure and through the bank and get money out of that so it worked out in their advantage over there and some of the other things in the in the news about data leakage right Sony got lost so much of credit card informations and Citibank and ADP it's happening every other day and these are the ones that we know off that goes like in the public news because like it's like a big company that kind of thing apart from this like you have like all these like smaller grocery stores or gas stations where like people are skimming your credit card information and then selling it out and like in the black market how that works is like they basically give them like okay I can provide you like hundred thousand credit card numbers and then here's the sample of that and then they provide like the first thousand of them out there and then like people basically run through like some of the quick checks on like those things and if it works out they immediately go purchases and like they're like sites which are dedicated for like buying and selling credit card information like this and it's all basically based on like invite-only basis and you the tons of places where it's happening right now and like all of them goes mostly to the Eastern European countries and somewhere up there to some more background on like what is what I'm like we we don't want to go more in detail about like what is wipe and how it is done like they're like fuel attacked which were which are covering all that for already so what we are trying to do is like we have phone conversations between people and whatever is going through the internet like it's basically like the SIP protocol and then the RTP protocol and mostly and we you're not even looking at the SIP right now like all that we do is like we take up the RTP payload from those sessions and then like we wanna like split those sessions out and then
like get the actual data out of that the the wave file out of that so a little background on the RTP protocol so this is the RTP stack the first by first bit as basically the version number then we have the padding information which basically says are they padding bytes to follow at the end of the packet then we have the X which is basically the extension header like this is the actual RTP stack but then like you can have like extension header following this stack which is optional then we have the CC which is basically the contributing source counter and it says how many rectus CS RC records are following in this one and we have M which is marker markers more specific for like like custom applications which are using RTP for their own data transfer so like those applications can have a unique marker identification over there to parse the data differently then the payload type payload type is much more important for us in this particular project because like that says like how the payload is encoded is it just like plain raw or is it like g.711 mu encoded or Eila encoded and Mueller encode and encoding is what is used in the North America mostly and a law is what is used in the European side in the wipe the data transfer so then we have like the time stamp and a sequence number and sequence numbers again to just make sure it's all in flow in the same direct like count in basically and the time see a timestamp camp helps out a little bit more so like when you are having a conversation say for 10 minutes and then like it happens that is like a lot of lag if the the packets reach you a little slower and there's a lag in your conversation so with the timestamp in there like we can actually compress the actual audio file which could be like a 10 minute audio conversation like that you have on the phone call could be compressed to like eight minutes or four minutes or whatever and bitch and we can even use that to like help out a little bit with like all the silence packets and like pull those things out and the regular sync
so the basic architecture that we have is like pretty simple like we have two people talking on the phone sniffer capture all the packets if you are having like a conference call or if you are having like multiple like if this thing if the vdl Dias is deployed on a central position where we can monitor all the conversations going on in the organization then like we need to split that all out into different sessions so what I started doing out was like started using a deep packet it's a Python library DP Katie yeah yeah well it's there the reference so it's a deep packet module what I did was like split the packets like first took the packet took out the IP headers out about the Ethernet headers out everything all the UDP header self and then went into the RTP header parsed that one and then pull the data out and based on whatever the encoding type was that I showed in the previous one like apply the encoding and on the entire file based on the sequence number and then start doing so I got almost till that point and that's when I'm like wow this is really painful to do all this work and then like and then I realized like I had to do all the session splitting and everything so like I don't know if you guys have heard of this tool called net dude it's a Linux based one so like and it was completely outdated tool so like we kind of rewrote net dude in Python - so like I said ok you know what let's just start using that and then let's use that for to split all the sessions in the packet capture so start doing that and then finally it struck me I should probably Google this up now and then see if somebody else has done it and fair enough they're like a whole bunch of other people who have done the session splitting and spitting it out into into audio files already so that's when I said okay you know what I'm gonna use one of the common ones which is like spread across like multiple platforms and like that that's when we chose white pong and Arica are the two people that we chose like that we can use to parse for our largest one going further into this one like after the session breaker be like we have the Audient Lee the way we deployed at least for the proof of concept was like my laptop was sniffing the traffic from my desk phone and then like whatever the conversation was happening was getting split in two the local machine and we basically were like doing like a headscarf you should like on a regular we'll go through that find where if there are new files over there if there are new files then like shove shove it into like our web server wherever's the the masters if the master could be running locally or on a remote server so we can transfer all the files over there and then do the further processing over there and then like the master basically does the audio to the text conversion and then like we have our own engines to detect the keywords and all the regular expressions to detect all the credit card numbers and stuff like that and then dump it into it like the display for long so a little
bit more details into like the actual architecture of our tool is basically we have the the payload converter so we have the payload which is basically the white packets with RTP packets in there and we convert that into a media store into a media database then send it through all our transcribing engine where like it gets like we regenerate the transcripts for those and then we go to the detection large logic enjoying where we figure out if it's like a PII data or if it is like a different kind of like an insider leech or curse words based on what kind of rule you wanna apply to it we run that against it and then what if there's an event generated then we just send it to the display portal and then it gets parsed and displayed over there and if you want to add your own custom signatures to it because I'm pretty sure you would want to like add something for which is more specific to your your organization and then those kind of things can be also done very easily so some of the other
things when we started looking into it like how can we get the the audio data out of the packet so when you start looking at a wish first said a Wireshark assets on the white telephony piece in it and let's just use that one and then like we said like okay detect the streams dump it to a WAV file and then see how that goes and then like we had a little bit of falses in there like things like if you are using like microsoft office messenger which also uses sip and RTP to communicate and then like those kind of started mixing in and then like we had like other cases where like it'll be a same stream but then like in the stream like they're like certain other control packets which goes through the RTP session so those are things that we wanted to take it out it's a deep packet like I mentioned that was a Python library that we used then white pong is like again you have like a live CD that's available from Y pong if you don't want to use the their source code to compile it and do it and you can just install it and that starts dumping out all the audio files then we have vomit and Arica and RTP break and RTP SK and RTP dumped an RTP play and can enable all these guys can dump out voice audio files out of the packet capture so obviously a good portion of this is actually getting the audio to some type of form that we can parse through and pull that data from there were a couple different architectures that we looked at and a lot of work had gone into one of the other obviously looked at the ones that are very specific to the platforms in general so Microsoft speech OS X obviously has their own speak recognizer class built in really wanted to stay away from that anything that isn't that mobile really isn't that good obviously there are a lot of companies out there that just don't use Windows or OSX and you'd have to develop it completely separate from the others also you know there are larger companies out there now that use Linux is their main desktop so you really need to target you know at least all three if not more of those particular platforms the one in particular that I kind of went for was Sphinx so there are multiple versions of Sphinx out there each for different reasons they're Sphynx 3 which is a completely C compiled module that you can use to filter audio down to text their Sphinx 4 which is Java which you can do the exact same thing and it's it was built to replace Spinks three I really started putting a lot of time into Sphinx war as a matter of fact the demo was probably completed once using nothing but Java or jaikon because I was the fastest way to actually interact with their system and it worked very well the problem that we ran into is as we looked into this you know Java is really heavy so it was much better idea to try something else and the same people that make Sphinx so decided to come out with puck is Fink's which is very white it's for embedded systems it gives you most of the same stuff it's a little bit simpler but it gave us a lot more avenues in the future we weren't necessarily going to be stuck using something that had to deal with Java we could have AC library that worked on Android it worked on iOS he worked on the iPad it worked on Windows so it kind of covered all of our bases all in one spot and it works fairly well as far as the Sphynx core goes like I said I really got into this I thought it was mobile but not as mobile as I needed and obviously it was easy to hide and it was easy to interact with it was easy to interact with because I could used iPhone and heck that's Python everybody can do that it was really easy to hide because let's face it everybody has seen Java just randomly take up everything on their system for no friggin reason so why not well before we're really going into it it's it turned out that we really want to kind of hit the mobile area and that's where pocket Sphinx is I mean it's it just really allowed us these future avenues to go through I'm not really going to get into a lot of kind of how voice-to-text works it's based on phones and basically you try to determine what is being said based upon what the standard phone is so what is being said at that moment based upon what was previously said and what should be said next so you have harmonic models and you have dictionaries that break down words into syllables essentially that you try to match on and then base that upon what's going to be said necks or ability of what's being said next so your language model as we go further you you kind of find that yeah it's possible to get a completely bad translation of what was said based upon who's speaking and what what is actually being said in which language model you're using and then kind of like how the audio has been transcribed so we the architecture that we use for the actually detection of the what is output because as we as we put this these wav files in there it's got to be of a specific format to get the most likelihood that it's gonna transcribe what you want so the kind of combat the the issues that come with something transcribing poorly we had to make the detection engine really customizable so it's script based you can is you can program something in Python and it's in that text file well you can do anything you can do in Python just make sure that you know what you're doing and you'll be able to pull anything out it takes a little bit more skill but it's gonna be custom and you're gonna be able to format in accordingly it's also rule-based because its script base it's easy enough to have a script there that all it does is pull a bunch of rules from a file that's regular expressions that you can match on and then just default lis kind of push it out there so you know Joe Sixpack can take care of it nobody really has to be too bad of a regex ninja it does have lower customization but that's what you sacrifice so well the deployment
architecture is like like I said like we can put it on taps with the PBX system and then like start dumping all the well conversations basically anything which is going through the wire basically gets converted into be a file and keeps dump dumped into the file a local hard disk and then we can send it out for for the processing and reviewing an analysis so that's one way of having it done so like you just have to deploy this at like one location the second option is basically
having it as a distributor one so like if you are actually trying to like specifically target just one particular person in the organization or like few people then you can deploy this agent on their box and then like have that agent just report to the master wherever you have it set up and then like it's all possible to set it up where the config file like you just set the config tool this is the master hunt and it just reports to the master which is in a remote box and then it dumps all the conversations over there like parses that transcribes that and then first it's actually does the standardization and using the socks and then then it does the transcription and then it does the detection logic and then so spits it out in a fan not so fancy portal at this point of time but we'll be fancy soon hopefully and the other option is like
you actually hacked into one of the boxes or you have know that you have compromised over it and you want to transfer this whole agent over there and then have it report back to you and that was one of the reasons like he said like we wanted to keep it very lightweight and smaller sort of like we don't have to deal with you know heavy transfer of file both the ways up and down so like we can even have the the master run on their machine and it's very low CPU intensive and you run it over they get the output and then send just to send the parsed output to you kind of thing so before we go into the demo some of
the future works that we have decided was like thinking about is like to disguise this whole thing as an app maybe that's a better app or something which says like it needs internet access and needs like the speech recognition software to put in the zip code or something that can thing and basically listen into every single phone conversations that you have on your cell phone and be able to transfer like just the key things that we need from that point to our service so then the real time of the scalars so the other thing that we were thinking about was like right now it's like all like the audio file has been generated dumped into the into the local disk and then we do the all the parsing but we were thinking if we can do this mode in a real time then like we can actually like start like beeping out or like like you know whenever like you are giving out numbers that you're cursing or something like you just basically be beat the whole thing out and or just simple thing will be like just to drop those packets and things like that and inject and like add more packets in there or filter out certain packets specific to that some of our references
and I'll show just so here's the first one which is
basically the Wireshark like so this is all are so here's what the
basic conversations that happened before like the actual payload starts going out like we have like conversation like okay what's your phone number what's my phone number what's your name all the SIP protocol data transfer basically then
so these are all the actual Arctic packets which are going between me and him when we are having this proof of concept call and so what you can do is like the Wireshark does have this feature
so this is our conversation that we are having to see audio out
hey Ganesh what type of a conversation are you trying to pull here you just trying to get a packet capture so that's just the basic conversation
that we were just pulling together and so like why shock dusters so like what we were thinking about was like first to like use the tetryl to do the whole thing like in line using wireshark to do the decoding for us but like you see like over there like we have like three conversations like listed over there as wise calls but it's just the center one which actually works the other two are just like falses over there and the other tool that we actually looked into was under necks
yes yes Oh so let me just bring that back here and
so basically what I'm doing right now is like sending a few packets through like like replaying a packet capture and then looking at all the different wise conversations that it's taking place and if I can just show that one PJ so I
did a TCP replay of a back a prerecorded packet capture and then like dumped up one and then like all the conversations which were recorded in that pocket capture that's what is displayed over there and finally you want to start out oh yes so
in order for our demo to work like we wanted to we basically pre-recorded
whatever the conversation that we had in the office and dumped it into a wall of the file that came out and we thought we'll play the conversation because we didn't want to bring like two phones over here set up the PBX thing and like have it all dumped out and everything so so these are the two files that we have well my name is Brown my credit card number is one two three four hello hey second conversation that what the is this well you just call me bastard kind of asshole just call the guy and doesn't say anything freaking dumbasses what a bunch of dummy I am NOT saying the words bitch or whore over the phone line that is all I know that thing so we basically wanted to like figure out like how the curse words work and like all the audio translations basically said one of our co-workers was voluntary enough to do that for us and the other thing that we wanted to kind of show was also like so part of the the agent basically like we have to do the standardization of the audio files and everything so when arica gets installed it basically gets by a default installation
of with GSM format so we have to convert that PCM so we do that and so this is how all our output logs basically look like and I can see much I'm going to just scroll it across and show you guys whatever so it basically says like who's what time the conversation happened who's the originator who is the recipient like if it's a phone number if it is internal within the same organization then it just shows the extension number and if it is going actually outside the organization's and it shows the actual cell phone number or the landline number that you're reaching out and what was the duration of the call and with the direction was it you who called out or was it an incoming call and those kind of things so those are all the things we
have in the log and we basically parse those things out and then we do a post to the to the master web server and Don will show you on that
basically everything was done previously so we set up a file that we're going to use to kind of post this the way it would typically happen on your system if you had to set up in one spot or the other and I'll show you just basically the the post first which is gonna seem normal
enough
and of course yeah it would fall off the screen but it'll scroll I promise at least I thought it would Australia the second
shrink this up a bit
so all this is gonna do is we have those two sample files we're gonna post it up really quickly and you'll see that's the first one going up hopefully and that's the second file being uploaded and while that's being uploaded the system on the other end is actually transcribing the data it's running one of our rule sets that we've set up to get everything through there let me show you the end
result as small as I possibly can make it so
I just needed something messy and quick and that's really what this is just
quick reports that were generated based
upon the two things that we uploaded so
you'll see that this is the ongoing call the local IP the remote IP which evolved you sleep and kept in a packed format for the and then you'll see that what you know what the number was which are the actual extensions that were used and then as you look down you'll see the alert so a pulled out the credit card number is one two three four five six seven eight social security number and then down here you have the transcription so that is exactly what we said that's enough for me to walk off with your credit card number in your social security number if I have it running on your system the other one didn't work so well due to such things as language models and it the actual language model that we're using isn't so good at dealing with profanity because typically when you're cursing you're gonna have spaces around the words well the way that our actual audio came out didn't so as you can see it detected
absolutely nothing and obviously there's not one curse word there so it is not perfect on everything but it does detect normal speech patterns one of the things for future work is obviously des training for different types of dialects so it works somewhat and then in other times it fails miserably and there are like different dialects and like so basically even for English language model like we
have like the British English model the US English model and like some of the dictionaries that we looked into also they also have like for like different languages like Spanish or German and was it French - yeah they have printed as well so they have like some of the major like language model elderly defined in dictionaries and so that's what we used and just for like ease of use also just made a form
so say if you did manage to get Wireshark sniffing on somebody's box then you could just you know quickly upload it and have it transcribed it would come out the other end automatically for you that's about it
that we had and you guys have any questions for us we'll be happy to take
Feedback