We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Deceptive Security using Python

00:00

Formal Metadata

Title
Deceptive Security using Python
Subtitle
Deceiving hackers to protect your resources
Title of Series
Number of Parts
130
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Imagine you are passing through an unknown street at midnight and you find that some anti-social elements are following you. To save yourself from them you start running and look for a safe place to hide. On the way, you will find a good person and requests him to help you. He hides you in his place to protect you. When these anti-social elements visit a good person’s place and enquire about you, the good person misguides them and redirects them to some other place in order to protect you. This is exactly how deception works. In this analogy, YOU are the resources to be protected, anti-social elements are the hackers who want to gain access to the resources, and a good person is a deception technique that protects the resources from hackers by making them fall in the trap. The talk begins with an introduction to deception technology, deception types, and methods, a deceptive security life cycle. In this talk, we will demonstrate the following deception tools implemented using python language: • WebTrap (https://github.com/IllusiveNetworks-Labs/WebTrap): is designed to create deceptive webpages to deceive and redirect attackers away from real websites. The deceptive webpages are generated by cloning real websites, specifically their login pages. • DemonHunter (https://github.com/RevengeComing/DemonHunter): is a distributed low interaction honeypot with Agent/Master design Finally, we will conclude the talk with how built a deception tool and demonstrate its working. How we implemented a deception tool in python using machine learning: We designed a deception tool in python language using PyBRAIN package to model and mitigate XPath injection attacks for web services. It is known that XML can be used to store the data and this data can be queried using XPath query language. XPath is a query language, it has injection issues similar to SQL. To handle this issue, we proposed a solution, which uses a count-based validation technique and Long Short-Term Memory (LSTM) modular neural networks to identify and classify atypical behavior in user input. Once the atypical user input is identified, the attacker is redirected to fake resources to protect the critical data. Our experiment resulted in over 90% accuracy in the classification of input vectors. Outline 1. Introduction to deception, Deception types, Deception technology applicable methods and Deception Life cycle(08 Minutes) 2. Demonstration of WebTrap deception tool(04 Minutes) 3. Demonstration of DemonHunter deception tool(04 Minutes) 4. Discussion of our deception tool and demonstration(06 Minutes) 5. Conclusion and Questions(03 Minutes) Audience No experience level of Python is needed. In general, anyone can attend this talk and learn about applying deception techniques and machine learning to application security.
61
Thumbnail
26:38
95
106
Information securityInformation securityPerfect groupXMLUMLComputer animationMeeting/Interview
Information securityContent (media)DemonElement (mathematics)Hacker (term)Mechanism designTime evolutionSource codeType theoryInformationClient (computing)Server (computing)Token ringInstallation artWeb pageHome pageWebsiteCloningUniform resource locatorMessage passingDirectory serviceFunction (mathematics)RootProgrammable read-only memoryBlogCommunications protocolSpacetimeCybersexWeb serviceCloud computingPhysical systemArtificial neural networkCodeDatabaseParsingInjektivitätRecurrence relationModul <Datentyp>ArchitectureRekursives neuronales NetzoutputCountingRegulärer Ausdruck <Textverarbeitung>Revision controlPasswordAuthenticationProxy serverInformation securityPersonal digital assistantHacker (term)Validity (statistics)Content (media)Internet service providerMereologyWeb pageInformationWeb 2.0CodeNatural numberCommunications protocolEvolutePasswordExterior algebraQuery languageMaxima and minimaoutputMechanism designArtificial neural networkLine (geometry)CyberspaceService-oriented architectureOrder (biology)Combinational logicExistenceGoodness of fitRelational databaseGraph coloring2 (number)InjektivitätMessage passingVector spacePoint cloudParameter (computer programming)Home pageComputer fileProjective planeLoginCloningServer (computing)Modul <Datentyp>WebsiteInteractive televisionFunction (mathematics)Web serviceData managementUser interfaceState of matterUniform resource locatorElement (mathematics)Cartesian coordinate systemSimilarity (geometry)Complete informationPhysical systemDirectory servicePhishingDiagramFront and back endsSystem administratorConnectivity (graph theory)Recurrence relationType theoryComputer architectureVulnerability (computing)Operator (mathematics)CASE <Informatik>Operating systemDecision theoryNetzwerkdatenbanksystemSlide ruleData storage deviceRight angleGreatest elementReal numberDifferent (Kate Ryan album)WeightNumberDemonAnalogyDescriptive statisticsPoint (geometry)UDP <Protokoll>File Transfer ProtocolComputer animation
Exploit (computer security)Regulärer Ausdruck <Textverarbeitung>Web serviceArchitectureService-oriented architectureComputing platformSoftware frameworkException handlingPressure volume diagramVector spaceAerodynamicsoutputQuery languageTime zoneDatabaseInjektivitätObservational studyArtificial neural networkModul <Datentyp>CybersexInformation securityDependent and independent variablesServer (computing)Physical systemSystementwurfMessage passingCodeTable (information)Data miningBuildingMaxima and minimaSingle-precision floating-point formatNumerical digitEquals signFamilyAlgorithmRoundingFunction (mathematics)Bit rateSocial classQueue (abstract data type)Client (computing)Web browserIntelCoprocessorIntegrated development environmentComputer hardwareFormal languageGraph (mathematics)Fedora CoreSystem programmingInformationLibrary (computing)Machine learningArtificial intelligenceReinforcement learningInstallation artCloningStandard deviationCoding theoryDirectory serviceRoutingTemplate (C++)Pairwise comparisonNumberNegative numberAverageExecution unitLine (geometry)Installable File SystemVirtual machineHome pageError messageDigital filterArtificial neural networkHoaxLibrary (computing)Virtual machineSingle-precision floating-point formatSoftware frameworkThresholding (image processing)Real numberNegative numberFrequencyResultantUniform resource locatorPasswordCASE <Informatik>Multiplication signLoginRight angleMessage passingInjektivitätServer (computing)Web 2.0Stability theoryUser interfaceoutputVector spaceClient (computing)InformationTerm (mathematics)Physical systemPosition operatorQuery languageModul <Datentyp>Electronic data processingObservational studyEndliche ModelltheorieError messageOperator (mathematics)Software testingEntire functionSet (mathematics)PropagatorResponse time (technology)Real-time operating systemValidity (statistics)CountingAverageComputer fileContent (media)Slide ruleBit rateTable (information)DigitizingCartesian coordinate systemLikelihood functionPresentation of a groupDataflowFehlererkennungFunction (mathematics)Multitier architectureEmailMobile WebInformation securityCategory of beingString (computer science)Utility softwareLengthElectric generatorLogicInstallation artTemplate (C++)Offenes KommunikationssystemBoolean algebraDifferential equationPiWeb serviceReinforcement learningSampling (statistics)AlgorithmData structureHome pageGraph (mathematics)NumberWeb browserForm (programming)Row (database)Computer animation
Computer animation
Transcript: English(auto-generated)
Perfect. So you're going to talk about security on Python with us today, isn't it? Yeah. Wonderful. So, well, whenever you're ready, the floor is yours. Hello, everyone. My name is Gajendra Deshpande, and I'm working as an assistant professor
in KLS Cocte Institute of Technology, India. Today, I will be delivering a talk on deceptive security using Python. So these are the contents which we are going to discuss today briefly. So introduction to deception, then two tools WebTrap and Demon Hunter, deception tools,
then what our experiment, how we developed a deception technique, then conclusion. And finally, the references. Imagine you are passing through an unknown state at midnight and you find that some antisocial elements are following you.
To save yourself from them, you start running and look for a safe place to hide. On the way, you will find some good person and you request him to help you. So he hides you in his place to protect you. When these antisocial elements visit a good person's place and then fight about you,
the good person misguides them and directs them to some other place in order to protect you. This is exactly how deception works. In this analogy, you are the resource to be protected. Antisocial elements are the hackers who want to gain access to the resources. And a good person is a deception technique that protects the resources from hackers by making them fall in the trap.
Now, let's understand the basic idea behind deception, how it works. The definition of deception is that it's a technique where hackers methods will be used as a security mechanism that is phishing the phishers.
Now, let's assume that you have a legitimate website of a bank, and what hackers do is they create a similar user interface which looks exactly the same, but the backend is different. So when you enter your details, assuming that it's a legal website,
but in the backend, hackers are collecting your data to carry out the further attacks. Now, deception is the military tactic used by both attackers as well as defenders. So in our case, we are using to protect our resources.
Now, this diagram shows how deception works. Now, there are two users. One is the benign user and the second one is the malicious user. Now, both have access to the common user interface. Now, depending on the type of user, depending on the input, depending on the activity, either they will be provided with the real system or the deceptive system.
So benign user, if he's authorized, if he's authenticated correctly, then he will be given access to the real system. Otherwise, the malicious user will be redirected to the deceptive system, which looks exactly the same, but it's not a real system.
Now, there are two types of deception technology. One is active deception and second one is passive deception. In active deception, what happens is inaccurate information will be provided to the hackers intentionally to fall further trap. In passive detection, incomplete information will be provided.
So intruders will try to gain the other part of the information and fall further trap. Now, they can also be classified as a client-side deception and server-side deception. So mostly client-side deception is used by hackers to deceive the legitimate users. Whereas the server-side deception, it is used by the security providers to deceive the hackers.
Now, you can develop a better deception by combining both approaches, that is active deception and passive deception. That is, you can come up with a better deception which has incomplete as well as inaccurate information. Now, let's see the deception's evolution and its advantages.
Now, honeypots were introduced in the year 1998. So honeypots are the small traps in the network. When trying to access those points, they will fall further trap. Then HoneyNet is nothing but the network of honeypots. They were introduced in the year 2000. Then Honey token is a small piece of information which is embedded in the real information.
When somebody steals the real information, this token will give the alert to the system administrator saying that so-and-so message has been stolen. And it gives the information about how it has been stolen.
The next honeypots 2.0 were introduced in the year 2012. The deception technology came into existence in the year 2016. The advantages of deception are increased accuracy, minimal investment and future ready. So it is applicable to even new technology and even the existing technology.
Now, let's first discuss the web trap deception tool. It is designed to create deceptive web pages to deceive and redirect attackers away from the real websites. So the deceptive web pages are generated by cloning the real websites, specifically their login pages. So this project has two files. Basically one is web cloner and the second one is web server.
Now what web cloner does is it clones the real websites and creates the deceptive web pages. And what deceptive web server does is it is responsible for serving the cloned web pages. Note here it is serving cloned web pages, not the real web pages. And reporting to the syslog server upon request.
So if anybody tries to access the cloned web pages, then it will be logged in the syslog server. Now you can install this web trap tool by following these commands. But the problem with web trap tool is presently it works only on Ubuntu 18.
Now you can see the usage is shown in this slide. So you need to make use of web cloner.py file and you need to specify the next parameter as the output directory. Then the website URL you want to clone. So example is shown here. So here we are cloning the Wikipedia as a login page into the directory Wikipedia login page.
The next is the web trap, that is the deceptive web server. So to use it, you need to specify the file trapserver.py. So to trapserver.py file, you need to specify the directory name and the syslog server. So here trapserver.py file is serving the login page from the Wikipedia login page folder which has the deceptive web pages.
So when somebody tries to access this folder or the files within this folder, it will be logged in the syslog server. The next tool is daemon hunter.
So daemon hunter is used to create low interaction honeypot servers and it has agents and a manager to check the logs. So it allows you to create your own honeynet all customized by yourself from ports to the protocol handlers. That means you can have your own port numbers, you can have your own protocols.
So usually different protocols are also allowed. So in this diagram you can see here centrally there is a manager component which manages everything. So that means it manages the honeypot devices, it manages protocols and it manages port numbers. And note here that these protocols can be of different nature.
So they may not be just HTTP only. So it can be a combination of HTTP, UDP, SMTP, FTP, etc. Now why we developed a deception tool is we know that cyberspace is a national asset. And XML is the heart of many mainstream technologies nowadays,
including web services, service-oriented architecture or microservices, cloud computing, etc. So web services vulnerabilities can be present in the operating system, network database, web server, application server and so on. Now when a new technology is introduced, it comes with its own new challenges, plus old challenges will also be present.
Say for example, when we say SQL injection, it is available or it is present with respect to relational databases. But when we use XML as an alternative to relational databases, the same kind of injection attacks can be performed on XML document also.
So next, the problem which we tried to solve was to secure the web resources from XPath injection attack using modular recurrent neural networks. And for that, we proposed a solution that uses modular recurrent neural network architecture to identify and classify a typical behavior in user input.
So once the typical user input is identified, the attacker is redirected to fake resources to protect the critical data. So in this case, we developed our own validation technique, input validation technique that is count-based validation technique. So in next few slides, I will discuss how we developed count-based validation technique and how it works.
Now, we need to first understand how XPath injection attacks works. So in this slide, you can see that there's a small XML piece of code which stores username and password.
So at the bottom, you can see that there are two lines. One is in blue color and second one is in red color. So the line which is mentioned in the blue color, it actually indicates the valid query where valid username and passwords are mentioned. Whereas if you consider the last line, which is specified in the red color, that's a malicious query.
So you can see here, no real data is used there. Attack vector is used. So you can find some Boolean operators and some unwanted characters. Now, on XPath injection, it clearly states that to perform XPath injection attack, you really don't need any skills.
So any beginner can perform these attacks, any beginner can perform or create an attack vector to perform the attacks and typical likelihood of exploit is very high.
So that's why it is a very important thing to handle. Now, we studied a few research papers and we found that there are some gaps related to the existing work. Now, what we found was neural network approach to identify and classify a typical behavior in input was not yet done.
So the study showed different approaches to handle XPath injection attacks. It also showed methods applied and their disadvantages. We can conclude from the study that neural networks are not applied to detect XPath injection attacks and existing results are not promising. The study showed how modularity in case of neural networks helps to achieve improved performance.
Modular neural networks have not been applied to cybersecurity, particularly to the detection of XPath injection attacks. Now, this slide shows the system design. So it shows the working of the entire system.
So you can see there are three tiers. One is presentation tier, business tier and digital tier. Presentation tier has the login form. There's nothing but the user interface through which the user or the attacker interacts. Then we next have the business tier where the data processing or the application logic is stored. The next we have the data tier where we are storing real XML document and the fake XML document and also the custom error messages.
Now, if you consider some examples, some examples of valid inputs are email ID, mobile number, etc. Examples for malicious inputs are also mentioned. Then there is a third category that is some invalid inputs.
So that is very large input string, string with special characters and etc. So what happens here is when an attacker uses invalid inputs, it's going to generate error message and error message also gives you some information related to the system. For example, which browser the client is using, which system the client is using, etc.
So to avoid that, you can design custom error messages which hides the system information. Now in this algorithm, we are describing how our count based validation technique works. Now the first step is to scan the user input and next determine the length of the user input.
Then count the frequency of every character. So you need to count the frequency of characters, digits and special characters. Now you can see here in table 4 on the bottom right corner, we have specified the character and the threshold. So that means only up to the threshold the characters are allowed.
If it is exceeding the threshold, then appropriate error codes have been assigned. So if the frequency of the character is below the threshold, then the value is set for that particular character in table 4. Then set the error code to 40. Else if the frequency of characters mentioned that is the special characters is above the threshold, then set the particular character threshold to 4000.
Now what we have done is we have modularized the neural network. We have not used a single neural network because that will result into a lot of samples. So to reduce the samples, we have divided into three neural networks.
So in the first neural network, we are training it on login attempts. In the second neural network, we are training it on error codes. So now to build a neural network, we have used recurrent neural network with 15 neurons and hidden layer as LSTM network.
That is long short term network and output layer as softmax. Then we had used resilient propagation trainer to train the network in a training data set and test data set is created in real time to validate against the data set.
Then if the training error is 0.0%, then classify the input as in table 3. So table 3 is mentioned in the next slide. If it is classified as valid, then you need to display the message as login successful and read it to real system.
If it is classified as malicious, then the content from fake XML file has to be displayed. Otherwise invalid, then custom error message has to be displayed. Now you can see here we are training our third neural network based on this data set.
So we have output of neural network 1, we have output of neural network 2, then the final classification. So if one of the output is malicious, then it will be classified as malicious. If one of them is invalid, then it will be classified as invalid. But the flow is from valid to invalid to malicious.
Now we had used PyBrain library for neural network. Then for web service, we have used BottlePy microwave framework. For web server, we have used BottlePy and Apache. Then similarly for drawing graphs, we had used Python, NumPy and matplotlib.
Then PyBrain is a modular machine learning library for Python. It is short for Python based reinforcement learning, artificial intelligence and neural network library. So to download, you can follow the URL and you can follow the instructions. There is a very nice tutorial, you can follow it to install PyBrain and execute examples.
Similarly, Bottle is a fast, simple and lightweight microwave framework for Python. So it is distributed as a single file, so it has no dependencies, no need to install. Just include the file in your directory and start using it. So it has built-in routing templates, utilities and server modules.
So again, more information can be found in the specified URLs. So these are the results we have got. So in case of true positive, you can see here we are getting more stable results with respect to modular neural network. Whereas results are unstable with respect to single neural network.
So similarly when we compare false negatives, it is the same. So results are better with respect to model neural network and single neural network. So same is the case with true negatives and also the false negatives. Now when you compare the response time here, so you can see here when we use modular neural network, our neural network response time is less.
So when we use single neural network, it is taking more time. So the ratio is 1 is to 1.5. Now when we summarize the results, we see that you can see here the results of modular neural networks are better compared to single neural network.
So that is including and excluding outlier, the results of modular neural networks are better in all the cases. Now these are the screenshots. So you can see here in this slide, we have fake data file and we have real data file. So if you observe the structure, both looks similar, but the thing is data in one file is fake.
It's not real. So this is the user interface which we have created for our experiment. So in the first case that is valid in the scenario, user will enter the right or legitimate username and password.
So it will be classified as valid. So it gives the message log in successfully. Then similarly, if user enters malicious query, then note here that it is not going to deny the access. Instead it will display the fake data.
And also note here when it is displaying the fake data, we are also capturing the details like the server web browser, query the attacker as used, what is the port number, etc. And also we are also capturing the log in attempts. Then next, what the attacker will do is he will try to log in with the fake credentials and this time it shows that log in successful.
But note here that this is not the access to the real system, instead it is giving access to the deceptive system. So conclusion is that our solution offers input security over existing methods by misleading attackers to false resources and custom error pages.
Our results also show that the system accepts legitimate input, although the user input may contain some special characters and rejects only truly malicious inputs. So our solution combines modular neural network and count-based validation approach to filter the malicious input. So it also resulted in increased average detection rate of two positives and
two negatives and decreased average detection rate of false positives and false negatives. The secure systems have to be successful every time, but attacker has to be successful only once. So with reception, I can only say that we can buy extra time to protect our resources, but we may not be able to protect the system entirely.
So these are the references you can refer for more information. Thank you. Thank you.