AI VILLAGE - JMPgate: Accelerating reverse engineering into hyperspace using AI

Video thumbnail (Frame 0) Video thumbnail (Frame 4539) Video thumbnail (Frame 16018) Video thumbnail (Frame 22291) Video thumbnail (Frame 25093) Video thumbnail (Frame 30432) Video thumbnail (Frame 38373) Video thumbnail (Frame 46314)
Video in TIB AV-Portal: AI VILLAGE - JMPgate: Accelerating reverse engineering into hyperspace using AI

Formal Metadata

AI VILLAGE - JMPgate: Accelerating reverse engineering into hyperspace using AI
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
One of the most exciting potential applications for artificial intelligence and machine learning is cognitive augmentation of humans. At its best, AI allows humans to handle more information, react faster to complex events, and potentially even sense features of the world that we are currently incapable of perceiving. This has many applications in the security field, such as aiding humans in the task of binary reverse engineering. Reverse engineering binary code is one of the most challenging skill sets in the security field to learn. The ability to look at a block of raw machine code and understand what it does, as well as recognize similarities to code previously seen, often requires years spent doing tedious analysis of large amounts of code. In this talk I show how we can use machine learning to handle the tedious parts of this process for us. If we show a generative neural network a wide variety of machine code, the network will learn the most relevant features needed to reproduce and describe that code. Once the network is trained, we can show it a new segment of code and capture the state of the neurons at the end of the segment. This neural state is effectively a summary of the entire sequence summarized into a vector. Comparing these vectors allows easy measurement of the similarity of several code sequences by simply measuring the Euclidean distance between them. These vectors can also be used as inputs to other machine learning models that can perform a variety of tasks, such as identifying compiler settings used to generate the code. As part of the presentation, I will also be releasing a tool, the JMPgate framework, which can be used to accomplish tasks like identifying library code within an executable binary.
Torus Context awareness Machine code Multiplication sign Execution unit Set (mathematics) Similarity (geometry) Computational intelligence Computer programming Malware Different (Kate Ryan album) Encryption Representation (politics) Vulnerability (computing) Task (computing) Pairwise comparison Scaling (geometry) Bit Machine code Line (geometry) Cartesian coordinate system Electronic signature Data mining Type theory Arithmetic mean Personal digital assistant Computer science Right angle Game theory Figurate number Power set Library (computing) Reverse engineering
Axiom of choice Presentation of a group Building Greatest element Existential quantification Machine code Length Decision theory 1 (number) Sheaf (mathematics) Set (mathematics) Computational intelligence Semantics (computer science) Dimensional analysis Computer programming Formal language Web 2.0 Programmer (hardware) Machine learning Bit rate Different (Kate Ryan album) Network socket Square number Videoconferencing Endliche Modelltheorie Information security Descriptive statistics Social class Area Adventure game Programming language Algorithm Block (periodic table) Keyboard shortcut Binary code Constructor (object-oriented programming) Lattice (order) Opcode Sequence Type theory Category of being Sparse matrix Arithmetic mean Process (computing) Vector space Hash function Octave Right angle Whiteboard Figurate number Sinc function Spacetime Reverse engineering Point (geometry) Functional (mathematics) Mobile app Game controller Metrischer Raum Divisor Variety (linguistics) Real number Virtual machine Letterpress printing Similarity (geometry) Branch (computer science) Distance Field (computer science) Machine vision Power (physics) Term (mathematics) Operator (mathematics) Representation (politics) Energy level Data structure Feature space Codierung <Programmierung> Computer-assisted translation Hydraulic jump Domain name Focus (optics) Information Mathematical analysis Content (media) Expert system Counting Machine code Line (geometry) Cartesian coordinate system System call Word Uniform resource locator Personal digital assistant Einbettung <Mathematik> Natural language Hyperplane Power set
Point (geometry) Functional (mathematics) Length Multiplication sign Coroutine Port scanner Set (mathematics) Similarity (geometry) Function (mathematics) Student's t-test Dimensional analysis Number Web 2.0 Medical imaging Encryption Representation (politics) Energy level Software framework Software testing Endliche Modelltheorie Computer architecture Artificial neural network Military base Digitizing Weight Feedback Physical law Interactive television Bit Machine code P (complexity) Sequence System call Type theory Word Process (computing) Vector space Personal digital assistant Order (biology) Pattern language Right angle Figurate number Table (information) Library (computing)
Suite (music) Group action Scheduling (computing) Randomization Confidence interval Logistic distribution Multiplication sign Decision theory Plotter Set (mathematics) Computational intelligence Semantics (computer science) 32-bit Perspective (visual) Dimensional analysis Semiconductor memory Different (Kate Ryan album) Forest Endliche Modelltheorie Physical system Social class Area Algorithm Linear regression Electronic mailing list Sampling (statistics) Bit Measurement Substitute good Type theory Process (computing) Order (biology) Figurate number Freeware Resultant Spacetime Row (database) Functional (mathematics) Virtual machine Letterpress printing Event horizon Graph coloring Machine vision Scattering Wave packet Number Goodness of fit Case modding Software testing Data structure Address space Punched card Mathematical optimization Addition Standard deviation Focus (optics) Scaling (geometry) Information Artificial neural network Consistency Content (media) Line (geometry) Machine code Cartesian coordinate system System call Compiler Performance appraisal Word Visualization (computer graphics) Personal digital assistant Einbettung <Mathematik> Statement (computer science) Natural language Disassembler Library (computing)
Complex (psychology) Functional (mathematics) Building Group action 1 (number) Set (mathematics) Similarity (geometry) Client (computing) Distance Dimensional analysis Computer programming Twitter Revision control Videoconferencing Software framework Endliche Modelltheorie Traffic reporting Social class Computer architecture Task (computing) Module (mathematics) Key (cryptography) Artificial neural network Binary code Projective plane Bit Machine code Cryptography Cartesian coordinate system Limit (category theory) Entire function Compiler Message passing Vector space Personal digital assistant System identification Writing Window Limit of a function Spacetime Reverse engineering
so our next talk is by Robert Brandon on Jumpgate accelerating reverse engineering into hyperspace using AI
we'd like to thank our sponsors end game silence so foes and tinder and we'd also
ask if you could raise your hand if you have an open seat next to you so that basically people in the back know that there's a seat available and then finally uh please silence your cellphone's and here's drunk all right
how you doing folks so I'm going to talk a little bit about something that's been kind of a passion of mine for the last couple years so flick torie Who am I been working in tech for a while finished my PhD in computer science at the University of Maryland Baltimore County last year largely on the research that I'm going to be talking about today and I'm currently a threat hunter will do the Velma Hamilton spark lab so when you're doing research and a lot of cases it's a good idea to kind of figure out what's the big question you're trying to answer it's otherwise it's real easy to find rabbit holes that you're going to go down and waste a lot of time without solving any mission sets for that so the big question that I've been trying to figure out is is there a good way to represent machine code in a way that computers can understand it you can kind of sit back and say well you know of course computers understand wishing code because they've executed but that's like saying you know a line shop at McDonald's understand all of the zine because they can follow a set of instructions you know what I mean by understand is can they take that particular bit of code and in some way placed in the context of all the other code that exists you know so it's kind of a question of is there a representation of machine code that captures the semantic meaning of that code in a way that both computers and humans can kind of make these comparisons between different types of different pieces of codes and this has a lot of applications you know one of the big problems or tasks in reverse engineering is probable binary similarity you know so given a piece of or given the program you don't want just one to know is this thing now where you want to know what kind of malware is it you know does it have some kind of encryption that could be that it's ransomware you know is this similar to other wraps that we've seen this also has a lot of applications in vulnerability discovery if you have a program unit I think if there are there any vulnerabilities in the program you're going to be asking questions like does it have a library that we already have that has known vulnerabilities you know in a lot of cases you can do that with signatures but then those signatures tend to break it the library gets recompile so there are definitely ways to approach the problem right now and then Biff is awesome but it doesn't really scale well you know it's been before it's great if you've got three or four binary tool on
the compare you've got three or four thousand though it very quickly becomes computationally infeasible then we also have similarity hashes like SSB even SD have those are great once again if you're over question similarity but those hashes don't really do it don't really help you as far as being able to encode anything semantic meaning of a function or a program until kind of the other problem I'm trying to tackle with this is how do you model binaries for machine learning so most machine learning algorithms you need some kind of fixed length in a lot of cases when you're working with the programs those pics and feature vectors are going to be constructed by domain experts they will look at things and say like okay there's important things like how many sections are empty you know how long is the text section you know how how many bytes is the binary how much data is there what's the introduced but those features aren't always comprehensive I'm one for one Soniya expert might take a difference in features in the mobile domain there is no real good way to decide what's the right what's the right way to do it meant a lot of cases something's like a angry of computations that could be pretty computationally intensive once you get evolve a circular layer below Levin when you're looking at machine code machine code doesn't easily fit into a fixed length future vendors machine code can be incredibly variable in length you know if you're looking at functions that's the length of the function kind of this focus and one user bottie of the program some people like writing really long functions and putting everything in one function other programmers will have very short functions and of course label data is really hard to obtain nobody didn't sit down on me look at all the programs out there we saw to blee even categorize them by okay this one has not come from the description the other real significant challenge in this field is there aren't a lot compared to appeals like visions and language there really aren't a lot of machine learning researchers working with security it even the ones who are working in the security field the ones working on you know binary analysis reverse engineering are very small subject so because of that I like to try and find approaches from other domains where other people have been successful that I can apply to the domain I'm working so for binary now analysis I found if you love natural language processing to be extremely useful because really there are a lot of structural similarities between computer languages engine was created by kind of the same web where so both consist of some kind of arbitrary length sequence you know both outcomes have a very rich semantic and conceptual information content you know on top of just what you both of composedly functions have means human at a level that's higher than the actual code before the actual sequence of instructions unfortunately enough there's been a lot of research done on how do you process and represent language it's been ongoing since before computers rate a machine I always like to try and avoid reads that need to be elevated so a lot of the natural language models kind of rely on converting the text into some kind of high dimensional space you know the hyperspace so a favorite space is basically any Euclidean space with more than more than three dimensions so it's you know it's a it's a space where you can do things like add vectors because it's more importantly for data science and machine learning if you compute this in between adventures so in data science you know it's very common to model similarity by taking taking the distance between two vectors in some high dimensional feature space and in machine learning you know machine learning fundamentally the best LaToya machine learnings or don't try to figure out how do you take these data points and draw a line between them so kind of us an example up there you have a data set in one dimension of X's and Y's you want to take your out how how can you draw a line between the X's and Y's you know if you're talking about one dimension there's no way that you can do that but if you can do something like say take the square of each value and move it up into a higher dimension in that higher dimension you can then draw a line between the two classes so the ways that you can uh take languages move the entire spaces they spend a lot of technique since I need to do this early building years one of the most common techniques is a a bag of board model where you just just take a straight count of each of them via via 4 etcetera pressing into document set the exact construct the vectors that also translates as well fairly well to a machine so you can do a bag of opcodes you just a canopy of octaves you can also do engrams which are basically going take uh say the sentence the cat ran past the two terms of the the cat cafe on and Rand passed that works reasonably well I mean when you so the problem with engrams is when you're looking at the sheen code and once you get over about 5 grams you start looking at processing power on devoted lunch to try and compute all the vibrance of the present in a whole set of bindings so the most of you but mostly really awesome not the language approaches kind of moved away from just baseball accounts they do to a concept called in bed and what embeddings are trying to do is take those word counselor take a document it converted into a dense vector within some higher dimensional space what I mean by dentists aren't a lot of zeros in it it's going to be made up of other real numbers that just leak complications so in general folks most documents are kind of sparse you know your typical document does not have all of the words in the English language salary so if Acrobat document is going to have a whole lot of zeroes in it but you have a really cool thing about the word embeddings as well is the vectors within the spaces within this high dimensional space just kind of naturally cluster into areas where high-level meeting for humans kind of resides within that area so for example if you train a document or metric space model on a lot of English language documents here can end up with a space or a hyperplane or space within that high dimensional space where the concept of capital cities is you'll
have another space where the capya the concept of you know countries is so you can do things like say the vector for London plus vector for Britain's finest the vector for France land you somewhere in the region of the vector prepares so one problem with the point with the figuring out how you apply this machine code though is that for most of the natural language second suspected models the vector is constructed by establishing the collocation of words you know what words are used together the meaning of a word is you can kind of infer that by experience so when you try and conceal to apply these concepts to machine codes you run into the problem of what first off what is the equivalent to words in machine code you know you're looking for something that has a fairly high level semantics e to a human you know something that encodes a lot of very dense information but is it so isn't to generalizable you know it's not so general that it has no meaning so you could use say op codes but an op codes like push you that it really doesn't say much about what the program does on its own you know something like a basic block is another intuitive a structure that you could use if those are easy you know a basic block is basically just continues execution code without any branches so the is simply a video sequence of things until you run into a jump or a call with some other point where the code has to make a decision about what to do but at least from my thoughts when trying to figure out you know kind of what is the basic beautiful growth of a program that you want to apply some kind of C activating T function seemed like a natural choice I mean programmers when they're coding they commonly break things up into functions reverse engineers when they're looking at a binary they usually try and break it up into functions and figure out what each function does so the problem now you know now that we've decided to use functions is functions don't exhibit the same kind of control properties that language shows you know what you're looking at a document of English language you know the words that are right next to each other you can say that these words have something to do with each other if you're looking at a static binary you really can't make that assumption you know you're going to have function definitions right next to contiguously that really have nothing in detail you might have print apps defined here right next to you know holes and socket which is right next to you know encrypt all the things on the ransomware each of those functions you know sequentially in the binary really they're a location divider you're really doesn't say anything about what functions they perform so when you're looking at machine code you can't take a co-occurrence type of approach you really have to look at the compositions of the functions you know what instructions comprise this function you know not so much what's next to it so from there you know if you're going to be working with a compositional protip to figure out okay how do we represent the composition of functions so one that just looking at the average length of x86 constructions in a wide variety of machine code you know is a common like most instructions with probably around seven lights you will include this instructions all the operators that's really computationally infeasible if your job trying to do engrams you can do you know two or three dreams but at that point each of the engrams is just going to be a subsection of a instruction and you know half and half an instruction really doesn't give you a lot of usefulness you know because you could sit down and have a human who really knows assembly
really well figure out okay hell you know what are what are the things that are significant when I'm looking at assembly so you can you know you might find things like okay if I see a bunch of bushes followed by you know up followed by a bunch of X or you know followed by a bunch of backstories that might def probably some type of encoding this encryption routine that's you know a significant pattern but there's a lot of patterns like that they're extremely variable lengths and figuring out you know which patterns are significant how to encode them it's not really a tractable problem you know if you step back and think about it intuitively you know if you really want to figure out how do you build something then you because if you want to know what features are significant to represent something then kind of knowing how to build it figuring out how do you build it you know which features are important to construct those things is it useful and do about it let's attach the approach you know and it's a really convert problem for humans to do why not let a neural network figure out how to compose functions then just case with the real you know take what the more networks people work so fortunately there is a type of neural network that does that which is a character RNN their car needs is a generative neural network that generate step sequences you know great thing about the about a member of internal network news we don't need label damage printed you know the data is its own label and the other really also things about this particular architecture is all of the popular deporting frameworks out there
as example code for doing this customer reference libraries it's a fairly common law arbitrate residence so training the decorative RNN the bases are going to show at a sequence of place one bite at a time it's going to try and predict the next play sequence so if it gets it well you know after you after you go to a little predict everything Jesus feedback and let it correct the place the one with trees but like in this example you know you have a throw network that isn't time students is trying to predict the output based on although the sequence of it isn't seen so far you show up a letter C he's gonna predict the letter a good job no networks in there that's right you know as it says okay I've seen CA I'm going to predict the next letters T great that's okay not gt80 maybe the next web is key you know at that point the tributes interactivity the fact that that's about that's not a valid word let's go ahead and correct the weights at that point in a way to hopefully get it working so this is example this is just a bit of assembly that a trader general or a chemical or net on lots of assembly you can kind of see that a what it's beautiful looking assembly it's up all the registers right you know there's no registers that you will expect to see and it even learn to think up the stack when it's done so when you end up with after you train one of these networks if you basically have a method for inventing your functions into higher dimensional space the final set of activations in your in your general coverage neural network you can basically treat as a high dimensional vector you know if you have 100 neurons in your network then each of them is going to have an activation which is a number you've got 100 dimensional vector within vector space and just give it the way the lady treating process works similar sequences of code are going to call T similar activations within the neural network so you can kind of say that if you've got two pieces of code each of the vectors if you say they're similar based on the fact that they activate a lil Network that's looked at a whole lot of x86 code the same level so of course you know be a data science scientist there's no science without testing images you can say hey this look this is great but how do you know all these numbers that you're generating actually do what you want them to do in your not just any use of neural network garbage so in order to test this I trade pre-eternity LSP ends on the data set consists of functions from react OS our attendance resistant you'll comprise with both GTCC and digits we use overall data tables about 23 million functions I'll scan for all single layers pretty limit with a hundred nodes on the 500 new one and abroad by probably stimulus and how much representation can you get within a particular neural network opportunity and just given that of the models TMS take forever the trainees will retreat
into the spirit free training definitely not the background be sure to time okay that 500 times and well all the specific schedule that he carries Valmiki janitors so the next the next big decision is you want to work with assembly versus evolve honoree there's been some prior research but in a lot of cases people want to disassemble because of which you're trying to look at these lovely human assembly a lot better than binary there are some downsides to that well the first problem is what is the correct disassembly you know every disassembler can disassemble things slightly different you might have one disassembler that wants to use AT&T syntax other one that once pieces and also guys I'm kind of kind of figuring out which one is correct is a hard problem and really I prefer to try and keep things as close to the Netherlands possible you look closely original data set without introducing more one problem I did work I did try to find a workaround for those they're all be a binary so this is some valuable semantic confusion so the x86 instructions that is a for example any call any function calls are done relative to the current address so if you have a like say if you have the print apps hitting that address 20 in memory if it's called from a function at address statement as address by it'll say that they actually call me plus 2d you call a that adds offender to say call the function f equals 10 so that's the event introduces a problems because there's no way in hell that the same function is being called multiple time so in order to work around this I do some basic novelization of the data you know if there's a function that he imported some idea of binary then I'll just take before I send it over to be vectorized I'll just computer 32-bit action function name and substitute value to the axis that every time they call print out you'll never see this is just a call for the same measures so how do we how do we evaluate these embeddings have to remove them in a lot of cases the criteria for deciding you know are embeddings any good it's kind of like well do they work for the column I'm kind of all that's great from an engineer's respect is not so great from a scientific rigorous perspective so one of the ways you can kind of evaluate some visions or you can kind of plot them out and look to see you know the league kind of have some kind of structure that's interesting you know you can do uh you can do suit random sampling you just have something kind of eyeball whether they look bit or not it's okay but I mean having the human kind of eyeball them that's a it doesn't scale and then you're also kind of prone to bias you know if you really want your algorithms work to come you might be tempted to ignore some of the death result it you can't quite explain focus on the substance can so in order to evaluate these embeddings I'm going to kind of borrow some similar things that longer to research in the natural language spaces in doing figure out some standard tests or venue model so in the natural language of realm you have things like a standard list of synonyms you can evaluate okay bebe vengeance day that all these words are synonyms we don't really have that for machines so the criteria I'm proposing for our consistency you know are beans and veggies consistent with inventing is generated with other algorithms or with other models as well as you know the ultimate job of a embedding model is to try and extract the semantic content of something to latex dimensional space so can we come up with some standardized tests to kind of measure how much semantically he was being extracted this is just a scatter plot of some of the inventions in his father vial operating systems and compiler so it can kind of sell well not quite as obvious with the colors but you can kind of see that you know even just eyeballing it the stuff that was compiled with GCC which was Deming Arch Linux you know was sitting in a very different area of the space compared to the stuff that was compiled with visual CD and as a guy he became less you know I say okay you know yes so in the visual studio in GCC he's completely different punching prologue into both detect totally different so when I'm looking at it is totally obvious better but you don't when we were training you Betty he didn't try to optimize we didn't say he didn't tell the model hey these things are separate to me to billing that that's just something that has picked up on the phone for
evaluating consistency I'm going to define two types of pieces we going to say heart consistency which is in set you know using bottle one the nearest neighbor for functioning instead line if we do the same here's the information in onset - they have to have the exact same years name that's a fairly great risk criteria there that's something you definitely by random chances not not going to happen even to reasonable novelty money for example is the citizen is the word fluffy you know up closer to the doofus off than the word though but they both kind of have the same II but which at which of those is should be the closest neighbors so to kind of relaxing below that I also have a major of soft consistency which is that full interview cake function a inset one then if nearest neighbor and tattoo is going to be within the communities so that way you're saying you it may not be exactly the closest one but it's still in the same area so evaluating the models like actually got some pretty good results for consistency here so for the consistency nation and I took a random sample of ten thousand fun functions that's because the we equal M by n usually takes forever and out of those ten thousand functions around a quarter of them met the criteria for art consistency between models so that's basically both the 100 those neural network and the 500 million neural network 25% of the time they said okay these two functions have the exact same years neighbors and when I relax that to the soft consistency of ten years neighbors then we are still getting around fifty fifty plus percent which is really good result design if you try and look at the eat expect for random chance you expecting your years as disposable this kind of consistency so this really kind of gives you some kind of a confidence that the neural networks are learning consistent useful stuff about the valve out there so for the problem of standardized tests give your guys tests not just for your kids anymore so the test I came up with our if it not be embedding you know can we tell what even custom bedding to retell welcome can we train a model or tell what compiler was used to do a bit to compile that function you going a little further can we tell what optimization sentence we use with that compiler and on top of that you know can we determine whether a particular mod functions from a particular library for example ws 232 that dll can we tell whether those will use in this function from them so to do that I have kids that I had labeled Abbott's I compiled all these things myself I've seen several classifiers so the Turing the logistic regression classifier with sub to tell what compiler was use and no surprise that actually got 100% accuracy which I'm looking at the PC for earlier you can kind of tell that the two classes are basically your linearly separable in this space what R is actually even one press bliss was the softmax classifier that I built for detecting the compiler optimizations actually got between 72 and 85% accuracy depending on which many times we've got a little better is a higher dimensional event that's really impressive when you consider that a in a lot of cases especially for small functions a compiler may not generate different code you know if you have a cook if you have a function that just takes two numbers add seven injures the result that's really matter if you use it you'll want a row three there's really not much you can do to change that reconsider it even the in the cases where the softmax pods fire didn't get the correct optimization sentence it still it still determined that it was the correct compiler which says oh I know this is TCC but I'm not you know I'm going to say maybe it's 51% that it's compared with - oh one in 49.5% is compiled with special three that's definitely not position in addition to that I trained a random forest classifier on functions that had WS 232 for him for Tom WS 232 dll and functions that didn't that was able to get over 78 to 91% accuracy with geography increasing the dimensionality of engines which I thought that was
actually really good really appeared to be including things like ok this has an import from the Windows Network you know it looks what it looks like these things work so how actually do something so this is where report that I'm working on and calling Trump a concern so basically one of the challenges in the reverse engineering space is that there is definitely not the shortage of tooling to use we've got Ida Pro you've got try to be enjoying and cover there all these frameworks do really awesome stuff remember them interoperate build yet none of them really work in the same way so in a lot of cases you know in this what kind of pick which do they like that I might approach danger you know I don't want to touch anything else other people oh I approach to expensive I like good air so my goal here is to be able to take advantage and use them with whatever kind of program you want if you're nighted user you should be able to use vetti models we dare use restraining device versions so kind of the architecture with chunky is you'll have your client which is going to be Ida Pro binary engine over there but whatever front-end you really like to use the key here is the vectorizer which is basting just a a python class that implements you can write spectra visors for whatever framework you want you know for a by forage person you can wear your gun fires and buy torch you can just use your Karis person you might easiest if you really like just coding everything from the ground up if you do that as long as as long as you can send something of flat screen fights then you're good go from there your vectorizer will you know convert move it up into this high dimension into your high dimensional space and you can send it on to modules that do whatever task you want to train an Iowa cops fire so that you know if you want to have it do find the ten years neighbors at your collection functions and send back whether you have something within a certain distance you can do that and it's really of intentionally left very brief ones than whatever you want except obviou you example applications that I've either tested or I'm working on right now our function similarity so I've got this binary that I just found somewhere I wanna know in my and all the other binaries I've seen do I have similar functions compiler identification this binary which functions will compiled with just Visual Studio and all the idea what compiled official CEO and when he addressing the crypto that's a little more challenging because that requires building data set of crypto functions trade classifier epsilon this kind of enables that available and you only have limitations what can you think of these vectors a vector space model code so some of you aren't going stuff I'm doing with this Python doing videos building out a 64-bit data set model that should be a lot more interesting 64 bits where everything's moving right now as well as you don't have the same function calling different complexity that you have what x86 you can do compiler problems on 64-bit a taxi to be fairly 50 also in the double transition to the entire project from Python beautify humphreys not up and get help right now while I finish that transition hopefully get it up there within the next week or so let that see a URL where we'll be able to get up there and up internet that's where all you can message me on Twitter a few months