Code is not text! How graph technologies can help us to understand our code better.

Video in TIB AV-Portal: Code is not text! How graph technologies can help us to understand our code better.

Formal Metadata

Code is not text! How graph technologies can help us to understand our code better.
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Andreas Dewes - Code is not text! How graph technologies can help us to understand our code better. Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that "code is text" is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code. In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development. Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.
Keywords EuroPython Conference EP 2015 EuroPython 2015

Related Material

Presentation of a group Word Graph (mathematics) Universe (mathematics) Self-organization Machine code Resultant
Digital electronics Computer file Graph (mathematics) Orientation (vector space) Direction (geometry) Source code Machine code Computer programming Attribute grammar Programmer (hardware) Latent heat Circle Control system Metropolitan area network Addition Torus Graph (mathematics) Shared memory Machine code Graph theory Type theory Length of stay Keilförmige Anordnung Personal digital assistant Query language Vertex (graph theory) Quicksort Metric system
State of matter Graph (mathematics) Multiplication sign 1 (number) Compiler Function (mathematics) Data dictionary Mereology Machine code Computer programming Programmer (hardware) Mathematics Single-precision floating-point format Endliche Modelltheorie Physical system Social class Metropolitan area network Computer icon Compiler construction Block (periodic table) Real number Data storage device Type theory Category of being Proof theory Hash function Repository (publishing) Right angle Task (computing) Functional (mathematics) Computer file Maxima and minima Abstract syntax tree Product (business) Element (mathematics) Number Intermediate language Root Complex number String (computer science) Energy level Software testing Data structure Mathematical optimization Compilation album Data type Addition Dependent and independent variables Graph (mathematics) Scaling (geometry) Information Cellular automaton Projective plane Database Machine code Directory service Graph theory Personal digital assistant Network topology Mixed reality Interpreter (computing) Statement (computer science)
Group action State of matter Plotter Multiplication sign Combinational logic Mereology Mathematics Different (Kate Ryan album) Analogy Monster group Social class Metropolitan area network Theory of relativity Data storage device Complete metric space Variable (mathematics) Category of being Type theory Hash function Modul <Datentyp> Quicksort Resultant Point (geometry) Functional (mathematics) Computer file Abstract syntax tree Branch (computer science) Number Integrated development environment Metropolitan area network Module (mathematics) Multiplication Graph (mathematics) Scaling (geometry) First-order logic Projective plane Content (media) Database Total S.A. Machine code Extreme programming Cartesian coordinate system Sign (mathematics) Network topology Hessian matrix Vertex (graph theory) Statement (computer science) Interpreter (computing) Gradient descent
Point (geometry) Functional (mathematics) Graph (mathematics) Graph (mathematics) Electronic mailing list Database Mereology Traverse (surveying) Graph theory Subject indexing Information retrieval Network topology Vertex (graph theory)
Complex (psychology) Digital filter Functional (mathematics) Group action Graph (mathematics) Multiplication sign Price index Branch (computer science) Automorphism Machine code Computer programming Number Revision control Uniform resource locator Frequency Medical imaging Programmer (hardware) Mechanism design Programmschleife Root Different (Kate Ryan album) Condition number Metropolitan area network Noise (electronics) Graph (mathematics) Information Sine Projective plane Expression Unit testing Machine code Line (geometry) Directory service Graph theory Symbol table Subject indexing Type theory Word Arithmetic mean Process (computing) Visualization (computer graphics) Software Function (mathematics) Revision control Vertex (graph theory) Statement (computer science) Quicksort Asynchronous Transfer Mode
Complex (psychology) Building Mountain pass Multiplication sign Sheaf (mathematics) Set (mathematics) Machine code Storage area network Formal language Programmer (hardware) Mathematics Endliche Modelltheorie Error message Position operator Social class Exception handling Area Metropolitan area network Theory of relativity Bit Price index Instance (computer science) Regulärer Ausdruck <Textverarbeitung> Abstract syntax tree Data mining Type theory Normal (geometry) Summierbarkeit Pattern language Right angle Text editor Functional (mathematics) Computer file Maxima and minima Translation (relic) Branch (computer science) Graph coloring Value-added network Number Mach's principle Revision control Goodness of fit Term (mathematics) String (computer science) Operator (mathematics) Code refactoring Software testing Data structure Implementation Metropolitan area network Module (mathematics) Addition Standard deviation Scaling (geometry) Graph (mathematics) Matching (graph theory) Information Weight Projective plane Expression Content (media) Total S.A. Database Machine code Directory service Line (geometry) Graph theory Particle system Word Query language Personal digital assistant Thermal radiation Statement (computer science) Vertex (graph theory) Formal grammar
Complex (psychology) Group action State of matter Graph (mathematics) Mathematical singularity Execution unit File format 1 (number) Set (mathematics) Parameter (computer programming) Mereology Machine code Computer programming Programmer (hardware) Mathematics Different (Kate Ryan album) Atomic number Cloning Error message Social class Metropolitan area network Algorithm NP-hard File format Sound effect Maxima and minima Bit Variable (mathematics) Type theory Category of being Text editor Quicksort Resultant Web page Digital filter Functional (mathematics) Identifiability Open source Computer file Transformation (genetics) Letterpress printing Similarity (geometry) Continuous integration Number Product (business) Attribute grammar Element (mathematics) Goodness of fit Causality Operator (mathematics) Data type Pairwise comparison Distribution (mathematics) Standard deviation Graph (mathematics) Scaling (geometry) Information Projective plane Java applet Database Basis <Mathematik> Machine code NP-complete Line (geometry) Transformation (genetics) Cartesian coordinate system System call Graph theory Sign (mathematics) Mathematics Software Visualization (computer graphics) Personal digital assistant Network topology Boom (sailing) Statement (computer science) Video game Electronic visual display Form (programming)
Complex (psychology) Program code Group action View (database) Multiplication sign Execution unit Graphische Programmierung Set (mathematics) Insertion loss Mereology Computer programming Pattern matching Different (Kate Ryan album) Exception handling Social class Theory of relativity Electronic mailing list Price index Regulärer Ausdruck <Textverarbeitung> Type theory Process (computing) Pattern language Figurate number Asynchronous Transfer Mode Point (geometry) Functional (mathematics) Metrischer Raum Wage labour Product (business) Goodness of fit Operator (mathematics) Energy level Software testing National Instruments Corporation Metropolitan area network Addition Pairwise comparison Multiplication Graph (mathematics) Information Cellular automaton Physical law Projective plane Content (media) Mathematical analysis Basis <Mathematik> Line (geometry) Machine code Cartesian coordinate system Graph theory Personal digital assistant Network topology
so well hello everyone I think we can start so hi my name is Andreas and then we'll talk about cold and graph technologies and initially all can learn new things about being used as a ground so 1st of all I want to thank of course the organizers to be younger excited and most exciting to see that this is the interest in this topic again there's a few words about me and and this is this is the origin of the and I started working on code qualities some 2 years ago and so on and in fact small spinel University of communities in quote quantified code and 1 of them and they don't to like think about our code quality and how we can improve cold and so most of the stuff that show in this presentation is the result of that word alright so when I ask you to
think about code most of you probably would think of something like this except if your assistance program of course then you would probably like to see and we lives flying around and like receiving a metrics erecting a brain but most people would like think when think of cold it's something like this circuit is a collection of text files that we open and edit and that we found edit and share with others to work control systems like appears and in this talk them and want to show you
that code can also you something like this is not the text but a case of our journey and this sort of electors 1st want to show you why graphs interesting and then I want to show you how we can store or code a graph and then what we can learn from the graph and finally how we as programmers can profit from the knowledge OK so and to get started his 32nd introduction to everything you need to know about that so in this talk when I talk about a graph what I mean is actually a collection of nodes or sometimes called vertices additional circles here which are connected to each other through so called edges you can see an you and the thing is that an ad for example always have like in this case is the direction so it goes from this node to this not for example and can also label in this case it's the faster and it can have some data sources and the node itself can also have some on data and the type for example and somebody that you had some other attributes that are associated to right so on present pretty old idea and I think what changed in the last years so is that we have a lot of new technologies and solutions for handling graphs and storing data in graphs pretend we like databases and new for oriented be around would be your type that allow us to store very large graphs and to perform queries over here so in this talk I want to compute the technology side of the out of the picture and and talk more about a generative graph instead of talking as this of a specific technology alright so and in
programming and compiler theory graphs or nothing you uterus so on graphs have been used there for a long time and mostly in interpreters compilers for various use cases such as called optimizations on code annotations rewriting of code and also as an intermediate language that interpret would used for generating for example in single cell what are the most common use cases have in common is that they are not intended for the end users so that kind of like I only used internally by the interpreter but we as programmers were not supposed to to look like interactive to graph and so and in this talk I want to show you why what we can actually learn when one's not sticking to his will and when actually be doing stuff to code and then the graph and generating OK
so that's the right into an example of use of some type of of cold it's a simple function that encodes dictionary of values so that you can sort of adjacent by for example you can see it's just the follow the checks for the type of 2 elements in the in dictionary and if it finds a complex value it all rights that out separated from the in the imaginary and the real part separately into another dictionary and then it reaches a storm coming it so on other values of the dictionary so pretty simple and now we want to generate a program for this but this is actually pretty easy and fightin but the only thing that we need to do is important to use the model and then intended to parse a string we can just pass the whole thing here and then as the output we get the data structure that looks like this so you can see that for every element in the colon and then we have the knowledge in the resulting graph on the right is so for example the function definition here which was and called encompasses focal is the 1st node in the graph and then we have various other for example 1 of these ones here that are related to the function definition through the body that and that contain for example the assigned statement that you see here which contains inter and in Maine and the dictionary and on the other side also name for example in this following this for so but this is basically it so we can like we couldn't principles of working with this and like thinking about it and doing stuff but what we want to do now is to not do not only did it on a small scale on a large scale so we need some way to store additional information in a database and number explaining how how we can do that so for doing this and we can we can make use of the trade that for example jet or bitcoin users and that makes use of so-called Merkle trees and if you ever wondered why get can like to give you a very fast response when you have like a product of let's say 10 thousand 100 thousand 500 change some of them and to make a comment and it doesn't take like ours to recalculate the state of the project would its an answer and if you like MS unity and the way it does that is by treating the whole project as a as a tree of the graph on of fashion and so we can see here an example for this so this is for example from the class project and and when you make a commit engine stand did we only whatever and associated snapshots tree commits and with which has for example this upper note here that contains the whole project and which also has a hash that is the Hessian will yield just given command and then on this dysgraphia would contain several such for example for the last directory Ortodoxo rectory which in turn also has edges and this would continue until finally we did some some block blocks find in this case is also have an associated test and not marketplace root flying taking and by starting at low almost nodes of the tree and then working its way up the tree every time generating the hash of a given element using all the properties of these elements as well as the hashes of all elements that are below in the tree so for example this past year would depend on the value of the of the of this node here but also on the hash values of all the nodes that are below that node in the tree and like this if we change for example only 1 file product then just wouldn't need to recalculate the whole tree or like the whole repository but it would only say OK or something is changing the tree and I look OK which parts of that tree actually change and and would say OK this tree is unchanged so it can just use the value the house for that and then all the streets and the change in and which recurs talented until finally this file has been changed so I need to like for the new value for the final database but apart from that I can just use your data that have so this is what makes gift and other systems really fast and so actually the block that we have that kind of the same thing only that we have In addition to the to the Market so-called proof of work on top that verifies that all the changes that you make to this structure right and for storing our code a graphic and basically directly take this idea and take it a step further that is we do not stop like it us at the file level so we don't say OK this is 5 it can generate especially for it and then we stored in the tree but we can now use the graph data we have fortified and store instead of the final here the whole graph of the coat industry so to say and if
we do this and we can have a look again at the example from earlier it would start at the lowest nodes here we would then calculate the hash value for and then we would go up the tree and continue calculating national for all the other nodes in the tree until we reached the final here and with adequate then store each of these nodes individually in graph databases of their relations and so on I have to call trees of the state in the database that analogy would imagine that we can change something in our code for example we can change some parts of this assigned statement statement we for example on changed a variable name here the idea and make some changes in the dictionary and we could apply the same technique that we saw earlier to like first-order changes in the graph database and so on again we would like calculate the Hessian of the solitary and then we would see OK can somehow hash of the whole thing to dysfunction definition has changed and because he had this knowledge he has been changed so we need to look like story new entry for that in the database but we would also see that for example this for statement so the same so we don't need to create a new node in the and that instead we can only we can just take and action linked to that existing so that's pretty efficient and if we look at this a on a large scale and so here is an example for the flask project will restore multiples comments of the project in a graph database and what plots here on the axis are the total number of edges and vertices that you have in the project so it's kind of like the whole content of your own called files versus the actual number of edges and vertices that we store in the database and you can see in the beginning when we at the 1st comment the number is increasing rapidly because we're encountering and not a lot of new edges and nodes so to say a lot of people that we knew that you need to create in a database but then if you keep adding more stuff to the database is that and we have many of the things that we had already in the database so we do not need to create new knowledge and new edges for that but instead we can use the monster the existing so so you can see here that are when we are at 500 thousand vertices in a graph we only have to store a about 8 thousand extra vertices in the database so this makes this kind of this kind of storing called really efficient and also good on a large scale this pretty
cool because found using this technique we can that I'm not you can then restore store everything in the graph database because like I for example ideas also more on some parts of the code and the end of the graph to for example do things like completion or co-browsing but they never used to store the whole code into in the graph database with this approach we can actually do this and also can only stored code of a single project in a graph database but Combined Code of multiple projects sort of in the same graph database and for example see there's some shared between individual projects or some other things relations that we can learn from the code that we have in the and of course is that earlier we cannot only do this for releasing the state of a given product but for the whole commit history so we cannot only see on the state of the fixed point in time but also see the changes and the differences between individual states of 2 projects over time OK so the end result of this looks like this for example this is again the graph of the flask project we have sort various comments and the master branch and a graph so it's about pretty relevant purchases and about twice as many edges and you can see for example the modules and then you can see several classes and a lot of functions and overlooking the cluster of the graph you can already see and what I talked about earlier namely that the Hessian of the individual nodes because of that that every note that this 1 the same properties will be stored only once in the graph database you can see here in the center a few nodes that have a lot a lot of work edges so that in coming to them and these nodes are actually especially what types of descent extreme Python that for example to compile interpreter that we want to load the variables or we want to store something to various and so so you can see that those also exist only once and so you have a lot of incoming incoming edges into them all right so my probably thinking man disguised bananas and in what can I what can I do with this this graph things rather useless and it's pretty but what can I learn from it and so the
next part of the talk want to use to half so we actually can work with this graph data and
when it comes to graph databases and they're working with graphs there like to have things that you always do so 1st thing is in order to get a starting point for your exploration you need to somehow select some edges from that some edges or some notes from the graph and you usually do this by and using some indexes that you have on the edges and the vertices to like for example retrieval list of all the function definitions or or a list of all the nodes that you're interested in so as soon as you have this list of nodes and you can work with them but just following the topology of the graph so for example by I'm going through all the all coming but is of a given vertex and like like traversing the graph in an interactive way
OK 1st example for this would be a rather easy to and shows a similar names that we have in a given project sort of by usage frequency again here I use of last project and I just retrieve some on vertices from the graph using an index over some node types for example the function definitions and then a group the resulting vertices that I received on using the name here and I ordered them by their frequency in a descending with then I can see for example can I have a lot of names which contained the word index and have also some example functions so other things that contain the names for 1 of our courses and he got quite and coaches so the next example
would be on the show you already have the given function to cope with this could be useful for example if you want to see and and which Committee introducing a given specific version of a function or how like a specific function has changed over the whole world history of your project and this you could do but just starting from the uppermost vertex in a graph so root node if you want and then following down path that is given by the efforts and the modern name of the function that interesting in interested in and the name of the function so you could just crawl your graphs choreographed for the information get get all the different versions of the function and in return again noise you probably say OK this is nice but you can also maybe do this by using some like fancy rejects stuff so why do we need graphs for that and the mean navigation and like exploration of the graph with only 1 aspect what for me is more interesting and what you can do a graph is called visualizations so let's also at an example here and 1 interesting thing especially in large projects is to get an overview of how complex codons because as programmers our everyday job is mostly to fight complexity and like and managed to complexity of large software project so as an example here I and we can analyze the cyclomatic complexity of project and this concept the cyclomatic complexity actually pretty old it's from the Stone Age of programming so to say 1976 and it basically counts the number of different paths that you program can contain so for example if you have the if statement in your code than you would increase the cyclomatic complexity by 1 because the code can you go into the if statement executed branch or if the that the condition does not matter can continue on the other branch of quote so and it's kind of helpful to imagine the 2nd mechanism cyclomatic complexity as the number of unit tests that you would need to to cover a given piece of code so if you have a cyclomatic complexity of 9 need you need to mind unit tests on 9 different assertions to like make sure that you test every branch of pure gold borrowing from using our graph on we can actually calculate the cyclomatic complexity pretty easily so here I have like to reasons for Python and what is the reason does it also starts at the root of all project and then and looks for nodes that have the function definite time so a function definition the goal and if it finds 1 it choose that notice in Ankara and it initializes the cyclomatic complexity due to the value 1 because it's always like 1 branch in each function regardless if you have any statements or not so and then it traverses the graph following the outgoing nodes of the function definition of judgments and checking for different types of of modes for example for statements while loops so if expressions if statements etc. and every time it comes 1 of 2 images increases to counter of the cyclomatic complexity of 2 given and like that we can just traverse the whole vocal poetry and calculate the complexity of all the functions that we have in there and we can use that information than for example aggregating it line directories for 5 and the functions and then visualize so
this is what we have here on the realization that produces called scale and its again for class project and so what you're seeing here the different particles so each city block here would correspond to 2 years in the directory or any module or a class definition in the the last gold and the individual buildings here are would correspond to our functions in that crowd and the area of the building here is given as is the weights which is kind of thing that the number of nodes that you have on the lower given mandatory just kind of like the number of lines of code to the and the height of each building is given as the sum total complexity of the nodes for example to function whereas the color is the so called Pacific complexity that is the complexity per AST weight so good translators from more this this whole complex is your code divided by the number of mine incidents of you have a very long function that has like only very few on branch statements and statements that is not very complex and this would show up as being deviant like very short functions function of a lot of crime statements and lot of complexity than would show up of that and so you can see for example that you have like these 2 functions here 5 and the U. of 4 function that has cyclomatic complexity is of 22 and 14 and what may be in for a good refactoring here so but yeah I mean the nice thing about this way of looking at the code is that it allows it to get a very quick overview of the complexity and the structure of the coal without actually going through text files and to the code itself OK
another example is visualizing the dependency graph of project here we look again at the flask codebase and we have a and B are visualized on relations in dependencies between individual model after to say she didn't 1 generating is because up to now only talked about storing the on the cult of the the code itself in a graph database but we actually need some additional information to generate this kind of graph here which contains so for example information about the imports and relations in the code which we cannot just extract from the syntax tree so on this is something that we can do in addition but I want to talk about this in this talk because it's subject on its own so in any case what we see here is that the fast-food indicators of the model of last where most of the things seem to happen and this model some some other things for example this last underscore compact model that contains the compatibility itself and also the flask stood at model and will be also considered low like examples in the test set mainly imported last model but do not depend on other the things in the cold so again this gives you a very nice overview of the cold and like gives you a feel for how the project is structured and will hold on different modules interact with each other without actually looking into the world so if you wanted to get that information from just by looking at the qualifiers you probably need often or 1 hour going through all the findings you you would do this for which other modules so they used by like this you have everything in 1 sense so this is another example a graphs and actually gives a much better understanding of all code than the code itself OK another and interesting area and when and analyzing the code is of course to find patterns and problems and not when the edit cold in our ideally your text editor and we want to like find for example certain names of certain things in our code base we use regular expressions so for example here the string have to work and we would like use and regular expressions like match he did the German or English words in that case now on it is sorrow graphical code agreed graph on we can of course no longer uses because some and we don't have any text information available and so on in this case to the whole world would be stored as the set of vertices and edges in our graph and so we need to think about a way to like to the same thing that we do in text that is like performing regular expression matching on the ground in terms of just like my people have thought about this and and their various approaches for example there's expressed there were some like proprietary query languages like sigh for his use of new for j Our groundlings grammar and it is more like a standard for how to perform queries on a graph and is used by the use of the graph databases and now we have like developed in our own language is just a little bit more simple and which allows to like user radiates light and structured syntax to perform better matching on the graph so you would like compare these 2 examples again you would like look forward that contains either handle on that which would be designated by the Special or operator here and which is has outgoing vertex that is called the this can be reached through the followed by edge and it also contains a word that you know contents of the words World War was like this we could translate this pattern matching operator from the text world to all referee so what can actually do it 1 thing is that we can build our own cofactors so you probably all familiar with the pilot of applied by flakes which of tools like check codes for or uncertain style violations or problems and using this pattern in which we can kind of write your own version of pilot so here I have like an example of a very good piece of gold which is a tri-accept statements that does not contain any 1 exception and which also does not contain any error and so basically every anything that goes wrong in this code here would just be swallowed up and nobody would learn about it so on if you are as a team leader the programmer decides that they do not want to happen is this and you go you could just write a regular expression for this that operates on a graph and the correct expression that matches this coefficient shown here so as you can see it's pretty simple up it would have just look for the node of denote tri-accept and that contains in its handlers on section note of an exception handler which doesn't specify a types of and this is here the DMT exception handler basically and whose body only consists of a policy statement so using this regular expression we can now just go to our graph again and catch all of instances of this pattern and we could even like you want to make some exceptions for example for this so we have like also an that exception type on what we just basically useless to do some error logging and then we raise the exception this might be the case that we do not want to match this man so we could just modify or patterns OK so I want to match and everything except the 1 exception handlers that have in the body they a statement of the type tie-break race so we can again and change or pattern and to make sure that we do not catch these false positives so I'm compared to normal Linda like and it's also it's not only easy to rights on you projects but it's also very easy to adapt and change them and for using them for example with new circumstances OK so and so's bombers chapter at another time yes but sometimes there is I want to talk about the analyzing changes in your code base and so let's look at an example from the Django project on talked it of course not static so Gold is changed often and the programmers need to understand what actually happens in the co for example when you do when you commit and if you look at for example get up and you want to say OK what happens happens in a given
commit and you would get the lines of code line by line this which would show you all the lines that have been removed and all the lines that have been added in a given file so in this case for example we have changed and a test function and we have home like I removed 1 of the function parameters we have added to the decorator the function and we have removed a class inside the function definitions and then changed his statement here and added another statement so with text really easy and and we can just to line-by-line different to see what has changed but with graph it's actually a bit more complicated because when we change our code in the graph and what would happen for example is that like certain attributes of art graph would change like for example if you don't have introduce a change in the variable U and we have changed the name of our and now he wants like like actually the have an algorithm that tells us what has changed and is a pretty tricky problem because the what's our algorithm with see is that only the the Hessian of this sultry has changed and that some parts of the trees seemed to be changed but it wouldn't be able to like identified these changes with each other and so actually this is like an NP complete problem and so on pretty tricky so actually we are not the only ones that this problem and so on to like talk about things similar here which is the chemical similarity because in chemistry and you also deal with a lot of complex molecules for example this 1 here which is called it the units teaching cheating there that's that's contained for example in why printing and today it's under investigation for its health effects of people tried to find out what it's good for you have about 4 have and what they need to do it In order to do this is to like for example identify chemical compounds that are similar to this 1 or other things that they can use for example synthesizer it or to like like reason about it and so the genesis of special databases of hormones so good chemical chemical similarity where they can for example on look for all the molecules that would as this molecule here for example contains various benzene groups or various types of money of atoms and like which give them as a result all different candidates that they can use them for the chemical screening and for like trying to synthesize this sort the molecules so and this actually pretty complex problem as a set it's a and the complete and there have been various approaches to make and as much as possible for example this is the so-called Jakarta they have friends and which are just a that feels that contains zeros and ones for different properties of the chemical molecule so for example if you would have a molecule that contains a benzene groups you would have a 1 in place and 142 of area and if you have like a molecule that contains an O H group you would have another 1 and another place and and so on so you can like have for example 800 different identifiers for given molecules and then just for fun like could bitwise and operation to see how similar to different molecules are on this given scale and another thing that is also used quite often there are converges because of the can efficiently test for membership in a given set so you could say OK and just this molecule containing benzene groups of the cis molecule contain a certain substructure and it turns out that we can use these things for comparing coat and the for like solving our NP-complete problem as well and again this is like the subject of a whole different talk and pretty complex so I don't want to go into the details here but I just want to tell you about some of the applications that this has 1 interesting thing of course is the detection of duplicated called so we all know that the commonplace tasting code is pretty well and should be avoided and so on it's actually pretty hard to to find duplicates code because of programmers change of variable names to change some small parts of the code which makes it really hard to like detect these things using a text page approach and there's some interesting papers and some research about how to do this with graphs for example this paper from which have which is also the basis for the Clone Digger tool and that uses a use and some of the concepts that I talked earlier about and to detect the clones in the graph of code actually and another application of course to generate more semantic gives which on another example for this is like here but if you read out where they tried to like to extract the minimum number of edit operations are from 2 stages of the project so what they do is basically the this and the product and state aid this is a product instead of being that try to figure out what kind of edit operations you would need to perform to get from 1 state to the other 1 ones so which classes we would need to add functions you would need to modify the central so last thing is of course the detection of plagiarism all copyrighted code which might be less important open source but very important and like Corporate Software Development and also some interesting papers about that subject to or OK so the last thing I want to show is that the semantic if would look if you would actually have 2 possibilities to used to to make it the comparison of the different states or project using the graph so instead of the life of a mind if now we would not see for example that we have like removed the line and this added another 1 but would directly see that we have had a decorative function similarly would see that we have modified the function by adding an extra argument to it and that we have like removed to classify mission removed his argument here from this function and also the editors of the statement and maybe just doesn't seem like a big deal but it has changed a lot because we go from like all dis line and then removed to slide to actually OK and modified dysfunction parameter and I edit this decorated the function which Don would allow allows also to like perform on more and analyzes for example trying to find the cause of the difference of summed error that has been introduced to it OK and Michael is now no longer working so was not not working all we added a new parameter the functions but we didn't at the corresponding parameter to the function call and so we can like an automated receiving automated way to reason about the state of our software using the semantic OK
and that's summarize text versus graph I mean both them have their advantages and in Texas really easy to write is used to display it has universal format that can be shared and copied uh everywhere and passed though it's not normalized so in the sense that it's hard to like extract information and like relate to different pieces of of information that are in that and it's hard to analyze as well as we saw and graphs on the other hand they're easy to analyze the normalized so they have like the relationships between the different pieces of the graphs inside them and they are also used to transform on the other hand they're pretty hard to generate and then not get an interoperable so so there's no real standard for how to exchange graphs in different formats and how to like rope feminine could in a consistent way so if you would ask me how the future of programming would look like it's probably and then you still use text for small-scale from manipulation and editing of Oracle but we that we use graphs for doing things such as large-scale analyzes visualization and also transformation of also we can and should try to like to use the best of both worlds wrote was
referred to the level of you move from questioned what was the few
I got where I 1 question dealing also Milan wasted more than and induced she had an application like nationalities demand that the sorry can you the you know something with like National Instruments love being all year and at the end of this so unfortunately very familiar with the view is that for the other that graphical programming language we can just like draw lines between different things the programming code like this I mean it's an interesting example of like graphics programming but it also shows very well the and the drawbacks of graph because it's and as I said really good for us to reason about programs but a pretty horrible idea to like interactive code in my mind and in so and I don't know what your opinion is about that you know it's there is another kind of of the same thing vintage called like labor opted not text any other questions are you using graphs on daily basis in the project there yes so we actually use different knowledge is the sorrow graph but that was only a question of do use them on a daily basis new from lies the cold yes what they did is using we reused or if we used to you use the graph topology of the pervasive uh the grafting yes we use lectures so most our and products for example for code analysis of based on our 1st converting code to graft and annotating it with additional information like types and relations between different parts of the code and then performing this kind of pattern matching operations on them for example and this is sociology figure for sporting news to tell which is a thing that just a very bad man and I can understand like if you got a class and you it for the definition of work quite well what if elected group of 3 or 4 lines in the new within a function is it is it easy to pick that out so you can actually do this do this with semantic this I I mean it's easy to detect and more code goods from identical so that has the same test because it just need to compare different processing nutrient you can find pretty easily and what is more complicated is to detect like here called that has been more and then modified afterward so this is the whole problem that talked about you know like of finding like trees all like graphs of sufficiently similar so that you can transform 1 into the other like a small set of edit operations so you can do it but it's not possible in all the cases but it's definitely better than with text With regard to to pattern matching and from GNU and you get the example with an empty except that cell could you also thank you for multiple types of exceptions or and and any exception this is the main unit knowledge for example specified given exception by that we want to like the match in like I don't want to kill a fast with any type of exception not just the empty set all right over of arming then you would not even have a try except because I think if you have like the point that you will always have this mode of the type of tri-accept and what he could do of course is to modify the type and to say OK I want to mention only certain things but not in the like like a list of all the user content law then they can also use like this and and so so that here but you can basically use the same operators that you also have like with regular expressions so you can say OK I want to match this followed by something and it's followed by something else and so this is definitely possible any more questions the question from and did you actually look into this loss and 5 function whether it's really a problem you metric space it is going to look into it and analyze it yeah I mean the complexity of the decoder complexity mean that it's problematical faulty and it's just basically an indication of some part of the product that you should maybe are effective in order to make it easier for other people to understand so but it doesn't tell anything about the correctness of the program in that case so it's just a way to see where complex code is in units of the project that was the question we have more time from more restrictions to myth was few this


  314 ms - page object


AV-Portal 3.21.3 (19e43a18c8aa08bcbdf3e35b975c18acb737c630)