Easy Ada Tooling with Libadalang
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 644 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41279 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Wechselseitige InformationMaxima and minimaExecution unitFamilyLevel (video gaming)WeightCompilerMessage passingImplementationComputer fileElectronic mailing listToken ringFormal languageProcess (computing)CodeFilm editingComputer programmingType theoryLibrary (computing)Declarative programmingLatent heatSlide ruleText editorBlock (periodic table)Object (grammar)Wrapper (data mining)PrototypeRootPredicate (grammar)Instance (computer science)Functional (mathematics)Uniform resource locatorResultantContext awarenessNeuroinformatikMathematical analysisStreaming mediaComplex (psychology)Source codeExpressionStructural load2 (number)Office suiteSystem callSelectivity (electronic)IntegerInformation overloadSoftware engineeringVirtual memoryDampingElectronic signatureProjective planeRule of inferencePoint (geometry)BuildingMultiplication signBeta functionTransformation (genetics)WritingIdentifiabilityImage resolutionExecution unitVariable (mathematics)Matching (graph theory)Exploratory data analysisQuery languageIntegrated development environmentParsingComputer animationLecture/Conference
09:15
Inclusion mapObject (grammar)Declarative programmingLine (geometry)ImplementationDemo (music)Scripting languageSingle-precision floating-point formatHierarchyToken ringNetwork topologyBitSource codeRepresentation (politics)Type theoryWeb browserSubsetFunctional (mathematics)Point (geometry)Projective planeCodeElement (mathematics)System callString (computer science)Query languageLibrary (computing)2 (number)Goodness of fitComputer configurationError messageContext awarenessIdentifiabilityComputer fileCode refactoringParameter (computer programming)Arithmetic progressionBootstrap aggregatingElectronic mailing listNumberMereologyCASE <Informatik>Operator (mathematics)Absolute valueoutputPresentation of a groupComputer programmingDirectory serviceLink (knot theory)Right angleLie groupElectronic signatureWeb pageNichtlineares GleichungssystemBinary codeAdditionMultiplication signMathematical analysisCompilation albumMoment (mathematics)Slide ruleFilm editingExpressionResultantFormal grammarLecture/Conference
18:25
Execution unitGame theoryMIDIAnnulus (mathematics)Uniform resource locatorReal numberProduct (business)Metric systemCodeMoment (mathematics)MereologyRow (database)Presentation of a groupAbstract interpretationSource codeProgramming languageIntegrated development environmentKinematicsWeightFormal languageDirectory serviceRun time (program lifecycle phase)ForestMessage passingProcess (computing)Bit rateSemantics (computer science)CompilerRight angleAlpha (investment)BitComputer programmingPower (physics)ParsingNetwork topologyImage resolutionSoftware bugCodeProjective planeLine (geometry)Arithmetic progressionImplementationFamilyRepository (publishing)Order (biology)NumberInterpreter (computing)Military baseSoftware testingFront and back endsSuite (music)BlogConfiguration spaceComputer filePoint (geometry)Loop (music)DeterminismMultiplication signError messageFluid staticsMathematical analysisRevision controlOcean currentPlanningBeta functionForm (programming)PressureVector spaceCompilation albumNeuroinformatikTowerComplete metric spaceCuboidSerial portStandard deviationLecture/Conference
27:35
Program flowchart
Transcript: English(auto-generated)
00:05
Hello, I'm Pierre-Marie de Roda, and this is Rafael Amiar. We are both software engineers at EdaCore, and today we will talk about libe.lang. So, yeah, we have very little time to explain to you, so let's go.
00:22
So first of all, what is libe.lang exactly? Well, in just three bullet points, libe.lang is a library, and we want it to allow people to get insight about Eda source code and also to modify it. For this, we want to offer both high and low-level APIs,
00:43
so by low-level we mean some small details like, okay, what is the location of this token, what is the token under this location, things like this. And also we want high-level APIs such as, okay, what's the type of this expression, or can you please rename this type and all use occurrences of this type.
01:06
libe.lang also wants it to be very versatile, so we want it to be usable from any language technology. So for this, libe.lang is an Eda library. It also offers a C API, and on top of the C API,
01:23
it can talk to basically everything, so we ship with libe.lang a Python wrapper so that you can use the whole libe.lang library from Python. So this offers an interesting feature, which is you can basically,
01:42
you have a really short path between having an idea and watching a tool because you can do a quick prototype in Python, and this is great. So, yeah, one important point. So libe.lang is an Eda library. We are going to present you some examples that will be written in Python
02:01
because it fits in slides, but everything you will see in Python can be done in Eda, of course. Let's go. So let's have a, well, first of all, let's see why we need libe.lang in the first place. So this is a screenshot from GPS, the worldwide known editor for Eda.
02:23
GPS needs to know where the block starts and ends, so here you can see that it knows, it seems that it's in a type declaration that starts here and ends here. You get to do some parsing to understand that, so libe.lang will provide that. It also will provide, well, in intelligent code editors,
02:47
sometimes you want to click on an identifier, and you expect the editor to lead you to the definition corresponding to this identifier. So this is often called name resolution or cross-references. We want to offer that.
03:01
We want to make it easy for IDEs to rename a function, for instance, or do source transformations like this. And also, so here you have some program, and we want to make it easy to write for people and for everyone to write custom tools that will, for instance, act as linters,
03:25
so detect variables that are probably cased. This doesn't match, but it doesn't matter. So yeah, variables names, if you have a rule like this, all variable names should be capitalized. Well, you can easily write a checker to do that.
03:42
We want that to be easy with libe.lang. At this point, if you know enough the ADA ecosystem, you might ask, okay, what not use it using ASIS? It does precisely that. Well, in libe.lang, because we want to serve as the building block
04:03
for tooling, including IDEs, there are several mismatches. The first one is we want to be incremental, which is you open your project, okay, you libe.lang analyzes the project,
04:21
and you perform a very minor modification. You don't want the GPS, for instance, to freeze for seconds or minutes because it recomputes everything that depends on the modification you made. So when something changes, we want to perform minimal computation.
04:40
Most of the time when you're writing code, your code is incorrect because you're writing it, and so we want libe.lang to be as helpful as possible when your code is incorrect. And also something very important, we want libe.lang to be somehow bounded in the resources it uses,
05:03
so we don't want libe.lang to crash your program after three days of a running process because it exhausted all virtual memory. So ASIS and, in particular, GNAT's implementation of ASIS, well, they were implemented with some objectives in mind.
05:23
Here we have a different, we have needs that kind of contradict them, so GNAT and ASIS are poorly treated for what we need. So we decided to do yet another library. Yes, talk. I just want to make one thing clear here.
05:41
There is no problem actually with the implementation of ASIS, sorry, with the specification of ASIS for those needs. The problem is more with the implementation that is based on GNAT. So GNAT is a compiler, and it was done to do all its work in one pass. So basically it's not adapted to be integrated in IDs, and hence ASIS implementation that we provide is not adapted either.
06:05
But we found other problems with ASIS at the API level, and we wanted to take a shot at doing something that is more user-friendly anyway. So this is why we created libe.lang. Thank you for those decisions. Okay, so as of today, what does using libe.lang look like?
06:26
Well, first let's start with the basic level of languages, tokens. You can, in libe.lang, ask to parse a file, and then to ask for the list of tokens that came from this file. So here we have an Ada program, a really simple one,
06:43
and this is a simple usage of the API, so in Python. You create a context to host your computations, you ask it to load a source file, and then, so you take your analysis unit, and you take the root node of it, and you ask for the list of tokens corresponding to this root node,
07:03
and you print them, all of them, so this is the result. So, well, asking for the token stream is quite easy. Next level, let's go to the syntactic level. So this is a more complex Ada program. Well, you can ask here to, so you take the root node of your analysis unit,
07:21
and here you ask, okay, find all nodes that comply to this predicate, so this is a type, so find all nodes that are object declarations, print their slope, well, source location ranges, and their text, and so this is the result, so again, something useful.
07:40
So, yeah, performing this kind of query is useful, for instance, for linters. And next level, and this is getting more and more interesting. So this is yet another Ada program where we defined two double functions that are overloads, so they're called the same.
08:02
They only differ by their signature, so one of them takes an integer, returns an integer, the second one takes a float, and returns a float, and there is a call to one of these double functions. So, in the bed alone, okay, so I didn't repeat it there, but we have asked to pass this analysis unit.
08:22
Then here we're trying to get the double call, so we find all call expressions whose called function is named double, and okay, so here we have, so this call is present here, and then all we have to do in the bed alone to get what double function is called is to get the name,
08:43
so the call actually gets also the arguments, and if you get the name, you only have this, and you ask for the reference declaration, and you print it, and the bed alone selects as we are calling double with an integer, the first overload is chosen, and so the bed alone finds which is the one that is called.
09:05
This is good. You have a question? Okay. On the previous slide? Yes, sorry. The object declaration, the previous one. Previous, yes. The second line of outputs, there's only one line with two object declarations in it.
09:23
Syntactically, this is in the Ada grammar, this is a single object declaration that declares two objects. That's an interesting point, so actually... No, no. Just give it to me. Okay, so that's an interesting point.
09:41
In the Ada grammar, syntactically you have one object declaration node, but indeed semantically you have two. So since the bed alone's prime goal is to make analyzers and tools that act on syntax, we want to keep as close as possible to the syntactic representation,
10:00
which is why you only have one node. We don't modify the tree after parsing or stuff like that, which is another thing that is difficult with Gnat and Azis, because they are compilers, they want to emit code, so they might get rid of that representation very early on, and then you don't have access to it anymore. What if I have a reference to C, for example, and I want to jump to the declaration?
10:21
So for the moment, reference declaration will just give you the whole reference declaration. That would be enough, but from C I have to find the declaration... Yes, absolutely. So what we plan to do is to have an API that will also give you the precise identifiers that you are looking for. That is not done yet, but it's not too difficult to do.
10:50
Well... Okay. Now, so this is being worked as we speak, almost. So we want also to provide a feature
11:01
that enables users to actually modify the source code. So here we have, on the top of the slide, an ADAP program. And then here, this is the use of the API we intend to facilitate with Ilbé Daland. So, yeah. So first of all, you...
11:21
So imagine we want to turn this into this. So all we have to do is to take the call to put line and to modify the input arguments. So first of all, we find the node corresponding to the call. Then we start a rewriting session, because while we're rewriting things, we want to keep the old thing available for...
11:44
to help you doing the refactoring. And then, so what you do here is to take a kind of rewriting handle to the parameter here. And then what you do here is to say, okay, let's rewrite this parameter
12:01
and rewrite it using a string data wall, this one. Then you apply, so that replaces the old source code with the new one. And then you're supposed to get this. So we want to provide that. And work in progress. Which is a long question about that.
12:22
Yes. We're trying to bootstrap Ada from a very small assembly. I'm part of bootstrapable.org. Sorry, I don't... I'm part of bootstrapable.org, and we have problems with Ada,
12:42
because Ada is implemented in Ada. And we have a self-evolution problem. And I wanted to ask, is it possible to use this to do transpiling to see something, for example? I suppose that we answer that at the end. Yes. Because that's a difficult question. Yeah, a little bit of topic.
13:00
But yeah, let's discuss that after. What would happen if the string hello world appeared twice in the program? For example, another putline hello world. So this... Actually, this example is incorrect, because findall returns you a list. So here we were supposed to extract
13:21
which found element we would have to work on. So in this example, if there were multiple calls to putline, we would have several results, and we would have to pick which one we would want to rewrite. So if I can be a bit more precise, the way you are finding the node
13:41
is not by searching for the text, but you have the option to search for precise contexts. For example, you can say, I want the first call, or I want the call to this function, even if you have another function but the same string literal. So you have a lot of granularity, because you are doing a query on the tree and not on the text itself.
14:01
So here we say we want the first call expression, but you could say something else. And get the node that you want to rewrite very precisely. Okay, I'm afraid we'll get out of time. We are already out of time. Okay, so this is an example. So if you remember, in the previous slides,
14:22
I talked about a linter that will check your variable names identifier. This is one possible implementation of it. So it's the whole script. We just iterate through each given file name. We parse it. And then we check for parsing errors. And if everything is okay, we just look for all object declarations
14:41
and all identifiers inside object declarations, because they can be multiple anyway. And we check the identifier, and if it's not capitalized, we run about it. So we want it to be really simple to write this kind of tool. And now I will let Raphael talk about more usage example of the library.
15:05
Okay, I guess it's good. So Pierre-Marie showed you a bit of how it's supposed to work and how you use it. I'm going to show you what we did with it so far
15:22
and what we will be able to do with it in the future. So I didn't get that I was going to start with a demo. So a little demo to start with. Where is it? Yeah, I'm going to find it. Don't worry.
15:40
So far, we showed only Python code. So you might be like, okay, those AdaCore guys, they do only Python. So the example I'm going to show is done in Ada. So we don't do only Python. We also do Ada. So what it is is a syntax highlighter slash code browser.
16:00
So it's basically a very small subset of the functionalities that you want in IDE. So it's a command line tool that you launch on your project. Here, it's the Gnatcoff project, and it generates a hierarchy of HTML pages. And then, if you click on one of the links,
16:24
then you get highlighted code. So basically, this is done with the libe.lang API. We highlight tokens in a certain fashion, but we have the tree, so we can do a bit more syntactic highlighting. So for example, you can see that types are highlighted correctly,
16:42
et cetera, et cetera. And then, you have links to the cross references tool right here. And if you click on it, it will bring you, even if it resets the size, which is a bug, but it will bring you to the correct source and to the correct line with the line highlighted. So this is very simple,
17:00
but it can still be practical if you want to browse your sources offline. And it is shipped with libe.lang today, so you can already try it if you want. It's in the country directory of libe.lang. Okay, so my demo went well. I'm so happy.
17:22
So another thing we did, Python again, is very small syntactic-based analyzers. So this was a fun project done by Yannick, who is not here now, but did a presentation on Spark. So he was like, oh, we do all these really complicated static analyzes based on Spark and CodePeer,
17:42
but let's do something really simple. This checker is doing something very fun. It's looking for binary operators and looking for cases where the left side and the right side are the same. And most of the time, it's an error, okay? And this is the way you express it with libe.lang.
18:01
So we look for every binary operator. And if it's in the list of interesting operators, so we have multiplication, addition, the concatenation operator, et cetera, et cetera, then we check if syntactically the left side and the right side have the same tokens.
18:20
And if they do, we print a warning. So what is really fun is the number of problems we found with that in our code bases. So basically, we'd assume since we run static analyzers and we have big test sheets and everything, no, this cannot happen. It's Ada, right? It's a very safe language and everything. But well, we had a lot of bugs in our codes
18:42
linked to that, so it's really interesting. It's also an example of the power you have at your fingertips where you have access to the syntactic part of the code. So you are not into the text anymore. You can browse the tree and find interesting stuff. What we are working on right now based on libe.lang.2
19:02
is a static analyzer based on semantics. So it's not a full interprocedural analyzer like, for example, CodePeer that we have. Some of you might know about it. But it's less powerful, less ambitious in scope. It allows you to do interprocedural stuff
19:20
a little bit like Selang static analyzer. So here, for example, we have a simple example where we have a file and we open it. And I'm going to take that, too. And here, we get a line and we close it every time in the loop, which is obviously an error. But when you write the code, you might do this kind of error.
19:43
So what we want to do is to be able to warn you very early when you write this kind of API code and say, oh, be careful. A file might be closed at this point. And when you close it, it might already be closed, too. And so we are using a simple form of abstract interpretation to make that.
20:01
And what's interesting is that users will be able to specify their own checks for their own APIs. So if you have an API that has some simple invariants like that that you want to enforce, you can add a simple checker for it. And it's a work in progress done by one of our interns at Edecor. And you can check the progress on this repository.
20:25
And we also did a copy-paste detector because we thought it was fun, given the number of bugs we found with the static analyzers. Maybe we could find, like, maybe a whole project duplicated at Edecor or something like that. It didn't happen, but we found some copy-pastes.
20:41
It's also an example of the API of libeidolon. And it's very lightweight. It's a few hundred lines of code, and it's pretty efficient. So if you want to try to run it on your Edecode base, you can find it on our blog here and in the country directory of libeidolon. So inside Edecor, we also use libeidolon for serious stuff,
21:04
not only prototypes. So we are in the process of changing the semantic engine of GPS, the main IDE, to use libeidolon. So it's a work in progress. It should happen in the following year. And also the new versions of gnatmetric, gnatstub, and gnatpp.
21:23
So gnatpp is a pretty printer. It goes through your code and pretty prints it. Gnatstub generates stuff for your subprogram bodies and specs. And gnatmetric gives you some metrics about your code. And all of those tools are based on this for the moment and are being adapted to run on top of libeidolon.
21:41
And outside of Edecor, we already have some people using it. Some guys are doing instrumentation with it for coverage. Some people are doing automated refactoring to make code smaller. Some people are making serializers and deserializers to JSON on top of it. So this is an example of the kind of stuff that you can do on top of libeidolon.
22:07
So in conclusion, if you want to check out libeidolon, literally or not, you can go on this URL. You can try it and open issues if you find problems. The API is still a moving target until we release it as a real product.
22:23
But it's very stable for some parts. Some others are moving. So it depends on what you do with it, I guess. Thank you for listening. And if you have any more questions... I have one more to add. If you want to know how libeidolon was implemented beside this,
22:42
we are doing a presentation at, what, 1 p.m. tomorrow? I think it's 2 p.m. It's 2 p.m. Anyway, check out for the language presentation in the source code analysis dev room. Yes. Thank you. Thank you very much.
23:01
Questions? Yes? Does it mean that you will give up the support for ASIS? So the question is, does it mean that we will give up support for ASIS? So we are not going to release new versions of ASIS. So it's basically baseline.
23:20
We will continue providing support for the current version of ASIS for undetermined time for the moment. Oh, yeah, yeah. But don't worry, we won't leave Jean-Pierre hanging. It's not part of our plans.
23:42
You also might piss off some customers. Yes, and we don't want to do that. So obviously, as long as we have some request for ASIS support, we will support ASIS. Internally, there have even been some discussions like, oh, we could rewrite the current ASIS based on libeidolon.
24:03
So just to give you an impression of the kind of discussions that happen, we would prefer not to do that, honestly. But if we have to... Depends on customer pressure.
24:21
There's a question about C. Oh, yeah. So you had the question, which was, could we use libeidolon to transpile to C? Not to anything else. The background is this,
24:41
that Ken Thompson's famous paper about vector in compilers, and it could be that all compilers are vector, and that's why we started the project a while back, which basically starts with us manually toggling switches on the computer
25:01
and writing a small 200 byte program, which is an assembler, and then building a tower of languages until we are at GCC, which now works. But Ada doesn't work because the Ada in GCC is written in Ada,
25:20
so it fails. We'll try to find a way to fix it. I think there is one really big thing that is missing in libeidolon is the implementation of execution semantics. I mean, the knowledge is not there for now, at least. So there's still a huge work to do starting from libeidolon
25:41
in order to create basically an interpreter or a compiler on top of it. It's not really the job to translate to another language for now. So basically you have a small part of the front end, you have the cross references, if you really need the legality checks you can use gnat on top of libeidolon, but then you still have a lot of stuff to do if you want to compile your code.
26:05
There exists one Ada compiler, not an open source, but there exists one that actually emits C and produces a full Ada runtime in standard NCC.
26:21
But I think the charge for using it is $1 per line of source code. That's the quote I heard at some point. You might be able to get them to say it's an interesting project and not pay it back. I think it concerns all of us by now,
26:44
because everyone has to have untrusted computers and have everyone listening to everything else. So it's probably not good. Maybe they're interested. And what about the completeness of the front end?
27:02
Can it parse the Ada compiler source code? Can it parse? Was that the question? So the question was can it parse the Ada compiler source code? And the answer is yes. It can parse any source code that we could find. Maybe they found box in the Ada compiler. Yes, it can. But it doesn't fail now.
27:21
The parser doesn't fail on anything that we could find. The semantic analyzer name resolution still fails on some stuff, but it's getting really small, and that's all we have for the moment anyway.