We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

IntelliJ Elixir - Elixir Plugin for JetBrains IDEs

00:00

Formal Metadata

Title
IntelliJ Elixir - Elixir Plugin for JetBrains IDEs
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Using Java, Kotlin, and GrammarKit to reimplement to Erlang, Yecc grammars, and Elixir for static analysis for Elixir source and BEAM bytecode. How decompiling and disassembly tools can quickly answer optimization arguments. IntelliJ Elixir is the Elixir plugin for JetBrains IDEs like IntelliJ and Rubymine. It uses JetBrains OpenAP, JFlexI and GrammarKit to reimplement the Elixir grammar, which is natively implemented as bespoke Erlang lexer and YECC LALR parser. This meant translating a recursive Erlang lexer into a strict regular expression state machine used by JFlex with some interesting needed extension. Porting the grammar from LALR Yecc to the LL Pratt Parser generated by Grammar Kit involved understanding the non-universality of BNF. Reimplementing and extensive testing of the plugin led to finding bugs in native Elixir, showing that alternative implementations of languages in editors and tools can find bugs in the original implementations. The BEAM bytecode decompiler and disassembler has led to better understanding of how the VM optimizes different Elixir code.
Plug-in (computing)InternetworkingBlock (periodic table)Computer animationXML
Integrated development environmentFormal languageWeightWeightAdditionClosed setSoftware developerIntegrated development environmentOpen sourcePlug-in (computing)DebuggerSource codeComputer animation
Computer fileCodeProtein foldingComplete metric spaceDisintegrationDebuggerTemplate (C++)File formatDivisorConfiguration spaceParameter (computer programming)Software development kitData structureCodeStack (abstract data type)CryptographyDifferent (Kate Ryan album)Latent heatData structureRevision controlWeb 2.0View (database)Symbol tableMessage passingMixed realityMathematical analysisErlang distributionElectronic program guideComputer animation
Erlang distributionFormal grammarCompilerParsingTranslation (relic)Normal (geometry)Compact spaceJava appletRevision controlPlug-in (computing)
AutomatonRegular expressionErlang distributionInfinityComputational physicsModal logicFormal languageSocial classInterpolationVirtual machineRecursive languageTuring testMathematicsTrailFormal grammarRegular expressionInfinityRecursionFormal language
InfinityStack (abstract data type)AutomatonRegulärer Ausdruck <Textverarbeitung>Computer fileCode2 (number)TrailForm (programming)Maxima and minimaLevel (video gaming)Complex analysisFormal grammarHierarchyDiagramPushdown automatonJava appletText editorMultiplication signWritingPartial derivativeRecursionInterpolationEscape characterComputer animation
Stack (abstract data type)AutomatonErlang distributionInfinityComputational physicsModal logicFormal languageSocial classInterpolationVirtual machineRecursive languageInterpolationControl flowLink (knot theory)
Formal grammarRegular expressionRecursionParsingDerivation (linguistics)Formal languageContext-free grammarSymbol tableTranslation (relic)ParsingRecursionDifferent (Kate Ryan album)Compilation albumText editorQuicksortNP-hard
Computer configurationInterpreter (computing)CodeBlock (periodic table)System callParameter (computer programming)Associative propertyRule of inferenceMacro (computer science)Context-free grammarComputer animation
CodePrice indexReverse engineeringInformationMenu (computing)Tablet computerInclined planeHill differential equationFunction (mathematics)Term (mathematics)Network socketInheritance (object-oriented programming)File viewerDebuggerComputer fileFile viewerStack (abstract data type)Different (Kate Ryan album)Macro (computer science)Error messageView (database)Formal languagePlug-in (computing)Source codeParameter (computer programming)System callInformationTracing (software)Line (geometry)FunktionalanalysisTable (information)Attribute grammarModule (mathematics)NumberDebuggerFlow separationMultitier architectureLocal ring1 (number)State of matterAlgebraic closureBytecodeCompilerMathematical optimizationRun time (program lifecycle phase)Subject indexingString (computer science)LengthTemplate (C++)MappingCodeErlang distributionAtomic numberSoftware testingDirectory serviceDot productOrder (biology)Arithmetic meanImplementationForm (programming)BitInheritance (object-oriented programming)Selectivity (electronic)Goodness of fitPower (physics)Matching (graph theory)File formatPattern matchingDisassemblerText editorMathematicsComputer animationSource codeXML
Module (mathematics)Meta elementConfiguration spaceDebuggerLevel (video gaming)Mobile appRevision controlIntegrated development environmentComputer animation
Game controllerSystem callFrame problemLine (geometry)CodeView (database)Frame problemTemplate (C++)Module (mathematics)Computer fileComputer animationSource code
Stack (abstract data type)Frame problemVolumenvisualisierungVariable (mathematics)Thermal expansionMeta elementCASE <Informatik>View (database)Variable (mathematics)DebuggerGraphical user interfaceField (computer science)Computer animation
Interpreter (computing)CodeLevel (video gaming)CodeLibrary (computing)Frame problemLimit (category theory)Sound effect
CodeInterpreter (computing)Modul <Datentyp>Library (computing)Function (mathematics)Macro (computer science)Module (mathematics)Pattern languageErlang distributionFunktionalanalysisSoftware bugLine (geometry)Library (computing)Telephone number mappingCodeDebuggerComputer animation
Erlang distributionMacro (computer science)CodeKerr-LösungKernel (computing)Function (mathematics)Form (programming)Erlang distributionForm (programming)Declarative programmingComputer animationSource codeXML
CompilerMathematical analysisFluid staticsParsingNetwork topologyInclusion mapCodeDeclarative programmingMereologyServer (computing)Asynchronous Transfer ModeQuicksortComputer fileCompilerCodeFormal languageComputer animation
CompilerMetaprogrammierungFunktionalanalysisTrailLine (geometry)Cartesian coordinate systemComputer animation
File systemMixed realityComputer fileBuildingParsingMetric systemSource codePoint (geometry)ImplementationOpen setSoftware bugPlug-in (computing)Software development kitServer (computing)Erlang distributionFunktionalanalysisFormal grammarDebuggerFormal languageStatisticsNumberExtension (kinesiology)Context awarenessSystem callData miningSoftware testingCASE <Informatik>Parameter (computer programming)Module (mathematics)Computer animation
Open sourcePoint cloudComputer animation
Transcript: English(auto-generated)
Hi, I'm Luke Himoff. You may know me as ChronicDeath on the internet. I'm full of autoimmune diseases. If anyone else is, there's a really good gluten-free bakery like three blocks away called Chambaland that everyone should try. Really good. I'm a member of the LumenCore team, an architectural engineer at Dockyard, and
most importantly for this talk, the creator of the IntelliJ Elixir plugin. IntelliJ Elixir is a plugin for JetBrain IDEs. IntelliJ, Community, or Ultimate Edition, of course, which supports any language, but also works for those coming from Objective-C background using AppCode, C and C++ using C-Line, SQL
using Datagrip, Go using Golang, PHP using PHPStorm, Python using PyCharm, .NET using Rider, Ruby using RubyMine, or JS using WebStorm. In addition to IntelliJ Community Edition, there is also a PyCharm Community Edition, so both those are completely free and open source. But the debugger and decompiler built into IntelliJ will actually decompile all the closed
source IntelliJ IDEs just fine, which is really helpful for plugin development. IntelliJ Elixir features support for viewing internals of Beam files, building whole projects, code folding, combing and uncombing code, code completion, Credo integration, which is a stack analysis tool for style checking, a graphical debugger, EX templates, which
are used for our web frameworks, fine usage, formatting, Go to declaration, related or symbol inspections with quick fixes, live templates, refactoring, run debug configurations, show parameters, SDKs, with full customization of both the Elixir
pass and the Erlang pass, so you can mix different versions of Erlang by selecting the code pass exactly. So if you ever see the release guide for Erlang where it's like, you can mix this version of the crypto package, but you need all these dependencies, you could try
those out and not have to have this specific version of Erlang, you can mix and match. There's also a structure view and a syntax highlight. One of the trickier features template was syntax highlight and parsing when the project started. It is a translation of Elixir's native lexer, which is written in just normal Erlang code, so it's just recursive Erlang to flex for IntelliJ Elixir, and flex
is a lexer generator that generates really compact Java code, and Elixir's native parser, which is written using yak, which is an Erlang version of yak, which is yet another compiler that was originally written in C, to grammar kit, which is a parser generator that some of the JetBrains people made for
making these sort of plugins. Since the Elixir lexer is written as Erlang functions, it can be a full Turing machine, but flex can't. Flex is able to generate a very efficient grammar by translating regular expressions, but not extended regular expressions that you're thinking of, like just the mathematical definition of regular expressions, to a finite automaton, which is what makes
it so fast. A finite automaton, though, lacks one important feature needed for Elixir, it cannot track infinite recursion of the language. Elixir requires this because you can nest any expression inside of interpolation, including more interpolation, and I have a feeling that an editor's grammar should always be forgiving, because most of the time you're writing bad code,
because you're writing partial code as you type, so I wanted to make sure that it wasn't like, well, I'm just going to find this as a regular expression, and six levels deep, and if you're doing more than that, the editor will just give up. Like, no, I wanted the grammar to work the way people could potentially write bad code, so I want infinite recursion. To do that, flex has an escape hatch, because you can write arbitrary Java code after you match on the normal regular expression.
So I was able to add a stack in Java, and that way keep track of when you're entering a new interpolation by just treating it as entering a new Elixir file over and over again, and keep track of when you exit the... And that actually elevates it to a more complex form called a pushdown automaton. This is actually the minimum level of complexity you need
in the hierarchy of Venn diagrams for this, to actually track nested braces or parentheses. Now that the lexes were equivalent, interpolation worked fine. When I got to the parser, I thought I had a lucky break. Both Yek and GrammarKit uses BNF, and from college I'm like, BNF, great, that's universal.
And so I copied it over, changed the early link to Java, I'm like, this is gonna be easy. Yeah, nah, no, no, BNF is not a universal. It is actually totally tied to the parser generator you're gonna target. The big difference here is that Yek is rightmost derivation, which means even though it reads like a human,
it actually starts matching symbols reading backwards, for left-to-right languages, of course, while GrammarKit is leftmost derivation. The reason for that is that rightmost derivation are really fast parsers, which are great for compilers that don't have to be forgiving, but leftmost derivation can be like, this is what I need to keep going, which is really good for an editor.
But the problem is, one, for rightmost derivation, you want one sort of recursion on one side, and for leftmost derivation, you want the other kind. If you just do it directly, you'll just have recursion that never picks up a new symbol. So they're completely incompatible, and they're kind of translatable as a human, but it is like an NP-hard problem
to have a general solution for translating a grammar, so it's all by hand. Good luck. The problem with translating from rightmost derivation to leftmost derivation is Elixir's do-end syntax and optional parentheses. This code has two possible interpretations. The do-end can be a block argument to either the left call or the right call. Elixir associates the do-end
with the outermost left call. It turns out this happens for free in LALR, but for LL parsers, a special rule needs to be defined in the BNF that disallows do-end blocks inside them. This unfortunately meant duplicating large chunks of the grammar, because the do-end block can appear in almost any operation, because Elixir is mostly defined with macros.
There is no module keyword. It is def module, and that is a macro, and it is all built up, so everything potentially can support do-end, which is only used for macros. When experimenting with optimizations, I like to use the beam chunk viewer. A beam file is composed of chunks.
There's an ATU8 that stores the UTF-8 atoms. It has replaced the older atom chunk, and so you can view the actual characters in the atom and see what is being treated as a constant. The string eight chunk shows something unexpected. It's not a table of strings, but a pool of one continuous string, and the reason why this works is the bytecode itself in the beam
has an offset into this table, and then the length of the string to copy out, but what you can do is that under certain conditions, the Erlang and Elixir compiler will be like, oh, this string is the suffix of another string, so I can just smush them together, and your pool table will actually be more compact than you would expect if you have common suffixes or prefixes, and you can actually look at this to be like, did that optimization happen?
While the string eight chunk only holds strings, the lit t table or literal table chunk holds all literals, so this way you can double check that your literals really are literals and they're not being computed at runtime, which this is slightly faster, which shows up as a true literal. The import chunk is the import table. It maps an index to a module and function
that was imported. Before I wrote the chunk viewer, I didn't actually realize remote calls are just declared in a separate chunk like this. I thought it was just a call to the runtime with the module functionality that I want you to call, so it's kind of interesting to see that modules know this just statically. Those imports are imported by the export table chunk.
The labels map to labels in the code chunk. Anonymous functions defined by the module are in the fun tier function table chunk, but somewhat confusingly, there is also the local function table chunk, or loc t chunk. However, this has more entries because if any named normal defined function is captured,
it also shows up here, so you can see all the weird names for the dashes are actual anonymous functions that have derived names, but all the ones from 12 onwards with build state were capturing that to use as an anonymous function closure that we can invoke with apply. The line number information for stack traces
stored in the line chunk is composed of two tables that are used to normalize the file name. You may have noticed that the file table can hold multiple names. This may seem unexpected, but Erlang actually supports an attribute to say that this chunk of code actually comes from a different file, and that's used all over the place in Elixir because of the macro, so that when a macro pulls in code from another place, we can actually blame it on the file it's coming from
for better stack traces than just assuming the line where the macro occurs being the place where the error happened. It also is used in EX for Phoenix views so that you end up blaming the EX file and not the view module that it goes in. The information from these chunks are used in the code chunk. By default, IntelliJ Elixir will inline
as many references and other chunks to view the code. So this language does not exist. I made it up, but it is as close to what Beam disassemble a tool in Erlang spits out, but more flexible and more Elixirish looking syntax. You can also turn off all the inlining to kind of see what the code looks like, but the code in the code chunk
is not Erlang's external term format. It is a more compact form that is especially adapted for code. So it only occurs in code, and nothing can really read it except Beam disassemble in this editor. I think people did a port to VS Code after I put this out, so there's a VS Code plugin that does it too now.
By looking at the code, we can find if changes in our source impact Beam files. So one of the big things to always look for is if your pattern matching is super efficient, you'll get a select eval instruction. So any arguments you have with your friends about like is this code more efficient, you can just look and see if it generates good byte code. You can also do cool things
like if you argue like do you match your, if you're trying to do a power match with like nested matches, does the order matter? You can look at the code and see is the code longer to change that order. Sometimes it is because matching first means it's the first variable which means it gets the next call very easily, and if you flip it the other way, it's not. The debug I or debug info chunk is optional,
but if it is there, the chunk viewer can show the internal implementation. If you click on the whole module, all the debug info's code form will be converted back to code. This looks a little bit messy with extra parentheses because that's just how macro to string works, and I'm not actually using macro to string, but I re-implemented in Kotlin to do this.
If you click on a function, all the clauses of function are converted back to code, and you can do the same thing with individual clauses. This also works for Erlang, but since this is the Elixir plugin, not the Erlang plugin, I translate the Erlang to the equivalent Elixir for you. So if you're like I don't really want to learn Erlang,
you don't have to. If you see like, so this is what I show you in the debug info, but this is what the actual code looks like. So it's close, but see, I'm automatically changing the variable names to be Elixir-ish. You know, we can't use capitals because then they turn into aliases,
so like they're slightly incompatible syntax, and this way you can understand if you're not up to speed with reading Erlang code yet. One of the original reasons I created the IntelliJ Elixir plugin was because I wanted a graphical debugger for Elixir that rivaled the one in RubyMine for Ruby. That debugger in RubyMine allowed me to level up my understanding of Rails quickly, and I felt that the same would be true
if I had a graphical debugger for Elixir. Let's say I want to understand how EX templates work in Phoenix. I can go into an EX template, place a breakpoint in the gutter marked here by the red dot. I can then right-click test directory to run the test in the debugger, and it crashes. Oh no. What happened is that the debugger can't debug NIF modules
because the way the staffing debugger in Erlang works is it is an interpreting debugger, and you can't interpret NIF since they're just C code. But we could ignore it, so we can tell it either at the individual run configuration level or in the entire IDE if the run configuration isn't set yet to be like, ignore this file, and it won't do it anymore.
This is also a bunch of stuff you see there as pre-ignored because those are usually like middleware stuff that if it was interpreted, your app would run so slow that it would be painful to debug. It ships with a bunch of defaults, which makes my version of the debugger slightly faster than VS Code's because it doesn't ship with that. Then when we rerun the debug configuration, we stop at the breakpoint line as indicated by the blue highlight.
Once stopped, we can look at the stack frames to figure out how Phoenix actually calls our templates. So templates are actually compiled into the view modules in Phoenix. So you see here it says user view. It's not in the template file. We can also click on the lower stack frames and see their variables, which in this case lets us understand how Phoenix layouts render their wrap views.
In the selected frame, we can see the variables and their values, and one of the benefits of a GUI debugger is that you can choose how you see information, so I'm able to expand the signs and find the values of the username field. I always found this super useful, going all the way back to when I used Eclipse. I loved Eclipse for C++ because of this GUI support for deeply nested structs, and I loved it so much
that I was able to do it here for Mighty Bugger. If you're in the top stack frame, you can also run arbitrary code. That's the limitation of the underlying DBG library in Erlang is that it stops you from running arbitrary code in lower frames because it might have a side effect and might mess with the stack so you can only do the top frame, so I unfortunately have to do it here. This looks really simple. You're just doing dots,
but that Elixir code is actually a bunch of map lookups, but it's not obvious. Here's more complex code. I'm using a with macro, with pattern matching, and do end, and it all still is evaluated because it's translated into Erlang code the same way that the IX Pry debugger does it. I just copy their code. The API is very unstable, though, and Jose Vellin, the creator of Elixir,
allows me to use it, but he says I'm allowed to use it because he knows I won't complain when he keeps breaking my stuff. I actually did fix a bug with 192 plus last night because of that. You can also step, so we're able to step here. The reason why it's at the top twice is that a lot of Elixir macros
blame the first line of a file, so we step and we go up twice because that's it printing that HTML. Even though we see it on line eight and 11, it actually assigns that HTML as being happening on line one of the module. To understand why a function exists, is it safe to change the lead? It can be helpful to do find usages. We can find modules.
We can find individual functions. This also works with library functions, so I can right-click find usages on enum into, and it'll find all the uses of that function. If you're just trying to understand something in the standard library, how do people actually use this function, or why is this function used here? You can go and find everywhere else someone uses it to understand code better.
There's also, the reason why this works is there's a decompiler that doesn't depend on debug I. It goes into just the exports and loc, and it can do the Elixir. It can also do macros. It can do special forms. It can also do Erlang in the same way,
and that's how all the go-to and declarations work. So that references, and go-to declaration doesn't use the compiler LSP. Part of this is, like I said when I had the question earlier today, is that the Elixir language server depends on the Elixir compiler, and there's no forgiving mode.
So if your code doesn't compile, a lot of the Elixir language server stuff just breaks. So instead, this is all built on using GrammarKit, which, again, is a JetBrains tool for writing these sort of parsers, and that, because it's an LL parser, is forgiving, so I'm able to, if you have broken code, I can still find usages in your file even though it's broken.
And it works if you don't compile, so the Elixir language server requires that mixcompile completed for it to work. And mine doesn't. But what this also means is that the compiler won't keep it around, it won't track all the imports. It'll track that import happened, but it won't allow you to really blame it
where it came from. It'll be like, this was imported from this module, but it won't tell you, this is the line the import happened on. So by me statically now analyzing the code, I'm able to show you the actual function called the top, but the second entry, I can be like, here's the line where you imported that function. So if you just have a bare function, you're like, where is this coming from?
You can click on that second entry and be like, oh, here's the import line that's bringing it in. That's where this is coming from. So I find that very useful. This also can walk through quoted blocks, so it can go pretty deep. It's okay with Phoenix, but Phoenix does this weird thing where if you do the use,
the application stuff with colon controller, it immediately doesn't apply, and that sometimes works. It's touchy because it's very weird metaprogramming from anywhere else. The source port is my GitHub at IntelliJ Elixir. It's in the marketplace in JetBrains,
so you can install it whenever. I have donations for everything I can find, Open Collective, PayPal, GitHub's own sponsor thingy. Any questions?
Am I aware of any implementations of the parser besides mine? Not that I'm aware of. Mine is a complete re-implantation, so I actually found bugs in the original native implementation because I have way more test cases because I didn't trust that I understood how to use grammar kit. It took me a year to get it right.
I kept finding bugs in the actual Elixir Erlang and YEC that they had to fix. I haven't seen anyone else be nuts enough to do it. Everyone else I know is just using the Elixir language server now, and that's just using whatever mix compile can do.
What kind of refactoring should I support? Because the JetBrains API for refactoring is related to references and find usage, once I implement find usage, I got rename support, so function renames work across files, not just in the current file. You can rename parameters and variables,
and that works too. I don't think module rename works, and that's mostly because I would have to also support asking you if you want to rename the file and never got around to it. I mean, Elixir doesn't require the files to match, but it annoys me when it doesn't because I'm a build engineer, and I'm like, the file system should match the files,
even though the language doesn't require it. Any other questions? Yeah. No, there's a, did I have to only do
the parsing grammar kit? No, the find usage is what's called an extension point in JetBrains. Open API is the thing they call it, plugins target, and I'd implement that, and there were some weird gotchas about whether a function should, if you go to definition on a, when you're already on the function call, should it go to itself?
And Elixir made it even harder because the def keyword is a macro, so if you go to definition on the def keyword, you technically should go to the def macro, but if you go on the name of the function, it should just go back to the function, and I have to go back to the function now, but I had to do some weird things to make find usage not think you're already at the definition, so it works. I'm not 100% sure I used the API correctly, though. It was mostly seen in the debugger
and figured out why it didn't work and making it miss those branches, but I've never had someone from JetBrains tell me I'm doing it wrong, so I don't know.
I don't see why I couldn't show this
as a language server to other stuff. I don't know how I would host it. I never really thought about it. I know people have made their plug-in. Right, I know people have packaged their plug-ins as jars to kind of do this for other reasons before, but I've never tried it, and for a number of users, it varies a lot.
If I don't do releases for a long time, I get between like five and 15,000 installs, but I don't know how many actual users, because I don't have any metrics. I just have the download statistics from the thing, so I don't know if that's that many actual users. All I have is unique downloads to go by,
so maybe 5,000-ish. I'm not sure, though.