We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to Build your own MLIR Dialect

00:00

Formal Metadata

Title
How to Build your own MLIR Dialect
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
MLIR allows you to define your own intermediate representation (IR) while benefiting from the infrastructure it provides. However, getting started with creating your own IR, called dialect in the MLIR universe, is sometimes tricky. This tutorial addresses some of the challenges arising with the CMake configuration and explores projects like the standalone MLIR dialect example in more detail. Furthermore, we take a look at how TableGen files that define a single dialect can be split. Even with tutorials on MLIR like the MLIR tutorial presented at the LLVM Developers' Meeting in 2020 [1] and the "Creating a Dialect" [2] as well as the Toy tutorial [3] in the MLIR docs, building an MLIR dialect can feel difficult. This can be especially the case when it comes to the CMake configuration. Good starting points are the standalone MLIR dialect example [4] and especially the tutorial [5] given by S. Neuendorffer at last years LLVM Developers' Meeting. In addition to the existing tutorials, we will look into the details of the CMake configuration of an out-of-tree dialect like [4]. Furthermore, we dive into more complex CMake configurations for projects like MLIR-EmitC [6], showing how to build a project standalone or embedded into another project. In addition to the CMake configuration, it is briefly covered how to architecture TableGen files. In particular, it is shown how to use multiple TableGen files to define a single dialect, e.g. to define the base dialect, operations, attributes and types. [1] M. Amini & R. Riddle "MLIR Tutorial", 2020 LLVM Developers' Meeting, https://youtu.be/Y4SvqTtOIDk [2] https://mlir.llvm.org/docs/Tutorials/CreatingADialect/ [3] https://mlir.llvm.org/docs/Tutorials/Toy/ [4] https://github.com/llvm/llvm-project/tree/main/mlir/examples/standalone [5] S. Neuendorffer, "Architecting out-of-tree LLVM projects using cmake", 2021 LLVM Developers' Meeting, https://youtu.be/7wOU7csj1ME [6] https://github.com/iml130/mlir-emitc
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
BuildingDialectComputer fontType theoryAttribute grammarRepresentation (politics)Level (video gaming)MultiplicationOperations researchCompilerSoftware frameworkRepository (publishing)Core dumpNetwork topologySource codeDirectory serviceMechanism designPoint (geometry)Connectivity (graph theory)MereologyAttribute grammarType theoryMeasurementMultiplication signProjective planeUniform resource locatorFlow separationPositional notationDialectSystem callBitComputer fontSlide ruleFamilyInformationRepository (publishing)Traffic reportingWebsitePoint (geometry)BuildingMechanism designArithmetic meanNetwork topologyDirectory serviceTorusConfiguration spaceEndliche ModelltheorieCore dumpCorrespondence (mathematics)Source codeSoftware developerIntelligent NetworkComputer architectureSoftware testingStreaming mediaTexture mappingFunctional (mathematics)Process (computing)CompilerPerspective (visual)Data conversionNumberTransformation (genetics)Group actionGoogol19 (number)Representation (politics)HypermediaException handlingComputer configurationAdditionIntermediate languageCodeSoftware repositoryRevision controlRootParameter (computer programming)YouTubeLink (knot theory)Term (mathematics)Computer animation
System callInformationElectronic mailing listConfiguration spaceInclusion mapModul <Datentyp>Mechanism designBuildingVariable (mathematics)Software testingWebsiteSource codeBinary fileLinear regressionParsingDialectType theoryComputer configurationNetwork topologySpherical capComputer fileOperations researchString (computer science)Parameter (computer programming)File formatInterface (computing)EmailAttribute grammarSocial classOvalAbstract data typeSpacetimeImplementationFunction (mathematics)Data storage deviceInheritance (object-oriented programming)BriefträgerproblemCodeTable (information)ParsingDefault (computer science)Heat transferMathematicsVariable (mathematics)Directory serviceContext-free grammarFlow separationUniform resource locatorDialectType theoryOperator (mathematics)Projective planeEndliche ModelltheorieLine (geometry)ParsingTable (information)MereologyBlock (periodic table)Right angleInheritance (object-oriented programming)Source codeBitBinary codeCondition numberSoftware testingPhysical systemSystem callSet (mathematics)Library (computing)CodeComputer fileStandard deviationParameter (computer programming)Attribute grammarSocial classEmailImplementationAdditionConfiguration spaceParsingInformationInclusion mapString (computer science)Mechanism designFile formatComputer configurationRepository (publishing)MultiplicationCASE <Informatik>BriefträgerproblemElectronic mailing listDefault (computer science)Process (computing)Message passingPiLipschitz-StetigkeitWebsiteFamilyRevision controlDisk read-and-write headTerm (mathematics)Uniform boundedness principleClosed setTorusClient (computing)Multiplication signComputer animation
DialectClefDefault (computer science)ParsingTable (information)BuildingNetwork topologyInclusion mapSoftware developerIntermediate languageCompilerMultiplication signDebuggerJust-in-Time-CompilerCodeCodeDeclarative programmingMachine learningCasting (performing arts)Run time (program lifecycle phase)Computer configurationAbstractionFormal languageOperator (mathematics)Slide ruleDialectTable (information)Independence (probability theory)Attribute grammarInclusion mapDomain nameProjective planeDifferent (Kate Ryan album)Type theoryCompilation albumFitness functionLattice (order)QuicksortParsingLibrary (computing)Software developerTemplate (C++)Right angleSemiconductor memoryWordLevel (video gaming)SequelFamilyDataflowElement (mathematics)HypermediaRepresentation (politics)BitElectronic mailing listAdditionImplementationTwitterSuite (music)Computer animation
Program flowchart
Transcript: English(auto-generated)
Yeah. I hope you had a great Fostum so far. I'm happy to talk about how to build your own MLIR dialect. So, just as a first question, who is aware of what MLIR actually is?
Who have heard of the MLIR sub-project? Awesome. So, it's not the whole audience, so I'm going to talk a little bit more about what MLIR is. So, my outline is, yeah, what is MLIR actually, but I only have a really short slide on that. I will show you the standalone example which exists in
the MLIR or in the LLVM repository as part of the MLIR project. And I will tell you a little bit more about how you can extend it, how you can build your own dialect, because following the discussions on discourse and discord, it always seems like people hitting the same pain points,
at least we did several times. So, that's why I set up this kind of tutorial to show you some of the tricks behind, mainly from the CMake perspective, which is some kind of tricky sometimes. So, beside that, how to build it, I show you how you can combine it with other dialects. And last but not least,
how to extend your dialect with types or attributes. So, and just as a side note, all code snippets are of course licensed under Apache 2 with LLVM exceptions. So, what is MLIR? MLIR is actually a reusable compiler infrastructure
that was introduced by Google in 2019, early 2019, and at end of 2019, Google donated it to the LLVM foundation. So, it's officially part of the LLVM project, and there it lives in the main, in the mono repo and MLIR.
And what it allows you is to define operations, attributes, and types, which are grouped in so-called dialects, and that lets you define your own intermediate representation. Later on in the session, we will also have an update about the Flang compiler, which also uses MLIR to define its own intermediate representation.
And these dialects that can be part either of MLIR core, meaning they are in upstream like the Punk dialect, which gives you the ability to define what a function is, or there's also an LLVM IR dialect, which actually mirrors what LLVM IR is,
but it is modeled in MLIR, sorry, what LLVM IR is, but it is modeled in MLIR. There are tons of other dialects like a GPU dialect, a TOSR dialect, which is the Tensor Operate Set Architecture, or a MITC, which I am one of the developers behind.
And there are also many, many out of tree dialects, like the SOG project is using it, or Torch MLIR, which is actually modeling PyTorch in MLIR. Many, many more, and these are considered as out of tree. So, when we look at the standalone example, which is really a brilliant starter
when you want to create your own dialect, you find it as part of the LLVM monorepository, and you can just build it against an installed LLVM. You can just run CMake, configure it accordingly. You just need a path where you find your installed MLIR
and where the LLVM external lit is present. And actually, then you can just put your target, which is here, check stand alone, it builds all the tools and further run some tests. This actually assumes, as I've mentioned, that LLVM and MLIR are built here and built here,
and then they are installed to prefix. And that corresponds to out of tree somehow. And for me, when I began with LLVM or MLIR, I was not a compiler developer, but I had some experience in CMake
and how these terms out of tree are used in LLVM and MLIR and the outer world are sometimes confusing. So, I want to give at least a short definition. So, in the LLVM world, in tree also often or nearly every time refers to a monolithic build. That means you build LLVM or your LLVM subproject
plus your dialects or whatever. In tree can refer to the source location. So, here we have an out of tree dialect, which is however part of the LLVM monorepo, but it's considered out of tree because you can't pull it out and you don't need to have it in the monorepo.
So, out of tree normally refers to work with a separate repository. However, there is also a mechanism which you can use to build your project which is the LLVM external projects mechanism. And projects using this, and if you look into their CMake configuration or into other tutorials, either they call it out of tree monolithic build,
so it's not a component build like you have against an installed MLIR or LLVM, or they even call it in tree, which is somehow confusing because when you look to CMake, normally in tree just means you're building where your source code lives, which is actually a bad practice. You shouldn't do this.
Normally you do out of tree builds. It just means you create a separate directory where you set up your configuration and where you do your build. This can also be a subdirectory in the source tree, but it's a separate directory not checked into your git later on. So, within this talk,
I just call it the external project mechanism. For me, it's always an out of tree build, regardless of what I do. Even if I build LLVM, I wouldn't call it personally in tree because I'm using the CMake notation normally, just to make it clear when you look into some of the projects and don't get confused.
So, what we can do is we can extend the standalone project by this LLVM external projects mechanism, and the question is why should we do this? So, Steven Neuendorfer gave a great tutorial about how to architecture LLVM projects at the LLVM Dev Meeting 2021, which is available on YouTube.
I also have the link in my references. Well, here we are referring to a monolithic build, and historically, he says use the component builds. That is what the standalone project already gives you, but there are some benefits when you maybe want to use the LLVM external projects.
So, what we actually do is when we developed the MHC dialect, we developed this as an out of tree dialect, completely independent or buildable against an installed MLIR version. MHC is now part of the MLIR core, so it's upstreamed,
and what we normally do is, or what's quite nice is, sometimes we want to look into when we change our dialect upstream or when we extend it, how does it behave together with our out of tree source, which we still have, all our conversions,
all our transformations are not upstreamed yet. And it is quite nice to build it as one project because you can easily debug into, you don't have to keep your installation and what you're building out of source. You don't have to keep this in sync, you just have a monolithic build. So, there are some benefits, and we just want to look into
what do we need to do to build it with the LLVM external project mechanism. So, we are creating our build directory again, then we have to define the LLVM targets to build. So, here you need to specify for which architecture you want to build LLVM. So, here it's just host or x86, which is also an option.
You must specify the build type, either release, debug, min-sus with rel info, whatever. And we need to enable our project MLIR, otherwise it's not built. And in addition to that, as we want to build our standalone project,
we use, or we specify LLVM external projects, standalone dialect, which is our project name. And furthermore, we specify LLVM external standalone dialect source tree, to specify where do we find our source. That are the two additional parameters you need to pass.
And here, LLVM source here, actually, we assume that it points to the root of our monorepo checked out. So, that is what we want to have later on. Right now, the standalone example can't do this. What do we need to change to make this possible? So, right now, it's looking like the following,
looking to the main CMake configuration. And what is important here is we have find package. We call find package MLIR. And find package in general imports information which were exported by a project. So, here find package imports information from the installed MLIR version.
And furthermore, the find package MLIR also calls find package LLVM for us. So, we don't need to care about this. So, we can then just, well, the MLIR config CMake is actually parsed as well as the LLVM config CMake parse. And we can gladly just do our includes
which adds some further code for us. So, for the external project mechanism build, we don't need to do this. So, what we need to change is we only need to call find package if there is an installed MLIR. Otherwise, there won't be one
because we're just building it as part of our build process. So, in that case, CMake source here normally is equal to CMake current source here. If it's not the case, we have a different build type and we're just adding this if-else block and we don't have the need no longer for our other project to load the CMake models with include.
And the code we're adding is we're just setting the variables which are not available as export or project settings by yourself. So, MLIR main source here, main include here
and that's actually it. So, that are the few lines we need to make it buildable. However, your lit tests will fail. So, there is a little bit more code that we need to modify. Here, we just define a standalone source here and standalone binary idea variables
which are then later on used also for include directories and we adjust our lit site cfg.py accordingly. So, here, we actually need to change CMake binary D or CMake source here by our newly set variable.
Otherwise, yeah, the lit tests are the location of lit cfg is assumed in the wrong place. So, we just fix that here. That's nearly it.
So, when you now want to use your dialect with other dialects and you have these in several repositories or with several projects at least, you can either use LLVM external projects to build multiple dialects. TorchMLIR, for example, is doing exactly this. Another option is to use CMake's external project at
which is considered as the cleanest way as it really keeps the projects enclosed and doesn't transfer variables between the projects. However, what I normally do is I use at subdirectory
but in addition with the exclude from all. So, no, only the build targets I really use are exported or transferred to the other project and we do this in our MLIR MHC repository and to do this, we actually have an option just embedded which changes our source code a little bit.
So, only when we want to call it embedded, then we check is it the case or not because the find package is already done by the other project. We don't need to call this. We only do the includes which we don't need for the external project mechanism.
So, getting to types. This is how the standalone dialect is currently structured or at least most of it. There are also some tools, standalone opt, standalone translate which are considered here and you see we have multiple finds and types could be specified in standalone ops.td
in our table gen definition file. However, it's quite nice to not put it all into one file but to use separate files for it. So, what we are doing is we're adding new files, we're adding a table gen file standalone types, we're adding a header file and we're adding the CPP for our implementation
and what you need to put into those are actually the following. Let's start with the table gen file. First of all, we include the attribute type base and the dialect itself because the dialect has some definitions and then we can define our standalone types class
which is the base class for types and in addition to that, we can define a custom type, it's a simple copy of a mid-c's or park type, quite straight forward but here we use a standard assembly format so no custom parser and printer
and it just holds a string ref parameter. So, nothing special just to illustrate the example. So, that is how the table gen file could look like. Getting to standalone ops, we can just replace the include of standalone dialect by standalone types.
And this is because the types already includes the table gen TD file so that's fine and that's it. Regarding the CMake list, we don't need anything to change. Why is that? Actually, at MLIR dialect already calls MLIR table gen
for you with gen type def decals and type def definitions so that's fine. No need to, we don't need to change anything here. However, for attributes that would be different because for attributes, the at MLIR dialect doesn't call MLIR table gen for you
to just set the LRM target definitions by yourself, call MLIR table gen by yourself, add a public table gen target and that's it. So, attributes are quite close related to, are quite similar except for that. You need to adjust your CMake configuration by yourself.
For the header file, just include the auto-generated type def classes in the header, that's it. Add the define, the include, nothing more to do. For our implementation, we need to make sure that the types are, or can actually be registered by the parent dialect.
So, what we do is we have a define here, get type def classes. We include then our generated, the generated code generated by table gen and then we implement or we write a function, register types, which actually calls the method add types
plus some of the auto-generated code. And this needs to be called in our standalone dialect .cpp. So, we just add the register types here and that's nearly the role trick. You can do the same, not with add operands or add types, but with add attributes for attributes
to register your attributes. For the CMake list itself, just add this to your MLIR's standalone library, MLIR dialect library target. That's it, nothing more to do. For attributes, you can also just add your source code,
or you must add your source file here, of course, but in addition, you also need a dependency on MLIR standalone attributes. Ink gen, the target we generate or we create it by hand because it's not auto-generated, just to make sure that table gen generates the code
before CMake tries to, or before the MLIR standalone target is built. You might be lucky, otherwise you might have a race condition in your build system. I experienced that several times, tried to fix it, or just keep it in mind. And that's mainly it.
For the standalone dialect, here we use the default printer and parser. Just let us tell table gen to generate those. And for register types, actually we need, of course, a declaration. We have the implementation,
but we also need the extra cast declaration generated by table gen. Otherwise, yeah, we cannot use it in our standalone ops.cpp. So all the examples are available in my fork of the LLVM project. I couldn't make it to send these via Fabricator
to be reviewed through upstream inclusion prior to my talk, but I will do so. I will add some more documentation to this. That's at least my goal. So when I planned this talk, I thought maybe some hints which could have
one or the other, and hopefully it's even more helpful if you not only find it in the slide spot but also in the upstream example. And there are many good resources out there. So the talk given by Midi Amini and River Riddle, the MLIR primer, the MLIR tutorial at the 2020 LLVM Dev meeting.
We have some great docs at MLIRLLVM.org. Here, how to create a dialect, the toy example, for example, how to combine it, how to add attribute and types. If you want to get more into the details what you can do all in the table gen world. And last but not least,
the tutorial given by Steven Neuendorfer at the LLVM 2021 Dev meeting. Yeah, so that's it from my side. So if you have questions, please let me know. And I try to answer them.
So to trying to summarize your question,
the question is when, as a compiler starter, you're mostly focusing on parsing an abstract C type like language and wanna know if you can just go through the ordinary LLVM IR way or if you need to plug and switch over to MLIR to do what you want, so in real short.
So you can do this definitely. You can go the way you're right now doing. So MLIR actually is a little bit different. So if we are looking to Clang, if you're talking about an abstract C language,
looking into Clang, there is Clang AST and then we directly more or less go to LLVM IR. And that is one of the things which, yeah, isn't that nice? Or if you look into other compilers, they introduce more intermediate representations in between. Like we will see later on in the session
the Flang update for example. Or even Swift has two intermediate representations for example. So MLIR just gives you the ability to define additional intermediate representations. So you can also write a front end for your language,
parse it into MLIR, convert it to the LLVM IR dialect and then translate it to LLVM IR. So that would be identical. It really depends on what you wanna do, what kind of infrastructure you wanna use. But you can go the way you're already going.
So hopefully that somehow at least answers your question.
Okay, the question is not directly related to the talk but as I'm one of the developers behind MHC, why we developed MHC. Sometimes you cannot compile with Clang or directly or with LLVM at all to your target.
So the idea was to get something independent of the compiler. And when we actually generate C code with MHC, you can have then the freedom to choose
which compiler you wanna use to translate for your final target. So we are in the domain of compilers for machine learning and sometimes we have some very exotic targets where Clang unfortunately is not the option to use it as a compiler. So that's the simple reason.
My question is would MLIR be a good fit for that or would possibly just see with some sort of, I don't know, C++ or with templates and some sort of,
I don't know, dynamic library you can argue with. So the question is when coming from the other side, so for JIT, essentially, if MLIR might be a good fit to define your own types and attributes. And well, I'm not an expert regarding JIT
but MLIR provides you, I think most of the codes are upstream, provides you the possibility to register types and attributes, I'm quite sure at least, at runtime. So you can extend your dialect after you compiled MLIR.
So that is maybe, yeah, depending on what you wanna do, if you really want to modify it during runtime, that should be possible with MLIR. So I'm not 100% sure but at least worth a look, I think.
Well, I'm partly aware of IRDL but I don't know, you mean how you compose it via CMake in the targets
or what? Yep, probably with IRDL, as far as I know, as you do most of the time at runtime, you wouldn't need to build it in advance. So yeah, the CMake stuff would be somehow obsolete, yeah.
I think so. Thank you.