Designing a programming language for the desert
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56903 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2022110 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
BefehlsprozessorCodeCompilerGraphics processing unitCompilerMereologyWorld Wide Web ConsortiumBuildingAverageComputer programmingProblemorientierte ProgrammierspracheIntegrated development environmentDebuggerJava appletAlgebraic closureFatou-MengeCore dumpErlang distributionScripting languageProgramming languageSoftware testingComputer programmingPlanningCodeCombinational logicMultiplication signProblemorientierte ProgrammierspracheCompilerArray data structureSoftware maintenanceGraphics processing unitIntegrated development environmentConstraint (mathematics)NumberMereologyFunctional programmingBefehlsprozessorMulti-core processorTable (information)Focus (optics)CompilerOptical disc driveMachine visionData managementVirtual machineArithmetic meanScaling (geometry)Model theoryDecision theoryAverageNegative numberTotal S.A.Confidence intervalNeuroinformatikIdeal (ethics)Category of beingRevision controlComputer animation
03:44
Programming languageIntegrated development environmentPoint (geometry)Problemorientierte ProgrammierspracheSource codeSupercomputerInsertion lossClosed set
04:16
Problemorientierte ProgrammierspracheParallel portCodeTotal S.A.Problemorientierte ProgrammierspracheProgramming languageProgrammer (hardware)Integrated development environmentConstraint (mathematics)CodeMereologySoftware developerFlow separationComputer programmingMultiplication signArithmetic meanArithmetic progressionFood energyExpected valueComputer animation
05:12
Computer fileSoftware maintenanceDegree (graph theory)System programmingCodeNormal (geometry)Canonical ensembleInclusion mapCompilerInheritance (object-oriented programming)RootAsynchronous Transfer ModePhysical systemRevision controlWindows RegistryServer (computing)Survival analysisData managementComplex numberPhysical systemComputer fileInclusion mapRevision controlData managementType theoryProgramming languageServer (computing)Mechanism designContent (media)Configuration spaceCompilerCompilerRoutingDirectory servicePhysicalismFunctional programmingBuildingText editorException handlingFile systemComputer programmingProcess (computing)Projective planeAbstract syntax treeInformationHeegaard splittingCodeIndividualsoftwareVotingAxiom of choiceUniform resource locatorIntegrated development environmentProgrammer (hardware)Library (computing)MultiplicationStatement (computer science)1 (number)Semantics (computer science)BitNumberRule of inferenceCentralizer and normalizerCore dumpBound stateImage resolutionMultiplication signDot productMereologyHierarchyFood energyMaxima and minimaOrder (biology)Degrees of freedom (physics and chemistry)CloningSoftware maintenanceCanonical ensembleRight anglePurchasingData storage deviceObservational studyIntelligent NetworkSign (mathematics)Group actionService (economics)QuicksortNetwork topologyTerm (mathematics)CAN busSoftwareThermal conductivityPosition operatorIndependence (probability theory)Goodness of fitArithmetic meanDirection (geometry)Router (computing)ResultantSingle-precision floating-point formatWater vaporSystem callImpulse responseScripting languageCovering spaceJava appletComputer animation
12:15
CompilerRevision controlWindows RegistrySurvival analysisServer (computing)Data managementRepository (publishing)Fundamental theorem of algebraIndependence (probability theory)Library (computing)Network topologyQuicksortMaxima and minimaConstraint (mathematics)Physical systemImplementationImage resolutionKey (cryptography)Total S.A.BlogComputer programmingModel theorySubsetPlane (geometry)InferenceComputer configurationCodeCausalityCode generationMathematical optimizationKernel (computing)Game theoryRevision controlLibrary (computing)Data managementComputer fileRepository (publishing)Order (biology)Axiom of choiceMaxima and minimaImplementationDivision (mathematics)CASE <Informatik>Total S.A.CodeProgramming languageDirectory serviceSource codeAlgorithmModel theoryCompilerComputer configurationBound statePhysical systemConstraint (mathematics)Semantics (computer science)NumberScaling (geometry)Kernel (computing)Programming paradigmModulare ProgrammierungNP-completeTypinferenzServer (computing)FreezingError messageComputer programmingStudent's t-testDifferent (Kate Ryan album)Latent heatCombinatoricsSoftware testingMathematical optimizationData Encryption StandardStandard deviationCombinational logicLine (geometry)Statement (computer science)Software maintenanceSoftware bugSystem callPressureGame theoryMereologyLevel (video gaming)Power (physics)Rule of inferenceRight angleSign (mathematics)Process (computing)Component-based software engineeringCorrespondence (mathematics)Equaliser (mathematics)Inheritance (object-oriented programming)Black boxGoogolDeciphermentColor confinementPlanningMetropolitan area networkMusical ensembleFamilyPersonal digital assistantLipschitz-StetigkeitBit rateComplete metric spaceComputer animation
19:18
Computer animationMeeting/Interview
19:36
Bit rateCASE <Informatik>Programming languageComputer programmingCodeDemosceneOpen setColor confinementContext awarenessDesign by contractLogische ProgrammierspracheCrash (computing)Shared memoryGoodness of fitNeuroinformatikComputer fileLevel (video gaming)Complex (psychology)Canonical ensembleVideoconferencingData managementKey (cryptography)Fitness functionFrame problemVolume (thermodynamics)Configuration spaceAxiom of choiceDirectory serviceWeb pageData structureParallel portPhysical systemVolumenvisualisierungReal-time operating systemRight angleParalleler AlgorithmusVertex (graph theory)Shader <Informatik>HookingMessage passingHierarchyUniqueness quantificationSimilarity (geometry)Source codeComputerMeeting/Interview
23:40
Computer animation
Transcript: English(auto-generated)
00:06
Hello, my name is Thorsten Eichten, and I'm here to talk about designing a programming language for the test. I am a researcher at the University of Copenhagen. We're together with my colleagues. I work on a programming language called Fudag, and it's a compiler. So Fudag is a language in the ML family, like OCaml or Haskell.
00:25
And you program it much like you program the language using bulk combinators on arrays. And then we have a compiler that can turn the resulting programs into very highly optimized code running on GPUs or multi-core CPUs. And that compiler is what our research reader, that's what we publish papers about.
00:43
But it's not really what we're going to talk about today. Because the full language is not meant for full applications, it's only meant for the small performance critical parts. It's a pure language, so you cannot encircle the outside world or a user in a way you normally write functions that are then called by a program written in some other language. And that creates some interesting constraints on the language design.
01:03
And I'm going to talk about some of those principles and some of the constraints today. Although my focus will not really be the language itself, but more the tooling around it. So building a programming language of any kind takes a lot of interest. Because the average number of users for all overall programming languages is about zero.
01:23
And language designers, they know this. They know that when you create a new language, odds are that language is not going to succeed, odds are no one is going to use it. But clearly there are visions going beyond that. No one wants to create a language that no one uses. So most languages are designed with the hope and the plan of succeeding eventually.
01:44
Because they tend to have very large domains or being completely general purpose to use them for anything. And they are built with the idea that they must scale to large teams and large programs. Which means you might have or need complicated build tools and debug and package managers.
02:01
And maybe one of those sufficiently smart compilers so you add complicated language features that you have no idea how to compile efficiently. But maybe someday in the future when the language is a huge success, enough people will be working on the compiler to make it run fast. And also you assume that most of the users of the language will have it as their main language. So they will have the time and the motivation to learn all of the subtle details about how the language works and how to use it well.
02:27
So in a sense these are languages that are meant and designed for a resource rich environment. And I don't mean machine resources, maybe this can still be a language that is meant to run on a very small computer. By resource rich environment I mean that the plan is for the language to eventually have a large number of users who can put in a lot of time.
02:45
And a large number of maintainers who can also put in a lot of time. So you can create advanced tools and expect your users to spend time to learn about the language. And companies think like this when they push a new language, they assume it will be a big success. But hobbyists also do so. And there's nothing wrong with that.
03:01
So this table shows a handful of languages that are kind of general purpose. And that eventually these languages would all want to take over the world or at least a domain that we use for everything. And they're complicated and while they might not be there yet, they are designed with the idea that eventually they can have complicated tools. They can have advanced compilers and many users and rich ecosystems.
03:25
And some of them have now become popular enough that such sufficient tooling may actually exist. Some because there was a big company that just put in enough resources to make it happen no matter what, like with Swift. Others like Rust just became popular organically and are starting to grow pretty advanced tools because they are so popular.
03:42
There are lots of people who want to write tools for them. So these languages really want to grow up to be this. They want to be tigers, not because they want to eat other languages. Well, maybe they do, but that's not my point. But they are supposed to thrive in a resource rich environment like the jungle. Where you, sure you can have a big muscular body, you can afford to spend lots of resources because there are lots of resources available.
04:04
You can always get some more. And sure, many of the languages will not grow up to be tigers. They will die. They will have no users at all. But that's not the goal. So what about Futhark? What about our little data parallel language meant for high performance computing?
04:21
Well, that's a small domain. And because Futhark is not general purpose, you can't write full applications in it. Most programmers who use Futhark mostly use some other language and then just want to speed up some part of the program by rewriting that in Futhark and integrating it into a larger program. That also means that Futhark will always be a guest in a larger code base that is mostly not written in Futhark.
04:44
And this is not a resource rich environment because even if Futhark wins, and sure, I have just as much hubris as several other language assignments. So of course I hope and expect that Futhark will win and take over the world or take over its domain. That's been a really tiny domain and that we will still, even when we win, we still won't have that many users.
05:03
Our users won't have that much patience and time to spend on the language. And we won't have that many development resources. And that creates some interesting constraints for how we've designed the language. So this is Futhark. Futhark doesn't live in a resource rich environment. It is in a desert. It's not a tiger. It's a hedgehog. It must conserve its resources and spend energy only when absolutely necessary.
05:24
It must really make sure to maximize the rewards for its investments. And the approach we've taken is a kind of conceptual minimalism. So when we design the language and its tooling and its ecosystem, we try to minimize the
05:42
number of things that require ongoing maintenance, such as servers that host packages or documentation or whatnot. We try to minimize the amount of implicit behavior in the language itself and its tools. Because our users will probably not memorize or study the tools or language enough to recognize implicit behavior.
06:01
So everything should be as explicit as possible. We try to minimize the degrees of freedom to limit the amount of choices that our users will have to take. And we try to minimize novelty, except we're absolutely necessary. So of course, we don't want to just create a language that is a clone of JavaScript. So there will be some novelty. While we try to make sure that we only do something unusual, why it's really crucial to our value proposition, which is high performance execution.
06:27
And really, the core is we have to do just a few things so we can do them well. And we end up saying no to things that are good ideas in most languages. So what I'm going to, what I'm arguing here is not that when you're designing a language for the desert, you should just do a really good job.
06:41
It also means you need to make some harsh trade or say no to things that in a more resource-rich environment, those are good things. But you just can't afford them when you're a desert language. So let's look at some concrete examples of what that actually looks like. So first, let's talk about build systems and multi-file programs.
07:03
So FULA is meant for small programs, but we still want to be able to support splitting a program into multiple files. If nothing else, then to support third-party libraries, as Will sees also. Now, I don't think any programmer enjoys learning about build systems or how imports in a program language resolves to files.
07:22
And to make sure that we don't ask people to learn too much new stuff in FULA, our principle here is that the easiest thing to learn is something that you already know. So in FULA, we create a very strong relationship between these import statements and the file system. So when in FULA, you do an import of foo slash bar, then that imports exactly the file foo slash bar dot foo relative to the importing file.
07:50
And that means that the semantics of importing are just exactly file system imports with no hidden details and no other rules. And all uses of code and other files, except for a very few build-in things, must be through an explicit import.
08:06
So there's no implicitly available environment or anything. Everything has to be explicitly imported. That's boilerplate, but it means that there are fewer rules to memorize. It's obvious, especially by looking at a file, what other files it's using. One downside of this very simple design is that files have no canonical name.
08:22
For example, let's consider this directory, where we have a main dot foo, and then we have two folders that contain other foo dark files. So if we want to import the bar dot foo file in the foo directory, then if we want to import that file from main dot foo, we would say import foo slash bar.
08:41
If we want to import it from bass dot foo, we would just say import bar, because it's in the same directory. If we want to import it from bass dot foo in the kooks directory, then we would have to use two dots to move our directory, then enter foo, and then import bar. So that means that the same file, bar dot foo, has three different names in these three different imports.
09:02
I mean, it's not surprising, really, because it's really just file system semantics. Everyone who knows about the hierarchical file system will understand this. But it is a problem that files don't have a canonical name. So this is a trade-off that we made in order to make everything as explicit as possible, but it might not be the right choice for every language.
09:24
So one thing I should mention is that this is not textual inclusion like C's include. Each file must still be syntaxed and type correct by itself. The compiler doesn't just paste the contents. It is still a well-behaved import resolution mechanism. And one subtle detail that makes this really work is that there is no search path set by some build
09:43
tool config file or include path in the C compiler that makes certain directories magically available as search routes or whatever. All of the imports reference concrete physical files that must be in the file system relative to the importing file. And to compile a foothog program, you then just run the compiler on the main file that then eventually imports every other file in the program.
10:05
This has a really interesting advantage, which is that if the program by itself is type correct, can compile, then each constituent file can also be used directly as a compilation route by passing it directly to the compiler. And sure, that might not result in interesting programs for all of the sub
10:21
files, but if you pass them to the type checker instead, that will also work. And then you can get back for every single file by treating those as a compilation route, a syntax checked or type checked syntax tree for that file. And everything it imports, which makes it really easy to write very simple get functional editor tooling. In our Emacs mode, for example, it doesn't worry about whether a file is part of a larger project.
10:44
It just takes the file you are editing, passes it directly to a command called foothog check, which just runs the type check on the file and hence its imports and uses the information that comes back to implement things like the type under the cursor and go to definition and these other nice things.
11:02
Nothing particularly fancy, a language server could also do this, but it was super easy to implement and requires almost no maintenance. So it's not really the right choice for any language. One downside is that there's no notion of shared libraries since all paths must be relative to each file. There's no way to just put something in a system location and have it available to everything.
11:23
And when you install third-party packages, they must be put in a known and accessible location relative to the program that's going to use those packages. So actually, let's talk a little bit about packages. Language package managers solve some fairly tricky problems. The two main ones are how do we find the packages and make them available to the compiler?
11:42
And how do we deal with conflicting version bounds in dependencies? This can get really complicated. For a central register of packages, you probably need a server like Rust has its crates.io. We're a desert language, we don't really have the time to spend on maintaining a server, especially if that server has to run custom software.
12:04
Also, most package systems allow both upper and lower bounds on dependency versions. And that turns out to be an incomplete problem, so you need a fairly complicated solver. It's difficult to implement efficiently and worse, when the solver fails because the bounds are in conflict,
12:22
then it can be really difficult to explain to the user what actually went wrong and how to fix it. For example, Rust solver in cargo is thousands of lines of code, so it's not easy to implement this well. So for our package manager in Futhag, we went with the simplest thing that could possibly work.
12:42
It's really not much more than a glorified file downloader. To add a dependency on some library, you run the command futhag package add and then you provide a package path. Currently, that package path must be the name of a repository on either GitHub or GitLab, but we intend to make this more flexible in the future. The most important thing is that it must be able to look up the available versions somehow.
13:06
After you've added the dependency, you can actually make it download the files corresponding to all of your dependencies by running the futhag package sync. That will populate a directory called lib, which for example might look like this if we add this DQDK source library. What we can see is that the lib directory just contains futhag source code, just files.
13:24
And we can see that the source library itself depends on another package called segmented, which is then also contained in here. And from within our own program, we would then just import these files like any other file. We know they're in the lib directory, we know the names of the library, so we just write the corresponding import statement.
13:42
It has a little bit clumsy, but it's fully explicit, so it's very easy to understand what's actually going on. And when it fails, it's also easy to understand why it fails. And if you want to do rendering, you just commit this lib directory to your repository. Nothing particularly, I think it's a little bit ugly, and maybe it is, but it's kind of obvious that it works and when it doesn't work so well.
14:08
So package versions, again, we went for the simplest thing that could possibly work. They're just git tags, which is not unusual by itself. So to release a version of your package, you just add a tag to your repository and you push it to the repository.
14:20
And a given package can depend on any new version of another package. And Package Manager, of course, also downloads dependencies of dependencies. So how does it handle cases where you have different dependencies, and then they, in turn, depend on different versions of the same third dependency? Well, this is where the many package managers end up being NP-complete, or trying to solve an NP-complete problem.
14:46
But I was inspired by Russ Cox from Go, who came up with a really simple system for the Go module system that they added a few years ago. He came up with what's called the minimum package version algorithm. And the real trick here is that instead of trying to use the newest version of an available package,
15:04
it tries to use the lowest version, so the oldest version that still satisfies all of the constraints. And then it says that you cannot constrain a dependency on other bounds, only lower bounds. This works only if you basically never break backwards compatibility,
15:22
and if you break backwards compatibility by modifying the major version number in semantic versioning, then that really just counts as another package. Completely unrelated to the old one as far as the package manager is concerned. And this is a really, this is a design with many trade-offs. So this thing about not being able to break compatibility, that's harsh,
15:42
because breaking compatibility in small ways and coping by using upper version bounds is very common in most package ecosystems. And this doesn't support that at all. But it has a lot of pros. For example, Go uses it, so it's not fatally flawed. It clearly scales to fairly large ecosystems.
16:02
It means solving for dependencies is reproducible even without freeze files. The only way this solver can fail is if a package doesn't exist, so you won't have a problem of reporting incomprehensible errors to your users. And the implementation is extremely simple. To show just how simple, this is the implementation of the solving algorithm in the CUDA compiler.
16:24
Now this makes for stuff, the actual code for communicating with GitHub and downloading files and so on, but this really encapsulates the actual solving algorithm, which is amazing. It's very, very short. And this design that we use for CUDA package isn't really CUDA specific at all.
16:41
One of my colleagues used the design to create a package manager for standard ML, and in total that ended up being a 1,500-line SML program. So you could pretty much cover the design for another language, and it's very easy to implement. So those examples of language design for the DES that are mostly about things that are kind of peripheral to the language itself,
17:05
about how to use libraries and how to structure the code. And a few other examples of where we went with this conceptual minimalism approach is that CUDA is based on a familiar program model. It's basically the model you would find in Haskell or ML. Map, reduce, scan, high order functions, type inference.
17:23
It's not the most popular program model, but we know that people can learn it, and we teach it to students all the time, and it's not particularly difficult to learn. There are a few places where we actually have language novelty, but we are very careful about when to introduce it, only when it gives us an enormous advantage. Instead, we put all of the novelty, and we research it, so there has to be some novelty,
17:44
in the compiler itself, which is hidden from the user. It's a black box, you give it a code, hopefully it does a good job, and it spits out a program that runs fast. Another example is that we support very few compiler options, because when you add options to the compiler, especially options that affect code generation,
18:01
then you end up with a combinatory explosion of different code parts, and that is really difficult to test. And again, we're in the desert, we don't have extremely large testing resources. For a fun game, try to randomly generate some optimization options for GCC, and see if you can compile a Linux kernel that works using those options.
18:23
I mean, GCC supports an enormous amount of options, and probably many of them, not every combination has been tested together, so probably you can find some bugs that way. So in conclusion, designing a program language for the desert means coping with persistent scars to your both users and maintainers. All languages start out small, most languages hope to eventually grow large.
18:42
If you're designing a language for the desert, you know that it will never become large, and that means you have to make some trade-offs. And the main trick here is just to keep it minimal. And that sometimes means making choices that you would not make for a language supposed to be popular. And you have to realize there are some things you will not be able to afford. Maybe you will never have that advanced language server implementation.
19:02
But then how can you perhaps find another way to design your language so that someone can go to a division tool in the afternoon? So if that sounds interesting, why not take a trip in the desert yourself and try out our language? I think it's really fun to use.
19:43
Okay, so the Q&A has started according to the message I'm getting. So I'm going to start answering some questions. The first question is about canonical file names. Why don't we just start from the top-level directory so that all files will always have a unique import name?
20:00
That would be better. Unfortunately, the problem is if you're just opening an arbitrary file in your editor, or passing an arbitrary source file to your rebel or something, what is the top-level directory? You need to find that top-level directory somehow. Maybe there's a configuration file. Maybe you can recognize it by the presence of a file with a special name. But that's more logic, more things you have to build into your tools.
20:24
And in my case, or in our case, we didn't want to have that kind of logic. But for most languages, what we did here would not be the right design. But we think it is for us, because it's simple and requires less maintenance.
20:40
The next question is from Ephraim. And apologies if I put you the pronunciation. Who asks, what happens if you have a package dependency loop? So I guess this means you have a package A that depends on B, then depends on A. So in our package manager, that would be an error. When you try to install one of these packages, you'll be told that's impossible. There is a loop.
21:01
And again, maybe that's the right thing in general. Maybe it isn't. There may be situations in a large program, in a larger community where it makes sense to have mutually recursive packages. But when you are starved for resources like we are, and you intend to always be starved for resources, sometimes you have to say no even to features that would be useful to someone.
21:20
Because they raise too many questions that you don't have the resources to answer. So again, it's not about me telling you what's the right way to do a language, or it's tooling. It's about realizing that when you have limited resources, sometimes you need to say no to things that you might want in a different context, or if you had more resources.
21:42
So there's another question by Blake 2B, who asks about graphics programming, and says that Puthag seems somewhat similar to GLSL, the OpenGL shader language. I wouldn't say it's particularly similar. So GLSL. In GLSL you specify the essential sequential code,
22:01
and then you run that sequential code in the context of a shader, which can be on fragments or vertices and so on. I'm not actually much of an expert on that. Puthag is much more high level, and also lets you express the structure of the parallel system. Now I would actually say that if you can express your program within the confines of GLSL or the other shader languages, whatever Vulkan has or some things like that,
22:21
then that's probably better, because they are simpler, and obviously they are more well used. Puthag is for when you actually have interesting complicated parallelism in your program. So writing a renderer maybe isn't an interesting use case for Puthag, but constructing a bounding volume hierarchy to describe the scene of your rendering,
22:45
to accelerate the rendering, that might be more interesting, because that's more complicated to do in parallel. So Puthag only makes sense once you have a certain amount of complexity in your parallel algorithms. For very simple things, just use a simple language. Even simpler language.
23:02
More questions? Someone says again, Blake is asking whether anyone has used Puthag for audio or graphics video programming? I think so. I don't know about video. So generally Puthag is not a good fit for real-time graphics, I think,
23:21
because it doesn't hook directly into the actual graphics port of the GPU. It's for general purpose compute. You can use it for graphics, but it's probably not going to be as fast or as low latency as using Vulkan or Metal or OpenGL, because that's not really what it's built for. It's built for bulk data parallel programming.