Bazel
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32282 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FrOSCon 201765 / 95
4
8
9
15
20
22
23
24
25
27
29
32
36
37
38
39
40
45
46
47
48
49
50
51
53
54
59
63
64
65
74
75
76
79
83
84
86
87
88
89
91
92
93
94
95
00:00
Open sourceFreewareSource codeComplex (psychology)Scale (map)Computer fileParallel computingCache (computing)Declarative programmingBuildingFlow separationCodeStrategy gameSoftware maintenanceRule of inferenceLibrary (computing)EmailComputer virusOvalImplementationArchitectureLink (knot theory)Computer configurationGraph (mathematics)Group actionGraph (mathematics)Client (computing)Server (computing)Read-only memoryVirtual machineGroup actionMultiplication signSource codeComputer programmingDirectory serviceoutputSpacetimePoint (geometry)Computer fileCompilation albumOpen sourceCodeLibrary (computing)Rule of inferenceObservational studyCentralizer and normalizerImplementationComputer architectureBinary codeDescriptive statisticsLine (geometry)1 (number)Graph (mathematics)Operator (mathematics)PlanningDeclarative programmingOrder (biology)Uniform resource locatorProjective planeProgramming languageDisk read-and-write headCuboidGeneric programmingConnectivity (graph theory)CompilerSurjective functionRepository (publishing)Right angleBit rateCASE <Informatik>Semiconductor memorySocial classWave packetGoodness of fitPhysical lawCausalityFeedbackParallel portBuildingFunction (mathematics)Cache (computing)Scaling (geometry)EmailServer (computing)Software maintenanceIdentity managementClient (computing)Arithmetic meanXMLJSON
07:57
Graph (mathematics)Declarative programmingClient (computing)Server (computing)Graph (mathematics)ArchitectureRead-only memoryComputer fileRule of inferenceGroup actionBuildingDiagramChainBoundary value problemRecursionRegulärer Ausdruck <Textverarbeitung>Directory serviceoutputFunction (mathematics)Finitary relationIntegrated development environmentCache (computing)CodeGastropod shelloutputEndliche ModelltheorieGroup actionSource codeProgramming languageLibrary (computing)AuthorizationRule of inferenceLink (knot theory)Variable (mathematics)CASE <Informatik>Multiplication signComputer file1 (number)Form (programming)Function (mathematics)Error messageTrailVirtual machineCompilation albumOperator (mathematics)Cache (computing)DiagramOperating systemGraph (mathematics)Forcing (mathematics)MereologyBefehlsprozessorCharacteristic polynomialTimestampContent (media)Object (grammar)Semiconductor memoryInformation securityBinary codeDirectory serviceExecution unitLatent heatCorrespondence (mathematics)Matching (graph theory)File systemCuboidCodeTemporal logicJava appletCombinational logicGraph (mathematics)Level (video gaming)VotingRow (database)Reflection (mathematics)Price indexCompilerLinker (computing)NeuroinformatikIntegrated development environmentRepository (publishing)Lie groupBlock (periodic table)MathematicsProcess (computing)Data centerArrow of timeXML
15:47
Rule of inferenceProgramming languageScale (map)BuildingPerformance appraisalCore dumpComputer fileDiagramoutputComputer-generated imageryMacro (computer science)Source codeScripting languageString (computer science)Resource allocationStructural loadSlide ruleSoftware maintenanceGroup actionLocal GroupLimit (category theory)Parameter (computer programming)ImplementationDefault (computer science)Source codeError messageMacro (computer science)Physical systemFunction (mathematics)Web pageOpen sourceSet (mathematics)Computer fileProblemorientierte ProgrammierspracheCASE <Informatik>Programming languageScripting languageRule of inferencePoint (geometry)Presentation of a groupoutputTerm (mathematics)Single-precision floating-point formatDirectory serviceString (computer science)MappingBuildingMultiplication signTimestampRead-only memoryLimit (category theory)Parameter (computer programming)Latent heatProbability density functionSequenceOrder (biology)System callNumberSubsetSound effectSoftware developerScaling (geometry)Semiconductor memoryType theorySlide ruleGroup actionStructural loadDefault (computer science)Web 2.0Centralizer and normalizerMereologyOctahedronStatement (computer science)BitElement (mathematics)ImplementationUniform resource locatorHistogramContent (media)Client (computing)DiagramVector spaceExponential functionInterface (computing)Subject indexingMetropolitan area networkProcess (computing)Medical imagingCausalityDescriptive statisticsState of matterMusical ensembleElectronic mailing listLevel (video gaming)Digital photographyTupleComputer animation
23:37
ImplementationRule of inferenceDefault (computer science)Parameter (computer programming)Limit (category theory)Scripting languageoutputComputer fileLattice (order)Function (mathematics)Electronic mailing listGroup actionMessage passingMacro (computer science)Local GroupSource codeInternet service providerInformationBuildingDeclarative programmingGeneric programmingExtension (kinesiology)Programming languageParallel computingHome pageRepository (publishing)FingerprintDesign of experimentsElectronic mailing listRepository (publishing)Internet service providerRule of inferenceData dictionaryGroup actionField (computer science)Functional (mathematics)Line (geometry)outputLatent heatAdditionSystem callComputer fileCache (computing)Source codeParameter (computer programming)Limit (category theory)ImplementationPositional notationMathematicsContent (media)Macro (computer science)Statement (computer science)Extension (kinesiology)EmailFunction (mathematics)Graph (mathematics)Compilation albumCASE <Informatik>String (computer science)Arithmetic progressionInterface (computing)Scripting languageBitMessage passingLibrary (computing)InformationOpen setAttribute grammarType theoryDeclarative programmingMathematical analysisResultantProgramming languageStructural loadMilitary baseComputer fontSelectivity (electronic)Endliche ModelltheorieLengthGoodness of fitState of matterSequenceRight angleCausalityManifoldSlide ruleNeuroinformatikOpen sourceJava appletCombinational logicSummierbarkeitSurfaceZirkulation <Strömungsmechanik>Arithmetic meanComputer animation
31:27
Home pageElectronic mailing listRepository (publishing)FingerprintDesign of experimentsHill differential equationSemiconductor memoryStructural loadOpen setCartesian coordinate systemLibrary (computing)Normal (geometry)Computer programmingRepository (publishing)Virtual machineAndroid (robot)BuildingCASE <Informatik>Client (computing)Projective planeSuite (music)Graph (mathematics)Server (computing)Representation (politics)Remote procedure callEmailPlanningProcess (computing)MultiplicationCache (computing)Computer fileRule of inferenceMultiplication signExtension (kinesiology)Connectivity (graph theory)NumberRevision controlAddress spaceConfiguration spaceStability theoryBasis <Mathematik>Open sourceArithmetic progressionQuicksortObject (grammar)Physical systemDirected graphFile formatUniform resource locatorCompilerMilitary basePressureState of matterSpacetimeSoftware testingPoint (geometry)Group actionWeb pageComputer architectureActive contour modelSelectivity (electronic)Computer animation
38:00
Electronic mailing listHome pageRepository (publishing)FingerprintDesign of experimentsComputer-generated imagerySemiconductor memoryGraph (mathematics)Group actionRule of inferenceResultantVirtual machineoutputRun time (program lifecycle phase)Content (media)MappingProcess (computing)QuicksortInformationProjective planeActive contour modelUniform resource locatorForm (programming)Type theoryFrustrationPhysical systemSpacetimeSequenceArithmetic meanProteinMereologyUniformer RaumCache (computing)Function (mathematics)BitBuildingINTEGRALIntegrated development environmentMultiplication signMechanism designUniverse (mathematics)Traffic reportingLimit (category theory)Domain nameLevel (video gaming)Digital photographyWebsitePoint (geometry)Communications protocolData storage deviceRemote procedure callComputer fileJava appletRotationService (economics)Key (cryptography)Hash functionDependent and independent variablesFile formatMiniDiscWritingXMLComputer animation
44:33
Home pageElectronic mailing listRepository (publishing)FingerprintDesign of experimentsFreewareOpen sourceEvent horizonComputer animation
Transcript: English(auto-generated)
00:07
OK. So yeah, once again, good afternoon here at FROSCOM 12. Nice to have you all here. The next lecture is from Klaus Ehrlich about Bazel, the CI
00:20
and build tool from Google, which I met open source some years ago, I guess. I'm really looking forward for this talk. And just a small reminder, please, everyone of you, watch the lectures here. Please provide us some feedback. It's quite easy. Just go to Program FROSCOM DE, choose the lectures
00:40
you visited, and give it a rating. It's something like Amazon rating. You can provide some comments and a one to five star rating, which really helps us to improve the quality of the whole conference. So yeah, please give us some feedback. That would be very nice. So after having said this, so yeah, I
01:05
guess we should just start. So please give some hands for Klaus. OK. So thank you very much. And thank you for the opportunity to introduce Bazel at FROSCOM. So first question is, what is Bazel?
01:21
As already mentioned, it is a build tool. That is, it organizes how you compile or create other artifacts, libraries, executable from source. It is open source since 2015. So appropriate for the conference. But it has a quite long history as a Google internal tool, with all the implications it has.
01:43
But in particular, it means we know the tool works. The more interesting question is, why do we want yet another build tool? So the historic motivation why Google started what is now known as Bazel in the first place was to have a tool that scales well.
02:02
Google has the approach to have one big repository with at least the majority of all code that is developed at Google. And everyone works at that repository from head. And so you get a quite big code base with all the dependencies in it. And conceptually, you want to compile everything from source.
02:23
So that is the scenario which Bazel is dealing at. Historic, well, is still dealing at today. So the design is really about aggressively doing things in parallel. Be quite aggressive with caching of build operations.
02:43
But be sure that we still keep correct in the sense that we get the artifacts as if we would use no cache at all and build completely fresh from source right now. And by the same artifacts, I mean byte by byte identical output.
03:01
So that's where the slow and fast correct that is associated with Bazel comes from. But I think even if you have a smaller code base, Bazel can be quite interesting. So one of the aspects I quite like about Bazel is the declarative style of build files, which had the advantage that you separate the concern of writing code.
03:20
I want to write some library in C. And here's my C code. From the concern of what is the best way to compile that, to cross-compile it to a different architecture and so on. And yeah, so that declarative style gives you a central maintenance point for your build rules. You only need to specify at one point how you build it
03:42
and not update everything if you find there's a better way to compile code of some language. And by now, it's a generic tool that is you can provide your own rules in a declarative style how to build things. And I will go through an example towards the end of the talk.
04:00
OK, so what is the look and feel of Bazel? Let's do that with a simple hello world example written in C. So you have your main program, simple program. And it uses a library. So you also provide a library, which usually consists of some header files and some implementation,
04:23
so some more C files. So in that simple scenario, how would you just instruct Bazel to build these things? So well, the first thing is you provide a workspace file, which is literally an empty file. The idea of a workspace file is saying, on the one hand,
04:43
specify where the scope of the source tree ends. So I'll pass a relative to the directory where the workspace file is. And the other purpose of the workspace file is define external repositories or external sources, which you might want to include in your build.
05:00
In that simple example, that is literally an empty file. And then you would write build files, which in that example look as follows. So you have a library, C library. Say, OK, I have a C library. It has a name. So that C library file is the lower build file, the one
05:20
parallel to the library files and headers. You give it a name, and you specify what are your source files and what are your header files. In that simple case, you can just use CLOPS. And then you have the executable. So you say there is a CC binary. It has a name.
05:41
It has a source file, and it has a dependency on a library. And you specify the, well, it's called label of that library. So you have the past to the location of the build file and then a colon and then the name of the rule in that build file.
06:02
So you say that C binary depends on that library. OK, so that's the general look and feel, declarative style. These are my sources, and that is the language they're written in. And it's quite worth noting what is not here.
06:22
So all the things about which architecture I'm compiling for, do I have to care about cross-compiles, which C compiler, and so on, is nothing you have to specify each time you write a library or binary. So focus on the code, and that is specified
06:41
at other places, once for the whole project. OK, so now how do you build your library and your binary from that description? So the general way Bazel builds is that you first load the build file, well, first the one that you specify
07:03
from your target, but then also all the ones recursively needed as dependencies. You analyze the dependencies between the targets. And then you look at the rules, and on the tools you create a plan, what to do, so a graph of actions.
07:24
And you execute them unless you have done so already and have, for example, have in your local cache an entry saying yes for these inputs, I already ran that action, that's the output. On subsequent builds, you update, just update the graph of your build dependencies
07:42
that you keep in memory, so Bazel uses a client server architecture on the local machine. So when you start Bazel the first time in a workspace, it starts to serve in the background and then communicates to that server all the requests, which has the advantage that you keep
08:01
the constructed graph in memory and only update by watching the underlying file system. And as I said, the main use case is that you have all your dependencies from source in the same repository, so that graph can be quite large, and therefore it's worth not recomputing it
08:20
every time you want to build something. Okay, so, say we want to build that hello world binary. The first thing is, okay, it is, you look at the target hello world, you find it's in the top-level directory, so you look at the top-level build file.
08:42
Yeah, so look at the conceptually at the package where that single instance, so all, all directories below a build file, except those subdirectories that have a separate build file, that is what Bazel calls a package, a unit of sources described by a common build file.
09:02
And then you read the rule and you find that, well, there are two dependencies, the source file and a library. And actually, again, there's an implicit dependency on the toolchain, so if you change the specification of which tools you want to use,
09:20
you notice that this target is outdated and you have to rebuild it again, but didn't draw it in the diagram. So again, with the same process, you don't do any recursive calling, you just look at the next package mentioned, which is the libsub directory, and you look at the build file there. There you find as dependencies to glob,
09:42
so you look at the content of the directory, find the corresponding matching files. Now you've discovered all the things you need for your build, and then you construct the action graph. So from that library, lib colon hello, you get the lower actions, how to compile,
10:01
well, to build the library, so these one compilation and one linking to a library action, and from the top colon hello world target, the C binary rule, you get another action to compile the source file and then link with that library to get an executable.
10:22
So this is the actual actions where you have to read the source files and invoke a compiler, but the tracking of the whole graph with all these logical concepts is what keeps your build correct, and the easiest way to see that is if you add a file,
10:44
new file into that library, so none of the actions is directly affected, but you notice that the directory has changed, so you invalidate everything that depends on the directory, so in particular the clubs, and that forces you to reevaluate the library rule,
11:02
and once you do that, you see oh, now I need to generate an additional action, and then you call all that actions and build the final output. Okay, as mentioned, the main part of the actual work
11:21
are what we call actions, so invocations of a compiler, of a linker, whatever, so they take the most resources for the build, definitely the most time, CPU time, et cetera. So it is particularly interesting to avoid unnecessary actions.
11:42
Okay, I mentioned we have the dependency graph, which means if nothing has changed, we don't have to redo the action, and in particular, we track all the inputs, so we are sure we're not losing anything. Yeah, so we can, since we track all the inputs and not characteristic timestamp or anything,
12:02
we can actually cache actions by content of the file. And that at every level, so the first thing you get for free is if you just change a comment, then you do the one compilation step, you find the object file hasn't changed, and we can skip the rest. But for that to work, it's important that you know that you have tracked
12:21
all the inputs and outputs, because if you miss something, then well, that assumption that you have the graph correct doesn't work. So yeah, you can only read the inputs that you declared as input for a rule, and none of these timestamp approaches. Fortunately, Bazel has a way
12:41
to help you doing that correctly. You can ask the actions to be run in what we call sandboxes. These are not a security feature, but they still provide some form of insulated environment, where you only have the declared inputs, and where you only copy out the declared outputs.
13:01
So depending on the operating system and the amount of effort you want to work, that can be more or less sophisticated. You can say no sandboxes at all, just run everything standalone, which then doesn't help a lot. What is simple but already helps quite a lot is that you create a temporary directory, copy in, or actually, simlink in only the files
13:21
that you declared as input for that rule, and then copy out the declared outputs. That simple approach basically catches all of the errors of not specifying a dependency. And you can go further, and Bazel has that implemented. You can build a change-shoot, and yeah, go crazy.
13:45
So the change-shoot is definitely implemented, and then you're sure you're not, you declared all the inputs you need. But that's tracking of all the inputs has another advantage, because now you know which files are actually needed, so you don't have to compile on your local machine.
14:03
You can also do the compilation step remotely at a different machine, so like your nearby data center, if you have one. And that enables yet another quite beneficial operation. So if you execute remotely,
14:21
then you can use the same compute center as your colleagues and so on, and then you can have a shared action cache. And remember, I said that the main internal use case is a lot of engineers working on the same code base. So quite likely, someone will have compiled the same source that you have
14:41
in your work deal already. And that way, you just get a cache hit, you get the answer back immediately, and can continue. Okay, so much for the execution model.
15:00
And we've seen Bazel, we've seen some of the built-in rules of Bazel, in particular specialized ones for the language that are the more important ones, at least the more important ones to the authors of Bazel, so in particular C, Java, and a couple of other languages. There is also a generic rule, or there are generic rules,
15:23
one of the, including one rule called gen rule, where you just specify a command to be executed, and you have some shell-like variables that get expanded in the way you would expect. I'm mentioning that rule because that is basically the only rule you have
15:41
when you use a traditional makefile. And you've seen that you can compose everything from that rule. But still, adding specialized knowledge for every language is something that won't scale. Definitely not if you want to use it in the open source world, where there are hundreds of languages and you can't get specialized knowledge
16:01
of each and every language into the build tool, so you need a way to extend the language. Therefore, Bazel has a domain-specific language called Skylark, in which you can describe your own build rules. The language uses Python syntax,
16:22
so the syntax shouldn't be too scary, oddly known, but it is not full Python anymore. It is restricted to a simple subset of Python. In particular, it don't allow any reference to global state so that you can evaluate your build files locally without side effects
16:41
and in a deterministic, reproducible way. Okay, and to give you a feeling of how Bazel can be extended and how that looks like, let's do a simple example of something that is not a mainstream programming language.
17:01
Say you want to develop rules for LaTeX. Say it's not mainstream, but still too incomplete, but for the purpose of this talk, it's enough to know that LaTeX generates PDF files, like the slides I'm using from textual description, text files, and these text files can refer other files, images, diagrams,
17:22
but also input other text files. There's a sequence of commands you might want to execute in order to build it, but this talk is about the build system and not text. Okay, so the first approach I took when I was faced with that problem is saying,
17:42
okay, first of all, what should the LaTeX rule look like? Well, probably you want an entry point, which is what is the main document and a bunch of other files that you have, and then you have a script to type this. So first of all, call PDF LaTeX the correct number of times.
18:03
Take care of all the timestamps that are implicit in such a process, especially if you implicitly convert postscript files into PDF, then for each of them, you get another timestamp, and as I mentioned, you want reproducible builds. Also, copy all the input into a temporary directory
18:22
to not rely on sandboxing to avoid polluting the internal workspace. I should mention that I described it with building and so on, but as every same build system base generates, executes the commands and write the output in a file that is outside the source tree
18:41
that can work with read-only source tree, which should by now be standard for a proper build system. Okay, so you have your script and the interface, and then the first approach is what we call a macro. So you say yes, I can build LaTeX,
19:02
and I express it in terms of rules that are already present in the system. So more simple case, we just have a general. So the syntax is as you would expect for a Python-like language. You say def, and then the name. You provide name parameters, and give some default values,
19:22
which probably is only useful for the additional sources, because it can have a single file document. And then you just compose your general with a typical Python command, so you compute a name for the rule, you specify the sources,
19:41
and you build a string that is the command you want to run, declare the outputs, and you also declare the implicit dependency on the tool. So you just, yeah. Whenever you invoke the tool, you have a dependency that is tracked by Bazel on your, well, tool in this case is our script that does the correct invocations,
20:01
but it means that if you change that, then we know all our LaTeX documents are outed and have to be typeset again. Okay. Okay, now you have your specific, written your macro that just maps everything to a general, then you can load that new thing in a build file. You write a load statement,
20:20
just the file name relative to the workspace, and you say from that file, I want to get into my build file one command, the, you specify it, in this case LaTeX, and then you can use the rule in the same declarative style that you've seen for other rules. Just name, main, sources.
20:44
And that works quite well. You haven't spent a lot of effort now, but you haven't, yeah. And you still gained a bit, you have your central point where you can say, yeah, maybe later I want to change the rule, I have one place where I have to do it and don't need to go through all the invocations.
21:03
You can also add, since the macro can expand to more than one rule, saying, oh, give me a general, and then also add a rule that, just an executable rule that runs the presentation and also always build kind of a summary
21:21
with many pages on a single page, and so on. Yeah. The next thing you notice is that in the particular example of LaTeX, that you start syncing into groups of files. You have that slide, and it contains five diagrams, so it is basically a set of files
21:40
that you want to declare. And for that, basically it's something built in which is called a file group. It's a named set of files that may be source files that also may be generated files, which in this case you get a dependency on the underlying target.
22:00
All right. Yeah. Named set of files. So you maintain that set at one point, and wherever it's used, the dependencies get tracked correctly, and you can add a file group into another file group, that is, if you would add the elements. Removing duplicates and implementing them
22:22
slightly more memory efficient way. That is quite useful concept when you're syncing groups of sources belonging together. Yeah, and then at some point, you come to the point where you find out that macros are quite well on working,
22:41
but there are some things missing. The first is your missing type checking, in the sense that you have to remember that the main argument is a single file, and that the source is a list, and if you do that wrong, you get quite confusing error messages.
23:02
And the other thing you will notice eventually is that there is a limit of how long a command can be. So the limit on arcv is not terribly small on Linux, but you can hit it. So you change your script to say, okay, instead of providing all the arguments,
23:20
I provide you with one file which contains all the parameters you need to know, in particular all the source files. And you change your macro to a rule. Note, I'm still, the only thing I have to change is that one file which contains the specification of what I want to build. Okay, and then you say, latency equals rule, that makes that name a rule,
23:43
and you specify the attributes of that rule, together with the expected type. So I expect main to be a single label, I expect sources to be a list of labels, and I also declare that implicit parameter on the build tool.
24:00
You specify the outputs, and you specify an implementation of that rule, which is a bit more. So you first compute all the files you need, and then you have something called a file action, which tells Bazel, yes, I need to write a file, or generate a file, with a given content
24:22
that only depends on the build specification, not on the contents of the source files. Typically meant for parameter files, because the same problem of not, of exceeding the limit of a command line, is something you hit with C and Java, and so on, compilations also, yes.
24:43
Source tree gets big enough, and therefore all the main compilers also allow parameter files. So that is a typical use case for a file action. And then you compute the command line, as you would want it as a list of strings.
25:01
Okay, once you computed the arguments, you can say, and there is an action, where you specify inputs and outputs as list of files, and you specify a command as a list of strings, that that can be called by exec. And you have additional bonus,
25:21
you can specify a progress message, though, that in the interface you see what is actually happening, oh, it's lot of invocation of our executing general, blah, blah, blah. So yeah, additional benefit is that you now have to, can specify a list of arguments,
25:41
instead of a string that is then interpreted by a shell, and you have to be a bit careful with quoting, and you have a bit of meaningful progress message. That is additional advantages on top of the already mentioned things that you get checking whether you have all needed parameters provided, and they are of the correct type,
26:01
and you can now get these two actions depending on each other to avoid the limitations of your command line lengths. Works nicely, but there's an additional thing that is typical for builds. Okay, so back to the LaTeX example,
26:20
you start collecting macros, at least I do that. These are my notations for mathematical things, and these are all the fancy things I want to use in slides, and you organize that all in five groups, and then you would say, just input that file group. Well, what do you do? Well, the first thing seems easy.
26:41
The content of, I want to input just one file, one statement, and then the content should be all the files in that file group. Well, that only depends on the names, not on the contents of that file, so you can easily specify your file action, writing out the statements you need,
27:02
except now we have a problem. Whenever you use that generated file, you implicitly also depend on these other files that went into that specification, and you want Bazel to track that dependency. A similar problem we've implicitly seen already
27:21
early in the talk, when I depend on a library, on a C library, I not only depend on the generated library file, I implicitly also depend on the header files when I want to use that library. So there needs to be some way where one rule can pass on a rule that depends on it additional information,
27:41
and that is what Bazel calls providers. So it states, let info is provider, that makes that name available, and what I didn't tell you so far is that a rule can have a return value, which is then a list of providers, so provider is basically a named dict, dictionary.
28:05
So once you have that, then the consuming rules, which we've seen so far, you compute the input, and then you compose the actions, can go through the sources and ask if a provider of that name is available in that file, and if so, then access the fields of that provider.
28:27
Okay, so these are, with that example, I hope I've, well, I've shown you the main concept that you can use in your extension language there. These are the main concept, there are more specialized and ready-made functions already,
28:47
but we've seen that you can extend it, you can also start, and what I hope I made clear is you can start simply, start with a simple macro that expands just to your one line command, and then later you can refine the specification
29:00
and add additional cooperation with other rules without changing the already existing calls to that rule. Okay, so to sum up, I've shown that, what is particular about Bazel,
29:23
first you have declarative build files, but you're still a generic tool, that is, you can bring your own build rules in a Python-like extension language, and in the execution model, Bazel tracks all the dependencies,
29:40
which guarantees correctness of the result, because we know when the output is outdated, we have sandboxing that helps us to make sure we declare all the inputs and outputs correctly, and that full knowledge of the build graph allows also for speed, because we can more aggressively cache based on content,
30:02
we can execute remotely, because we know which files we have to send there, we can do more things in parallel, and this remote execution also allows shared caching, saying, oh, the colleague already compiled it, I can use the result, and Bazel is open source, so you can try Bazel yourself,
30:21
these are the contact information here. We have a homepage, we have mailing lists, there is, the repository is mirrored on GitHub, and there's an issue tracker, there's a not very active IRC channel, so mailing lists is probably the better way to contact, and release artifacts are assigned with that key.
30:44
Yeah, that's an overview of Bazel, and I'm now open for questions.
31:06
Okay, so the question was, who's currently working at Bazel, and what's their affiliation? So from Google, there are, we'd say roughly about 30 people working on Bazel,
31:20
not all of them full-time, some also on, I mean, as I said, it's open source, and it's also used internally, there are some internal extensions that talk to the internal caching, and the internal source control system, those are some components which are not open source, because they don't make sense in the open source, but if you also take that at the broader scope of Bazel,
31:41
then it's about 30 people working internally at Bazel, and externally, we regularly get contributions from a fairly small number of persons, but it is still the case that the vast majority of commits are from people with net-google-com address.
32:09
Yeah?
32:21
Using the program, whether you will change the user load into the users, say, in multiple SSL and so on, what does the user look, can you give us, is such possible with Bazel? So the question was, if you have traditional C project using auto-tools, then at configure stage,
32:43
you can say which, for example, which SSL library you want to use, and how that works with Bazel. The answer is unfortunately, to some extent, but not very well, because the tradition is that you have your big monolithic repository
33:01
where you have all your dependencies, and there is some approach to that, because I mentioned the workspace file, there you can specify external repositories and give them a name. That also includes, even though it's still not a very stable interface, saying, okay, there's a precompiled library,
33:20
that's a binary, these are the header files, and there, by changing your workspace accordingly, you can switch to one or the other external dependency. There is no such thing as the auto-tool as yet for Bazel, but there are plans and discussion about how to best add such a thing to Bazel, but that is still a work in progress, unfortunately.
33:47
Any other question, yeah? Basically, a server, where is that, because I didn't quite understand. Okay, so the question about whether Bazel is a server application, well, yes and no.
34:09
So it's, technically, it's a server client architecture, but that all runs on your local machine, so you basically start to deem that for each workspace
34:20
that keep the dependency graph in memory, so when you start building, the first time you add a process in the background, just that you have a persistent process to keep the dependency graph in memory for some time, because the typical thing is you work on that code, but then you test something work again, and you don't want to recompute that graph again. So in that sense, it's a client-server architecture,
34:41
but it's meant to run perfectly on a single machine and just communicates over the loopback device nowadays to keep some process, to keep information persistent in memory. It is also client-server in the sense that it supports remote execution as an optional feature,
35:04
so that then the, well, what is the base of the server? The one that organizes the build and is in memory over multiple invocations in the same workspace is then a client to a remote execution or a remote cache project, but that's all optional.
35:21
You can have your single machine and run it, and you don't really notice, or it's not disturbing that you then start two processes instead of one, and I mean you start a lot of process in the background anyway for your compile invocations, but just one process then survives at the end to keep the graph in memory
35:41
in case you need it quickly again. Any other questions?
36:04
The question was whether Android will be providing any build files for Bazel. I'm not a representative of the Android team, but there are definitely rules for Android applications already provided with standard Bazel,
36:20
so there definitely are plans, but I'm not sure about timelines or anything that you have to ask the Android project. Sorry?
36:43
Oh, the question was for the Bazel suite and the end user devices, so in case of Android, I wouldn't run it on a mobile device. No, Bazel supports building Android applications, that is then, the applications are then to be run on the Android device, but you build it typically on a different machine.
37:06
So I'm not sure whether I understand the question, the question was whether Bazel is on the end user device, I mean it depends. If you're a developer, yes, Bazel's open source, you can use it on your desktop, whatever, but the application you build is a normal C file,
37:22
normal whatever file, you don't need a Bazel to run the program that you compiled in the same way as you don't need make to run a program that you built with make. So it is, in that sense, it's a normal build tool as every other build tool. You use it on whatever machine you want to build something and then the artifact
37:41
is independent of the tool you used to create it. Yeah, sorry, what?
38:07
I guess the question was about integration into an IDE. Okay, so the question was about integration into an IDE, there are, there exists some integrations for some IDEs, let me guess that correctly,
38:20
so it definitely exists for IntelliJ and I think Eclipse, even though I'm personally using neither of them, but they are integrations and as far as I can tell from my colleagues, they're quite happy with them. There is also a generic mechanism where Bazel can report, so you can invoke Bazel from whatever means
38:41
and there is also a machine-readable report about what happened during the build, so a sequence of protocols that are serialized whatever format you like, so you can also add your own wraparound Bazel that then has information about what happened during the build
39:00
and machine-readable form, but there are, there exists IDE integrations at least for some IDEs.
39:25
Data saved there to make sure that things are not recompiled. Okay, the question was about caching. Okay, there are several levels, so start with the first thing, you have the dependency graph
39:41
and if none of the inputs change, you're not even considering that node again. Then there is on disk, on your local machine, a cache that contains a hash of the action that is a hash of all the inputs and the command
40:03
and maps to the output. If that output is still the latest result of that execution, so for each action, the latest execution is stored as a hash on disk together with the artifacts, so you compute the cache key if that is there, then you don't rerun the action.
40:25
That is standard local caching with just one invocation of Bazel. You can then also specify a remote cache, which may or may not be on the same machine, where you then store more such hash of action
40:40
and hash of output, triples so if you change and then change file back and run build in between, you can keep that from cache and then it's your responsibility to get some cache rotation to sort out things. And that can also be on a remote side, typically also combined with a remote execution service
41:03
so that you then just send it there, hopefully it'll cache it. No, there's that protocol as well. These are the main levels of caching. Yeah?
41:35
So the question was whether it's worth trying Bazel or Makeful is okay if you have a very small project.
41:43
So let's say if you have a small project and you have a Makeful already that works and you're happy with that, then you don't have to change, I personally use Bazel also for kind of new private projects and are quite happy with that. I mean, it is a big tool but it doesn't mean that you can't build small projects either.
42:02
I mean, you have to accept that Bazel's written mainly in Java so you have to have Java on each machine where you want to run Bazel. So you have that as a runtime dependency but for me personally, it's fine to have Java installed on my desktop and work with that. And I think it's definitely worth for small projects,
42:21
at least for somewhere you don't have all your infrastructure in place already because of that declarative approach that makes it much easier to write build files and make sure they're correct. And you can change the way you build things later at a single point where you maintain all your rules. So I like that flexibility even for small projects.
42:43
But in the end, if you have something which you're happy with, then use it. But if not and if you can accept Java as a runtime dependency of your build system, then I think it's definitely worth trying Bazel also for small projects.
43:04
Okay, the question was about resource usage compared to make. Since you keep the full dependency graph in memory, you need a bit more memory. Also, you have a bit of overhead just to start up a JVM, et cetera. But with nowadays, desktop, that is not a problem
43:22
but it's more. Since you want to not recompute the dependency graph, you have that process and background which you don't have to make. After invocation, no process arrives. It is more but it's not terribly more.
43:43
It's about the effort of starting up a JVM, keeping that persistent and then it depends on the size of your project. Of course, if you want to have a huge dependency graph in memory, then that is resource usage. Whereas if you don't have this make but the make, then you pay the price of rereading all the make files again
44:01
once you're at the next invocation. So basically treating time for memory. Okay, any further questions?
44:24
Okay, I guess that's it. Thank you for your attention.