We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Where does TeX end, Lua start and vise-versa

00:00

Formal Metadata

Title
Where does TeX end, Lua start and vise-versa
Title of Series
Part Number
29
Number of Parts
33
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceCork, Ireland

Content Metadata

Subject Area
Genre
Library (computing)BitComputer animation
Medical imagingCodeDampingImplementationTouch typingProgramming languageComputer programmingRewritingLibrary (computing)Level (video gaming)Software testingDimensional analysisOffenes KommunikationssystemBitGroup actionMultilaterationComplex (psychology)Extension (kinesiology)MereologyMoment (mathematics)TheoryPhysical systemProjective planeTable (information)Text editorComputer fontProcess (computing)PredictabilityInstance (computer science)Complete metric spaceDistribution (mathematics)Point (geometry)outputReading (process)Web pageData conversionLattice (order)Electronic mailing listScripting languageComputer fileBootingWebsiteStreaming mediaEndliche ModelltheorieCalculationAuditory maskingContext awarenessMultiplication signMappingSpacetimeSoftware developer1 (number)JSONSource code
Attribute grammarNatural languageType theoryBitLetterpress printingContent (media)MereologyMoment (mathematics)ResultantVirtual machineMatching (graph theory)Range (statistics)Computer fontOperator (mathematics)Instance (computer science)Distribution (mathematics)Point (geometry)CuboidOpen setProgrammschleifeInclusion mapDirection (geometry)Latent heatElectronic mailing listGraph coloringComputer fileMarkup languageTrailStreaming mediaMiniDiscContext awarenessMultiplication signWritingInterface (computing)Mechanism design
Attribute grammarMedical imagingCodeInformationMathematicsNetwork topologyLibrary (computing)BuildingParsingMacro (computer science)Level (video gaming)BitBoolean algebraSheaf (mathematics)Group actionContent (media)Structural loadLoop (music)Extension (kinesiology)Table (information)Virtual machineThermal expansionNumberAreaWeightCASE <Informatik>Process (computing)Heegaard splittingInstance (computer science)Complete metric spacePointer (computer programming)Point (geometry)CuboidKernel (computing)WordSound effectComputer fileVertex (graph theory)Streaming mediaElement (mathematics)Context awarenessMultiplication signSpacetimeAssociative propertyJSONSource code
Attribute grammarInformationComputer clusterNatural languageType theoryLevel (video gaming)BitForcing (mathematics)Complex (psychology)Physical systemProjective planeCore dumpNumberGoodness of fitConnectivity (graph theory)Error messageRootSquare numberOpen setReading (process)Web pageCalculationElement (mathematics)Context awarenessMultiplication signMessage passingJSON
CodeType theoryState of matterProjective planeTable (information)NumberDirectory serviceError messageOpen setFile formatComputer fileRun time (program lifecycle phase)Computer animation
CodeAreaCuboidHookingMultiplication signClique-widthComputer animationSource code
Macro (computer science)System callFrame problemComputer fileContext awarenessSource code
Mathematical analysisAttribute grammarCodeMathematicsOrder (biology)Macro (computer science)BitLine (geometry)Maxima and minimaLoop (music)Extension (kinesiology)NumberMixture modelParameter (computer programming)Heegaard splittingError messageMetropolitan area networkUltraviolet photoelectron spectroscopyPoint (geometry)CuboidReverse engineeringElectronic mailing listChemical equationLengthContext awarenessMultiplication signClique-widthSource code
Transcript: English(auto-generated)
So we have heard something about Lua-Tec, the program, some library stuff. What I want to do is to explore a bit what the consequences are for micro-packages. So I'm actually creating Frank's next nightmares.
So where does Tec end, if you put it on the Tec end, and where does Lua start in Fieser-Verse, the other way around. First a few remarks. Well you refer to the Alderberg meeting and I think
if we actually look back we can safely say nothing happened. No real extensions were made to that. So when we started the Lua-Tec project one of the things that we decided is we will not provide solutions. We will just open up machinery because everybody wants his own solutions and it saves a lot of discussion
and I think there is some general agreement about what people would like to do but I'm pretty sure that let's say a later spacing model is completely different from a complex spacing model and if we try to hard-code all that kind of stuff into an engine you keep coding and never reach a solution. As a consequence it means that
as we don't provide solution that the macro-write or the write-on-micro-packages should provide a solution. They are the ones in the space in the Lua. For instance what Hutton showed the original image library and each micro-package can see how they
group it into their own stuff. And in the next couple of pictures I will show so far has been the consequence for context and I will do that by step by going through the
development of Lua-Tec. But the first ideas were just to add, well it was as I told before I'm using an editor that has Lua as an extension language and by playing with that it was something, well it would be nice to have the same thing in Tec and then Hutton and I did a little bit
of discussion and then suddenly it was there and play a bit and that was basically the starting point. Just have some scripting engine in Tec. This is really the simple way of using Lua. Users can use Lua as a programming language and just leave everything in touch.
An example of usage is that what we have in our workflows is sometimes a mask set that needs to be entered. Very simple and well what we call calculator mask, the basic Texas
instrument calculator mask. You can have that kind of input stream just to convert it into something Tec, get it typeset, but it has nothing to do basically with Tec. You're just converting and this is the simple kind of Lua usage that we were first thinking of. For the next stage what would be handy if you can use Lua to do some of the calculations
that in Tec get bloated. So access to registers was added. First only dimensions and counters and well you couldn't do some. It's easier simply to have a loop and over a thousand things and calculate the average then assigning things to dim and scratch registers and things
like that. So that was already a bit more Techie so to say. I think the first really big thing that was implemented or re-implemented in Lua was the whole IO system.
And there was actually a reason for that, it's the starting point of everything. Tec is of course IO but another reason is that we have been hearing for maybe 10 years about extending Apache with things like reading from zip files, reading from web pages and whatever
and it never came. And this opened up the possibility to say well just kick out the existing stuff and replace it by your own stuff. So the first thing I did, it was my first real big Lua experience is to rewrite Apache in Lua. It's a scripting language but which
is actually faster than the original C thing and also more flexible. So we can read from zip files and we can read from FTP sites and whatever. I must admit that I seldom do it, it can be done. In theory you can say I have a complete Techie distribution running
from a zip file. A logical next step was to look at well since Lua Techie is UTF-8 and see what are we going to do with all the input encodings, these are called regimes, input regimes in context. And well if you read in your data using Lua and then
it makes sense to re-implement this part and all the code related to this kind of stuff was just kicked out of context and replaced by a couple of mapping tables that we hooked into the Lua code. So this was really the first time that you saw the traditional
tech part becoming smaller and Lua code becoming larger. The next big effort was something that has been bothering me for quite a while, the traditional meta-post conversion involves some marci calculations if it comes down to pen-transformed paths.
That's not my one is it?
So there's quite some marci calculations going on and over time it evolved to something that was actually usable and reasonably fast so that could be done better in Lua. And the nice thing
at that point the LPEC library came around and really even if you don't want to use LuaTechie it is worth looking into LPEC because this is really something I think maybe not revolutionary but in a sense it is. And so it was a good exercise to do that kind of stuff also in the lab.
It also was a little bit faster, not that much but it was noticeable. This is one of the interesting things if you think that Lua test will make, by replacing tech code by Lua code that you will get a faster thing this is not true. Because at the same time you often
take a modern engine, probably the same as to the CDEC, well the fonts are bigger to start with so everything related to loading font and banning font everything so you will enter business like this lower system anyway. At that time we got the first access to the note list, I will not
go into details about what note lists are. Notes and tokens, this is typically the kind of things that you need to avoid at a conference and it became possible to manipulate these things. For the moment think of a note list as a linked list of thingies and some of these thingies are
characters or basically glyphs and if you can really touch them, if you can say okay I see let's say a small letter A and I want it to be an uppercase you can change it at that level. Now imagine that this kind of stuff happens at the tech level, at some point it doesn't matter
what you implement, it always breaks at some point because users are kind of unpredictable and if you can postpone that kind of stuff and do it later on then you can implement rather robust solutions. So this was the first thing that we started adding stuff that was really
operating at the internals and okay this is not something that you have to do yourself, this is the same kind of thing that marker packet-wise it's why this will do for you. Really impact was when the OpenType font readers were added.
Be warned, LuaTech does not implement OpenType support, it provides a loader for OpenType fonts and you have to implement OpenType support because today we have OpenType and tomorrow we
might have something else. So this is what I started with, we provide the means and other solutions. It means that if you look in the context distribution, the mark 4 part of it, you will find a file of I think some 18 kilobytes of OpenType feature processing code.
The nice thing is that because it's an open system you can still tweak and deal with all kinds of things. For me it was a good way to explore OpenType fonts and also to come to the conclusion that a lot of stuff in there is pretty undefined, undocumented and messy.
It's also fun because I like to do things with fonts, this is fun programming. Okay then after OpenType fonts you start thinking about yeah but we don't have that many OpenType
fonts, we have a lot of Type font fonts, at least on my machine there are lots of commercial Type font fonts, how to deal with them? And that was the moment when I realized that I could kick out TFMs and I just started reading the AFM file,
mapping the AFM file on the Unicode range and just treating it as basically open type font. That was when Tag already mentioned the more effective font inclusion mechanism where we're taking care of and in principle you can use Type 1 fonts in their wide inclination, they are not limited to 256 characters. As a result I could kick out
font encoding which is a rather substantial part of I think any markup package, all kinds of definitions, when you map things fall back, the whole crap is gone. The only thing that we
still use TFM files for is Mars and that's only because I'm still waiting for that guy on Mars to show up and after that we will kick out that kind of stuff too. I'm already preparing for that. One of the things that you may wonder is, okay, so a lot of things are done in Lua, what is the
impact on the runtime? I keep quite close track of how many time is spent by Lua and doing things and you really must imagine that we are talking about lots of things. If you are looping over the
note list, let's say considered for a moment the whole page, you are normally talking about some 4000 characters plus a bunch of glue and other things and if you look what I'm already doing, you're talking about hundreds of loops over these lists
which may concern hundreds of loops over individual notes of those lists and if you have complex fonts like Sapfino, you're talking about maybe thousands of loops over each character looking forward and backward because there is contextual lookup and there may be hundreds of lookups per
glyph that may have a potential match. So we are quickly talking about millions of, many millions of operations on notes in the scripting language and it's really incredible how fast Lua is.
Okay, we have optimized over time a lot of the interface into that kind of stuff, but it's surprisingly fast and for me that's quite promising because it means that there is a lot of room to where you can waste time.
Another nice thing is, I imagine that you have this font and it doesn't have all the composed characters. One thing that you can do is you can just build them. If you have to write data at hand and one of the things in the context market for distribution is I think by now two and a half megabyte file with specific data about glyphs and characters
and things like that, mostly take from Unicode plus a bit more, you can say okay I have a U and I have an Umlaut so I can combine the things into a UUmlaut or whatever. So this is really something that's new. Here we really start doing things that
could not be done before, well you could do that but no user was doing that on his disk I think. This is really new stuff. A really nice feature is that the best example of what we
already have is fonts. If you have a font, tag more or less remembers for each character what the current font is. So this means that if you put something in a box and you unbox it, it's basically frozen. It knows what fonts are used for the characters in that box.
Color, there was much talk about these interfering Wotsits in English. Color is normally implemented using Wotsits or marks or whatever kind of stuff and this is actually an interfering thing. So there are directives in this stream saying okay here we start a color, there we finish
a color. Now this is different from fonts, for instance if you copy a box the surrounding font definitions is applied to that box. So those are slightly different mechanisms. Fonts is a rather unique mechanism in that sense. So at some point we thought well it would be handy to have something more generic than fonts and then we came up with attributes. Each node basically
can have an attribute and they can be basically an enormous amount of attributes and an enormous amount of values of attributes and they carry or are carried along with these nodes, with these prints and glyphs or whatever and they are behaving like fonts. So
when okay this was very promising I started well let's first start with doing the redoing of the color mechanism and because it's a nice candidate for that and that works quite well because you get rid of the interference things. On the other hand you must realize that
this introduces incompatibility because as I told you if you have something like a special or whatever saying go red you copy a box then the box content is really done red. But if you talk about attributes just as like fonts it's basically frozen, it won't get red
because well this attribute telling that the stuff should be red is not applied to that. In practice this is not a problem but this is one of the things where a micro package writer might have to redo some of the things that used to be done, especially things that
don't have a lot of efficiency. Like I have a box I copy it ten times or I reuse it later on well maybe it's better now to regenerate that content. So this is really I think the only functional extension to tag attributes and it's really handy. I use it all over the place.
For instance one of the things that has been bothering me and seems not only me for a long while is vertical spacing of course. And the only nice thing about XMLFO is that they have a rather advanced vertical spacing model combined with penalties and weighted
spacing and things like that. So I had a model for this laying around already for a long time and I've now on and off been start playing with this but this is again one of the areas where just like in the perfect builder the hibernation kerning ligature building was really split
we are now in the process of splitting off all the other things in Tegh into clearly separated stages because if you want to implement this kind of stuff, well something is added to the vertical list, well what happens exactly? Maybe Tegh has some secret pointer pointing to something and you change something and the whole thing you certainly end up in a loop.
That's the kind of nasty stuff we have now and we are now in the stage that we are going to do that kind of stuff. This is again something that may have quite an impact because it might give better output but incompatible with the old stuff.
Just to give some examples of where context underwent some fundamental changes or not so much changes but extensions. Take for instance handling XML. In traditional Tegh you can only handle XML basically by making characters active and treating it as a stream. One of the first
things that I implemented in Lua just for fun as an exercise is an XML parser and I found that it was so fast that I said I have to do something with this and now in mark 4 we have a tree
based XML access so we also added things like path searching in the tree so I can really load a document and access each element in such a tree, do things with it, we shuffle it, manipulate it and say for instance the table of contents now becomes something like give
me all the titles under the section elements, things like that. The interesting thing of this is I can't show it because I have no running machine here. This is something where it has some mind boggling aspect in it because what happens there? You imagine
you have the 6-in-0 tree and at some point you say ok now I trigger off basically flushing this whole thing. What we do is we associate actions with elements and an action can be that something is printed to Tegh which itself can trigger an action like filter
this and that from the tree. So what you basically get is one big because the Lua command is completely expendable. What you basically get is one really big expansion going on there, going back to Tegh, calling Lua, typing back to Tegh, calling Lua again
and then sometimes you have this situation where what's actually going on there you can really interesting side effects but sometimes you get a question to that and this is for me personally one of the reasons why I can spend time on Lua because this is the kind of stuff that we can use actually in projects. So I gain back some time there
that I can then waste on Lua. So one of the things that has been re-implemented using this trickery is MarsML. We have MarsML3 support or at least we are working on it and it is a way cleaner solution than it has. Image library was another important
thing where we really kicked out lots of code from the context kernel and replaced it by image library code. Not so much in this case scaling and all that kind of stuff because we have rather advanced scaling models but things like searching files, checking if file names are correct, cleaning up all kind of mess that's way more easy to do in Lua
plus managing resources. On my agenda is currently a complete re-implementation of everything that deals with sectioning, numbering and lists. We already carry around what we call
multi-passed data, the stuff that goes into auxiliary files, basically there is one auxiliary file in context. Everything is kept in the word tables and then we can carry around a bit more information and it is faster and things like that. And this
is again one of the things where you really replaced lots of code. Traditional code which has been implemented in, well Frank mentioned it long ago, you have to be really resourceful with that, has been implemented in ways that you sometimes think yeah I know it works and it does something but what is it doing? Well we build upon build upon build upon stuff.
That also means that it is quite a job to rewrite this kind of stuff in such a way that it is still compatible. But at the same time it becomes way more powerful. I'm also replacing, and that's actually starting now, so we are now a few years down the
road and only now I start doing the nice thing, replacing the typographically related things that have been bothering me for long. If we have time I can give a short example
of that later on. So let's summarise what Lua means for the macro package writer because that's what I'm basically talking about. Well you can just use Lua as a language that can do some calculations for you, hey you want the number P in your document and you
can use Lua to print P in your document, or the value of square root or whatever. That's actually I think what most users will, the users will use Lua, the Lua stuff in Lua dev that way. You can get a bit of information about what dev is doing, registers things like that and act upon that. I can imagine that Lua, that user still uses some
of that stuff, they use Lua counters and OTEC counters for doing certain things. I
leave that open. You can pass data to Lua, so as an example this is XML stuff, you just pipe stuff to tech that you have manipulated before and manipulating is more easy in Lua. Another good example is common separated list. It was one of the first things
that in context users start asking, oh can we now have finally good trustworthy common separated list, where you always have this problem with nested quoting and things like that. And it's easy to parse in Lua, pipe it to tech and do something with it. You can replace components, that's what I was telling about, the whole file, searching, opening, reading
or everything related to file, even logging, so we implemented the logging stuff. It means that you can, if you set a switch, you won't get a tech error message, you will get an HTML page popped up showing you the message and the complete environment about
where things happened, things like that. Or your whole log, there's another switch that can say I want the whole log in a well-formed nix-nl file so that you can really parse it and do things like that. So that's also done.
More fundamental changes, as I mentioned, were the font changes, so then it really gets visible that Lua is being used. Well, I can skip this one, but the whole bidirectional thing and complex open type stuff. One of the driving forces behind the LuaTech project
is the OrientalTech project. Actually the most demanding users are those who want to do a rapid typesetting, because I think then more or less all elements come together, also the directional stuff. So quite some time goes into that.
Next step is attributes, as I mentioned. Re-implemented things like color, but you can use attributes for basically anything. Re-placing type, really components of the tech system itself is just another step further, and that's the stage we were adding
now. Nobody else is harassing me about that issue. Just think, no, I think if Dom
says something, you can. And that's something that, well, I think then we are three or four years down the road, when we really start adding stuff to the core engine, like dealing with multi-column stuff, etc. It also takes some kind of time to get into
the thinking of what we can do. It could be that a different mind than a completely different thing that I hadn't thought about today. So this is more or less a summary of the current state of what I do. Just to give
you an idea, I don't have my machine, but this is my directory of showing how many Lua files I have. Here's some format of code. Okay, cheating a bit, because there
is the big character down for definition. You can imagine that loading all that kind of code at run time is not really an option. This is why quite early in the project mechanism, there are actually new type of edges or whatever that could load and store
Lua byte code, which is then stored in the format. But even then, loading data, you might think 2.7 Mac is a big file. Well, the open type table of, say, error type of
Sapfino is easily, in just plain Lua table format, 10 to 16 megabytes. In byte code, only a few, and this is all kind of stuff that we catch. We keep a catch of things, and then loading becomes really fast. Anyhow, so this is the amount of, currently, the
number of Lua code. So the specific Mark 4 code, that's not that much, 272, and that replaces 498. So what you basically see is you get less tech code and more Lua code.
Eventually, because you want to do more, you get more of both, I think. I want to show two files. Now, if you know context, and I hope not that many people here know
the really internal details, one of the features that have been around for a long time is that if you have, one of the really core commands is the command that can frame something,
because you can hook in all kind of codes there. And one of the things that it can do is automatically calculate the width. So it can say, okay, there's some content, make this, let's say, this area that we are dealing with as small as possible. And one way to implement that is by, well, you have something in a box and you start
decomposing it. And then, of course, you have the problem of all the interfering stuff that's there, so that in the worst case, it stops halfway and you cannot do it anymore. In practice, it's not that bad, but, well, it's not perfect. So one of the things I was thinking, wait a minute, I can do it in Lua. In Lua, I have just a certain moment,
you can have, say, I have a box and I can look into the box what's there. So, wait a minute, this, I've opened the same file twice. So there was a command
called v-shaped frame box, which was actually calling, I don't show the other code, calling some tech macros, which is, you know, to do it. And now the same macro is a call to Lua. So what you actually do is you, in context at least, if you implement something
in Lua, you put stuff in a Lua file and you just call a function. This is an example. We pass the box number as an argument and I want to know the minimum width,
actually the possible minimum width of this box. These are just some speed ups. Here I loop over all the horizontal lists in this box and I check the width. And I do that by actually repacking the box to its natural size. That's what the hpack,
the node hpack does. And in the end, I know the minimum width I need to take. And in the next loop, I'm going to set the width of all the already typed at eight lists in this thing.
And as a bonus, I can report back the number of lines and the last line lengths. The amount of code that does this is really small, as you can see, and it's way more readable and definitely more robust than the tech code where you start looking back and back
basically in reverse order and hope that nothing bad happens. This is what I mean with the more typographically oriented extensions. Well, unfortunately I didn't have that much time to spend time on this kind of stuff,
but eventually we will be there. Imagine that you are going to do column balancing. You can just do it by looking over a list, collecting data, trying to find the best point by taking this big thing and more or less trial and error on splitting it at places that work out well,
but you can just do some analysis around it. Maybe even move stuff around. If you use attributes to tag something as being a graphic, you can basically say, okay, I can move the graphic around because I know it's a graphic. So, well, I hope this gives a bit of an idea about what we are heading to
and what eventually will be, in this case, context mark 4, a mixture of tech and Lua things. And I think I can safely say that sometimes you hear people say, oh, this will become a man
and it doesn't change that aspect. Users can still measure out. You can look at the code and change something and you may pray that it works well. In other places that will not change. But what will change is that the more complex macro package writing becomes more convenient, I hope at least.