We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Do we need a Cork math font encoding?

00:00

Formal Metadata

Title
Do we need a Cork math font encoding?
Title of Series
Part Number
11
Number of Parts
33
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceCork, Ireland

Content Metadata

Subject Area
Genre
Abstract
The city of Cork has become well–known in the TeX community, ever since it gave name to an encoding developed at the European TeX conference of 1990. The Cork encoding, as it became known, was the first example of an 8–bit text font encoding that appeared after the release of TeX 3.0, which was later followed by a number of other encodings based on similar design principles. As of today, the Cork encoding represents one of several possible choices of 8–bit subsets from a much larger repertoire of glyphs provided in font projects such as Latin Modern or TeX Gyre. Moreover, recent developments of new TeX engines are making it possible to take advantage of OpenType font technology directly, largely eliminating the need for 8–bit font encodings altogether. During all the time since 1990, math fonts have always been lagging behind the developments in text fonts. While the need for new math font encodings was recognized early on and while several encoding proposals have been discussed, none of them ever reached production quality and became widely used.
Projective planeGroup actionSet theoryMathematicsBeta functionComputer fontBuildingSoftware developerQuicksortCodeCuboidTraffic reportingSymbol tableCore dumpPrototypeMeta elementLatin squareStudent's t-testSubsetResultantStandard deviationSimilarity (geometry)Formal languageComplete metric spaceAuditory maskingFamilySelf-organizationGeometrySign (mathematics)EntropiecodierungCASE <Informatik>Fraction (mathematics)MassScripting languageoutputBitConstructor (object-oriented programming)Vertex (graph theory)Phase transitionArrow of timeRevision controlTable (information)Extension (kinesiology)QuarkAlphabet (computer science)Endliche ModelltheorieMultiplicationAreaLimit (category theory)Probability density functionPoint (geometry)ImplementationSubstitute goodType theoryPrice indexRaster graphicsDifferent (Kate Ryan album)Arithmetic meanShape (magazine)Discrepancy theory2 (number)Physical systemPlane (geometry)VirtualizationExterior algebraPosition operatorFlow separationMacro (computer science)Rule of inferenceParameter (computer programming)Multiplication signDirection (geometry)Correspondence (mathematics)MappingAdditionSoftwareOperator (mathematics)Dimensional analysisSpacetimePay televisionMatching (graph theory)MereologySinc functionWritingFisher informationClique-widthFields MedalData structureMetric systemOcean currentVolumenvisualisierungFile formatContent (media)Game controllerLatent heatComputer programmingLevel (video gaming)Office suite1 (number)Task (computing)Installation artInterface (computing)Text editorBlock (periodic table)PlotterForm (programming)DataflowExecution unitUniqueness quantificationSampling (statistics)Insertion lossSummierbarkeitInheritance (object-oriented programming)WhiteboardData managementWebsiteView (database)Open setInstance (computer science)WordVideo gameSystem callFunction (mathematics)Computer configurationArithmetic progressionContext awarenessVector spaceGoodness of fitGraph (mathematics)Right angleVarianceCircleAutocovarianceControl flowError messageTheorySocial classOpticsSequenceHypermediaMaxima and minimaPhysical lawElement (mathematics)Variable (mathematics)Computer simulationBridging (networking)Consistency.NET FrameworkXMLComputer animation
Transcript: English(auto-generated)
When I proposed this talk, I had this idea of taking the return to Quark as an opportunity to review the situation of mass fonts. As most of you will know, at the previous conference in Quark, the Quark encoding was developed, which had an enormous impact on the development of text fonts.
On the other side, there wasn't quite such a success in the development of mass fonts since then. There have been some projects working on something that haven't really succeeded,
and there have been some other developments going on. So what I did was sort of write a review paper. Review a bit of the history, what has been done, what has happened. Review some of the recent developments and also look at what the new developments could mean for future developments.
So I will be talking about font encodings, particularly about Unicode and Unicode math. Also be talking a bit about font technology and the impact of OpenType and OpenType math.
And sort of the basic theme could be summarized as new mass fonts for new tech engines. So let's start with history. What was the situation back in 1990 at the conference in Quark?
At that time, Teich was undergoing a transition phase. As Frank told us yesterday, they convinced Don Knuth to extend Teich, and then Teich version 3 was released in the beginning of 1990.
Brought with it the introduction of support for multiple languages, 8-bit font encodings. At the same time, some other developments were going on that many European user groups were founded at that time.
And the European users were really interested in making use of the new features. They wanted to have better hyphenations for their language, they wanted better font support for their language. So they actually started to work on fonts and font encoding. And somehow it happened that at the conference in Quark, they came together, worked on a font encoding.
It was named after the conference site, and in this way, Quark became famous in the Teich community. So it had some success.
First of all, it was the first example of a new 8-bit font encoding. It became the official model proposed by Leiteich for the organization of similar 8-bit text font encodings.
It provided support for many European languages. It was the beginning of many developments in font encodings in the last two decades. It certainly did a few things right, like it included a complete 7-bit subset of ASCII as a common denominator.
It used a consistent encoding for all the font shapes, so no such weirdness like a dollar becoming a pound sign if you switch to italic. It had a consistent way of organizing the upper case and lower case codes, which is important for hyphenation.
It had no interdependencies between text and mass fonts, but that was only because there really weren't any mass fonts for this at that time. On the other hand, there were also some shortcomings.
The Korg encoding did not really follow any other standards, so it did it all itself. So it created a discrepancy between the input and output encodings, which we have been dealing with in the 8-bit text systems for many, many years.
It was sort of solved only a few years later, when later H2E introduced the input enc packages. Until then it was quite a messy situation. There was also the question that Korg tried to support many or most European languages, but it couldn't support all of them in one 8-bit font table.
Although the original idea might have been to create one new standard font encoding, they ended up with many local encodings, the Polish QX and whatever, being used besides the Korg encoding.
This wasn't really solved until a few years ago, when the Latin modern fonts were created, which at least made it possible to support all the various encodings with the same set of fonts.
Also, there was no room for text symbols in the font encoding, so there was a need for supplementary encoding. Also, the Korg encoding didn't consider the glyphs that are commonly available in PostScript fonts, so when you take
a PostScript font and want to make it available, you still have to fake some glyphs with a virtual font. Then you need two virtual fonts on top of one real font to get all the characters that are in the font, so there were alternatives like the LY1 encoding proposed before.
So in the end, we've ended up with a big mess of font encodings. In part, the Korg encoding was adapted and followed by additional Tn, was supplemented by a text symbol encoding, there were many other local encodings used besides it.
We learned over the decades that no single 8-bit font encoding can serve all the needs. We need several different 8-bit font encodings instead, or we move beyond the
8-bit limit and towards Unicode and OpenType technology, and then it all gets better. So, with the Latin Modern fonts and the Teixeira fonts developed in the past few years, we have a consistent implementation.
We have a full glyph set provided using the Unicode encoding and OpenType font technology, and there are many subsets created from the big font using 8-bit encodings and Type 1 technology.
So, what's the situation today? I would say Teixeira is going another transition phase. In recent years, we've seen that PDF has largely replaced DVI and PostScript output, scalable fonts have replaced the bitmap fonts, and now we are starting to adopt Unicode and OpenType to replace the 8-bit fonts.
We have the new font projects in the past few years, and we have the development of new Teixeira engines. For text fonts, we've already made quite a lot of progress, but unfortunately for Mars, we're
still missing everything. We're still basically using 7-bit encodings that were created 30 years ago.
So, what about Mars fonts? When the 7-bit fonts were created, the text fonts and the Mars fonts were created at the same time because there was a need for them, because Don Knuth needed them to typeset the art of computer programming. He couldn't do without Mars fonts.
When the 8-bit fonts were developed, it was driven by the need for European languages. At this time, the users needed them, but it was recognized that there could be a need for 8-bit Mars fonts, but it was not driven by the user needs. They could still go on and use the 7-bit fonts and typeset Mars perfectly well.
So, the reasons for developing new Mars font encodings were, from the implementer point of view, you want to remove interdependencies, improve the organization, maybe being able to support more symbols in the same number of fonts, but these are not reasons that the user sees.
It's more like for us and not for the end users. So, there have been several projects going on since the time of the first core
conference. There was, as Frank told yesterday, the LaTeX project sponsored with some research work. A student working on Mars fonts for several months during 1993 developed a proposal how
Mars fonts could be organized, but unfortunately, after a few months of activities, nothing happened anymore. Then it took several years until the proposal was taken up again.
At that time, when somebody started it, I joined in and then we worked on it for several months. We had some prototype work based on some meta font work to create new glyphs, some font enc work to re-encode, put together various pieces.
We presented the results at Eurotake 98 in San Maro and, well, as John said, at that time he criticized, why are you doing 8-bit font encodings at all?
At that time, probably we weren't ready for Unicode yet. Then it happened that after the conference, all the activities were sidetracked by other developments. These were the efforts to bring Mars into Unicode. Barbara has given some talks about this at previous conferences.
Basically, there was, first of all, research work to collect what constitutes a Mars glyph or what are all the symbols that should be encoded. Then the committee work of getting the symbols accepted.
I think it was starting from Unicode 3.2 that Mars was included and has been since then. There was a related project to develop a set of reference font based on the Unicode Mars Encoding the Sticks font.
This project was driven by a group of scientific publishers, the Sticks group. They finally released the first beta in the end of the past year.
There is expected to be a second beta sometime this year. So, they have now all the building box ready of all the glyphs that are in Unicode Mars. It is still the question of how the font should be organized to make it usable and how to provide a text report for it.
Then it happened that during all the time when the text community was sort of waiting for the Sticks fonts to be finished, outside developments have moved on. Microsoft has put Mars support into Office 2007 and they did so by extending OpenType font technology.
Microsoft is one of the companies controlling the OpenType specifications.
So, they basically created a new table which has all the information about Mars and they commissioned the development of a font, Cambrian Mars. I think we will hear about it in the following talk. It is quite impressive what they have done.
Turns out that many of the concepts they used were based on ideas from tech. They followed the model of tech for many ideas. Although the standard is still considered experimental and not officially published,
it is already sort of a de facto standard. It is widely deployed to installations of Microsoft Office. There have been independent font developments like the Asana Mars font.
It is supported by the FontForge font editor. It is also supported by one of the new tech engines, namely XeTech. So, what is the current situation? XeTech has started to support OpenType Mars, I think, since the end of last year sometime.
And it will become available to the public widely pretty soon with TechLife 2008. XeTech has already been available with TechLife 2007, but not having the Mars support at that time.
LuaTech is still under development, but it is likely to develop some support of OpenType Mars as well. So, we probably get support for OpenType Mars in the new tech engines pretty soon. The question is now, what about support in fonts?
There is likely agreement that OpenType Mars is the way to go for the new tech engines or for the new fonts. So, it turns out that font encodings become a non-issue. If we adopt Unicode Mars and OpenType, we do not need any more new font encodings.
It is just a question of dealing with the font technology and understanding what it all means to develop an OpenType Mars font. So, we go into some of the details.
The OpenType font format, probably most of you know some about it. It is developed jointly by Adobe and Microsoft. It might suggest it is open by the name, but it is actually a render control specification. Not much different than the previous font formats with TrueType and ProScript Type 1 and things.
Based on many of the contents of ProScript and TrueType font, it uses a table structure from TrueType. The encoding is based on Unicode.
The interesting thing is the addition of a new table, the Mars table. For software which does not know about it, it is just another optional table. It becomes only meaningful if you have software which knows how to interpret the table. So, what we need?
There are some global parameters, which are sort of comparable to the font dimension registers in the TeX Mars fonts. Things like the spacing of big operators and fractions and subscripts and superscripts.
Some of the OpenType parameters have a direct correspondence to the TeX parameters. Some of them are a generalization. You have something in TeX that, if there is a hard-wired limit, it gets smaller than this.
It uses, at minimum, three times the rule thickness or something like that. In OpenType, you have a new parameter where you can specify how much you will use in this case. Unfortunately, there are a few cases where the TeX parameter doesn't have a direct correspondence in OpenType.
So, the new TeX engines which do an internal mapping still require some workarounds. In addition to the global parameters, you also have the glyph-specific information.
In the TeX Mars fonts, in the TFM files, this was previously encoded in a rather complicated way. By overloading the meaning of some fields that the width is the position where the subscript goes,
and the width plus the italic correction is the actual width where the superscript goes and things like that, this has been cleaned up. They provide a structure where all the things can be described clearly. There is also some generalization of the concept. In TeX, you have only the superscript and the subscript on the right-hand side,
and it's a sort of horizontal displacement. In OpenType, you can define a cut-in position, say, also on the left side. For certain characters, it's moved in this match or moved out that much.
So, this is one part of it. Another part of it is the horizontal and vertical construction. This is sort of like the extensible glyphs in a mass extension font. In TeX, you have the situation of one code point, and it points to a series of next larger glyphs,
and at the end, you have optionally a recipe how to construct an extensible symbol. This matrix, the TFM matrix, in theory, supports these fields in general,
but they're only used in very restricted contexts, so they expect to have two sizes of big operators, support multiple sizes of big delimiters and an optionally inextensible version,
but for white accents, it has several horizontal sizes, but no extensible version. In OpenType, this is generalized a bit, because they are called vertical variants and constructions or horizontal variants and constructions. This is applicable also to things like an over-brace.
You can define an extendable over-brace or under-brace as a glyph, which has several sizes and extensible construction, or you can construct long arrows this way.
From the technical point, in the TeX font, all the variants used to have slots in the font table. In OpenType, this has a glyph name which is accessed indirectly, which doesn't have a direct slot in the font table. It may be mapped to the private use area for technical reasons.
Another thing, the math alphabets. In traditional TeX engines, they are organized to 16 families of 8-bit fonts. You have usually one alphabet per family,
and then the rest of the slot is filled with glyphs with geometric symbols. And then you have this mask code, which controls how the symbol behaves. For the geometric symbols, you have a fixed code in a fixed family,
and for the alphabetic letters, you have a fixed code related to the font, but it's changeable to the family. So if you switch from a roman to an italic to a script font, it's a font switch. In Unicode, the basic idea is that symbols with a different meaning have different slots.
So they have reserved in the second plane many slots for alphabets in various shapes.
The geometric symbols are in the base plane, and the alphabetic letters have slots in the secondary plane, and switching math alphabets now becomes switching code positions within one big font,
which requires a lot of work at the macro level. If you look at the Unicode mask package by Will Robertson, which does that already, then you have the thing about the optical sizes.
In Teich, a mask family consists of three fonts loaded at three sizes. Fonts coming from a metafont design usually have three optical sizes, adjusted for readability in the smaller sizes, and for a postscript font, you have usually one design size,
and then it is scaled down but becomes problematic to read in the smaller sizes. So in OpenType mask, this is all packaged within one big font, and it's handled as a font substitution that's triggered with a feature tag.
Things like the 10-point, the 7-point, and the 5-point font all packaged in one, and it's just the optical replacement. So if you want to build something like that, it becomes a bit complicated for the font designer.
But from the Teich point of view, basically you still load three fonts, and it's just the question of whether you load three different font instances,
or you're loading the same font with three different feature sets. So, yeah. Okay, OpenType features. I can skip this one too. For the summary, okay.
The question is, what will be the future of mask fonts? Unicode mask has already replaced the need for new font encodings, and OpenType seems to be the best candidate for the font technology,
Z-Take already supports it, LureTake is likely to follow it, and the question is, what happens about the development of new fonts? I think there are some indications that they want to do mask fonts for Latin Modern and Teich Dreyer, but they're not specific now when it will happen and how long it will take.
There are certainly many challenges involved for the font developers. So, first of all, it's the scope of the project. You have many new symbols to design and many different mask alphabets to package into one font.
The font will be very big, it will extend across multiple 16-bit planes, so even just going to 16-bit is not enough, you need more than that. You have to package all the various designs into one font,
you have to package the size variants of the extensible glyph in the same font, you have to package the optical variants, then you have to figure out which tools to need, how to create the parameters in the mask table and how they correspond to the font dimension parameters,
how to include all these glyph-specific metrics to replace the things that were previously hidden in the TFM fields, how to develop the vertical and horizontal constructions or the font substitutions for the optical sizes.
If we want to do open-type mask fonts for Latin Modern and Teich Dreyer, there's a lot of work ahead and it takes probably a lot of work even to figure out how it all works, how to do it.
We can say that the previous conference in Cork in 1990 was the beginning of many developments in the area of text fonts, so maybe with a bit of luck we may see the beginning of new developments in the area of mask fonts at this time.
Don't know if it works out, we shall see.
Okay.
Which one are we?
The guys at Microsoft reckon that over the last 15 to 10 years they've been doing this work, probably between 10 and 20 years put into the mask font stuff.
That's not including designing the font itself, it can be a font. So there is a lot of work to do. And also they don't claim that that map table is generic.
You come up with a different font design, you'll need more fonts or different fonts, it looks like they needed different ones from the John's ones which worked as an ongoing and intimate task.
You must have one big point in this summary table, which is interfaces, it's useful to have a thick font, it's useful to have a uniform map, but you might still want to have higher level interfaces.
You mean at the macro level? Yeah sure, probably need them, but that's another topic.