We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Parallel Typesetting

00:00

Formale Metadaten

Titel
Parallel Typesetting
Serientitel
Teil
20
Anzahl der Teile
33
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortCork, Ireland

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
We present the general mechanism by which logical content, arranged in multiple interacting containers, can be typeset into a set of visual substrates. The overall algorithm is iterative, with the successive iterations refining a multi–dimensional context that parameterises the behavior of the algorithm. Each iteration consists of three parts. First, each visual substrate is informed which parts of which logical containers are to be placed thereon. Second, in parallel, the content placed in the substrates is typeset. Third, the resulting layout in each substrate is assessed for goodness, thereby resulting in the refinement to the overall context. In the talk, we will present the theory and the practice behind this algorithm.
MultiplikationsoperatorSelbstrepräsentationKomplex <Algebra>Deskriptive StatistikForcingProzess <Informatik>SummierbarkeitGamecontrollerNumerisches VerfahrenDreiecksfreier GraphElement <Gruppentheorie>PunktVarietät <Mathematik>RechenwerkQuick-SortSoftwareentwicklerAffiner RaumTaskNebenbedingungVorzeichen <Mathematik>ProgrammierungDistributionenraumGruppenoperationBeobachtungsstudieHeuristikExtreme programmingWeb-SeiteARM <Computerarchitektur>FrequenzTelekommunikationFigurierte ZahlSpieltheorieAuswahlaxiomLeistung <Physik>BitratePhysikalisches SystemKonfiguration <Informatik>MereologieBildgebendes VerfahrenWort <Informatik>CoprozessorSchlussregelSchreib-Lese-KopfEndliche ModelltheorieInzidenzalgebraTopologieIterationSchnittmengeFlächeninhaltMikroprozessorFamilie <Mathematik>MAPBitGrundraumDifferenteResultanteTeilbarkeitRechter WinkelTermInformationLeistungsbewertungFormation <Mathematik>Gewicht <Ausgleichsrechnung>AnalysisGlobale OptimierungSoftwareMonster-GruppeAggregatzustandKernel <Informatik>MomentenproblemThreadSpezielle unitäre GruppeRadiusMinkowski-MetrikFunktionalEreignishorizontRelativitätstheorieArithmetisches MittelPhysikalischer EffektLuenberger-BeobachterWechselsprungStandardabweichungGeradeEin-AusgabeStrategisches SpielHeegaard-ZerlegungMehrkernprozessorAlgorithmusVisualisierungInterface <Schaltung>Ausdruck <Logik>Ultraviolett-PhotoelektronenspektroskopieFlash-SpeicherFunktion <Mathematik>EntscheidungstheorieOrdnung <Mathematik>MathematikParalleler AlgorithmusArbeit <Physik>HalbleiterspeicherLinearisierungOverhead <Kommunikationstechnik>AbstandProgrammbibliothekBenutzeroberflächeGrenzschichtablösungVersionsverwaltungEinfache GenauigkeitSystemaufrufCASE <Informatik>Lesen <Datenverarbeitung>Güte der AnpassungPhysikalismusEchtzeitsystemInteraktives FernsehenPhysikalische TheoriePeer-to-Peer-NetzImplementierungATMMerkmalstrukturMathematisches ModellPaarvergleichComputeranimation
Transkript: Englisch(automatisch erzeugt)
Yeah, so now we have the different internal representations, so we've got the galleys and we've got the iterators between them as well in the AVMs. So now we need to actually take that and create the document somehow, and we want to do it fast because we're all busy people, and we also all just bought expensive dual-core processors and we want to make use of that as well.
So I'm going to describe an algorithm of taking a representation and in parallel converting it into the actual representation of the document. So there's two levels of parallelism, the more obvious easy one and a more tricky one.
So the obvious one is usually you can split a document up into sort of very discrete chunks that don't interact at all. So a book can be split up into chapters and they can be typeset at the same time without having to worry about how they interact with each other. One chapter can be typeset in one way and the other chapter doesn't need to worry about
anything about it really, they're just going to be typeset at the same time. So yeah, that's the easiest way of threading to start off with, just split up the document up into different parts and typeset them all at the same time. So within these chapters you'll have things that do interact, the different elements, you've
got an image, and they do need to worry about how they are placed relative to each other. So you typeset a paragraph, it can come out a certain size and that affects how an image is placed and so that needs to worry about where it's placed on the page and so on.
So basically what we do is that we split all the visual elements up basically and they each decide how they're going to affect the, they basically argue about where they're going to be placed on the page. So what happens is we split it up into the visual elements and then we start off
just placing them out in a very rough description of where they're going to be on the document. So very arbitrary, we try and get a half decent but the point is to do it quickly, repeat the document. And then based on some basic heuristics perhaps, we apply forces between them.
That way a paragraph can try and force down a picture because the paragraph needs space or the picture could need more space as well and try and force up the paragraph and so on. So basically it's based on the M-body problem and the idea that you've got all these different
elements and they're applying forces to each other and the forces are dependent on what makes up a good document, what makes a document look nice. And doing that in parallel by having each element on the page, each visual element sort of handle itself with its own thread, looking at the other elements around it.
So it's basically three stages, so basically we do a really rough typesetting algorithm and then to just do it quickly, chuck them down on the page and then we go through and we look at how it looks. If they're overlapping that's obviously bad, like if you have two elements overlapping
they should have a force between them that pushes them away. If you have say a paragraph over the end of a page and only one word on the next page you should have a force pushing that paragraph back up again. So then you have all these interacting forces and the forces cancel each other out and you basically, they move around depending on these forces pushing around the page,
arguing about where they want to be. And then you basically go through iterations, each time these forces move you apply more forces depending on the quality and you just keep iterating through until you reach some sort of stage that you're happy with the document from now on. That might be after a certain amount of iterations or it might be after it stabilises.
So to calculate these forces you basically just look at the, there's hundreds of different ways you can calculate the forces depending on the requirements of your document. So examples could just be overlaps, as I said before, looking at the, looking at
going over the end of the page and so on and also looking at the user settings as well. So you could apply different weights to the forces depending on user settings and so on. So once you have this algorithm, yeah, so there's some nice theory and then there's
some optimisation as well. So generally speaking you don't actually want to create a million threads when you're doing a program because, as I said, you've got a dual core processor, you usually only want to create two threads because anymore it's just extra overhead and when you're introducing
these forces and so on you actually are doing more work, it's just that you're doing it at the same time and that's what makes it faster. So you usually just only want to parallel down to a certain depth of the document and then only do it sequentially from then on. So at the moment we only have dual core processors so you might actually only split on the chapters
but as we go into the future and we have sort of a processor with 32 cores and so on then you start getting speed ups of 32 times by splitting up down further into the document. So distributing it over a network, just rough ideas.
The two obvious ways are master-slave mode. So the master idea is that you have a master and it collects all the results and it calculates the forces that get spread out back to the slaves which are then sort of typesetting their part of the document. And then peer-to-peer, that means basically there is no master, they all just look at
their own parts of the document and then once you've finished typesetting you send your results to every other peer and then they have to calculate the forces themselves. So there's a bit of duplication of work as in they're all calculating the forces over again and also there's a bit more communication and peer-to-peer, well a lot more communication and peer-to-peer. So they both have different good things and bad things.
So in comparison to sequential overall there is more work done. But yeah, fun can come at least. So the work is all done at the same time so it should be faster in the end. If you've got 32 processes you might be doing more work but you're doing 32 jobs
at the same time so it ends up being faster. So sequential is generally just really rough. The complexity is the number of elements or the time complexity. In parallel then it's the depth of the tree or the amount of chapters plus the number of iterations, which one.
Memory usage is a bit more of course because you're doing everything at the same time. So the general idea was just you split the document up really roughly and then as you go through these iterations fixing it up in parallel. So you split it up into these pages then you apply, in this I gave the example of
forces but it really could be any algorithm that fixes up that page, goes through the iteration, rates the page and then keeps going through these iterations rating the pages until it gets something it's happy with. So yeah, that's it. Do you have any samples?
No, not really. You'd also probably have to have higher level iterations as well.
So you'd do a whole chapter and then in parallel there's the algorithm by just talking and then you'd have to have a number iteration that does the cross referencing.
It's theoretical. It's not only theoretical in the sense that it's labor aware, but it's also theoretical in the sense that there is no planned real time to develop
and create new users. I think now interfaces, right now we're saying yes we have specific examples.
For example, when we type in later, the footnotes are typed in lines.
They're standard poems in input.
It's not like it's complete. How most of the world does it is what John is saying. Don't worry about it. In other words, this is a common formula that sort of subsumes a lot of stuff that's already been done
in a very easily imagined, adapting, existing interface.
I actually wish to pick up on what John said about Paul Svoboda's work. It seemed to me what he'd done there was a very nice implementation of a non-existent mathematical model. I'd say the opposite, maybe, around the theory.
Just for that particular thing. That was quite unusual. We didn't have a model. I think the model...
It's interesting, but it's also... Johannes just said, what about those kind of interactions between the parts that you parallelized, which are there. It's not that checkers do not have relations of other checkers.
That might be solved with those preferences. What I'm getting worried about is you seem to think that those kind of parallel processes in the way you indicated is actually something towards a single goal.
I'd like to advise all readers that the full library is not closing. The full library is not closing. I would also like to remind readers that the 24-hour reading room will open at 4 o'clock each day.
4 o'clock each day, thank you. I just mean I don't have the answer. Well, what I see as potential real big problems in this approach,
especially for what I would call something like impossible documents whenever you start posing constraints of some sort. The most simple example is something like you have cross-references with Roman numerals, so whenever you try to transfer your document,
the number changes to a longer sequence, and therefore the cross-reference jumps to a different page, you have to re-types that and it jumps back because Roman numerals are not in a linear fashion in terms of spaces. Now, talking about this kind of parallelism,
you will encounter more kind of these kind of jumps and problems. The suspicion here is that if you try to do that, you will find extreme jumps between what is the last current state of your course lines to something that jumps to a completely different state.
So you end up with potentially very large cycles, not something between those kind of two strategies you say, okay, we give up, we take one of those, but large ones. And there I wonder how you want to be able to identify those kind of things
being what I call impossible documents, and you need to change your requirements, otherwise you can't talk to them. There are ways in physics to deal with this. You just add some very small random force to it,
and if you go to the cycle attractor, this random force might drive you to another attractor, which would be a nice point. It means that if you think that this is a cycle, you just add some random things to your forces, and then you might end up with some nice…
The problem what I'm thinking is more actually recognising these large cycles. You might need to have… If it doesn't converge in half an hour, you add them. Right.
In extreme situations, it might be that to analyse a specific page, the only way you can do it properly is exhaust the search. If you're only looking at a specific page,
probably not, you can kernelise your problem. The stabilising function might have to be made very customisable as well,
so it could be adapted to different and more complex problems. It's a plain symbolism, the fact that Knuth and Flash doesn't produce proof. I am in favour of research into pagination,
both automated and semi-automated. For that matter, by hand, we need to develop an aesthetic in order to be able to tune software. I'd like to say something from my own experience, and it's from the books I presented very briefly in my talk earlier, which was that chapters very often can't be dealt by themselves.
Whether a chapter opens on a rectum or a versa has a big influence because it affects the spreads. That's the first thing. Well, chapters here, it's an example of documents, and documents often can be split up into different discrete entities,
they might not actually always... There's an important point here, which is that when you've got a chapter, you can cite it with a rectum, you can cite it with a versa, it might be ten pages, it might be eleven, those are the only choices you've got. That wouldn't be a chapter in this, we'd be talking about chapters where you're with cheapo publications.
Leave that to one side. And I view it in all...
Each element of the page, each element of the web, even if you don't know where to start, I want to say, which I think is the same as what Frank said,
is that, again, in my experience, I found that certain chapters would give me two solutions,
neither of which were completely satisfactory, but I had to choose between them. And what I would want in that situation, here are the solutions I've got that aren't so bad, and you choose. That could be one of the options.
If you cannot stabilise, it can come down to a usage of choice. What I'm worried about is when it does stabilise, but it stabilises on a really bad solution, rather than a not very bad solution. There may be two very big figures that really fight each other
over a distance of several pages. Each one of them wants to be properly placed, and you can't do it. A very nice example of that, in the earlier versions of the program, you would find the outcome,
and you still have the problem with something, but that was in the pre-82 version of the tag, the solution that volunteered. So any automatic means that it resolved an arbitrarily bad...
The point is, if you end up stabilising on a single solution, and it's often the case that you actually have to stabilise
in the sense of, here are four solutions, I can't do it better with that, we have to do some intervention, there's one way to do it, the other is to go back and say, maybe you can try on twisting, but my suspicion is to twist it in a small scale,
it's not good. Well, at least it would break the... Most probably didn't, that you almost get the same result. You're guaranteed to get different results, but what can you do? What I see is that you currently,
in your parallel part setting in the courses, you are currently doing kind of constraint-based programming. Are you aware that in the 90s, this was also used for user interface design quite often, which is kind of laid out for automatically output constraints, and then you develop a theory,
what kind of constraints could be solved in, well, in more than a time, so the decision really, and what could not be solved. So if you go on further in your work, it might be sensible to check out the work of Kat Meyers at Cornell K. Miller University,
and you really put lots of work, and also use this for typesetting work. The head constraint-based system in this typesetting, which are, you just specify, how do you say, how the forces work together in terms of constraints, and the system then chooses other things. And it was also used for,
have you ever thought about demonstration? So that you can, so to say, demonstrate how a force shall be used, and you try to infer the force out of demonstration. Yeah, yeah, I mean, that could be interesting, yeah.