Add to Watchlist

RinohType, a document processor inspired by LaTeX


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title RinohType, a document processor inspired by LaTeX
Title of Series EuroPython 2015
Part Number 116
Number of Parts 173
Author Machiels, Brecht
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/20091
Publisher EuroPython
Release Date 2015
Language English
Production Place Bilbao, Euskadi, Spain

Content Metadata

Subject Area Computer Science
Abstract Brecht Machiels - RinohType, a document processor inspired by LaTeX RinohType is a document processor inspired by [LaTeX] and written in Python. It renders [reStructuredText] and [Sphinx] documents to PDF based on a document template and a style sheet. RinohType already implements many of the features that make LaTeX so great. Not stopping there, RinohType also tries to fix LaTeX's weaknesses; it should not only be easy to use, but easy to customize and extend as well. To minimize frustration when things go wrong, care is taken to provide descriptive warning and error messages. The powerful layout engine makes it easy to define custom page layouts. And the CSS- inspired stylesheets simplify the styling of document elements. At a lower level, Python makes the writing of extensions much more accessible when compared to TeX's rather arcane macro language. In the talk, I would like to introduce RinohType to the Python community. No special prerequisite knowledge is required. I will start off by discussing my motivation for starting RinohType development, its design goals and the currently available features. This will be followed by an example of how you can use RinohType to render a reStructuredText document to a neat PDF document, highlighting some of the features along the way. Next, we'll explore some of RinohType's internals such as the page layout engine and the style sheet system. We will explore how these can be used in a Python application to create a document from scratch. A first RinohType release was recently created. While this preview release is of alpha quality, it should be able to render most reStructuredText documents. It also includes a preliminary Sphinx builder. Please find more details in the package's description at [PyPI].
Keywords EuroPython Conference
EP 2015
EuroPython 2015
my name is a very much um do little bit more about me and learn from Belgium and learn about
30 years old and I've been enjoying computer since a very long time as you can see in the picture with colors 64 by the way uh have obtained a PhD degree in microelectronics however I haven't really been practicing tropics anymore since I graduated uh I've been programming in C and C + + invited and similar languages also for a professionally and for fun although in the case of C + + perhaps the front part doesn't really apply so was
right type it's a document types accidents time of laughter he uh who here does not know where lot sure has not used OK
couple of people but it's not necessary to understand the stock so you should be fine uh basically takes a input document in structured text format so for now this is limited to restructure text and strings documents and difference with laughter here is that this style uh and contents are more strictly separated so come back to that in a couple of slides and so taking this we structure the input file um you can start out the document elements using a style sheet which is similar to the assessors and you can choose to typeset the document in a number of formats such as a book as a template for an article laced with template you choose or provide yourself and output is a PDF format it's also possible to add all the back so that all the uh formats are supported in the future I'm and thinking of a of scalable vector graphics so that might be interesting to display in a browser then alter my motivation for starting a random time uh like you have to use loudest before and generally just in pretty good experience uh you provided with dense and it's takes care of formatting Internet rather nicely however there's also quite a number of problems
with love this is due to its H mostly I think because it doesn't relying more new technologies a typical complaints about a lot of different things warning and messages and those you can see over there I can define control sequences and followed by so the name of someone weird so that's Marko and you have no hope of finding out what what the problem is so the stakes a
lot of time to fix the so called the benchmark language is not very accessible very different from modern programming languages so if you want to
extend or customized style and out there you need to be quite to the expert annotation lastly it's also very large and complex systems this is very transparent if you want to install laughter hits for a test flight distribution which is the full distribution nowadays it's too much to be done by downloads uh and installed it it's next to quite a bit more I think there's also some smaller distributions but then you often run into problems with that missing packages so that's not really a good solution maybe I should ask if any of the people
here kind of hates loutish even though they have to use it now and then OK that's great script gives me some confidence here so we have that's my goal in the development of right outside the is it to be is for it to be escapable slot or eventually even more capable but it's very important for it to be very easy to use and these problems I mentioned in the last slide I want to work trying to sixties so it should be easy to style documents and I hope to fix this by presenting this year's this like style sheets I think there are good match for Starting documents in general and document and place should also be easy to modify or from configure at least or should also be easy to right the ones from scratch uh run
died is written in Europe item currently it's written in Python 3 uh I might do a better fortified into but sure about that said and I try to minimize the tendency so currently depends only on the pupils for passing reStructuredText uh that if you're using Sphinx you have that dependency installed anyway and it also depends on what your vitamin PNG library if you want to install includes PNG images in your document the current status I've already shown in this
slide you the lightning talks on the was the 1st release a couple of weeks ago it's a kind of early beta you could call it so it contains most of the features that are also offered by lot 1 major exception there are iterations but these are definitely on the to do list and also important is that there's very few documentation on its however there's a readme so if you want to try playing with the right type Erekat recommend you start with the readme and that should get you started along with this presentation of course furthermore I need to do lots of testing like right men using unit
tests and then fixed looks that's all I will release a new version soon that will probably
be early next week in features sneery competing support for example and that's the sphinxes on documentation which is a large document and uses most of 1st things features and it also comes with a picture style sheets so the 1st couple of slides available resent how
you can use the right type as the a kind of and you on 1 end user level and afterwards will dive a little bit more into the details but looking at establishing a mechanism for for example so if you look at trial
type and send it as an engine we feed it an input file that is
structured text documents so for now that's reStructuredText 1st being so documentation project uh all the backends could be added so just takes a lot of work I suppose in the case of markdown that work is limited uh but for don't book I think that's a very brought specification what I believe it's fairly uh this useful from like publisher so I think that's a good to from 10 to add the structured text that can refer to images which can be in PDF format or in PNG or think format so these are these bit maps are more or less included as it's in the PDF so they don't require any a long processing time so these get set to around the time and then run looks for some style sheets and document templates to determine the style of documents uh these are both in these are basically modes biting source files I might provide a text-based later for style sheets in the future to ensure security but that is not yet present uh and of course we need some forms to and this text and all baseball currently widely supported formats technologies are supported including for into and outcomes and a
nice speedier if everything goes well
so who is not familiar with reStructuredText OK so that
makes things easier also so here's an example on the rights so
it's basically plain ASCII text and its structure as the name says so you can see 2 sections and then there's an intimate enumerated list with table of Mr. inside of it a very important feature of reStructuredText is that this it is in effect and extensible so you can add new rules and directives and you can use this contrast to laughter laughter humor in the input file is the source text source file uh you can basically program anything in right type I shouldn't say stop but uh with this restriction text format so you cannot programming that's but you can still extended and on the right type sites you can implement a corresponding parts so in some sense it's still programmable and content and style is is more separated source kind of forced to make this separation and I believe this will lead to better on a cleaner documents and even if that sounds like more work than then laughter shouting it will be much easier to do this in right outside and trying to figure out how to do something and that a lot of choosing the best so like I said there is also a seen a front-end provided uh anyone who is not familiar with things OK looking so you all know this it's used for larger documents projects like the example API documentation but you can also use it for books and manuals that are not related to the source code and you can render to HTML to laughter and then to BDS already but and now it's Ronald can render directly to PDF uh using nothing but items and the pattern so I'll
show a little demonstration the 1st order and the small restriction text file using the command line tool that is included the trying of type and also
also look at it's how you can compile the things Documentation Project trained using run so this is just a small reStructuredText follow there some images and images directly open the restructured text file so this should look familiar to most of you here I have defined it just a rule for types of acronyms this will be much in the wrong about type style sheets and so these will be displayed small capital we we have at paragraph Switzerland line styling In line image
also and area of maybe I should make a little smaller so it's a
paragraph with the custom class assigned using the glosses directed we will also use this to work apply special formatting and run time is of of course something you normally would not do in a normal document is just to display uh starting features of trying to the right lines paragraph some more of these here and then we have some typical Bobby elements like lists field lists optional and it's going to block in that's paragraphs a table and some images we'll see how does gets rendered so the rhino tool is available at this moment it's only accepts a single arguments but uh I will add some more so you can choose the style sheet to apply and perhaps also choose paper page orientation and uh maybe consider page margins if you want it hey we can ignore this line for
now I will explain this
uh this line shows the error but also supporting or warnings in the run-up types so it shows and the input file that is the cause of the warning and the line number so it wants to do the exact location of where are the cause of the problem is that in this case there's a very long line in a code block because the code blocks are not lined wrapped it flows into the margin of the page and it also displays status a personal page 3 of rendered documents and read this is a nice improvement over a lot of and the 2nd line and this indicates that the 1st rendering policy has finished uh because speech references or cross references have not yet converged uh because in the 1st false you're not so you don't know how many pages documents will
consist of so we need to do another page if this references included somewhere so often the 2nd possible output is written and have a look at result so
we have a the title and subtitle author is there there is the 1st section which is automatically numbered by drawing of type you can choose to number in different ways using more and the roman numerals or just a a B C D then we have something line styles such as atomic for emphasize text bold for strongly emphasized text some monospaced text forum trolls reconnect subscripts superscripts uh inline images as here and hyphenation is also supports ultimately you can even almost of paragraph basis so parts of paragraphs you can set the language and the maximum amount of work the minimum amount of characters to keep together when hyphenating words and it uses OpenOffice on office hyphenation dictionaries to perform this hyphenation cross references are transformed into links in the PDF document it's interesting to note that reStructuredText supports you because you can't right and many of many languages and since Ronald written in Python 3 it's internally all units
codes so I have to say haven't tested this story it's but I yesterday attracted
to this Vietnamese text and it seems to me that world to what's to my example that I've seen here we have a left aligned paragraph so it's ragged on the right side of his right mind paragraph these are center aligned and support for kerning and ligatures which can be turned on and off uh as in here so don't worry about that if moment means basically makes text more readable so in the case of a V here by the full there's a rather a large space in between and kerning information and the forms uh makes this placed closer together so it's easier to read like an title it could occur that these are like 2 words while infects it's 1 it's 1 word on this helps readability similarly we have ligature so f of for example is contracted into a single list uh that is supposedly easier to read I the top paragraph also the footnote references and footnote can be seen on the multiple the page and we have a local Table of Contents with page references on the rest of the right and they are also um the hyperlinks and we have the typical of all the elements like this so if you have this option this some block-level elements so here is a paragraph to generated the warning so the Texas flowing into the into the margin and indent it's paragraphs and then tables and we can Gross spending and the column span in cells as in HTML and the width a table and the columns in the individual columns sized automatically based on their competence this is something that is missing from the sink slapping build I believe I suppose that there's a lot of factors that fixes this but I'm not sure if it's possible to use things then we have a section with some images so this image is simply kind of inline images that image and often this paragraph there's a floating image for figure inserted and this gets float to the top of the the page you can see here along with the caption and then finally we have some admonitions of our style differently based on their uh go
to the Sphinx so the most so this is basically the things gets repository checked out the move into a new look directories already there I've made some changes to by to configure around the types of 1st we need to include the Sphinx serves building that's included the trying type in the extension we need to specify this variable to instruct trying type what to render exactly so this is very similar to the configuration variables for all the back ends of saints so we will render contents the positive which contains basically all of the conditions of you are producing is not PDF by specifying things here spices specified document title and author and the following me bots configures a custom headers and footers for a document so the headers and footers each consist out of 3 that stops so long left 1 in the center 1 on the right so if we at attacked uh elements and this moves the cursor to the middle that's the so this text will be in the header will be centered and for the footer me on the uh left we insert the page number forward to that so we end up on the rights of the future and that's where we institute chapter uh description just simply the section number followed by the section title of the level 1 of our top level section we can also configured stylesheets page size orientation and configure margins of the page for this thing so project and we can also a figure in a number of columns to use 2 types of documents so I will not render this real time because it takes a little bit too long at least for on this presentation
so this is using the book templates so it's comes with the title page followed by
a blank page and then the table of contents so similar and I like the 1 and the previous documents and this goes down to 2 levels in this section of the tree so longer document you can see at the bottom of the page number of its members are in Roman of small case and then for the body parts of the text so you can see the custom header and footer and we have defined so based on that page number on the left and section description the right and the
style moralists was copied from the loudest from used by strings so continuing with the
presentation uh I
quickly mention side to side book find this is a sister project to Ronald type it's basically a CSO processor and years old is a standardized XML format to describe the formatting of citations and bibliographies uh it comes with for or a lot of stuff so are available so you probably don't need to write your own style if you are writing a document that says citations it passes a dictation delta basis so you can use these uh and it can outputs the formatted citations and perfidious HTML reStructuredText or using the internal ironic type representation this is not yet usable from within uh reStructuredText 1st things but this should be a very small steps
so this is an example of how citations and references this could be formatted spice up by so this extends
this schema so we can add references and site requirements the formatting of the now diving
little bit deeper into the internals look at the style sheets for this to understand this you need to you know how document elements are represented is our basic uh quite instances of flowable classes uh so we have probabilis such as a paragraph of an important image and then we have in 9 images which make up a paragraph the after
that I will discuss the style sheets and how robots are linked to a style definition so as I said the problem can be for example paragraph and paragraph adapts
stood available which then is provided by the document templates uh and their float onto the page that's a term that is often used in this context images for example day don't adapted available bits more generally not at least taken horizontally aligned themselves within the available space and troubles form a tree that together form a documentary such as is the case in HTML also use so that should
be intuitive here we have a title paragraphs which has a special type of style uh so this paragraph functions as the title of the document and we have 2 top-level sections uh and the top 1st top-level section has 2 subsection with some additional floatables inside of them In line
elements for example this is short paragraph witnessed a
stunning is also represented as a tree I will skip over that
quickly so style sheets I guess that they are very similar to see is that you 1st selects document elements you want to style and you select them based on their place in the documentary their style attribute which is similar to the ID and class attributes and see is this or you can select based on any other attributes or any combination of these 3 as style sheets are being source files were also managed and mentioned that before let's have a look at
an example now suppose we
want to selected me the paragraphs that are part of a list item here uh but we don't want to select this 1 we make use of context selection so we only select list items that are a direct child's that sorry we only selected paragraphs that . direct child of this type or we can also used to select the widening ellipsis keywords it's represents any number or any number of levels of problems and in this case it's simply matches the this title we can match based on the style attribute for the title paragraphs using the light class methods and so we can mention arbitrary attributes for example level to section headings we can selects like the we limit the selection of sections by uh passing the level that you argument to the likeness of of skip next 1 some running out of time there's some
limitations to see is this I think uh so I added another level of indirection I split up stylesheets between a style matcher and a style sheets style matcher basically comics although selectors and maps these 2 style names and the style names can then in establishing the mapped to definition and that means that single style matches because these are often the same all the selectors the same for multiple documents can be reused by by multiple style I think maybe this might make less sense for HTML
but think for a document processor uh this is a good that's a riot type also supports variables in the style sheets to avoid duplication this is a problem that is often uh not or this is a missing feature from CSS that this often uh complained about and we can also inherit from 1 style to again avoids duplication so let's have
a look at the start of matches and style sheets so we define the created create a new style so for example we select the style text with the that emphasis style and we assign it to the emphasis on style and name the same form as the blind blocks which is so should be more or less self explanatory and we use this natural refer to it in a style sheets we create and store over there we mapped style elements to each of the different style names so emphasize text is represented using italicized text and nested line looks are indented life on the left variables can be used to
feed the use a single
uh value for multiple styles for example is a good example of that so we define a variable here triple family uh which collects a number of forms and we can use this as follows in a style definition uh note you can also referred to for all of the value that the variable yes inheritance suppose we
have a define this style for headings for number headings uh we often also have a phone number 2 and had been style for Table of Contents title for example in this case we want to base this on the default had been style and we can simply overwrites 1 or more of the following attributes as is the case here so all the attributes are simply the same as for the standard heading style you can also
extend this style sheets so if for example you use a standard style sheet that the ship to trial type and you want to make some changes you can simply
inherits additional you can add the new matches so for example for the acronym uh role that was visible in the demonstration you can define a new matches then you can create an inheritance or a style sheet match that based on the previous style sheets and also referred to the new matches and now we can define a style for the acronym uh role we structured textual so we set small capitals to true in this case we can also override styles that were present in the bay stylesheet like this change form tradable additionally and we can change variables that will also have effects on old styles that use this variables that this variable in the base sheets so here we
replace all forms with some others then a small note about
performance so this documentation I forgot to update this but with a new style sheet it's about 230 pages and this takes about 70 seconds to render from scratch so to states to if the cache file is in place so the document has been rendered already um it takes only about 40 seconds I think this is not too bad for quite some document processor however I still looked into speeding up a little bit that would be nice um so I tried to the other side fortunate didn't get to make a big speedup but that's a together this is due to the fact that I'm basically do most of things with lists and dictionaries and step of tight loops number crunching also tried by pi fortunately it was twice as slow necessity Python um the pipe and passed by by developers are not sure what the cause of this is but I've talked to official yesterday about it and we'll have
a look at it during this I should mention the license so my intention is to keep Linotype free for
non-commercial use so open source projects can use it to generate documentation uh I'm thinking about offering a separate license for commercial use but I have to determine the details about this is a very complicated matter I understand so for more
information please note the spelling of right time it could help you to say that it's pulse from backwards please look at these euros for more information on that maybe a
set of sometimes some questions
be thank n anyways on of the 1st closed system is a project of topics and so on I have accepted questions but since the has like boring I will only ask about my old formula so that elected talk mentioned kind of mistake is there is no support for formal but with the medical like equations yes not supported at the moment it so
there's a project in JavaScript world called context you know you which 1 is the context get a that Uniden these languages sorry I know about Jesus marked or something to do with the level of 0 can't they the following explicitly dedicated to mathematical equations and everything related but it's joules so just like different things thanks with time and how old
water gives you the tried to comply with regular lattice for the same document Anderson content and in because it seemed to compilation was horribly slow compared to problematic it is quite slow compared laughing
at think that's why I tried to to look at pipeline for uh speeding up however for small documents this is not a problem uh the top of definitely look into speeding speeding the rendering of the little and did you
compare it with the Sphinx has the same speed as the distinct since laughter here revenge
yeah did did you compare are their combined with Sphinx to 2091 1 under the fastest but to make PDF make PDF things and not have week to run that I know I have compared but
I would like to know how far along that you we so the idea of rendering these 2 SCG so I haven't started to just 90 but
can watch them with but actually text processing library using internally because to do all the part formatting and all the the page title all typesetting effectively of with watch library using the you implement everything that is intact yet so
everything is pure Python I do not rely on external libraries except for docutils or the PNG library what anyone was
still hi how do you support do we use a poet on absolute placement constraints so the infobox needs to be placed at the end of the exact position on the page or maybe if you have a table where the 2 cells may not be broken if it's multi page here so as not there's nothing in
place for absolute placement as of yet uh this could be added but under drought that would interact with the rest of the elements uh as 4 tables you can set the constraint of how how many uh rose to keep together at the minimum so if you can just run the right to rules on on on a page uh will skip to the next 1 if you specify like 4 anyone question I
have a question about testing itself is it's all related to the visuals so how tonight testing such things as domain very
little testing up to now uh the 7 constantly refactoring also culture that's a good excuse me the I I'm also a bit worried about how to do this testing I guess I will have to mark the back and and or maybe it'll but just be limited to regression testing that I know which is the good location just make sure it doesn't it doesn't change but the fact of talking
and I tried that I and that we japanese only sticky and and that meant that 50 than that cause a exception you have
idea brand opposed to other much like to uh unique code should be a good part of that but I'm sure there's there is more to do with their uh of course you need to also make sure that you have forms that contain the necessary lives uh but this is something I could probably use some help from Cincinnati to not to not to read or write Japanese or many other the any was put hi how
will handle images from and there's some sort of scaling the hand is of the not type year you
can not specify uh
to scale to the total available which or specify an absolutely uh in like points on CM OK and along with then then the placed in 18 in humans with those uh and this means that this theft
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  735 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)