mandoc: becoming the main BSD manual toolbox - BSDCan 2015 presentation

1 views

Formal Metadata

Title
mandoc: becoming the main BSD manual toolbox - BSDCan 2015 presentation
Title of Series
Author
Schwarze, Ingo
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Berkeley System Distribution (BSD), Andrea Ross
Release Date
2015
Language
English

Content Metadata

Subject Area
Abstract
The original audio stream of my presentation at BSDCan 2015 in Ottawa (except for the first 30 seconds and the last four minutes; those two chunks failed to record in Ottawa, so i had to re-record them). The associated video stream contains the presentation slides captured off the beamer input by the conference organizers, so video and audio are in sync. Topics are the new man(1), man.conf(5), man.cgi(8); eqn(7) HTML5 and MathML output; UTF-8 improvements, afl(1) audit, -Wunsupp, pod2mdoc(1), a status summary in various operating systems, and possible future directions.
Loading...
Area Focus (optics) Process (computing) Presentation of a group Analogy View (database) Software developer Circle Formal language Oracle
Point (geometry) Reading (process) Implementation Freeware Transport Layer Security Real number Multiplication sign View (database) File format Home page Markup language Complete metric space Function (mathematics) Mereology Semantics (computer science) Formal language Revision control Web 2.0 Writing Analogy Moving average Data structure Units of measurement Physical system Form (programming) Default (computer science) Newton's law of universal gravitation Focus (optics) Key (cryptography) File format Point (geometry) Uniform convergence Weight Markup language Markup language Surface of revolution Open set Computer animation Software Function (mathematics) Revision control Physical system Data structure Subtraction Design of experiments
Computer programming Query language User interface Elementary arithmetic Debugger View (database) Connectivity (graph theory) Home page File format Parsing Function (mathematics) Mereology Perspective (visual) Software maintenance Computer configuration Military operation Database Network topology File system File viewer Configuration space Form (programming) Area Electric generator File format Debugger Interface (computing) Perspective (visual) Inclusion map Computer animation Database Query language Analog-to-digital converter Configuration space Electronic visual display Arithmetic progression Physical system Sinc function Probability density function
Web page Computer programming Implementation User interface Computer file Direction (geometry) Home page Function (mathematics) Formal language Mathematics File system File viewer Equation Fuzzy logic Implementation Physical system World Wide Web Consortium Newton's law of universal gravitation User interface Online help File format Direction (geometry) Interior (topology) Computer simulation Set (mathematics) Portable communications device Open set Word Computer configuration Computer animation Database System programming Equation Configuration space Film editing Physical system
Web page Point (geometry) NP-hard Laptop Computer programming Computer file Code View (database) Multiplication sign Home page File format Letterpress printing Function (mathematics) Interface (computing) Hand fan Number Mathematics Computer configuration Linker (computing) Database File system File viewer Energy level Subtraction Tunis Physical system Link (knot theory) Slide rule File format Debugger Computer program Electronic mailing list Code Open set Computer animation Lattice (order) Database Query language Personal digital assistant Sheaf (mathematics) Order (biology) Kerr-Lösung Configuration space Data type Physical system Laptop Library (computing)
Filter <Stochastik> Computer file Direction (geometry) Home page Sheaf (mathematics) Directory service Function (mathematics) Complete metric space Order (biology) Data compression Computer configuration Network topology Configuration space Subtraction Computing platform Form (programming) Default (computer science) Newton's law of universal gravitation Clique-width File format Web page Electronic mailing list Bit Directory service Radical (chemistry) Kernel (computing) Computer configuration Computer animation Network topology Function (mathematics) Sheaf (mathematics) Data compression Configuration space Gastropod shell Probability density function
Code Direction (geometry) Home page Sheaf (mathematics) Parsing Peg solitaire Function (mathematics) Client (computing) Semantics (computer science) Formal language Web 2.0 Order (biology) Computer configuration Data compression Single-precision floating-point format Physical system Process (computing) Computer-generated imagery Namespace File format Aliasing Web page Computer simulation Open set Radical (chemistry) Computer configuration Database Data compression System programming Configuration space Website Regular expression Identical particles Asynchronous Transfer Mode Wide area network Computer programming Asynchronous Transfer Mode Implementation Connectivity (graph theory) Computer-generated imagery MIDI Power (physics) Database Operating system Scripting language Configuration space output Implementation Units of measurement World Wide Web Consortium Default (computer science) Newton's law of universal gravitation User interface Scripting language Computer program Code Planning Client (computing) Set (mathematics) Computer animation Personal digital assistant Function (mathematics) Sheaf (mathematics) Library (computing)
Email Query language Linear regression Code Direction (geometry) Source code Home page Parsing Function (mathematics) Interface (computing) Tracing (software) Variable (mathematics) Web 2.0 Information Process (computing) Information security Triangle Source code Link (knot theory) Process (computing) Computer-generated imagery Software developer Web page Computer simulation Interface (computing) Drop (liquid) Parameter (computer programming) Bit Root Computer configuration Content (media) Order (biology) Configuration space output Software testing File viewer Modul <Datentyp> Information security Resultant Point (geometry) Surface Computer programming Asynchronous Transfer Mode Server (computing) Implementation Line (geometry) Connectivity (graph theory) Computer-generated imagery Directory service Electronic mailing list Code Number Architecture Goodness of fit File viewer Video game console Configuration space Regular expression Statement (computer science) output Subtraction Quicksort Linear map World Wide Web Consortium Newton's law of universal gravitation Default (computer science) Authentication Projective plane Computer program Code Component-based software engineering Length of stay Number Database normalization Error message Computer animation Function (mathematics) Sheaf (mathematics) Revision control Element (mathematics) Fingerprint
Code Logarithm Multiplication sign Home page Parsing Binary code Function (mathematics) Mereology Web 2.0 Code File system Electronic visual display Information Process (computing) Error message Information security Social class Process (computing) Computer-generated imagery Web page Token ring Formal language Message passing Well-formed formula output Right angle Code Information security Regular expression Data type Reading (process) Computer programming Inheritance (object-oriented programming) Computer file Line (geometry) Computer-generated imagery Directory service Electronic mailing list Number Sequence Escape character String (computer science) File viewer Equation Configuration space Regular expression Right angle Message passing Traffic reporting Loop (music) Newton's law of universal gravitation World Wide Web Consortium Information Inheritance (object-oriented programming) Debugger Expression Content (media) Denial-of-service attack Directory service Syntaxbaum Cross-site scripting Error message Computer animation Query language Function (mathematics) String (computer science)
Inheritance (object-oriented programming) Parsing Vapor barrier Computer file Logarithm Line (geometry) Computer-generated imagery Home page Directory service Parsing Binary code Electronic mailing list Formal language Sequence Mathematics Escape character Regular graph Mathematics File viewer Equation Information Regular expression Process (computing) Configuration space Right angle Message passing Loop (music) World Wide Web Consortium Area Standard deviation Web page Token ring Formal language Syntaxbaum Error message Well-formed formula Computer animation Function (mathematics) String (computer science) Equation Code Information security Regular expression Arithmetic progression Library (computing)
Context awareness Group action Code Home page Parsing Function (mathematics) Matrix (mathematics) Fraction (mathematics) Preprocessor Sign (mathematics) Computer configuration Code Internationalization and localization Probability density function Process (computing) File format Staff (military) Translation (relic) Sequence Parsing Radical (chemistry) Root Computer configuration Preprocessor Equation Linearization Module (mathematics) Website output Decimal Escape character Representation (politics) Freeware Resultant Reading (process) Probability density function Asynchronous Transfer Mode Domain name Octahedron Asynchronous Transfer Mode Line (geometry) Web browser Sequence Latent heat Escape character Representation (politics) Directed set Equation output Subtraction ASCII Default (computer science) Newton's law of universal gravitation Default (computer science) Addition Element (mathematics) Code Line (geometry) Web browser Syntaxbaum Computer animation Function (mathematics) Domain name
Complex (psychology) Code Multiplication sign Home page 1 (number) Compiler Binary code Parsing Function (mathematics) Total S.A. Formal language Pointer (computer programming) Invariant (mathematics) Personal digital assistant Logic Fuzzy logic Information security Error message Constraint (mathematics) Block (periodic table) Software developer Elementary arithmetic Bit Formal language Radical (chemistry) Message passing Macro (computer science) Root Network topology Crash (computing) output Software testing Block (periodic table) Data buffer Computer programming Implementation Algorithm Mathematical analysis Functional (mathematics) Open set Latent heat Causality Computer hardware Software testing Data structure output Implementation Macro (computer science) Validity (statistics) Suite (music) State of matter Computer program Total S.A. Power (physics) Compiler Error message Computer animation Integrated development environment Personal digital assistant Logic Computer hardware String (computer science) Fuzzy logic Kolmogorov complexity Invariant (mathematics) Exception handling Family Buffer overflow
Parsing Code Multiplication sign Home page Coroutine Parsing Binary code Weight Pointer (computer programming) Invariant (mathematics) Personal digital assistant Logic Drum memory Metra potential method Recursion Area Elementary arithmetic Staff (military) Variable (mathematics) Formal language Parsing Macro (computer science) Root Order (biology) Buffer solution output Lipschitz continuity Right angle Block (periodic table) Row (database) Data buffer Reading (process) Freeware Division (mathematics) Theory Power (physics) Writing Latent heat Causality String (computer science) Operator (mathematics) output Implementation Computer-assisted translation Loop (music) Operations research Power (physics) Number Pointer (computer programming) Error message Computer animation Function (mathematics) String (computer science) Kolmogorov complexity Integer Exception handling Buffer overflow Subtraction
Complex (psychology) Code Multiplication sign Home page File format Parsing Function (mathematics) Mereology Formal language Fraction (mathematics) Invariant (mathematics) Mathematics Insertion loss Logic Cuboid Computational science Arrow of time Information Office suite Information security Formal grammar Programming paradigm File format Web page Complex (psychology) Open source Formal language Message passing Module (mathematics) Modul <Datentyp> Energy level Data structure Point (geometry) Programming paradigm Octahedron Consistency Line (geometry) Distribution (mathematics) Portable communications device Number Factor analysis Goodness of fit Authorization Energy level Data structure Message passing Absolute value Information Distribution (mathematics) Consistency Projective plane Code Line (geometry) Number Summation Error message Computer animation Logic Personal digital assistant Function (mathematics) Kolmogorov complexity Invariant (mathematics)
Web page Computer programming Freeware System call Perfect group Computer file State of matter Patch (Unix) Multiplication sign Real number Home page Mereology Perspective (visual) Software maintenance Variable (mathematics) Revision control Computer configuration Process (computing) Traffic reporting Physical system Raw image format Graph (mathematics) Software developer Web page Content (media) Basis (linear algebra) Computer simulation Weight Variable (mathematics) Open set Flow separation Process (computing) Computer animation Personal digital assistant Function (mathematics) Revision control Bridging (networking) Negative number Units of measurement Perfect group
Web page Point (geometry) Computer programming Freeware Software developer Multiplication sign Home page File format Electronic mailing list Mereology Open set Inference Number Formal language Social class Logic Moving average Energy level Cuboid Process (computing) Data conversion Area Software bug Graph (mathematics) File format Direction (geometry) Building Software developer Point (geometry) Projective plane Mereology Open set Formal language Number Word Computer animation Function (mathematics) Utility software Revision control Data conversion Energy level Quicksort Freeware Arithmetic progression
Software developer Code Line (geometry) Home page File format Electronic mailing list Mereology Word Prototype Escape character Process (computing) Newton's law of universal gravitation Software bug Spacetime Closed set Software developer Code Mereology Letterpress printing Formal language Macro (computer science) Computer animation Personal digital assistant Function (mathematics) Utility software Revision control Data conversion Energy level
State observer State of matter Home page Function (mathematics) Parameter (computer programming) Interface (computing) Mereology Semantics (computer science) Word processor Hash function Network topology Logic Volumenvisualisierung Library (computing) Descriptive statistics Source code Algorithm File format Open source Computer simulation Interface (computing) Staff (military) Markup language Instance (computer science) Open set Preprocessor Hash function System programming Figurate number Prototype Data type Arithmetic progression Physical system Point (geometry) Octahedron Polygon mesh Implementation Line (geometry) Declarative programming Open set Implementation Macro (computer science) Computing platform Data type Information Faster-than-light State of matter Code Table (information) Computer animation Logic Function (mathematics) Revision control Library (computing)
Implementation Mapping Freeware Multiplication sign Home page Semantics (computer science) Mathematics Computer configuration Software repository Implementation Form (programming) Default (computer science) Source code Default (computer science) Hidden surface determination Scripting language File format Open source Code Computer simulation Weight Portable communications device Summation Computer configuration Computer animation Revision control Freeware Arithmetic progression Sinc function
Polygon mesh Implementation Distribution (mathematics) Distribution (mathematics) File format Home page 1 (number) Online help Complete metric space Mereology Total S.A. Revision control Code refactoring Software testing Gamma function Units of measurement Physical system Exception handling Default (computer science) Default (computer science) Suite (music) File format Distribution (mathematics) Web page Computer simulation Feasibility study Bit Total S.A. Portable communications device Open set Computer animation Oval Revision control System programming Arithmetic progression Physical system Fingerprint
Ocean current Onlinecommunity AIX <Betriebssystem> Freeware Source code Home page Grass (card game) Mereology Internetworking Software repository Information Physical system Default (computer science) Source code Default (computer science) Red Hat Information File format Airy function Weight Open set Derivation (linguistics) Flow separation Computer animation Oval Revision control System programming Website Physical system Fingerprint Window
Point (geometry) Computer programming Slide rule Asynchronous Transfer Mode Touchscreen Application service provider INTEGRAL Code View (database) Multiplication sign Disintegration Home page Parsing Online help Function (mathematics) Mereology Power (physics) Linker (computing) Computer configuration Natural number Authorization File system Software testing Implementation Library (computing) Physical system Default (computer science) Covering space Source code Default (computer science) Link (knot theory) Computer-generated imagery Information Direction (geometry) State of matter Basis (linear algebra) Bit Open set Parsing Computer animation Function (mathematics) output Data conversion Arithmetic progression Local ring
Implementation Freeware Code Patch (Unix) Disintegration Source code Home page Parsing Electronic mailing list Mereology Software maintenance Implementation Physical system Social class Source code Software bug Graph (mathematics) Patch (Unix) Code Bit Software maintenance Computer animation Systems integrator Function (mathematics) Software testing Physical system
welcome to the presentation of land up the topic this year is becoming the main views the manual toolbox but in a single shot to shot circle can be used the oracle and I have been an open view the development for the last 6 years also contributed to a few other areas it happened that binding focus became documentation tools 1 thing that's always nice for starting the thought is reminding everybody that we're usually standing on the shoulders of giants and a pioneer in this area is really sincere Livingstone's you're seeing here on this picture she designed the man indentation language we're still using today in 1989 1990 the analog language she implemented that language herself she felt like that the whole corpus of B is the manuals from the old man language to the new and up language and in the process he also really road all the text that was still encumbered by AT and T copyrights so all that by by 1 single person when I talk to about my talk for 4 years ago when I 1st
presented the demand toolbox my focus was this is a completely new tools and we have to train it to do real work and you see how the
beast has matured and so on OK so the means the key
point we we're talking about documentation to the the key point from my point of view about system documentation is really that all documentation should be in 1 place and in 1 form so not a part of the web and the part in HTML and the part user shared docks and what not 1 place 1 format that makes it easy to find easy to read and easy to write only if it's easy to write there is any chance that you will be correct complete and concise that of course puts a particular focus on which system to use if you want to use 1 system the basic markup syntax we're still using today goes back more than 50 years Jerry Saltzer started the wrath runoff marked up in 1964 is unobtrusive it's differently it's easy to hand and it ended there are simple tools to produce high-quality output in various formats from them the basic manual structure goes back to the very 1st version of Unix to Thompson and Ritchie the manual language still using Linux today comes from the 4 famous version 7 units the last preservers UNIX version that is publicly winnable but the real revolution in documentation languages really was the invention of semantic mark-up of the analog language in 89 90 by Livingston which then got to the world of followed for B is indeed at about the same time James slot wrote that the new implementation of trough which even though it's GPL software dominated the toolbox world and the B is the need for more than 2 decades untill finally around 2010 we started to introduce the BSD license man up to about step by step into the various views these by now practically all many of these these use it open B is the previous unit is even in the most switch to it but you know what is
in that 2 books we are talking about from the user perspective it has become really simple from the user perspective you basically have 1 you land program that you were calling my the manual view that thing when you call it once those 3 things it finds 1 manual page in the file system or using a database you can decide which filed to find either by giving its name or by giving a search query then in the 2nd step mind will form the manual page and in the 3rd step it will display it usually using a pager even in this very elementary area there's quite some progress since I've presented here last year last year I said my Emmanuel view is out of scope that's not part of the toolbox now we have you 1 unified interface for both for the viewer from on for the formative from under lock and for the search to apropos they all take the same options and very new this year 2015 we have a unified and very much simplified configuration file format that show and the toolbox
also contains a few of auxiliary components you have a database generation tool you have a syntax checker you have the apostrophe divided you have the form of co worker from onto to end up at a spilled in that part of parton from and up to 1 that is built in and you have output front ends for various formats like HTML PostScript pdf so before really
starting with the individual topics I give you a very brief overview which topics I'll talk about I guess I never talked about so many different topics in a single public talk is just because so much happened in the last year and I hope I won't get mired in in that 1 of topics unified user interface and you will you on configuration file format is the 1st thing the 2nd is the same for the wet the 3rd is improved vomiting of mathematical quake equations then improved unique cuts apart hunting bags with a fuzzy programs mean detecting use of unsupported features converting manual pages from another language from the Pearl documentation language to our common and not language these are the main subject of the main things that happens during this year and I'll wrap up providing a status and various operating systems and hinting at a few possible future directions by the way the pictures I'm using from for illustrations of pictures taken by other people along the road of the bicycle tour I did uh around some 7 of Ontario just after last year's conference OK
so the 1st thing we did open the is the no longer uses the traditional BEST non program but implementation integrated into model into the format of the traditional set was that my word for it execute twice once to call the former and then again to page the output right now the program finding the files in the file system and formatting is the same program now which is the
point in doing that the point really is to have a unified interface for for all the 3 main front ends of which means that when you call u 0 when you type 9 some page you can use command-line options that were traditionally only available to the formative like saying how which a warning level you want to see are saying which output format you want you can now do things like time in open the is the not yet have previously that will only be well within 3 years the 12 you can't say things like my 1 minus the HTML or just the name of the manual and then pipe it directly to links or something on the other way around the search tools now has access to all options that traditionally you only had in the view of the search to apropos normally just lists of a number of TypeAlign's from manual it's you now have options to say I want to see the finances instead all I want to see the command synopsis the or even I I get a search query but I want to see all the full manual pages and 1 less right away so that's quite flexible and you don't need to remember different options for different programs besides it is it allows them to a simpler configuration file format which we distribute minor things that I'm not going to name individually except that 1 of them is quite nice we have to maintain 1 less meet UserLand program and the print edition of the is the man is quite old code so it's nice to along the need to maintain that there are 2 other things that we can gain in the future that we have not yet exploited in particular Amendment many library manual it's document not only 1 but 2 free 20 functions and those typically have multiple entries in the file system hard links with the new database we can get rid of all those hot wings and can get rid of thousands of files in the in the installation and another nice thing that can be done with this it is already implemented is only not integrated get an open the is the we can have an interactive choose so you say up propose something it comes up with a list all of the programs that match and then you can just choose 1 of them and to open the manual directly without exiting apropos 1st and typing and you command a lot of things OK any
change of a program for a new program will come at a coast in this case the coast is that database lookup is slightly slower than 5 system look then again on my notebook the additional delay for displaying a manual pages on the order of 10 ms mean there is another coast when you install a new manual page on the system and you want to find that you manual page with the search tools you have to update the database that the at least the open B is the package to its rotten the required commands automatically for you so you don't need to do it manually and even if you forget it it doesn't work then the tune itself will still work you will still see the manual when you explicitly call for them and you just won't find them in in their search tool and so the weekly make which is run so there are very little downside so if any compared to the additional features we get out of this is a nice 1
Dillard configuration file format of the manual review and this is the list of all the features I identified in the old wood ducks the old configuration file format that are completely useless I'm not going to bore you reading that this to you I'll just take out 1 you can configure decompression filters like if I have a decent man page I want to use guns on it while the old configuration file format allowed you to configure different decompression filters in different sections of the manuals so if I have a kernel manual that is this if I want to use this the compression filter if I have a user and uses the I prefer the other decompression and thousands of and so the platform that was so complicated that really consistently everybody hated it all because nobody here is that even tho people have good news is for some things you want to configure so I came up with
a new configuration file format that basically has this to directives the things that people actually need 1 is you can specify among positive a directory name it takes that directory is a complete tree of manual pages on consistently uses that tree across all the tools and the other directive has specified output options for example for the terminal saying how wide the terminal is for HTML saying which stylesheet you want to link from the bit generated files from PostScript the PDF specifying the paper form and things like that that's very easy to use and of needs almost no learning the few
at the base of the name space in the new configuration file format has been reserved so once people ask for it I'm planning to implement the following features alliance directed which makes it easier for people using languages like Tcl also to make poetry is more easily accessible with the minus as options than sections directive that allows people to configure cost sections and to change the search although of sections and the filter directives in case any operating systems are using other compression formats besides GZ however so far nobody asked for these formants and as long as people don't need them I say kiss Keep It stupidly simple idea we we shouldn't implement anything that people don't actually want so be aware of I really reduce the functionality inside open is the and people don't yell at me so it's not always bad to make things simpler good so far
I talked about viewing manual pages on the command line on the terminal and we also have a CGI for viewing them on the web which has basically identical functionality of India's the on the WWW opened B is the or site no longer uses the traditional CDI process could was some Schneider previous but and CGI implementation included in the model to look at the traditional set was that month CGI would fork and execute the set of the system's non command that command with for and execute them draw for later mindlock but not in HTML mode interminable then the CGI script in Perl with regular expressions not even with the with the library would powers that terminal output manually converted to HTML incredibly ugly the human CGI is 1 single C program it yes but some people do right singular programs in and it links in just the components needed the money deposits the database client code and HTML formatted code and it directly generates clean HTML code with the benefit of providing full semantic search capabilities so just at the command line like at the command line and web interface you can use the search for all the semantic units
1 thing that was surprising about this man CGI was that even though when I started at last year's may have found annually honor all the underlying functionality was already quite mature and ready there were was about a dozen different components to to tweak so the configuration of the syntax for this thing had to be tweaked so
what I say it is that there were even even though we had a good Matsuo codebase to do really exchange the man CGI completely a surprisingly large number of small things have to be fixed and adopted slowly so even in this seemingly small project you can sometimes be prepared to to have to do a lot of work but in the end it paid off because quite a few of the features we originally implemented for the that manual viewer uh and turn out to be useful for the command line to 1 of the only new command-line options derives from that the code fault that was originally developed for the web was then used for the implementation of the modern command I talked about before even the the way the apropos command searches results was originally developed for the web fewer so there were quite a few quite a few benefits and that 1 thing we completely overlooked at 1st is that even if you're not doing anything with HTTP S all authentication or limited excess whenever you do anything on the web you at once get into security issue issues simply because you're taking untrusted data often that newer processing it the processing itself could be harmful or even if it only includes slot on your server and the output you throw at the user could be harmful to exist as or whatever so at some point we decided that so we had to audit man CGI codes for security issues and the way we did that was in 3 ways 1st starting with all the untrusted input tracing forward and looking what is this input used for on the other hand starting from the other and locating all the places where the CGI is printing output to the user and tracing back where where is that data coming from is it coming from other sources couldn't be clobbered somewhere in a given that there are 2 molecules that need to be audited the steering program and the formant but we also identified all the places where data is transferred from 1 molecule to the other and started auditing from that interface into both directions of course all these these tracings hopefully end up in the same code paths but you really don't want to miss any code past so it's good to have a bit of redundancy in such a model but initially almost all the security issues found were reported by Sebastien that the who by the way in the meantime has become an open the is the developer but I read it all the order to make sure that nothing was missed and I'm now reasonably confident that we've found most things that shouldn't be in the program run on the web so but is still that was was planning to replace the 3 nearest the non CGI 2 and he's and planning to use this exact
code here are a few here is an overview of the Security Act issues we actually found in 1 important class was and then new dated input and unmitigated input in particular in the URI both in the past provided to the year in the U I and in the query string that led to 2 kinds of problems 1st erm reading and related files from the file system on the server and for possibly disclosing content of files that were never intended for display and on the other hand information disclosure in in our messages so even when the program realized OK this is strange I should be doing that and the error messages might reveal stuff to the to the attacking should know and of course the sixers were rejecting absolute path rejecting ascension to parent directories and validating stuff up front paying attention what we is played an error messages and the other type of problems were mostly cross-site scripting issue is partly due to invalid characters embedded in query strings and partly due to only the stuff embedded in manual pages I mean when you run a manual page so you will previously for example also serves manual pages from parts and you don't really know what people put into manual pages and parts so that should not be able to trigger cross-eyed reporting of facts in new CGI front end basically all that these the CGI early success as things require getting the encoding right which turned out to be quite tricky because some of the output HTML encoding some new URI encoding some even need both so it wasn't only a question of doing the encoding of the at the right places but doing the right number of encodings and choosing the right encoding of the right places but I guess we figured it out the end of 1 saying that it is almost impossible to fix it is correct that expression DoS attacks we prove and on the command line we allow people to search and manual pages using red regular expressions and we wanted to have the same functionality on the Web so people can enter regular expressions into that thing and the regular expressions are so powerful that you can't really prevent them from clobbering so the resources so the only mitigation we came up is just limiting the total time is CGI process can run and so far nobody has brought the over down I hope it can stay like that if if it doesn't we might have to switch offering regulating preference there's probably no better than the OK so so much
about the various display tools yes you're welcome to this year's years
and it's extended regular expressions by the standard through teams contained in the C library might be I must admit I'm not really up-to-date whether those are less easy to exploit what or file from Honda that might make sense on the other hand I'm getting on the nerves of people with restricted to be yeah yeah that it might be better to switch it if really we suffer from a text then we should probably consider that that's the mice method the
OK so so far about the barriers and US no let's get to some things about parsing and actual formatting the main progress last year was made in the area of mathematical in equations and manual pages now I admit that the and language is not used as much as and not mine and TBL but there are some manual pages containing mathematical it's quote equations and particularly in its aura and debate in the math libraries and the parsing works quite well Chris that's the is the finished that in 2011 that the and the someting was really ugly and we have to apply some paint to that
in HTML output Chris that's rewrote the output module to generate not an element in the context Free also switch the output to HTML 5 that was actually quite straightforward the past streak falling out of physique UN parser can be translated 1 2 1 2 2 moss and how it's less than 200 lines of code and the the output looks quite beautiful in a graphical browser just look at that on the open B is the website and the open manuals look at a few in X manuals containing mattresses and so it works quite well terminal the output at 1st sight seems seems how how do you form a mathematical in equations on an ASCII terminal then what do you eat you and there's is they try to move elements Apple line down the line and try to draw lines from minus signs and such staff and the results just unintelligible it doesn't work at all so I chose a different approach and we wrote the terminal output as a linear text through representation and here you have a few examples of how fractions and mattresses and functions look like this the that's not pretty but at least you can figure out what it means and that is the main thing for manuals so this status now is that man got actually formats equations much better than the new EQ and both on the terminal and for HTML while Postscript and PDF is still the domain of the of the full thing of new EQS and another thing
about processing and formatting is a internationalised manuals motorbike characters now admittedly non-English manuals have a lot of problems they are they are they are hard to maintain even if you try to maintain them they tend to get outdated and once they are out there today are arguably worse than nothing however it doesn't happen at all if in addition to all these problems the tools in the reading for that reason Chris that's quite poorly implemented basic UTF-8 support but in the same way as it's done in Rolf it required the preprocessor to transform the UTF-8 input into Roth escape sequences and that it requires a specific output options to tell the thing OK I I want you to have a output so in addition to all those problems you also have to do when when action viewing the manual pages you have to take care of the special options now this year I integrated the preprocessor right into my so that the input encoding is automatically detected and I switched default of output modes from T ASCII to to local which means as long as you use the politics of local it doesn't make any difference for you but if you have your stuff set up for UTF-8 output anyway then it just works so right now you in a Japanese are a Russian manual is no longer more difficult than viewing in English manual and I think that is how it should be looking so much
for functionality now let's come to the things that don't work because you probably all know programming is kind of about getting things wrong this tree data here found a lot of bad assessing 2 is a program that runs another program trying to feed varying input trying to crash at our hanging it and the the specific things advertised about the American fuzzy logic of the program are that it does compile time instrumentation of the tested code and has a genetic algorithms such that it can discover the test cases itself and the can execute as many functional code pass as possible with the goal of full functional coverage now getting full functional coverage for terminal output in mind on modern PC hardware takes several days of round-the-clock running but that is exactly what Jonathan Gray and open B is the development on Australia did repeatedly since the end of last year and I was a bit surprised that he found more than 40 issues grandchild grand total no westward
these about a 3rd of them were cases where we assume that our data structures had certain invariances and these were actually violated there were cases of general and invite you environments vary violated that way in cases of macro specific ones another so where logic errors that would just arising from excessive complexity of the code in hard unavailable because the language design is so complex that I mean the topic here is badly nested block in them and other language I talked about that for years ago I can't repeated for time constraints and partly because of complexity in the implementation of the specific thing that cause facts was macro rewinding so you open the block and then don't close it again or you close a block that is not even open and all that has to be handled and if you do it that in creative ways that it might crash the program so these were thirds invariants and complexity the remaining so that's where the things you expect to cause security issues like missing input validation buffer overflows use of free and so it's not
only interesting to look at the causes of facts but also the severity of
after that some of the of theory is that you can use the reason is that the data is usually use operator Austrian reverses had right here on string colonists well it's a lot of the things I said area here 3 years of the variables used in the and that was all the problems you know this is this is the problem with it is well the the routines of whenever the in the in the open the is the lips C are mostly retains concatenating and copying strings like for a cat and through copy but I know I'm not aware of any open the is the specific string parsing routines so and that's the other and that you arrived at the top the top 5 issues found rarely buffer overruns were a string parser only parse the string that was broken in such a creative way that it skipped over the final 0 and then if you if you return that positive the calling code and the calling code might modify the buffer can even end up as a right buffer overrun a acaso so it kind of only 1 lesson learned here is that I made this code was written by moderately experienced people in the context of open the is these so you should assume they were aware that there are dangers and still these things were in their own and what is particularly interesting is that this weight I'm just losing power that is particularly interesting thing was that the easier the staff would have been to avoid the more dire the consequences so if you just if you pay attention from the beginning and then after a a certain time order to code once again looking for the simplest thing like being careful when passing back when passing around past pointers making sure you don't pass the rows watching out for use of the free not forgetting to validate input being careful with arithmetic operations that you would probably still find stuff and find things to
fix that some of the computer and keep in mind all this in code written with security in mind from the 1st point it cannot be stressed enough that some of the well-known things come up again and again the largest numbers of box in absolute numbers were in the most complex code so yes it does pay off to toward complexity if you can the distribution of the bag across the various modules that was more or less proportional to the size of the molecule but not not a big surprise that yes it does pay off to keep coats small but so in this case we were about 1 . 5 serious box propulsive lines of code or something in between half and 3 serious bucks for a line of code is arrange that you might expect in this case we had a few aggravating factors in particular the languages we are parsing have no formal definitions that nothing has really ever been written down the and cleanly they are not designed according to any strict paradigm's but rather evolved historically so the part of the requirements of the design goes were known from the start but discovered piecemeal so again and again we have to break existing logic to some change in existing or a logic and existing invariants that might have contributed to to to this thing and 1 thing that would have helped tremendously I didn't find the time yet but if you are working on an important project and have the time to spend the effort I would recommend recommend to explicitly specify fall you may data structures which invariants you intend to guarantee and then ordered code whether all places changing these data structures actually respect the invariants and shape a audit code without all places reading from the data structures don't assume anything else except those it's the invariants you have explicitly specified in the case of that would have called about the sum of all those issues which is quite a substantial fraction look at so well
broken stuff not only code is broken manuals of broken to know what do we do with about it well we tell the office which have the porches for that we have 3 message levels and 1 lowest 1 is warning so warning means the author should be aware that the quality of his code could be improved it's still here what he means but it might cost portability problems with tools the of an arrow means the author has written something that we don't really know what that means it's inconsistent it's it's likely that information gets lost the the user doesn't see the full text is intended to see others with the formatting might be completely clobber that's an arrow this year I introduced the 3rd level which is called unsupported that's not so much for manual office but for Porter it tells that my book has the impression OK this is probably a valid code but I know I can't handle them yet so that formatting this particular manual then there was a historically of false level and I'm quite proud of finally after 5 years having gotten rid of it it was called fatal that means if you through an annual ACM under and it's replied knows that's so weird I don't give you any output but there's some text and it so it should display that text and we have finally reached that whatever you throw at nite you get some output even if it's empty it no longer there's nothing good so the
basis then the problem is not the problem of broken manual pages is not really hard if you find the broken manual and the base system you fix it and other with it but imports that's not really an option you can try sending patches upstream but not likely that something will happen so the good news is that by now after several years of development of mind of about 95 per cent of reports manuals and this work but what about the remaining 5 per cent in open the is the we know we mark those parts where the manuals don't work with model with the use graph variable in the file and there are still about 200 such ports these manuals of former preformatted at port built time and that the formatted versions of PAQ if the advantage obviously is that end users get perfect manuals for every port but well from the formatting perspective in the content must still be off the inconvenience is that you need the support the ports infrastructure for such a thing marked as he has written that years ago and it works and the porches need to maintain this use graph variable for every single port for that reason to avoid this
work freely it's the has chosen a different way what they do is the modern program doesn't run on that right away but 1st asks mind what thing about this manual page can you deal with that and I have my you looks good it's run again that time for real and if models is no I don't think I don't like that then Gross's run state inconvenienced are considerable of course because at the time you run the mount command that page has to be passed twice that close time and particularly bad in case model doesn't realize it's unfit for the job it's too confident the user getting completed on mis-formatted output and on the other hand if none of his shiny and says no I don't want that even though it could handle it then time is wasted on running graph so that's kind of a trade off in which way you do it next is the has a very creative way to handle that if I understand correctly that they just in all the whole problem and it seems to be good enough for them I don't hear complaints about those probably about 5 per cent of broken parts manuals and in that the is the
1 In the future we might improve it in 2 ways 1 way would be to improve low level Roth support in Montauk and to remove use graph from various Open B the ports another way would be to improve the W. unsupported logic such that the number of problems in free BSD is reduced and at least 90 so Christian Bizer chemically can open the myself hope that these 2 ways will ultimately converge so that we can go then that is the way and everything just works with non but will still take some time
OK at this point I'm through with the man up to a box in the strict sense now I'm talking about 1 of the companion tools a converter from the pro part former to the end of because in that area we have made quite some progress during the last few months was throughput relevant well after the and format used in India's is decent demand from that used in the notes I guess that's the 3rd most used format for infer for manuals it is used by Pearl it is used by Open as it is used by ffmpeg by various projects that usually these pages are converted to the old man format by the public to lie a program which is itself written in Perl the downsides are that you get no semantic searching and that the developers have to learn other formatting language the pro what language which is less powerful so learning 2 languages and 1 is even less powerful than the other doesn't make a lot of sense so we have decided to convert versus and manual it's from me only hope want to end up Anthony Bentley has done half that work already last year and it is committed so than the SS L manuals under undone and I am currently working on Cook told many words to get a little guy from the sort of filling up
and for Anthony it was still quite hard because he only had a the prototype of the part to end up to and it required a lot of manual post-processing I've
now improved a lot of details in particular in things like white space and closing punctuation and quoting you might say well that's all seems quite minor but you need to keep in mind that the ability is to commit the converted manuals so the generated code must be clean and maintainable because after that developers I'm going to hand edited for the case maybe so maintaining it up by hand is quite tedious we're talking about hundreds of manuals here
but admittedly in that when talking about improvements of this talk to end up converted the conceptual things are even more interesting to keep in mind that the probe platform has no semantic market it basically only is things about the physical formatting bolded Italy and so on but in the output in the and the output you want semantic markup and I've written some of URIs states that look at really at the text where there are apparently uses and commerce and blanks and figure out 0 this might be a function declaration in particular instance synopsis and and the missing not up on the fly full-function types and the function names and function humans and so on and not only that but it also uses hash tables using the 0 hash library written by Mark sp and remembers the names such that when later in the description of this function names of argument names reoccur they can be found in the hash tables and the correct macros can be inserted in the text which considerably reduces the amount of manual post-processing that needs to be done I'm I release all that last month so it's quite you 1 thing that is interesting about this this is that it is somewhat similar to staff Eric Raymond has done in the context of of the 2 he's using for converting non-manual ITS 2 . book how is it called thing I've forgotten the name but the and so the dream for the future is that we might use a similar logic like the 1 developed here in the future to extract semantic information from manuals and enable the semantic searching that we have for and of documents from 1 to but this is no clear concept yet how to do that it's just an observation that the the the algorithms needed points extracting these information from various formats these pro part on or whatever a quite similar good so let's
get to the status in open the is the most of the work was already done last year so there is a lot that was completed by 2014 in particular model is the only documentation format open b is the base now for all almost 5 years and the search tools had had been switched before last year's conference the main progress here since last year is the new online interface unified interface for the formative and search tools this which of the manual you to the new implementation and all that is released last month with open is the 5 7 so even if you install open the is the stable now you get all that part of that would only be when it will in the next not in the next but in the in the previous the release after that previously 12 .
previously has made it even more progress than open the is the this year last year I could only say OK they haven't and days but don't use it right now but he's double sum has done tremendous work here switch the default form at the end of last year and he's using the unsupported option I explained for ports manuals since March this year he is switched the search tools a week ago but the codon BSD is completely up to date with the latest stable model really all that is going to be released the 11 and the only thing it postpones under free the 12 is switching the demand implementation because the sensible enough to say we shouldn't change everything at the same time that might Alenia new users people might get upset if everything changes at the same time but very impressive progress in previously
unfortunately in that because the and dragonfly almost nothing happened now is the is using dog as the default format along and Free BSD but but they don't have semantic search tools they have their own search implementation which is doing full text search but lacking semantic search and dragonflies still having
it but not using it another system that made impressive progress is in the mosque In previous years we often say and we often cited Solaris based systems as the examples of very old very traditional stuff that didn't even have any and up implementation in the most has decided to bit by bit translate all their manuals from to end up so the same thing into Livingston BEST did in 1990 they are now doing that too I and it will be interesting to see whether they completed in 1 year like she did I guess not not only got of the switched the default format in the same commit as he imported model into the base system so now they have non dock and they are using it not the newest version but it seems to just work for them that's the 1st system grand total who did the the switch and the 1st Monday is the system in the notes
there`ll surprisingly there are 2 distributions were completely relying on Monday both are very small ones Alpine notes and widely note but they have everything they have to search tools they use them the manual you if they have the latest release of Alpine he notes was the 1st non B is the system ever to use the model based not and there are a few other spots units the units has an official part Slackware and crew except unofficial 1 but not none of male units is really as it does anything so far even though parts have been away blue for all of them unofficially and on regular testing on the Union before uses we should see what happens there that was the use of factor in the future work so I can't give you think a lot of world thanks it would be nice it would somehow fits the Linux philosophy more or less well that that's just be patient and if anybody if if they need help they should just come to us because that's and myself and try to help them if they need any help other
operating systems Linux hasn't in various but is somehow completely apathetic since 5 years or something there is some kind of user community of several user communities in always IX their also when we're both homebrew and Mac parts there is even a halfway up to date for for for Windows as such information is periodically updated on our website so you can see the status also
between conferences so the status summary is fully integrated in open B is the Alpine know except for my also in previous the current default format in an Internet BEST and it almost it's at least in the base system in previously 10 dragonfly and Minix oficial packages exist for 4 years the 90 archly notes and PKG source and then there are a few having unofficial packages or outdated packages on
regularly announcing goes for the future conferences of those that were announced forward region this year the replacement of the man CGI the integration of Creek on the switch to local output by default and replacing on an open B is the several
things are in progress we are working on the Nips cooked or manual in SSL and improving part 2 and to facilitate that we are I'm unifying the parser ending for better-off support in the future that's a very complicated subject that I really couldn't cover In this talk if I were to talk about that I would have to get full talking at the same time nature of their answer party its knowledge is still being improved we want to at some point in need all those redundant hard links in the file system so this essay the same thing they already that's 1 thing free the already did 2 months ago but I only learned about it yesterday so it's not on the slide here is that text info is no longer previous the but use the researchers used that tastes in food to and opportunity that Chris steps to convert all the test input documentation to and I'd like to look open views the 2 this is a very nice example of previous deal leading the way actually what's a bit stalled providing help with the man to end of convergence but that can be picked up again it's basically with Chris that's starting with the book to end up to its mentioned 2 things that have not yet been stuck at some point I dream of using part 2 and but not for convergence with manual post-processing that inside build systems instead of running part to man on the program humans we could run part to end up converted all the parts per to and the farmer and then we would get semantic searching in Perl manual it's in your is the basis them for free and another thing 1 thing that is good about info not talking about all the problems that people have but 1 thing that is good there is internal linking with manual pages so long as we don't change the basic way manual pages out built self-contained linear easy to navigate by hand if we get additional options for linking inside manual pages that would help a bit and 1 idea to do that was functionality similar to the tags that could be integrated into less than that really needs to be worked out to conclude I would like to talk to thank a few people Chris that's of course the original author of Mandela who again contribute quite some code this year for example the newly you and power the HTML 5 mathematical from that of course is Jonathan Gray who did extensive testing with the album reporting more than 40 important that that
is that was sent for tremendous work on previous the system integration and also sending source code patches please get last day of the 98 open the is indeed for removing use graph from many parts and helping with OpenView departing were told last class next we is a nest that B is the 4 PKG source maintenance and while that is is lagging a bit in the base system Thomas's doing examined were completely source that and I call Alpine you know excluded system integration there and proved that man that can be used as the system manual from in the shook also and you know it's suggested the implementation of man if it weren't for Paul who say so why don't you implement man and I said well that's a bad idea last year and is the camera said that's out of scope but then I stepped back and thought well actually why not we already have almost all the code that needed it just needs to be shuffled the but this year there were quite a few people contribute patches and of course gets even more people will report that was suggested features the
Loading...
Feedback

Timings

 1134 ms - page object

Version

AV-Portal 3.11.0 (be3ed8ed057d0e90118571ff94e9ca84ad5a2265)
hidden