Everybody wants (someone else to do) it: Writing documentation for open source software
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 183 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/32151 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2015 | |
Production Place | Seoul, South Korea |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Seoul 2015112 / 183
7
8
47
53
54
65
73
74
79
82
84
92
102
103
105
124
126
127
130
141
142
143
156
161
162
170
176
178
181
183
00:00
CodeMathematical analysisProjective planeOpen sourceCodeComputer animation
00:31
Multiplication signSoftware developerRevision controlProjective planeCapability Maturity ModelOpen sourceBitExterior algebraWebsiteContent (media)Stack (abstract data type)TouchscreenComputer configurationService (economics)Web 2.0Random matrixSoftwareWeb pageMenu (computing)Term (mathematics)WritingElectronic mailing listWordCuboidMereologySlide ruleVideoconferencingWater vaporPairwise comparisonOrder (biology)Turtle graphicsPosition operatorSuite (music)Level (video gaming)CASE <Informatik>Installation artUniform resource locatorNumberCodeMetropolitan area networkFreezingBuffer solutionVideo gameFocus (optics)Perspective (visual)ResultantArithmetic meanDifferent (Kate Ryan album)Level of measurementWindowFormal grammarOperating systemGoodness of fitSoftware bugInternetworkingServer (computing)WaveTranslation (relic)
09:41
Cartesian coordinate systemComputer fileShape (magazine)Medical imagingBitTelecommunicationOrder (biology)Formal languageChainGoogolProjective planeOverlay-NetzTerm (mathematics)CodeVideo gameInclusion mapRevision controlLibrary (computing)MereologyEndliche ModelltheorieInformation securityWebsiteTranslation (relic)Sound effectOpen sourceWritingSoftware developerMessage passingMathematicsPattern languageWikiVapor barrierInformation technology consultingProcess (computing)Type theoryWordLatent heatMaterialization (paranormal)Core dumpInternetworkingNumberText editorMultiplication signEuler anglesSelf-organizationService (economics)Suite (music)Set (mathematics)Electronic mailing listCASE <Informatik>Online helpSoftware testingData structureParadoxFunctional (mathematics)Sheaf (mathematics)Design by contractSoftware maintenanceSlide ruleBlogFormal grammarArtificial lifePressureOnlinecommunityGraphical user interfaceConsistencyInheritance (object-oriented programming)Internet service providerAngle1 (number)
18:52
Software testingMetric systemCodeComputer fileBitMereologyMultiplication signBuildingType theoryOnline helpBlock (periodic table)Software developerArtificial lifeProjective planeSource codeOpen sourceGeometryBenutzerhandbuchRight angleDifferent (Kate Ryan album)Object modelSoftware bugOnlinecommunityCASE <Informatik>Library (computing)Sampling (statistics)ChainData storage deviceLine codeMetropolitan area networkPoint (geometry)Medical imagingDistance
23:14
Computer animation
Transcript: English(auto-generated)
00:03
You're going to get off that easily, so I have a question for you. What is exactly innovative about writing documentation for open source projects? I was hoping you could fill us in on that one, Jodie. You've got 20 minutes to do it. Well, I'll give it a go. So, welcome to the last afternoon of VOS4G before we
00:24
start the glorious code sprint tomorrow. Today, I'm going to present a talk that's actually not my talk. This talk is put together by Michael Pumphrey. He's our community advocate at Boundless, and by that I mean that he's really pro-user. He helps tame the wily developer,
00:43
he beats English sentences out of them, and helps act as a buffer or a translator between all the technical stuff that goes on, and real people that just want to get stuff done. So, Michael Pumphrey's a really nice man. He's unfortunately not here, so you have me.
01:03
This is kind of what I look like. I kind of look like a bandit in this picture, because I'm at the Philadelphia code sprint, and it's rather cold outside, so I'm rugged up. I work on a number of projects, GeoTools and GeoServer. I also work with the Eclipse
01:21
Foundation, Location Tech, and also the OSGO Foundation. So, just in terms of getting started here, I just want to give us a rough clue what we're up against as a software project. If you had spent $30,000 on a software project, you might spend a week learning
01:42
it, you might take a training course, you'd kind of care a little bit. If you spend zero dollars on an open source project, how much time are you going to spend? Five minutes? Ten minutes? Fifteen minutes? We really need to focus on people that have downloaded and are trying out our software. We really need them to see visual results right away,
02:03
or they're going to go on to the next project. So we've got really stiff competition in the open source world. In terms of documentation priorities, I just want to keep that perspective in mind. I really want to focus, just as a matter of life and death, of an open source
02:22
project on making installation easy. So we really want to ensure potential users can actually install your software. I know it sounds like it would be obvious, but it's a hurdle that a lot of projects stumble over. The other thing we want to do is some kind of quick start. We want to make sure that users can install the software
02:40
and see their own data in it. There's a lovely project that the Open Source Geospatial Foundation has put together in order to help with both of these aspects. It's called the OSGeo Live DVD. It's not actually a live DVD anymore, is it? It's a little USB stick, so I think it's OSGeo Live. Now that USB includes a lot of open source software installed and
03:04
ready to go, and it crucially provides quick starts that you can follow to see these projects on screen. Just to go over what we want out of installation, we really want to get the software installed. We want to show each step. We want to show a picture
03:21
of each page of your installation wizard. People are really going to be doing a comparison between your instructions and what they're seeing on their own screen, because they're desperately nervous they're going to do something wrong. So it's really a sanity check to make sure things are going okay. The other thing that's really difficult if your
03:40
software developer is, please don't get distracted explaining all the wonderful options that probably only one or two people in the world care about. Just focus on getting the software installed. The one or two people can come and talk to you later. One thing I've seen recently is the use of installation videos. Who here has watched a how-to-install post-JS
04:04
video or something like that online? Anyone? A couple people. They're very friendly. People get to hear your voice and follow along on screen. It is good because it actually shows the software working, so that can provide a sense of hope. There's a couple
04:20
things to keep in mind. It can be really hard to find an installation video that matches the version of the software you're using. It can also be really hard to follow the steps that are being taken. You end up scrubbing the video back and forth, and it can be very frustrating. I do recommend using video to showcase new features and so forth, but
04:40
please do stick to documentation for installation instructions. In terms of a quick start, we're really once again looking to show rather than tell. We're going to try and focus on the key value out of your software. For GeoServer, we really want to get GeoServer on screen and show it off being a web map server. Not too many people are interested
05:03
in WPS or web coverage service or how to configure the latest performance options. We really just want to get the project on screen and show how it's amazing. Another thing that people run into is the dreaded Stack Exchange. This is where we all
05:21
wave at Ian. Hi, Ian. Ian's been a very responsible member of our community, lurking on Stack Exchange and helping answering people's questions. It's a little bit of a question and answer site. Now, this is more of an alternative to a user email list than it is to documentation, so it's really good at helping answer questions that get asked
05:41
repeatedly on the user list. Once again, it can be hard if you're working with an older version of GeoTools or an older version of GeoServer to hunt down exactly what you need to do. Just in terms of writing tips and tricks, I just want to once again remind everyone that these slides are from Mike Pumphrey. I'm part of the problem
06:03
he's trying to warn against in terms of these writing tips and tricks. I'm absolutely terrible at spelling, and I depend on people like Mike to clean things up. The first tip I want to do is when you're writing instructions, please don't use the word easy. The only thing that can happen here is that users feel frustrated because
06:22
it's not easy. It might have been easy when you wrote the software two years ago, but chances are you're writing for the future. Maybe you're referring to the start menu, and the start menu is gone in Windows 8. Or maybe you're talking about the Windows, I don't know, start screen, and then all of a sudden the start menu is
06:41
back in 2000, in the next one here, Windows 10. So just please don't use the word easy. It can only serve to frustrate people. The other word to be careful of is simple or simply. A lot of people are coming at geospatial and software with different backgrounds, so something that is simple for you might
07:04
not be simple for everyone else there on the internet. So the only thing that can happen here is you can make things, make people feel stupid if they don't understand something you're describing as simple. So just one piece of software on its own is pretty complex. Once you start putting these
07:21
pieces of software together, it gets more and more complex. Sometimes people will be using this software in the future in an operating system you've never heard of. So yeah, just keep in mind, stay away from these words. Now with that in mind, I'm going to be a little bit mean to the GeoTools and GeoServer community. So here's a sanity check. How does GeoTools do? We
07:42
found 46 cases where someone's used the word easy. In GeoServer we found 40, and for Boundless, OpenGeoSuite, we're doing a little bit better at 19. In terms of checking for simple, we're doing a lot worse. So for GeoTools we're getting over 100, same for GeoServer, and same for OpenGeoSuite. Now in our
08:03
defense, the OGC has defined something called simple features for SQL, or simple features, and so a lot of these hits are false positives. So I'm going to have to cut the team a little bit of slack. One thing I want to do is, when you're thinking about writing, focus on getting any content at
08:25
all from your developer community. So don't, we've all heard about don't let perfect be the enemy of the good. In the open-source world, I want to soften that a bit. Don't let good be the enemy of nothing. So often if
08:40
you complain or hassle people too much, they might walk away. So really focus on getting any content at all, it'll be a good thing. And we also have to relax a little bit about things like grammar. This is, once again, Michael Humphrey talking. I've relaxed about grammar a long time ago. Just
09:01
going to talk a little bit about curation of content. Both Geotools and GeoServer are very long-lived projects like, when did Geotools start, 1996? Something like that. Geotools version one actually died due to lack of documentation. The developers found that they were
09:22
spending so much time answering user questions that they didn't have time to actually fix the bug. So they did the only thing possible. They fired the user list and started Geotools too. So here's just an example of what we're looking at in terms of creation. This is a page in the GeoServer 2.7 docs about the GeoEXT Styler. This is a feature
09:45
that hasn't existed in GeoServer for how many years? GeoServer one at least, right? Yeah, so I deleted this a couple of weeks ago. Here's one of the latest features of GeoServer. We've got this wonderful security model called GeoFence and there's no docs about the GUI. So
10:04
we've got some work to do. And here's another interesting one here for complex features. They're the opposite of simple features. And you can see that after many years of being a popular part of the library, there's still no example code for new developers. So we certainly have work to do. So how did this happen? It can be very
10:23
easy to forget or let slide some of the maintenance work involved in projects. Documentation and quality assurance can really be some of the first to go. For projects that have been going a long time, often developers hope to get back to something. But often
10:41
without a champion such as that GeoXT Styler, the feature can be removed and no one's thought to check to clean up the documentation. The other thing that happens, especially in our modern commercial open source world, is that features get produced under contract. So there can be a great bit of social
11:01
pressure to get a new feature in so that a development team can get paid. But we actually risk the development team moving on to their next bit of work and hoping or assuming a volunteer will eventually get to it. So we've got a real tension there. One of the things that did not work for us is using a wiki. So using
11:24
a wiki was something we tried in 2004 or so, and the hope was that our user community could help us write docs. So Geotools had to give up on using a wiki. I found that after acting as editor for a while, it actually written 90% of the wiki. So code examples especially were really out of
11:41
date and the library was getting a bad reputation because the examples were hard to follow when people assumed the library was hard to use. And for GeoServer, OpenGeo, the forerunner to Boundless, migrated to a fresh set of docs. They did a complete rewrite rather than start with the
12:02
wiki as one of their initiatives when they set up OpenGeo Suite. So what has worked? At Boundless, we have a policy, this is especially important when we do professional services work, that a feature is not done until it's documented. And I really encourage other
12:20
organizations working with open source to take this attitude. In terms of what Mike's found his work, he asks developers to write low-level notes, and Mike translates this for the world to understand. The other thing he's done on occasion is interview developers to ask them about a feature, and then from that interview he can
12:41
write up the documentation. Now I'm one of the people that Mike would be interviewing. I've found that including the cost up front in a GeoTools proposal, so we have a little section where you write down the scope of work, and I make sure to include the documentation in QA as part of that scope of work. The other thing I do as a volunteer is when a feature
13:02
is likely to get cut or not included, I might step forward and write a couple paragraphs of documentation just so the functionality gets in. So one thing we've done is a policy change in GeoServer, so now to get included in the project, documentation and QA test
13:22
coverage is now a requirement. So this really enforces the idea that if consultants want to get paid, they have to provide docs. So we've actually seen some success in this, so our docs now do a much better job of documenting new features, and consultants do know what's required up front, but this is a
13:42
barrier to the casual drive-by GitHub commit practice. People often balk if we ask them to write docs. Now, GeoServer documentation was originally written in HTML. We did migrate it from HTML to a wiki, and a bit of community outreach attempt, but
14:01
gradually this became out of date, and as I said, OpenGL moved from a wiki into having the docs versioned with the code base, and this has really been helpful because the version of docs gets released every time the code gets released, so people can find the docs that match what they're using, and the other thing that's happened is
14:22
boundless has volunteered Mike to act as an editor. In terms of geotools, it actually started in docbook of all things, which was like a terrible XML thing. We moved to the codehouse wiki, and as I said, after a number of years, I'd found I'd written most of the wiki, and the
14:41
other thing that happened to us is vandalism, so wikis were very prone to vandalism, and that's just because it's the internet and we can't have nice things. So we've moved the code into the code base, much like GeoServer. It's a little bit harder for community volunteers to help out with, but I'll show you how we can do that in a bit, and the benefit is it's
15:01
now versioned alongside with the code base, and also we can include live code examples, which is really nice. In terms of the amount of effort it took, it took about, I don't know, a couple days with pandock to convert the material, and then about six months to clean up the mess, so it was one of my bigger volunteer projects. We use a tool chain called
15:25
RST and Sphinx, so in order to install Sphinx to work on docs, you install Python, and then there's this easy install Python, sorry, easy install Sphinx command line, and then you can run and make our docs, and this is kind of what our docs look like.
15:42
They look like text files that have been formatted, so it is a wiki format with structured text, and the killer feature for GeoTools here is the ability to include live code examples that don't go out of date. In terms of how to help casually update our docs
16:02
in case you see a typo, you can browse to the code directly on GitHub, and you can find the file that you want to edit, and there's a little edit button, and then you can make the change directly in the file, and then you can scroll to the bottom and provide a little commit message, and that will be sent to the
16:21
development team as a pull request, and it'll get reviewed by, probably by Mike. I would do that as a live example, but I'll keep motoring. In terms of a writing guideline, GeoServer is a little bit more formal than I'm used to. They actually have a writing guideline, so this really helps the community provide a consistent voice when they're writing
16:41
documentation, and it also allows Mike as an editor to be fair and consistent when he's reviewing documentation. We found that often developers don't understand all of Michael's fancy English grammar words, so I really lean on his good and bad examples. So one of the things we do try to do
17:01
is be concise, so we don't want developers providing a brain dump of everything they've ever thought. Reference materials, we want to be short and scannable, because people are usually just looking for a few specific facts. Tutorials can be longer, as you would imagine. Often we find developers like to have an opinion. We are happy for them to have an opinion,
17:22
but we want it in a blog post, not in our documentation. The other thing is, even though people are really excited about their work, we don't want to hear the excitement when they're describing the feature. If they want to showcase or do marketing, then please write a blog post. So here's a bad example.
17:40
Super overlays are a great way to publish super cool data sets awesomely in Google Earth. That's a little bit over the top. Instead, we want to take a technical angle if there is something positive to say. Super overlays allow you to efficiently publish data via Google Earth. We also try to avoid the use of slang. Slang differs across the world,
18:01
and can actually be very opaque, especially if you're coming as English as a second language, to understand what people are referring to. So we try to be very succinct and professional in our communication. We also try to provide direct commands. So rather than say it a little bit backward,
18:22
now let's add a shapefile by, we're going to just be very direct, add a shapefile. We also follow Wikipedia naming conventions. I don't mind what naming conventions you use, just as long as you're consistent. GeoServer is also an international application, so we make use of a facility called Trans Effects
18:43
for translations, and this is a website where you can log in and grab the text files out of GeoServer and start to translate them. Any questions on that? I'm probably going pretty quick.
19:02
Someone must have a question, or a horror story of documentation that wasted their time, come on. Oh, it is too.
19:22
He beat me to it. Hey, what I wanted to ask, do you track somehow what's documented or not? Do you have some kind of a map, or like, for instance, for test cases, there's this code coverage metric that you can say, okay, my test cases cover my code or not. Do you have something like that for documentation,
19:41
that you know what's documented and what's not, and what's to document next? We don't have that. We've got a bug tracker, and so we use that to find holes in our docs, so it's something we depend on our user community for, but I was talking to someone from the QGIS project earlier, and they've taken the stance of having their documentation right next to the source code in the source tree,
20:03
so the feature gets deleted, the documentation gets deleted at the same time, but we don't have any kind of failsafe like that in the GeoServer or GeoTools docs. It is probably a good idea. Thank you.
20:21
You brushed up pretty fast on the code block in the re-structured text. Can you show it again? You had an example of the really awesome, pretty code blocking in re-structured text, and if you can elaborate on that, and how it visualizes the re-structured text through Python or whatever format.
20:41
Just trying to see how much time I got. How much time do I got? Two minutes. Two minutes? Okay. I haven't done this all the time, so I'm going to quickly try and do an example. Of course, whenever you type live at something,
21:02
it's kind of terrible. People can't see it. So here's our GeoTools docs. Here's our fabulous quick start tutorial. If I can find it. Can anyone read that at all? A little bit.
21:24
So here we've got a little thing where we're saying literal include artifacts eclipse any. So this is a reference to another file. It's a build file, and that's going to be slurped into the documentation at that point in time. Here's another part where we're including
21:42
part of the palm object model that we're trying to talk about. And if I quickly go to docs.geotools.org, and go to our happy little tutorial user guide thing, quick start, eclipse, we should be able to find these little code examples.
22:03
Here we go. So here's a little bit of that XML file that we included. And one thing that's really nice here is that this will stay up to date as the code base is updated. So you can see that we haven't released GeoTools 15 yet,
22:21
but because this is the latest docs, this file has already been brought up to date. And more importantly, if I scroll down here, there's actually source code, somewhere, source code. And that is like a live code example that gets compiled as part of our documentation
22:40
before the Sphinx code is run. Does that help? That helps a lot, thank you very much. Okay. We'll also find that all the OS Geo live documentation is done with this toolchain, so it is something you will see at different OS Geo projects. Thanks everyone for your time,
23:02
and documentation is hard to maintain, so please be kind to the developers and pitch in and help out. Thanks.