We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Toolmaker’s Guide

00:00

Formale Metadaten

Titel
The Toolmaker’s Guide
Serientitel
Anzahl der Teile
188
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produzent
Produktionsjahr2014
ProduktionsortPortland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Opening Keynote, FOSS4G 2014, Portland, Oregon
Schlagwörter
SoftwareentwicklerDifferenteProzess <Informatik>BitSuite <Programmpaket>Quick-SortComputeranimation
PunktForcingZahlenbereichProgrammbibliothekResultanteAnalysisDatensatzVirtualisierungTaskAggregatzustandSchnittmengeStreuungEin-AusgabeGewicht <Ausgleichsrechnung>Quelle <Physik>Rechter WinkelMultiplikationsoperatorE-MailRhombus <Mathematik>TabelleGradientFamilie <Mathematik>EntscheidungstheoriePlotterMaßerweiterungKartesische KoordinatenQuick-SortEuler-WinkelSkalarproduktOrdnung <Mathematik>MAPEinflussgrößeInterface <Schaltung>Peer-to-Peer-NetzGibbs-VerteilungFolge <Mathematik>Rationale ZahlRauschenDatenfeldMusterspracheFunktionalDefaultProzess <Informatik>MereologieBenutzeroberflächeCASE <Informatik>Projektive EbeneNichtlinearer OperatorTypentheorieArithmetisches MittelAuswahlaxiomWort <Informatik>MehrrechnersystemSurjektivitätVisualisierungTropfenUnrundheitGraphfärbungFontAuflösung <Mathematik>Uniformer RaumRichtung
PixelFeuchteleitungQuick-SortGraphfärbungRepository <Informatik>Zentrische StreckungGamecontrollerTransformation <Mathematik>HardwareElement <Gruppentheorie>MAPVisualisierungSchraubenlinieMapping <Computergraphik>Interface <Schaltung>Profil <Aerodynamik>TypentheorieTermNummernsystemProxy ServerFlächeninhaltGeradeMathematikNeuroinformatikDatensichtgerätBasis <Mathematik>RandomisierungGreen-FunktionUniformer RaumKategorie <Mathematik>SkalarproduktMereologieFarbenraumDifferenteAbstandKontrast <Statistik>Einfache GenauigkeitOrdnung <Mathematik>KreisbewegungEndliche ModelltheorieBildgebendes VerfahrenZeichenkettePunktRechter WinkelProzess <Informatik>AuswahlaxiomGüte der AnpassungKreiszylinderÄhnlichkeitsgeometrieSchnittmengeFarbverwaltungssystemMusterspracheWürfel
Gewicht <Ausgleichsrechnung>Lokales MinimumQuick-SortObjekt <Kategorie>Komplex <Algebra>BitImplementierungFlächentheorieFlächeninhaltWort <Informatik>GruppenoperationCodeMultiplikationsoperatorPunktZentrische StreckungCASE <Informatik>TermMaterialisation <Physik>FunktionalE-MailInterface <Schaltung>InterpolationKardinalzahlGraphfärbungDifferenteTypentheorieDemo <Programm>FarbenraumWürfelSchraubenliniesinc-FunktionProzess <Informatik>VisualisierungStichprobenumfangBenutzerfreundlichkeitWellenpaketReelle ZahlPerspektiveMinkowski-MetrikPlotterSpeicherabzugMathematik
SoftwareCodeTermFramework <Informatik>Quick-SortMultiplikationsoperatorRechter WinkelDimensionsanalyseMinimalgradInformationsverarbeitungTaskExpertensystemProzess <Informatik>FeuchteleitungGüte der AnpassungFunktionalZahlensystemMAPRichtungComputeranimation
FunktionalSoundverarbeitungMapping <Computergraphik>Schnitt <Mathematik>ElementargeometrieTransformation <Mathematik>PunktProjektive EbeneBoolesche AlgebraNichtlinearer OperatorAdditionNatürliche ZahlCASE <Informatik>TaskZahlenbereichPhysikalisches SystemPolygonMinimalgradMereologieFächer <Mathematik>MAPQuick-SortUnendlichkeitSchnittmengeDifferenteBitKoordinatenService providerKreiszylinderKegelschnittTouchscreenGeradeRadiusKreisflächeHorizontaleCulling <Computergraphik>QuellcodeSelbstrepräsentationAnalogieschlussKugelkoordinatenMathematikPolstelleOrdnung <Mathematik>Automatische HandlungsplanungObjekt <Kategorie>KugelSpiraleMustervergleichSphärische GeometrieWeg <Topologie>Lambda-KalkülTransversalschwingungArithmetischer AusdruckTermSoftwaretestMehrrechnersystemCodeKreisbewegungRechenwerkImplementierungKomplex <Algebra>Rechter WinkelKreisbogenÜberschallströmungProdukt <Mathematik>MultiplikationTypentheorieTabelleProzess <Informatik>Güte der AnpassungDruckspannungEinfacher RingComputerunterstützte ÜbersetzungShape <Informatik>ResiduumPortscanner
GeradeAutomatische HandlungsplanungKugelkoordinatenAnpassung <Mathematik>KreisbogenKugelProjektive EbeneLineare AbbildungKurvenanpassungFlächeninhaltPunktQuick-SortURLDichte <Physik>PolygonMAPKnotenmengeKrümmungsmaßStichprobenumfangInvarianteMultiplikationsoperatorNichtlineares GleichungssystemMinimalgradTransversalschwingungPixelTouchscreenSchwellwertverfahrenMetrisches SystemProdukt <Mathematik>BitNichtlinearer OperatorAbstandMathematikKreisbewegungGarbentheorieDifferenteTopologieRechter WinkelKomplex <Algebra>Physikalisches SystemCASE <Informatik>Computeranimation
CASE <Informatik>Projektive EbeneGeradeProdukt <Mathematik>StichprobenumfangKurvenanpassungMinimumPunktOrdnung <Mathematik>Rechter WinkelZahlenbereichQuick-SortEinfach zusammenhängender RaumSchnitt <Mathematik>Minimalgrad
Projektive EbenePunktTransformation <Mathematik>Physikalisches SystemQuick-SortProgrammbibliothekFunktion <Mathematik>TermSchnitt <Mathematik>ElementargeometrieBitTypentheorieProdukt <Mathematik>Arithmetischer AusdruckKategorie <Mathematik>MAP
Projektive EbenePolyhedronQuick-SortMAPDifferenteMereologieInterrupt <Informatik>DialektPhysikalisches SystemKategorie <Mathematik>KugelTransversalschwingungMultiplikationPolygonElementargeometrieProdukt <Mathematik>LastPerspektiveCASE <Informatik>SchnittmengeWort <Informatik>Vorzeichen <Mathematik>MultiplikationsoperatorInformationMapping <Computergraphik>Minkowski-MetrikWärmeübergangTopologie
CASE <Informatik>MinimalgradGewicht <Ausgleichsrechnung>Objekt <Kategorie>DistributionenraumProgrammbibliothekQuick-SortVisualisierungSchnittmengeTypentheorieStichprobenumfangExogene VariableBitPhysikalisches SystemSoundverarbeitungMereologieInverser LimesMultiplikationsoperatorNichtlinearer OperatorOrdnung <Mathematik>Rechter WinkelDifferenteGüte der AnpassungComputeranimation
Quick-SortMAPShape <Informatik>Ordnung <Mathematik>BrowserNichtlinearer OperatorVisualisierungSpieltheorieFunktionalTermKonstanteFlächeninhaltAggregatzustandTaskDifferenteGruppenoperationTypentheorieProgrammbibliothekGradientMailing-ListeGeradeMereologieArithmetischer AusdruckBildgebendes VerfahrenComputeranimation
MereologieShape <Informatik>SelbstrepräsentationRelativitätstheorieCodeDifferenteFlächeninhaltKontrollstrukturDemoszene <Programmierung>Quick-SortDynamisches SystemProjektive EbeneMathematikMapping <Computergraphik>ProgrammierumgebungInteraktives FernsehenHydrostatikBenutzerfreundlichkeitVarietät <Mathematik>KonditionszahlTypentheorieKategorie <Mathematik>SkalarproduktUmwandlungsenthalpieOrdnung <Mathematik>FehlermeldungGeradeMultifunktionSzenengraphProgrammschleifeSchreib-Lese-Kopf
Transformation <Mathematik>SelbstrepräsentationKategorie <Mathematik>Quick-SortRelativitätstheorieStandardabweichungKontextbezogenes SystemKreisdiagrammObjekt <Kategorie>Pi <Zahl>BenutzerbeteiligungArithmetisches MittelTypentheorieAttributierte GrammatikElement <Gruppentheorie>WinkelFolge <Mathematik>DifferenteNichtlinearer OperatorFunktionalInverser LimesZusammenhängender GraphZahlenbereichKeilförmige AnordnungCodeKreisbogenEndliche ModelltheorieSchnittmengeTopologieMereologieSchweizerische Physikalische GesellschaftHalbleiterspeicherCASE <Informatik>Browser
FlächentheorieQuick-SortPunktKreisbogenWinkelWasserdampftafelTropfenBitAbschattungProjektive EbeneKurvenanpassungGraphfärbungKugelInstantiierungGradientGraphische PrimitiveRadiusSchweizerische Physikalische GesellschaftCASE <Informatik>p-BlockMAPObjekt <Kategorie>MultiplikationsoperatorMinkowski-Metrik
FunktionalBrowserSelbstrepräsentationSoftwareentwicklerAdditionMaterialisation <Physik>StandardabweichungCodeKategorie <Mathematik>DatenstrukturDifferenteGraphfärbungAbstraktionsebeneComputeranimation
WellenpaketDemoszene <Programmierung>MultiplikationsoperatorSchnittmengeElement <Gruppentheorie>StandardabweichungQuick-SortNichtlinearer OperatorVisualisierungGebäude <Mathematik>Kernel <Informatik>SummierbarkeitMaterialisation <Physik>Exogene VariableSichtenkonzeptSzenengraphInformation
TermGüte der AnpassungComputeranimation
Transkript: English(automatisch erzeugt)
All right, so hello, everyone. The purpose of this talk today is to really just convey my personal approach to tool development, and if I can, sort of distill some guiding principles from my own experience on how to design effective tools but maybe, you know, principles is kind of
overselling it a little bit. It's hard to define these things concretely, so maybe I'm just gonna give some hand-wavy advice instead, but I wanna articulate my thought process and motivations behind some of the tools that you may have used, such as D3 and Top Adjacent, and although this talk was conceived primarily as a talk for tool makers
on how to make better tools, I hope that it also speaks to people that use tools as well, which is really all humans, but especially people that are attending this conference, and so, you know, I hope it makes you consider how the tool that you've chosen to use affects your work
and whether, you know, maybe you could switch to a different tool that suits your needs more precisely or even whether you want to become a tool maker yourself to build the perfect tool. Now, before I can really convey how to design a tool effectively, I think we kinda have to step back a little bit
and ask some more basic questions about how tools affect the creative process, and I think the role of tools is often taken for granted. I mean, they help us a lot. We use them. I mean, obviously, we use them, so we don't really think about the role that they play so much, so I'm gonna ask some perhaps obvious questions
or basic questions now, but I think it's a useful starting point, like what is a tool anyway, and why do we use them, and what role do they play? And my answer is that tools are really, the tools exist to increase our efficiency, where efficiency is a measure of the value produced
per the effort expended, and value can mean the quality of the good that you're creating or the quantity, the number of things, and effort can mean the amount of time it takes or the level of exertion required or even mental anguish, right, because your emotional state
when you're doing something obviously matters, and so for a physical tool, that might mean applying leverage to pull a nail out of a wall, or for a virtual tool, it might mean reusing a library in order to build a map more quickly, but the point here is that tools have extrinsic value rather than intrinsic value,
which means they have no value inherent onto themselves, and I think you understand this, right? You can't eat a hammer. It doesn't feed your family directly. You have to use it in order to produce value, and so even though we're asking a fairly obvious question here, I think there's sort of a non-obvious implication, which is that as a tool maker,
you have to remember that your tool has only extrinsic value, so you can't get too enamored with the idea of the tool and the abstract. You instead have to care about how people actually use your tool in practice, and another important consideration is that tools do not increase our efficiency uniformly.
A tool makes some tasks easier and some tasks harder, and the reason for that is that tools are designed by people, or in the worst case, by a committee, to solve particular tasks, and it's tasks within that intended set that are made easier,
and tasks outside that intended set are often made harder, and that's simply because tools don't make tasks easier by chance. They have to be designed intentionally to address those specific problems, so as a result of that, tools then influence how we solve these problems, so all creative processes require
sort of choosing between different competing approaches, and as rational actors, we want to choose the sort of sequence of operations that maximizes the total produced value relative to the amount of effort, so that might mean we want to make the best graphic before deadline, right, we're maximizing value,
or it might mean that we want to balance multiple simultaneous projects, like minimize the amount of effort, or it could even mean that we want to do a reasonably good job, and we still want to spend some time with our families, so tools are really changing the inputs into this cost-benefit analysis that we do when we decide how to solve a particular problem,
and to me this is related to the adage, which you probably heard, Maslow's hammer, if all you have is a hammer, everything looks like a nail, and to me that also speaks to this influence is sometimes subconscious, it's not always overt or obvious how a tool is sort of changing
our decision-making process as we create something. Now one of the sort of canonical examples I think of in my field at least is Excel, if you want to make a scatterplot in Excel, Excel makes a bunch of choices about how that scatterplot should appear, right, so it chooses that there should be a legend and a title,
which is based on the header row here, Y, not very helpfully, it chooses the number of ticks to draw on the X axis and the Y axis, and the color and the font of those ticks, and then of course most importantly it chooses how the actual dataglyphs, the dots in the scatterplot appear,
and so we should really admire the attention to detail that went into these dots, right, there's a diamond with this vertical linear gradient that goes from a light blue to a medium blue, there's a dark blue edge with nice rounded corners, a gorgeous drop shadow, it's really stunning the amount of effort
that went into this dot, and I think they're hoping that everyone will really be impressed with your scatterplot as a result of all of this design work, but I mean obviously I'm being sarcastic, I don't think that this is really the right dot that you should use in most applications, particularly because all these little details
sort of contribute a lot of visual noise, and really you want to understand the pattern in your data and having a lot of noise in these glyphs just ends up being distracting, so this is the type of glyph that maybe it works if you only have 12 points in your scatterplot, in which case you maybe should be just drawing a table anyway and just printing the raw values.
Now of course Excel doesn't force you to make a scatterplot this way, it has an extensive user interface for customizing the appearance of that scatterplot, but the issue is like even if it's possible to customize that, it's still pushing you towards that default appearance, right,
just because something is possible does not mean that people will actually do it, they will actually use this functionality, and that's especially true if the interface for customizing it is sort of an afterthought, it's partially neglected, it's hard to use, like it's this complicated sequence of menus and it's not always clear what these things are doing,
so we have to think about those trade-offs and what the interface is pushing us towards. Now this is a slightly more subtle example, but it's something I think about as well, so you may recognize this, this is a detail of a liquid crystal display, and these are the sort of red, green, and blue elements that comprise a pixel.
So each of these little lights are individually addressable as sort of the color of each pixel, and this type of display really is the basis of the most common color space that we use with computers, RGB, I mean prior to LCDs of course, phosphor displays had similar red, green, and blue dots.
And so that RGB color space is basically direct control over the hardware, and that's not entirely true because there is a slight transformation that's done in terms of your color profile, and the temperature, and the gamma, but relatively speaking it's a fairly low-level interface in order to color.
Now RGB is a perfectly fine color space, particularly for things like photography, you don't really care, it can reproduce a large gamut fairly easily, but it's not a great color space for visualization, and that's because it lacks this property of perceptual uniformity, which means if you increase the intensity
of let's say the green channel by a fixed amount, you perceive that differently than if you increase the intensity of the blue channel by the same amount, right? So the distances in color space don't correspond to our perceived differences, and that means that this color space is basically distorting your data
when you use RGB in order to encode data. Now of course we use RGB everywhere, so this is an example that I made, it's just a toy choropleth map, it doesn't even have real data here, the color is actually the area of each of these counties, and I did that just because I was lazy,
and if you do a random choropleth it's completely useless, like area is actually a fairly reasonable proxy because there is a geographic pattern that's easier than picking a data set. But of course this is a pretty terrible choropleth, both because the RGB color space is not a great way of encoding data,
and because the particular colors that I picked here of steel, blue, and brown are just kind of muddled, so there's not a lot of contrast here. But I'm including this example mainly to show how, in D3, if we don't do anything, we're encouraging people to use this RGB color space,
which is not a good choice for visualization. And furthermore, if we build these examples and we don't do a good job of picking colors, we're encouraging people to use these bad color scales. So one of the things that I've tried to do since then is, well first we added support for these perceptual color spaces in D3, so this is the same choropleth, but now using the HCL color space.
There's two color spaces that sort of correspond to RGB and HSL, so one is lab color and the other is HCL. So HCL is a cylindrical color space, so you can specify the hue, and then anyway. So I hope you can see that this is obviously a better color scale for this choropleth,
has better contrast, and also the changes in this color actually correspond proportionally to the data. And then similarly, another thing I've been trying to do is encourage people to use ColorBrewer. Like ColorBrewer is an excellent set of off-the-shelf color scales, and there is a perfectly fine interface on colorbrewer2.org for picking out your color scales,
but I think there's still more that we can do to make those easier to use. So that includes including examples with D3 that use ColorBrewer scales. The D3 repo has a JavaScript and CSS interface to ColorBrewer to make it easier to apply those scales to your maps or to your visualizations.
And then I've also made just a simple example that lets you point and click in order to get the string of colors that you can then plug into any scale. So really it's about sort of identifying how you want people to do things the right way, and then making those ways sort of more discoverable and easier for people to adopt. You're lowering the barrier to entry.
Now this year I also discovered this really interesting color scheme called Cube Helix by Dave Green, sort of more popular in astronomy for applying to astronomical images. And it has this nice property that it has a continuous increase in perceived intensity, you know, from light to dark.
But it also does a rotation through hue as well to try to increase contrast. And really this is intended to replace sort of that ubiquitous rainbow color scale that you see a lot in scientific visualization that has a lot of really bad properties in terms of perception. But the thing that I especially love about this color scale, I mean,
so this particular one is sometimes called the ugly watermelon. It's not maybe the most aesthetically pleasing color scale. But in fact, the Cube Helix is really more of a color space than just a single color scale. And so you can create new color scales by simply drawing a different line through that color space. And so this is an example which is actually
fairly similar to some of the other color brewer scales. And so you can create your own color scales that have that nice property of perceptual uniformity but still have sort of control over the aesthetics as well. Now this is what the code looks like. Obviously you're just sort of saying
these are the start color and the end color and then we're gonna interpolate between those two colors. And I'm only including this really to just show that there's no real difference in the API when you switch from one color space to another color space, right? It's just a matter of replacing interpolate RGB with interpolate HCL or interpolate Cube Helix. And so from an API designer's perspective,
what I find so compelling about this is that this is making a dramatic improvement in terms of the quality of the visualization, right? We're switching from a color space that distorts the data to a color space that is higher fidelity. And along with that, there's really no cost to that change because you're just sort of changing one word
to another word. You're not really increasing the API surface area. There's a little bit of implementation complexity which was sort of my initial objection to adding this feature. But really what you care about is the API weight. And this has minimal API weight and a huge increase in value.
So I include the choropleth example as well to illustrate that when we're thinking about how our tools are influencing the creative process, it's not just the tools themselves that matter, but the examples and the documentation, the tutorials, the related materials that we provide with that tool as well. And examples are particularly powerful in this regard
because they're sort of the starting point. It's easy for people to pick up an example and just sort of ignore all the rest of the documentation and then just hack on that. And so if your example, you made it fairly quickly and maybe you didn't consider all of these subtle aspects, examples can also be dangerous in that regard
in terms of making it easier for people to replicate a bad practice like the steel blue to brown color scale. So when you're making an example, it's often the case that you're trying to demonstrate one particular good practice and you inadvertently demonstrate a bad practice at the same time
because it wasn't really the point you were trying to make. So when I'm creating examples, I try to be careful about that and also to articulate sort of what the intent of that example is so that if I do do something silly because it's fun or because I'm lazy and I just wanna do this quickly, as long as you're making it clear to people
that they shouldn't do this or that this is a bad practice and here's why, then at least you're mitigating some of that risk. And then just as a personal example, like I think the example that I'm most ashamed of is the D3 showreel which we made when we published the paper in IEEE InfoVis.
And the showreel was trying to showcase like all of the different transitions that were possible within D3 and it sort of went through eight different chart types and I'm not gonna show it because I'm embarrassed by it now, but even though it was a cool demo of all of the different functionality of D3, the problem was people saw that since
and they were like, okay, how can I get my data into the showreel? I mean, I've gotten numerous emails from companies, from people that want to replicate the showreel with their data and it's just like really frightening because it's a horrible interface to that visualization, but I didn't make it obvious when I published that example
that the purpose was sort of just to demonstrate functionality and not be a usable interface. Now, I've tried to do a better job since then, as I said, with my examples and hopefully I've started to drown out some of those bad examples with more good examples as well. So I said previously that tools can make things harder,
which seems somewhat counterintuitive, right? Like don't tools just make things easier and if a tool isn't intended for a particular task, can I just switch to a different tool that is intended for that task? But the problem is you can't do that and there's this concept of viscosity which comes from a framework called the cognitive dimensions of notations
and the framework is really designed to evaluate sort of how easy or hard to use software is and this concept of viscosity represents sort of this resistance to change, right? Like if you have a piece of code, how easy is it for you to edit this piece of code to do something else? And even though this framework is really intended to analyze software,
I think it applies equally well to sort of humans and the creative process using tools in the sense that we develop these learned behaviors, we become familiar with particular tools, we become experts in those tools and that makes us resistant to then not use those tools in the future if we have our favorite tool. Even if we know that that tool
is not really the right tool for the task, if it's the tool that we're most familiar with, we're gonna have a tendency to use that as well and to some degree this is rational, right? Like you don't want to be constantly learning a new tool every time you do something. You want to leverage that learned behavior. But at the same time there's a risk that you become over specialized
and then you become inflexible and unwilling to change. So you have to balance those two things. And so even though examples can be dangerous in terms of like propagating bad behaviors and they can of course be good in propagating good behavior, one of the things that I think is universally good about examples is that they are lowering viscosity, right? Because they're making it easier
for people to pick up the tool. So, you know, and that's true of documentation as well, right, you're lowering the barrier to entry. And if you think back to this concept of extrinsic value I feel like documentation, tutorials, examples are really one of the best ways
that you can spend your time as a tool maker. Because you're not, you know, maybe you're not addressing like the particular functionality of your tool, but you are making your tool accessible to more people and you're making the tool, or you're making those people use your tool more effectively. So you're able to produce more extrinsic value even though you're not working on the tool directly.
Okay, so here's a shameless plug of a tutorial that I wrote recently that I encourage you to check out if you're interested in D3 and top adjacent. It's about making a bubble map for US counties. So just to recap the first part. So all tools have bias because they're designed
with a particular intended set of tasks in mind. And there's no single tool that does everything equally well, right? There's no sonic screwdriver from those of you that are fans of Doctor Who. And we can't, or there's a cost associated with switching tools as well. So we can't sort of constantly just switch to the right tool for the particular task at hand.
We're always going to be resistant to change to some degree. And thus the tool that we have chosen is going to affect our end result. And we therefore have to keep that in mind as we're designing something, try to make that sort of subconscious influence more explicit. So now I wanna look at sort of a more specific example,
hopefully relevant to this audience, about map projections. Now, often when we think of map projections, it's sort of tempting to think of them as these point transformations, right? You have a function, it takes a point in longitude and latitude, and it returns a point in x and y.
And it's a very deceptively easy approach. It matches the mathematical definition, right? Like if you go to Wikipedia and you look up a definition, it's gonna say like x is some function of lambda and phi, and same thing for y. And so often you might see code like this, right? This is a implementation of the spherical Mercator projection.
It's simplified a tiny bit and assumes a unit radius for the sphere. But this is really all that code does. Now, unfortunately, when you think of projections this way, it ignores a critical step, which is that in order to project a sphere
down to the plane, you have to make a topological change. Those two objects are topologically distinct. And so you actually have to cut the sphere in order to flatten it down to the plane. And this is often explained to newcomers by using the analogy with an orange rind and squishing that down onto a table.
But math can kind of get away with this continuous representation because you can just think of it as an infinite number of points, and you're just mapping one point from the sphere down to the plane. But that's not true with computers because we're not dealing with continuous representations, at least in most cases, right? We have discrete geometry, we have a set of points, we have lines, we have polygons.
And so that adds a level of complexity to the projection process, and we have to change our implementations accordingly. So this is what it looks like when we're cutting the sphere and flattening it down to the plane. And so any geometry that crosses this orange line here, either it crosses the anti-meridian
or it goes around the pole, that has to be cut in order to faithfully reproduce it on the plane. Now I said people often think of projections as point transformations. But so how is it that they even work? And the answer is that they rely on the geometry being pre-cut.
So very often the sources of data that we have already have this cut baked into the geometry. And if you download data from natural earth, it's already gonna be cut along the poles and the anti-meridian for you. And so basically someone else solved the problem for you, and so if you're just implementing your own map projection system, it's very easy to ignore this critical problem.
But this ends up being a source of bias. You can no longer change your central meridian because if you do that, you end up with this. And this sort of reminds me a little bit of the VCR tracking problem. For those of you old enough to remember what a VCR is.
Okay, so this is an equirectangular projection. Of course you get the same problem with other projections, so this is a conic projection, which obviously you don't use very often for doing the whole world. But I'm including it because it's a particularly
bad example because what you have here is Antarctica, which goes around the south pole, is then basically inverting the entire map because the polygon ends up sort of wrapping around the entire top. So what we do in D3 is that we apply this cut dynamically, so after rotating the geometry to the desired central meridian,
we then apply that cut. And that means that you can now take any of these projections and just freely choose what you want to use as your central meridian. So same idea here with the Winkel triple. Actually, this one's a little bit easier to see sort of what's going on because basically as the polygons sort of hit one edge of the screen, you can see the line
going back to the corresponding edge on the other side of the screen. Like in the case of Antarctica, it's always on there. Same thing fixed. Now this is true not just of sort of conic and cylindrical or pseudo-cylindrical projections. It's also true of azimuthal projections as well,
so this is an orthographic projection. And so the naive solution would just be to call the points that are 90 degrees away from the origin. And this actually works sort of okay, but as you can see, there are some artifacts here because we're not able to sort of fill in the gap the part of the polygon that got culled.
And so instead, it's just drawing a straight line between the two points that were closest to the horizon there. So D3 actually supports two different types of geographic clipping right now. One is the anti-meridian cut, just along any meridian.
And then the second is a small circle, so relative to the origin. And in this case, we're doing a 90 degree radius on that circle. And Jason Davies is actually working on some other types of geographic clipping to give us more flexibility in the future and some other really cool operations as well, like Boolean operations on spherical geometry.
But the point of this is you can't think of projections as point transformations. You instead have to think of them as geometry transformations. And so rather than just being a function that maps a point to a point, you have to think about a mapping from a point to zero or more points. For example, if that point is culled or you may have multiple points,
if you have a composite projection with all these different inset views, like it's very common to show a map of the US and then you also have Alaska and Hawaii. And in some cases, it may be possible to have the same thing represented twice. And so your projection system needs to be able to handle those scenarios. And similarly, like a line can be cut or clipped
and it appears multiple lines and a polygon can be cut and clipped the same way. Now, let's say we build a general projection system that solves all these problems. It turns out we still have sort of a residual influence. So can anybody identify what is wrong with this map?
There's probably multiple things wrong, so don't. Right, so I think somebody pointed it out. There is a cut here, which is not a country border. It's this blinking red line. And so this is a line along the anti-meridian. So this is a line through sort of the eastern part of Russia.
And the problem is, even if we are applying this cut dynamically, we still have the old cut that was there. And so if we're drawing a stroke around the geometry that we got, in this case from natural earth, it's gonna have this residual effect of the old cut. And so in addition to applying this cut dynamically, we also have to undo the effect of the cut
that was originally baked into the geometry. So this is functionality that Topo Jason provides. And it's actually sort of a harder problem that it appears. It's not just a case of finding, doing a little bit of pattern matching and then removing those coordinates. You can actually get cases, for example, where you have two sort of U-shaped polygons
that are cut in half, and then the outer part of the polygon now becomes the exterior ring, and the inner part now becomes a hole. And Jason has made some really cool examples of these elaborate spherical spirals in order to stress test our different clipping approaches. So again, we're sort of trying to identify here
bias in terms of limiting our expressiveness that are built into these sort of simpler existing approaches, and then remove those and increase people's flexibility in how they apply these tools. Now I've talked about sort of rotating the central meridian, or just rotating longitude in order to change the central meridian.
But if we wanna do other aspects of projections as well, like transverse or oblique aspects, it actually raises another problem, which is that very often the geometry that we get, even if it's in sort of spherical coordinates, and by that I mean like WGS 84, like EPSG 4326.
So you think of it as being in longitude and latitude, and that somehow represents spherical coordinates. But in fact, a better way of thinking of that is that it's more like plate carre or equirectangular in the sense that each of those points there, it's assuming a straight line on the plane between those two points. So you're just interpolating X and Y linearly
in your polygonal geometry. And the problem is, if you rotate not just longitude, but if you rotate the latitude, or if you rotate gamma as well, then you actually can change the shape if you're applying a spherical rotation to these coordinates that are in plate carre. And so instead, what we need is to represent
our geometry in true spherical coordinates, which means that the lines become great arcs, right? So if we wanted to do something like this, where we're just rotating an orthographic production to show different countries, now every time I see this, I think of somebody forked this example and added the yaccos,
countries of the world, from Animaniacs. So I have that song going in my head. Or similarly, if we want to finally remove the northern hemispherical bias of the Mercator projection, obviously you wouldn't really do this, but it's kind of cool to watch, we need to have true spherical coordinates.
We need to represent lines not as straight lines on the plane, but as great arcs so that we can just rotate them and they don't change. So they have true rotational invariance. But of course, this then introduces another problem in our projection system, or another level of complexity, which is that we're no longer, even if we ignore the issues of clipping and topology,
we're no longer just projecting or doing a linear transformation on a line, right? We can't just take one point, or the start point and the end point, and then transfer them to a different location. Because a great arc on the sphere very often becomes a curve when it's projected to the plane. So the way that we've solved that in D3 is that we have this concept of adaptive sampling.
And what adaptive sampling does is it basically samples these intermediate points along the line, and then projects those points. And it has a quality metric, which we use the Douglas-Poyker, same thing from line simplification, just now applied to projection. So this is how it works.
Basically, in this case, we're taking the equator, and we have sort of a slightly off transverse projection. And we're bisecting each segment of the equator here. It starts out as being like 90 degree segments. We take the midpoint of those segments, and then we project that point. And we just measure the perpendicular distance, which is the white line between the new sample
and the existing line. And that white line basically is our quality metric. It represents sort of how much better the line is if we add that sample. And so then we recursively subdivide, and we keep doing that until the line is short enough that we don't really need to do it anymore. And that threshold is specified just in pixels on screen.
So in this case, it might be two pixel distance or something like that. Now, there are different approaches that we could take. I mean, obviously, we can't do nothing, because then we get what you have on the left, and we just see these horrible polygonal artifacts. But another common approach would be just to do uniform sampling, which is a bit like, I think,
the ST segmentize operation in PostGIS. So in this case, we're taking the equator, and we're sampling it every four degrees rather than every 90 degrees. But the problem is, if you do uniform sampling, it doesn't understand anything about the projection. And so those samples don't end up being used efficiently. So you have a lot of samples that are used in these flat horizontal sections,
which are areas of sort of low curvature. But even though you have so many samples here, you still have some polygonal artifacts, because along vertical edges here, you don't have a sort of sufficiently high density of samples. And so by taking the projection into account, when we're doing sampling, we can then identify these areas of cutting,
but we still have this sort of connection on Antarctica, where it goes around the pole, or sorry, from minus 180 to plus 180.
And if we don't do any adaptive sampling here, then we get just a straight line. In this case, it's a 90 degree line. That's sort of an artifact of how we do the anti-meridian cut. But once we apply adaptive sampling to that, it can automatically determine sort of the right number of points that we need to do, or that we need to add,
in order to get a beautiful curve along the bottom of this map. And of course, it works with any projection, even these like silly outdated projections, like the Larry Vai projection here. Okay, so just to recap a little bit again, what we've tried to do is identify
a bias in map projection systems that are based primarily on these point transformations, because that approach, although easy to do, ends up limiting our expressiveness in the types of projections that we can make. And it's particularly bad because it's a little bit insidious in terms of how it depends on these cuts being baked into our geometry,
and how it differs from how people often think of these projections in terms of mathematical functions. And so what we've tried to do with D3 is rather than just focus on sort of building a library of these fixed point transformations, is instead to build a system that makes a general projection from any point transformation, so that we fix these problems,
we increase expressiveness, but also we make it easy for anybody to create a new point transformation and automatically inherit these nice properties. So I wanna quickly show just some of the more interesting projections that Jason Davies and I have implemented. So this is a recreation of Aaron Philbrick's interrupted sinumolvita projection.
And it ends up dividing the world, so first by latitude, so the northern hemisphere I think is done, what does it say? Yeah, is a molvita projection, and then below a given latitude, it switches to the pointy sinusoidal projection. But also, the northern hemisphere is divided up into two lobes which have different central meridians. And then similarly, the southern hemisphere
is divided up into three lobes, which can each specify their own central meridian. Sort of a more elaborate example is this recreation of Bartholomew's regional projection by Jason Davies. So this is also interrupted, and above the Tropic of Cancer, it uses, let's see, the equidistant conic projection,
and then below that, it uses an interrupted bond projection. And to do this, Jason implemented, it's called D3-Geopolyhedron. And the nice thing is it basically lets you specify any polyhedron. You see, you map the sphere, or different parts of the sphere, to polyhedron faces, and then you can define whatever projection you want
for those faces. So in this case, he's able to recreate this projection, but there's another example. Actually, I'll skip this one just so I can show you. So this is, of course, Steve Waterman's butterfly map that uses the same polyhedron projection, but in this case, using sort of a more elaborate W5 polyhedron,
I think it is, and a mnemonic projection for those faces. But then another example, now this isn't really sort of interesting from a projection perspective, because this is just a fairly simple transverse sinusoidal projection, but it's using replication, like drawing the Earth multiple times,
and then using clipping to recreate, this is called a shoreline map by Spilhaus, and he did a bunch of these different maps, and sort of, Jason was able to recreate them just by using this clipping technique. And they have this nice property that they only interrupt, they don't interrupt any of the oceans or the continents, so they're only interrupted along the shoreline.
So I think it's a nice example of sort of how flexible this projection system is. Okay, so lastly, I want to now sort of describe a concept that I keep in mind that I think is sort of at the heart of effective tool design. And I'm gonna describe it really by example,
sort of talking a little bit about the history of D3 and its predecessor, Protovis. And so I think we have a tendency when we are designing a tool to really think about sort of its more superficial qualities, right? Like its feature set, or for a physical tool, like its weight distribution, how it feels in its hand.
And so we similarly have a tendency then to forget the problem that we were trying to solve in the first case. And I think to some degree, that's inevitable, right? Like you kind of have to take the problem for granted a little bit in order to design a solution. But at the same time, if you stop questioning that entirely, it sort of sets a limit on your effectiveness.
And so particularly when you're starting out, before you sort of think about what the API for your library should be or whatever, you really have to define the problem that you're trying to solve as clearly as you can. And for me, what I look for, I call the smallest interesting problem.
The smaller the problem, the easier it is to solve. Obviously, that's a good thing if you're lazy, like me. But also, smaller problems are nice because you make fewer assumptions about how the tool is going to be used. And so potentially, your tool can be more broadly applicable. But obviously, you have to still pick interesting problems
because if you solve a problem that's too small, it's not really gonna be any more useful than just doing everything by hand. So Proteviz, as I said, was the predecessor to D3. And it was a JavaScript library for doing visualization. And Proteviz was really a response to sort of the most prevalent approach to doing visualization, which is these chart typologies, right?
So like I showed you Excel already. But there are lots of other charting libraries, like HighCharts, for example, is really popular, for doing visualization. And the way that these work is they just have, they've enumerated sort of the six or eight different popular chart types, and you just say, here's my data, and here's the chart type that I want,
and then it creates the chart for you. And so the atomic operation for those tools is picking the chart type or creating the chart. And you can do a little bit of customization after that, but I mean, that's sort of the basic operation. And in Proteviz, in order to recreate it, right?
So that's an example of sort of a lower level approach that you're then composing those operations to do something more complicated. But unfortunately, Proteviz was limited by its mark vocabulary. It seems like a fairly easy thing to do in order to just sort of define the basic shapes
that you need in order to create visualizations, right? So this is sort of the list of different mark types that Proteviz supported. You've got areas and bars and dots, and then an image and lines, et cetera. And it seems like an easy thing to do, right? There aren't that many different geometric shapes that you need in order to create charts,
and you can just compose them to do more complicated things. And I think we were fairly successful in terms of recreating some of these older visualizations, historical visualizations by Playfair, for example, or Menard. But the problem was, even with that relatively high level of expressiveness, it still wasn't enough.
And that was because you knew that your browser was capable of doing these things, but Proteviz could not do them. So for example, like clipping or gradients or dashed lines or masking or transitions or things like that. And so as the toolmaker, I knew that I could sort of add each one of these features to Proteviz,
but it was extremely tedious to try to wrap each of these features in the library. And as a toolmaker, as I said, like emotional state matters, right? Like you don't wanna work on tedious tasks indefinitely, especially because people are always adding new functionality to browsers, and it would just be a constant game of catch-up.
Now another problem that Proteviz had is that it was slow to do interaction and animation. And I think Proteviz was sort of primarily conceived for doing static charts, excuse me, sort of having a concise definition of those charts. But if you wanted to do any sort of interaction
or dynamic changes to the scene graph, it was pretty slow to do it because it had to redraw everything. And it didn't have any understanding of sort of the dependencies between the data and different parts of the representation in order to do that more efficiently. And then lastly, a big problem that Proteviz had is that it was hard to debug, right? There was a lot of internal control flow
in how rendering was implemented. And so if you had a small error or a typo in your chart definition, maybe it would work, but then you'd make some change on interaction and it would fail. And when it failed, you would end up sort of deep within the bowels of Proteviz and sort of like a very deep stack trace. And it also meant, because it had this specialized representation,
you couldn't just inspect the DOM and see what was going on. Like, you had very little visibility. And I think this is one of those properties that's easy to overlook when you're designing a tool or when you are sort of evaluating a tool, when you're sort of reading about it abstractly. Sort of as toolmakers, we tend to assume or maybe hope that people use our tools
in the way that they're intended. But obviously, people don't do that, right? People aren't perfect. People make mistakes or the documentation isn't clear or a variety of other conditions mean that people don't end up using the tool the way it was intended and so designing a tool that fails in a way that is more usable or considering the usability
of your tool under duress or in these sort of more harsh environments I think is extremely important as well. So I'm sort of harshing on my old project here, but Protovis I think also was a very good idea, at least in one particular regard, which was this concise mapping from data
to visual representation. Like you could say, I think I have some code examples here. You know, you could take an array of data and you could say, okay, it corresponds to this area shape and then you could then inherit a bunch of related shapes from that. So you could add a line to the top of your area. You could add dots to that. And so it was a fairly concise specification. It was more declarative rather than
having a lot of for loops, for example, if you were to do this type of thing by hand. But what we wanted to do with D3 was to try to address some of those limitations and we addressed them by sort of removing functionality rather than sort of taking a different approach.
We redefined the problem in the sense that we didn't want to provide this specialized representation anymore. We just wanted to use the standard DOM instead of Mark. So we no longer had to define like these are the six different Mark types that you can use. Instead, you can use any element that's available in your web browser, which could mean HTML, it could mean SVG,
or it could mean sort of CSS properties. All of those things were available to you. And instead of having sort of instantiating a Mark and then having it automatically sort of maintain that relationship between the data and the representation, you now have this concept of a data join, which was sort of a transformation of the document rather than just instantiating Marks.
So just briefly, like this is some code examples. So this is a pie chart in Protoviz where you're creating a panel, you're associating it with some data, you're setting some properties and then adding a wedge to that. And then this is the corresponding code in D3
where you're adding an SVG element to the body. Again, you're associating it with an array of numbers. But in this case, rather than creating that pv.wedge, which was one of the specialized Mark types, we're just creating a path element, which is a basic element that's supported by SVG. And then we're using these other two components here.
One is called d3-layout.py. Basically, all that does is take the array of numbers and compute the angles, the start angle and the end angle for each arc. And then we have a thing called d3-svg-arc, which is able to take that definition and turn that into the sequence of path operations that you need to actually render the arc.
So we took a larger problem, which was how do we specify a graphical mark? And we've broken it up into a bunch of smaller problems. One is how do we actually create, update or destroy these elements? How do we set attributes on these elements? And then how do we compute the layout, like the angles that we're using for these pie charts?
Now, I think there was a lot of resistance when D3 first came out. Like some people found this to be upsetting. And I think primarily, the objections overlooked sort of the benefits of adopting these standards. And I think that really comes from sort of considering the tools in isolation rather than within the context of the ecosystem
of related tools and technologies and standards that the tools exist in. There were a whole bunch of benefits that came from adopting the standard representation of using the DOM and SVG. All right, so this is an example made by Derek Watkins, my coworker.
And this is all done just in SVG. This is 2D graphics, but it looks very much 3D because you've got this sort of radial gradient here being applied to the sphere. There's actually, I think, two different radial gradients, one for the water and the other for the land. You've got a little drop shadow underneath, and then you've got these arcs. And these arcs are actually implemented using two different orthographic projections
with a different radius so that you can just draw a curve between two points on the surface and then another point out in space. And then he just sort of layered these arcs and did a nice bit of fading when the points end up getting occluded on the back hemisphere. And then there's also, you probably can't see it on the projector,
but there's a small shadow so that you can see where that arc travels along the surface of the Earth. And it seems very realistic, and it's all just done with a simple composition of these 2D graphics primitives. And then similarly, this is another thing that I made. It's a bit of a silly map, but I think it demonstrates sort of some of the crazier things that you can do.
And so in this case, what I've done is take the land and sort of pop it out from the sphere a little bit. And then there's actually three instances of the land being drawn here. There's the sort of light beige color. There's the underside of the land, which is sort of clipped at a different angle that's sort of with ambient occlusion so it looks a little bit darker underneath.
And then there's a third layer, which is the drop shadow, and that's simply done by applying a blur filter to give it disappearance of shadow on the Earth underneath. The standard representation also means you can now take advantage of your developer tools and all this additional functionality that's built into your browser. So this is another example of epicyclic gearing.
And so I can go in here and I can say, oops, inspect element. And so now I can just see this rotating gear and I can see the properties that are defined on it and I can even edit these different properties, whatever, I can change the fill color.
And this helps you better understand the structure of the DOM that you've created. It's about removing this abstraction between what you're specifying in code and what your browser is doing. If you more closely match the representation that your browser is using,
then you can then leverage all of these tools, not to mention all of the additional documentation and materials and tutorials that apply to these standards as well. Another nice feature is that you can take documents that are created by other tools like Inkscape or even hand-coded HTML and you can transform those as well.
And as I sort of hinted at, you can take knowledge that you learned in D3 and then apply that to other tools as well. And so often we think about, or I mentioned this concept of viscosity before, like how much effort it requires to learn a tool or to switch a tool. But if you choose a tool that's based on standards,
then very often you're not learning information that is specific just to that tool, but you're learning information or you're learning skills that you can then take with you if you choose to use a different tool in the future. And that really makes it much easier for people to do, both because they can leverage sort of existing resources or training materials, but it's also a more sensible investment of their time
because they know that they're not really committing that time just to your tool, they're committing it to the broader ecosystem of standards that are going to stick around for a long time. So then just to sum up, I have this concept that D3, or to get back to the idea of this sort of smallest interesting problem, so I sometimes refer to D3 as a visualization kernel,
and the idea is to try to identify that smallest problem in visualization like that always occurs, and no matter what tool you're going to build, I mean, you could build other tools on top of this visualization kernel. And the kernel, the only thing that it really does for you is it takes an existing scene graph, so a document or a set of elements,
and it transforms those elements so that they correspond to data, so that could mean creating elements or updating them or destroying them. And so it's really just that minimal thing that D3 is trying to do for you, and everything else that D3 does is just sort of these smaller decoupled operations that you then compose on top of that.
And that also means that D3 doesn't sort of bake in these best practices into the tool itself, right? It's not deciding how you should do, how you should make your visualization. And so of course there's a risk that you will make a visualization that is terrible with D3, and I'm sure I've done that, and I've seen other people do that.
And so in a sense, you now have an additional responsibility, which is in your training and in your examples, and then the other work that you do is to try to communicate those best practices. But in my view, it's better to do that separately and explicitly to teach people those things rather than trying to abstract those best practices by baking them into your tool.
Because if people understand those principles, they'll be able to apply them more generally in whatever they're doing, even if they're not using your tool. And I believe it's this approach that has really enabled D3 to flourish, both in terms of adoption and also in terms of the diversity of beautiful examples that people have made. Okay, so just to wrap up then,
the toolmaker's guide. So reduce bias by making smaller, more flexible tools. And if you can't avoid bias, then at least favor good bias over bad bias. And above all else, teach users how to be effective. All right, thank you. Thank you.