Notebooks in (geo)datascience
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 266 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66491 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Prizren Kosovo 2023110 / 266
10
17
23
44
45
46
47
48
49
50
52
53
80
84
85
91
110
116
129
148
164
167
169
173
174
181
182
183
186
187
199
202
204
206
209
215
241
248
265
00:00
LaptopComputerTask (computing)Computer programmingCore dumpInteractive televisionCodePresentation of a groupLaptopCodeInformationState observerNeuroinformatikLimit (category theory)Function (mathematics)Musical ensembleMathematicsJava appletBitPatch (Unix)Formal languageMultiplication signGastropod shellCodecMereologyWeb 2.0Computer programmingMedical imagingMacro (computer science)Digital photographyAsynchronous Transfer ModeCore dumpResultantMathematical analysisKnotObject (grammar)MultilaterationGraph (mathematics)Mixed realityPasswordLevel (video gaming)Category of beingMobile appCodebuchMultiplicationProbability density functionWeb pageInteractive televisionComputer networkXMLComputer animation
07:54
Asynchronous Transfer ModeLaptopKernel (computing)Fatou-MengeCellular automatonCodeDisintegrationIntrusion detection systemOvalPlot (narrative)Demo (music)Inclusion mapGraph (mathematics)Texture mappingOperations researchTransformation (genetics)Level (video gaming)MappingLaptopIntrusion detection systemCodeWritingGraph (mathematics)Library (computing)Formal languageAndroid (robot)Server (computing)ArmWordBuffer solutionMetadataBitNeuroinformatikMusical ensembleSquare numberKernel (computing)Operator (mathematics)Computer configurationCellular automatonINTEGRALRoundness (object)Block (periodic table)Function (mathematics)Row (database)Ocean currentGodNatural numberProjective planeFraunhofer-Institut für Materialfluss und LogistikFile formatState observerComputing platformRun time (program lifecycle phase)Category of beingRule of inferenceParameter (computer programming)MereologyLevel (video gaming)Functional (mathematics)Open sourceGeometryGoodness of fitComputer filePlotterWindowWebsiteLink (knot theory)Text editorFlow separationVisualization (computer graphics)Graph (mathematics)Integrated development environmentFeedbackGrass (card game)Computer animationXML
15:39
CodeLaptopEmailCodeLaptopVolumenvisualisierungComputer animation
15:57
Demo (music)Plot (narrative)Execution unitCodeFunction (mathematics)Inclusion mapBlock (periodic table)Fatou-MengeOctaveView (database)Cloud computingPresentation of a groupQuarkCodeLine (geometry)Goodness of fitState of matterProbability density functionPoint (geometry)SoftwareMultiplication signWikiModule (mathematics)State observerRun time (program lifecycle phase)BefehlsprozessorLevel (video gaming)MultiplicationWordAsynchronous Transfer ModeFormal languageLaptopCellular automatonMathematical analysisPresentation of a groupException handlingLimit (category theory)MereologyBitFunction (mathematics)MathematicsModul <Datentyp>Run-time systemTraffic reportingFunctional (mathematics)Source codeResultantWave packetComputer animation
20:30
Computer animation
Transcript: English(auto-generated)
00:08
Thanks, Ian. Welcome to this presentation. I will be speaking obviously in notebooks and geodatascience because I'm a GIS engineer. So let me introduce myself to introduce the
00:22
subject. I'm Nicolas Roland. I'm a GIS engineer at Gustave Eiffel University, and I work for researchers that have trouble to specialize their data or that are not cartographers, so I make my problem of spatial analysis. And to do that, I use a lot of notebooks. Pretty
00:43
much all of my work is in using notebooks. I'm also part of the notebook workgroup in France where we investigate notebook thing, the object, what it is and what you can do with that, and what's the limitation of notebooks, actually. And I'm in this group
01:07
and also part of the OAGO community, and last previous edition and even in this edition people use notebooks, but nobody speak about the notebooks, the actual thing. So I say,
01:22
am I jumping? That's my subject. So to define what is a notebook, I still quote Stephen Wolfram. So he said that the idea of a notebook is to have an interactive document that remixes code, results, graphics, text, and everything else. And I think the important
01:49
things in that quote is that it's an interactive document, so you can interact freely with it, and that mixes code and text, and then after that you get results.
02:03
It might be more text, it may be a graph, another output. And it's the core idea is from light terrain programming. It's something that was thought by Donald Knut, you might know him,
02:26
but it tells us where you should be, refer to, instruct the computer what to do. We should tell people, other people, other human beings, what you want the computer to do. Not to say, do that to a computer, but oh, I ask the computer to do that because I want that.
02:46
And I get this result. So it's the core concept of literal programming. So you give instruction to the computer, and you give information to the reader. It might be yourself, or it might be another reader. So a notebook is a computational
03:04
document that is interactive, that mixes code and text, and should be readable by a computer and a human. So let's get back quickly, many many years ago. In 1978, Donald Knut created tech
03:24
to create a document with a computer. Later on, LaTeX was created to make
03:43
web. That was a mix of LaTeX and Pascal, so you can have a computer document. And there was also the C web with LaTeX and the C language. At the time, the first
04:03
actual notebook was created in 1988. So it's pretty old, actually, by Sivam Ramaphan. It's called Mathematica, so obviously it was a notebook on math, with its own language, math language. In 1992, Knut
04:26
coordinated the literate programming thingy. In 1992, there is the creation of NoWeb,
04:44
which was not a fork, but another thing similar to web, but not limited to Pascal. And also, we have HTML output plus LaTeX output, so it starts to have multiple languages
05:02
and multiple outputs just from your document. A little bit closer to us, in 2001, there was the IPython interactive shell that was released. We can write in your command line Python.
05:24
In 2002, there was Swift. It was something we did from the NoWeb idea, but where you can mix R and R as code and get HTML or LaTeX output. In 2005, SageMap,
05:48
again, it quickly became a notebook on math. In 2011, the IPython tool became a browser-based interactive. It was the first one to do that, and they get some knowledge from SageMap. So
06:07
the IPython community and the SageMap community were pretty close, and they benefit from each other. In 2012, there was the Knitnar engine, which from the R world
06:22
improve what Swift did and take the whole computation of the document with LaTeX. So you press a button, and it compiles the document to take in account cross-references and citations, and at the end, you have your PDF or your HTML page.
06:44
Swift didn't make that. You have to cross-compile things beforehand, so it's an all-in engine. In 2014, there was the Markdown and Jupyter notebook released. I will speak more about
07:01
that later, so I will get there. In 2015, for the people who like Java, there was actually a notebook written in Java, Apache Zeppelin. I'll never use it, so I won't make any comments on that. In 2018, there was the observable notebook released, and in 2020, it will be
07:31
Quarto. I will speak about only those four tools, because they are trending in data science. I won't speak about Org Mode, because it will be hard to explain. I don't have a full session,
07:49
I just have 20 minutes. So there is a lot more on notebooks, but those four are the things I think trendier and funny to use for some of them. And I tested them for Geospatial, so
08:08
I can speak about that. So let's talk about Jupyter notebooks. So they are very popular, anyone know? Anybody has ever used Jupyter notebooks? Yeah?
08:23
Yeah, pretty much everyone. So they are very well known. At first, it was only Python, R, and Julia, but they quickly add new kernels to connect to other languages. So now you can connect to more than 100 languages and things. So we can do a lot of stuff in Jupyter notebooks.
08:46
It's markdown and code cells. Each cells are very distinct, and the file format is JSON. I think it's for me the big issue.
09:02
So there is a good integration with IDEs, especially in the Python world. So there is very good integration with VS Code or PyCharm, because actually Jupyter notebooks are not really good IDEs. They are trying to help you write Python code, but they are not as good as VS Code
09:26
or PyCharm might be. So what does it look like? So you have markdown, everybody is somewhere, you know already. So this is a notebook I stole from Grass, Grass community, and you have cells. Markdown cells, and then you have code cells
09:45
that you can run. You can run each cell in the value by itself, or you can run the whole document if you want. So you can access Python or other geospatial libraries. So
10:03
the notebook itself doesn't provide any geospatial insight. It's provided by the language you use. So Grass community has created several notebooks in Jupyter, so you might want
10:23
to look to it. But you need a Jupyter server to run the notebook if you want to edit it. So you forget to use just notepad++ or VS Code. You have to run the Jupyter.
10:41
And I think it's my personal opinion. Markdown editing in Jupyter notebooks is sometimes a bit tedious. Let's speak about RMarkdown. It's not a JSON file. From the beginning, it's a markdown file with code cells in it. So you can edit it with any text editor you
11:06
want. It might be vi or Emacs or whatever, or even notepad from Windows. It should be okay. It has great integration with IDEs, but especially in RStudio, because it's the same company that
11:23
builds RMarkdown and RStudio. So they make sure everything works well. It's mostly used in the R world, but not only. And it's not limited to R. As you can see, there are more than 150 engines in nightmare that you can connect to other languages, like C, Python, Fortran, for example,
11:54
or SQL. So you're not limited to R. RMarkdown is just a syntax, pretty much.
12:02
So let's have a look. We can compare it a bit with Jupyter notebooks. So we have IAML editor, where you can provide metadata, and you have your code blocks that are fancied by a feedback tick. It starts with a feedback tick, and then you have a feedback tick.
12:24
But everything is text. It's plain text. So you can write your RMarkdown text in RStudio. And when you render your documents, you can have the whole document with outputs inside your document, but you can also run specific cells,
12:43
if you want. And you have also the option to run cells above. The fun one. The funny one. Observable. It's a JavaScript notebook made for data visualization.
13:01
It was created by the creator of d3.js. The current observable platform is closed, but the runtime is open source. And the libraries are also open source.
13:21
So actually, libraries are often also notebooks. So public notebooks are free to use and re-share. So you can load the notebook to access function from the notebook. So it's pretty fun to use. You can make easy integration to website. For example,
13:41
you create a map with a notebook, put it, put just the iframe in your website, and just a link to get back to the notebook so people can get access to how you created the map. Let's talk about the geospatial ecosystem. So it's quite young, maybe one or two years
14:00
old for some libraries. You can make graphs and maps with plot, but you also have access to Bertin.js to make thematic mapping. So it's created by a French cartographer, and it took several, it's an opinionated mapping tool dedicated to thematic mapping.
14:22
You can also access to spatial operations with geotube box, and you can do some projections. You can also access to other data formats than JSON with GDAL, with a port of GDAL in WebAssembly. So in geotube box, for example,
14:46
you can do buffers, centroids, you can clip, you can compute debox. It's not a full feature JS like we are used to, but there are still some tools for basic operation.
15:01
So this is actually not live, but I will show you the rounder code afterward. So in this block of code, for example, I get the world of JSON from, I think it's natural earth, Africa, actually it's not my map, it's Nicolas Lambert's map,
15:25
I just told the code. We all do that, copy and paste. It's how we work, obviously. And then we draw a map. I want to run onto the parameters, if you want to show more. If you want to see more,
15:41
you can access to the notebook. And so actually, yeah, this is what the code rendered. So I can have that. I might have done the same for Python and R. Actually, when I was there, that's code that I've been running. I just put the line,
16:06
the code line, and that has been running. Okay, so that's working. So the last tool is Quarto. It's pretty new, last year. It's a tool made for scientific and technical publication.
16:23
It benefits from all the experience from R Markdown. Actually, I think it's better, the syntax is a little bit better for some things than plain R Markdown, the old R Markdown. And you can have various outputs, like this presentation is Quarto document, actually.
16:43
But you can also have reports in Word or PDF. You can also have websites, wikis, it works with Pandoc, so you can access to what all Pandocs outputs you can get. But it's limited to four languages, Python, R, Julia, and Observable.js. That's why I was
17:05
able to create a map within this Quarto document. Because with Quarto, you actually have your Python and R and Julia installation, but Observable is shipped with Quarto. The runtime is shipped
17:27
with Quarto. So let's talk about the limitation. So the cell execution is not linear, not necessarily linear. If you run the whole document, it will be linear. But if you run each cell and move, change things, and get back to another cell, at some point,
17:43
you might get lost of the current state of all variables, because the variable state might be hidden. So the tip is to, from time to time, run the whole document to get a clean state of everything. And it will be okay. Notebooks become very messy. Actually, an intern
18:11
that wrote 6,000 lines of code in R Markdown. And I open the document, I say, oh, I won't run it.
18:20
Never. And I tell him, okay, you have to split it in maybe five or six parts to make it, because it was actually several analyses. So it's a source of bad habits, actually. And it doesn't care about software environment. You have to care about the software environment.
18:42
The notebook won't do that for you. So if you are like me into reposible science, you have to take care of that yourself and think about that. It helps to understand what the people are doing and what they are wanting and how they process the data to get their results. So it's good for reposibility, but it's not perfect. You have to take care of software
19:05
environment. So there is this presentation at JupyterCon by Joel Goss called I Don't Like Notebooks. You should look at it. There is lots of memes and actually very good points
19:20
on issues. It's more on Jupyter Notebooks. So all those issues might be not present in software. Let me conclude. Most training notebooks that I show you today can do geospatial stuff. Most are multi-languages, except observable, which is only JavaScript.
19:42
The capabilities are the same as the language you use. It's way too to interact and play with, but there are some caveats. Like it can be a messy document. It doesn't encourage modularity. So you have to take that into account and maybe rework afterwards. You are on draft mode and
20:05
then you export your function and create modules from it. The reposibility is quite good, but you have to take care of the software environment.
20:23
If you have any questions, I think I'm on time.