We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Jupyter notebooks for teaching and learning

00:00

Formal Metadata

Title
Jupyter notebooks for teaching and learning
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Jupyter notebooks for teaching and learning [EuroPython 2017 - Talk - 2017-07-10 - Anfiteatro 1] [Rimini, Italy] The Jupyter Notebook (formerly IPython Notebook) has been used to support learning in different scenarios, including taught courses, self-directed learning and reference material such as software documentation. People have used it to learn how to program, and to learn about diverse subjects where computer code is important to human understanding. The aim of this talk is to dive into where and how notebooks can be used most effectively for education. I will first describe notebook-based learning material created by a variety of people for different purposes, deliberately taking a broad definition of ‘education’, along with my own experiences using notebooks to teach Software Carpentry sessions and conference tutorials. I’ll pull out both strengths and limitations of notebooks as an educational tool to explore how they can be used most effectively. In the second part of the talk, I’ll talk about several extra software tools which can make the notebook more valuable in educational settings, including Jupyterhub, with which a teacher can provide notebook servers for a group of students, nbgrader, which allows notebooks to be used as assignments, and cite2c, which can insert academic citations into notebooks. I’ll also touch on commercial offerings integrating the notebook, such as SageMathCloud
Order (biology)SoftwareBitElectronic mailing listEmailSoftware developerCharge carrierSoftware maintenanceSpeech synthesisLecture/Conference
Function (mathematics)Revision controlControl flowInstallation artPerspective (visual)SoftwareEmailStudent's t-testLaptopInterface (computing)CondensationVariety (linguistics)Server (computing)Web browserQuicksortImage resolutionRevision controlCodeInstallation artVideo projectorVirtual machineMassArmInverter (logic gate)Function (mathematics)
Variable (mathematics)CodeComputer programmingCognitionError messageTask (computing)Shared memorySpacetimeStudent's t-testLaptopSheaf (mathematics)ComputerRight angleStructural loadSoftwareMultiplication signModule (mathematics)Conformal field theoryParameter (computer programming)Software testingCodeQuicksortComputer programSoftware engineeringBitComputer animation
Computational fluid dynamicsLocal GroupQuantumParticle systemAuto mechanicoutputQuantumState of matterSeries (mathematics)Library (computing)CausalityDynamical systemMatrix (mathematics)FluidCalculationQuantum mechanicsLaptopModule (mathematics)XML
Virtual machineAnnulus (mathematics)Value-added networkComputational physicsProcess modelingPay televisionModule (mathematics)LaptopComputer animation
Open sourceMathematicsProcess (computing)Service (economics)LaptopVariety (linguistics)Kernel (computing)BitRight angleComputer filePropositional formulaSoftwareProcess (computing)Regular graphKey (cryptography)MereologyStudent's t-testDirectory serviceQuicksortInterface (computing)Electronic mailing listWritingMathematicsComputerSoftware engineeringCheat <Computerspiel>AlgebraPoint (geometry)SpacetimeStandard deviationEvent horizonFile formatPhysical systemCodeMeasurementResultantWell-formed formulaTranslation (relic)Instance (computer science)CASE <Informatik>Group actionEmailSoftware testingStapeldateiComputer programCodeLimit (category theory)Program flowchart
Execution unitStatisticsProcess (computing)Computer configurationStudent's t-testLaptopPoint (geometry)NumberMereologyCellular automatonTouchscreenCASE <Informatik>Bound stateRevision controlFeedbackDisk read-and-write headView (database)Sampling (statistics)Level (video gaming)Software testingUser interfaceGradientInterface (computing)INTEGRALPlug-in (computing)Electronic mailing listQuicksortPhysical systemForm (programming)Total S.A.CodeComputer clusterIntrusion detection systemBitModal logicXML
Computer configurationLoginStudent's t-testData conversionComputerLibrary (computing)PasswordStudent's t-testComputerPoint cloudLaptopPhysical systemText editorElectric generatorService (economics)Server (computing)Hand fanSet (mathematics)File formatComputer configurationSlide ruleRange (statistics)Different (Kate Ryan album)Instance (computer science)SoftwareFamilyMereologyPointer (computer programming)QuicksortForm (programming)Integrated development environmentPlug-in (computing)Point (geometry)State of matterSingle sign-onOpen sourceUniverse (mathematics)View (database)Moment (mathematics)CASE <Informatik>WebsiteGroup actionTuring-MaschineStandard deviationProbability density functionWeb browserUniform resource locatorLoginLibrary (computing)EmailCodeFreewareAddress space
Computer networkFreewareMoore's lawComputerRevision controlInstallation artMultitier architectureStudent's t-testTablet computerLocal ringService (economics)MereologyEmailPoint cloudComputer hardwareINTEGRALVideo gameSoftwareSoftware testingDifferent (Kate Ryan album)Social classComputing platformLaptopServer (computing)Connected spaceTranslation (relic)NumberTemplate (C++)Continuum hypothesisMultiplication signOnline helpWage labourArithmetic meanCategory of beingScaling (geometry)Independence (probability theory)CASE <Informatik>Electronic mailing listXML
GoogolGroup actionMoore's lawCellular automatonRow (database)Group actionEqualiser (mathematics)Kernel (computing)System administratorScripting languageStapeldateiLaptopFunction (mathematics)Single-precision floating-point formatOntologyRevision controlLimit (category theory)Computer networkFormal languagePoint (geometry)Process (computing)QuicksortComputing platformTraffic reportingoutputComputerServer (computing)Student's t-testModel theoryPhysical systemWeb 2.0Social classTerm (mathematics)Different (Kate Ryan album)WindowView (database)MultiplicationShift operatorRight angleSummierbarkeitComputer fileGoodness of fitMereologyPointer (computer programming)Execution unitSet (mathematics)Shared memoryVideoconferencingSoftwareCodeDivisorWater vaporEmailArtificial lifeMultiplication signState observerPlug-in (computing)Mixed realityFlagComputer animation
Transcript: English(auto-generated)
Thank you, Stefan, and thank you all for coming. Thank you for inviting me to speak here at Yopothon. But before I get underway, I want to emphasize that this talk is largely not about my experience. So I am a Jupyter software developer, and I've done little bits of teaching for things like software carpentry.
But in order to make this talk, I went to the Jupyter in Education mailing list and asked other people about their experience. And all of these wonderful people came back to me and told me about the courses that they've taught with notebooks, and the advantages, the disadvantages, the things that they use, things like this.
And so largely this talk is a condensation and a summary of the experience of a lot of these people who responded to my email. So I'm very grateful to them. So in the course of this talk, I'm going to be trying to answer these four questions
from the perspective of somebody who's preparing to teach some sort of material and is thinking of using notebooks. So should I use notebooks? Are they the right thing for any particular course?
What pitfalls are there? What should I avoid doing with notebooks? What other software tools are there that you can use along with the notebook to enhance the course? And should I host the notebooks on a server where my students can run them?
Or should I encourage them to do local installations on their own machines? So I'm going to dive into the first question. I have notes.
Should I use notebooks? So no. You weren't expecting me to say that. There's a variety of technical issues. Things like using notebooks in version control can be awkward. Projecting them if you've got low resolution projectors, which many places still do, can be difficult.
And there are things like if some code produces a massive amount of output, then it can slow your browser down and lock it up and it can be a pain. And there's also pedagogical issues that people raised.
A lot of them do with confusion with the unfamiliar notebook interface. In Southampton, where I'm based, they teach a first year computing course for engineering students. And they actually avoid using the notebook in the first section of that course.
They introduce it later on because of the bit on the right. They want people to learn sort of common software engineering practices like running code from the command line and writing tests for your modules and things, and these are more difficult to do with the notebook. But on the other hand, on the left there, there's people saying that the notebook can cause them problems
because it introduces an extra cognitive load on their students, that they're having to learn to program or learn markdown at the same time that they're trying to learn the subject matter. So there's kind of like two opposing ends of this argument. There's the people saying, you know, the bad thing is that the students have to know too much about the computers.
And there's the people saying the bad thing is that the students don't have to know enough about the computers. So this is kind of a hint that maybe there's a niche in the middle that notebooks can fill. And indeed there does seem to be. So this is Lorena Barber's CFD Python course, often called 12 Steps to Navier-Stokes.
So this is a fluid dynamics series of lectures. This is Andrew Dawes, who uses notebooks to teach quantum mechanics, in particular around this Python library called Q-Tip for doing quantum calculations.
This is the UC Berkeley Data 8 course. So this is a data science module that they're now requiring all incoming undergraduates to take, regardless of what subject they're majoring in.
This is Lex Nederbragt in Oslo, who I'm sure I've just pronounced his name wrong. He is going to start teaching a Python course for biologists this autumn, and he even gets to design a new classroom to teach it with, and he's going to use notebooks for it.
This classroom, I should say, is not his classroom. This is one of the examples that he's looking at for inspiration. And a bit of a different example. This is Mike Bright, who has given a variety of container tutorials using Docker and Kubernetes and things,
using notebooks with the Bash kernel. So this is running Bash code in the notebook rather than Python code, and he delivers these tutorials at conferences like this one. In fact, I think he was at EuroPython last year doing one of these tutorials.
So why do people use notebooks, given all of the problems that we pointed out before? So the key value proposition really of a notebook interface is that you're combining computing and writing and mathematics, so you can combine explaining the steps that you have to do for something with illustrating those steps.
And because increasingly just about every discipline involves some measure of computing, this is a very valuable format to describe the computing process. These are all reasons that people on the mailing list gave.
I just wanted to pull out a couple of them particularly. So a couple of people said that using notebooks as opposed to teaching sort of with pen and paper allows their students to tackle harder problems than they would writing things down with pen and paper.
In particular, somebody who's teaching chemistry said that their students can tackle problems that don't have a straightforward analytical solution. So the textbooks for this subject often limit themselves to problems where you can use algebra to make a straightforward solution, but with a notebook you can go into more complicated problems.
Somebody else suggested that when they do their exercises in notebooks as opposed to asking their students to submit plain Python code files, then because you have the markdown notes and you have the results that the student has saved in the notebook
as they were executing it, you get a better idea of their thought process than you do just from the code that they've written by itself. And these were a couple of interesting things that I hadn't thought of.
So to summarize kind of when a notebook's a good idea, so I think that notebooks are a good idea when you've got computational steps to a problem that you want to combine the explanation and the illustration of those steps. But if you're thinking about teaching with notebooks, then you should consider
what it is that you want your students to learn. Do you want them to be more shielded from the computational stuff, the programming side of things? Do you want them to learn more of the software engineering skills like using the command line and writing tests and things?
And you might decide to use notebooks for part of the course and use sort of regular coding and text files for other parts of the course. Moving on to the second question, what can go wrong with teaching in notebooks?
So the big one is slow down. So it's really easy when you've got a notebook with lots of examples in there explaining all of your computational steps to go through going, and as you can see now, we do this, execute.
And as you can see, we do this and this and this. And it's easy to go through it much quicker than the people watching that can take it in. So you have to, especially if you have all of the code already there in the notebook, you have to really force yourself to go through it at a measured pace
so that people have a chance to pick up what you're doing with it. One way that you can do this is to leave blank bits that you have to fill in as you're talking. That's a very effective way of forcing yourself to slow down. You can also leave blanks if you're distributing these notebooks to the students,
then the blanks are bits that they have to fill in. And this is sort of a standard part of teaching, is that anything that involves people doing some sort of active process in their learning, so even if they just have to translate a formula into a bit of Python code,
that sort of thing sticks in the mind much more effectively than just reading and watching passively. So you can consider doing this. And one thing that we do, for instance, in Software Carpentry
is we leave blank bits for the exercises that students are expected to do, but then there is a solutions notebook in the same directory that they have. So if they're behind and need to catch up with the exercises, they can go and look at the solutions. That works for Software Carpentry because it's not assessed or anything,
the students are just there to learn, so we trust them not to cheat because there's really no point in cheating. So moving on, again, extra software tools that you can use with the notebook.
So this is MBGrader, which is a system for creating and using the notebook as assignments. So students have to fill it in, and then there's a system for bringing it back and marking it. So this screencast here is illustrating the process of creating a notebook assignment.
So you can see there's a sort of extra cell toolbar which is provided by a plug-in, and you can select these cells as being automatically graded answers or manually graded answers, and you give them IDs and you assign numbers of points for them,
and it's pointing out that there's a total number of points up at the top. And then from the student's point of view, this is what they see. So there's another plug-in that you're running on the student's version of the notebooks. They get a list of assignments that they can go and download,
and that assignment is now a collection of notebooks. And for each of those, the student has the option to run the automatic parts of the marking before they submit it so they can see how many of the tests it passes or fails before they send it in to you.
So in this case, the student hadn't done anything yet, so when they clicked validate, it failed. Now this example student is going to go and find the necessary bit of code and fill in the code to do it, and they'll save that and go back to the overview screen,
and when they validate it again, then now it says this has passed, and then the student has a button that they compressed to submit this whole assignment, so all of these notebooks.
And then on the teacher's or the marker's point of view, nbgrader can take care of the automatic part, so it runs the student's code, it runs the test cells, it automatically assigns marks if it passes the tests in those cells,
and then there is an interface called formgrader where the marker can go through and manually adjust those marks and add marks for written answers and things, so you're not limited to just questions that have an automatic mark,
which is very important because you need to check that people actually understand things as well as that their code works. And nbgrader also includes things for them collecting those marks and exporting them into different formats and for giving students feedback so you can make notes on their answer, tell them, you know, this is what you did wrong in this place,
so it's not just about the score that they get, it's about how they can improve as well. Another tool for the same kind of thing is OKPy.
This is actually, both of these autograding systems were built at UC Berkeley. This is the one that they're using on the data science course that I showed before. This has, as you can see, a very slick web interface from the teacher's point of view here.
It's not specific to notebooks, so I would guess that it doesn't have the same level of integration with the notebook interface for creating assignments and things that nbgrader does, but if you want to mark notebooks and other kinds of code submissions,
then this is a very neat interface and at the moment I think they even provide this as a hosted service for free. I don't know how long that will last. So another group of tools here are for hosting notebooks,
which we're going to discuss whether or not that's a good idea in a couple of minutes. And there are two main options that people use here. So JupyterHub is our open source DIY solution. This is Python and JavaScript software that you can install on a server,
either a server that your institution already maintains, or a cloud server on Rackspace or Amazon or Google or Microsoft or whatever that you have. And JupyterHub, you can set up different sizes of things,
so if your students need to solve problems that require a GPU, then you can ensure that this is running on computers that have access to the necessary GPU. And it can be integrated with different login options, so you can plug JupyterHub into your university's single sign-on system,
or you can integrate it with GitHub logins using OAuth, or if you don't want to do any of that, you can just use a standalone login and give students a new username and password to access it.
The other main option is CoCalc, which was formerly SageMath Cloud. This is what you do if you want somebody else to take care of it for you. The somebody else is William Stein from the University of Washington, who is doing this as a startup. CoCalc costs between $4 and $20 per student,
depending on how many students you have and how long you want the course to last for. So you can choose a four-month course or a full year. And it has its own integrated set of course tools, which include some really fancy things,
like the instructor can even go in live and remotely collaborate with a student and give them pointers as they're working on it. So before we come back to the posted or local install question, there's a handful of other tools that people said they were using.
So one sort of family is for converting to and from notebooks. So nbconvert is a standard part of Jupyter for converting notebooks to other formats. But there are other tools that you can use. So some people like to write their notebooks in restructured text or in markdown
because they're a big fan of their editor. This is often people who are big into Emacs or Vi like to do this. And there are tools that you can do that and then convert it into a notebook file. There are also tools if you want to, for instance,
have a collection of notebooks and then convert that whole collection to one big PDF handbook using LaTeX, then you can do that. There are ways to use notebooks as slideshow material.
So nbconvert has an option to go to slides. There's also a plugin called Rise for reveal IPython slideshow environment, which gives you very similar looking slides, but the code in those slides is still editable and runnable,
so you can be changing things on the fly while you're doing your slideshow. And finally, it's possible to programmatically generate notebooks. So I don't have an example to point you to, but one of the people who responded said that he's randomizing questions for assignments and the notebook file format and the nbformat Python library
make it quite easy to generate notebooks if that's what you want to do. So then coming back to the hosted or local question, so there's really a range of possibilities here.
So going from on the left, CoCalc is in the cloud, somebody else deals with all of the technical details, you just give them some money and put your students' email addresses in there and everything is set up for you.
There's JupyterHub where you run the server yourself, but the experience from the student's point of view is broadly the same, they just go to a URL and do everything in the browser. If your IT department is cooperative on this sort of thing,
then you may be able to get it installed on the institution's managed desktop systems, so you can go and do a computer lab with the students using university computers, or you can get the students to install it themselves on their own computers.
So if we simplify this to the two main possibilities of either it's done for the students or the students have to do things themselves, there are a few obvious advantages on either side. So with the hosted solution, there's nothing to install, installation can often be a pain, so this is a big plus for a lot of people,
and students can use it from their tablets, they don't have to have a laptop to set it up. Anecdotally, one of the people who does their course like this said they tell students they can bring a laptop or a tablet,
and some of the students do bring a tablet. We suspect that if they told the students you need a laptop, then most of the students have probably got a laptop, but some students prefer to work on their tablet. On the other hand, installing it on students' computers is free,
at least assuming that students already have computers, which I think in most western countries is probably the case, and you're not at the mercy of any part of the network connection, whether that's the network hardware on your laptop,
or the institutional Wi-Fi, or the broadband backbone over to whatever server you're using, any of that can go down and that can interrupt your use of a hosted service, and this is a problem that quite a few of the people
who are using local installations pointed out, and Anaconda, by continuum, has made all of this stuff a lot easier to get set up, and I would say almost everybody who is asking students to install it themselves is asking them to use Anaconda,
because it's made it so much simpler. There are a few people who disagree. These are a handful of perhaps less obvious things that you might want to consider, so most of the automatic assignment and grading tools
only work or work best with hosted solutions. There is, I believe, work underway in nbgrader to integrate support for local installations as well, but it's also trickier because platform differences may mean that if somebody has got a slightly newer version of NumPy
and they use a method that's only in their new version and not on your grading server, then it works for them, but when they submit it, then the tests fail, so there's problems like that to be aware of.
People also suggested that there are equity issues, so using a local installation may privilege people who have nicer computers to do it on. It may privilege people who have already got the technical know-how to easily install it and need less help with that.
On the other hand, the advantage of having a local install is that the software tools and the materials from the course are readily available after the course is finished, and I should say that William Stein, who makes CoCalc,
vigorously disputes this and says that people are no more likely to keep using that on a local install than on a hosted installation. It's questionable whether, yeah, depends on how powerful
the free tier of the hosted service is and whether students are willing to continue paying for the non-free tier. They're students, so they're probably not going to pay unless they absolutely have to.
So I think my kind of overview of this would be, are the computing skills a key part of what you want students to get from the course? So if you see the computing skills as something incidental that the students just need to use a computer to learn about this really important material,
then a hosted solution probably makes everybody's life easier. If you want them to come away with those computing skills, then doing a local installation is probably worth the trouble, and there are people doing local installations for classes of up to like 500 students, and they say it is doable.
It's also possible to combine these, so some people said that they do primary local installations but ask students to fall back to the cloud solution if that isn't working. Some people say that they have the cloud thing as primary, but they encourage students to install it as well.
So I'd like to thank once again everybody from the Jupyter Education mailing list, and these three foundations, the Moore Foundation, the Sloan Foundation, and the Helmsley Trust fund our work on Jupyter, and I think we have a couple of minutes still for questions. Thank you.
I don't know. Oh, we've got a microphone coming. Thanks.
Thanks. That's really interesting. In your introduction, most of the examples you gave were of scientific kinds of teaching, but you also mentioned using containers and bash.
So how well adapted is this to that kind of work? Because I'd be really interested in knowing about that and the practicalities of it. So from a technical point of view, Jupyter has first-class support for the notion
of plugging different kernels into Jupyter so you can have different languages running inside a notebook. It all came from Python, but then we generalized the idea to support other languages. The bash kernel is actually something that I made and it was initially supposed to be just an example
of how to make a kernel for Jupyter. I didn't really think it would be something that anybody was interested in using. Then it turned out that people were interested in using it. It works pretty well from the point of view of using bash. What is tricky in all Jupyter kernels
but becomes more of a problem in bash is that if a sub-process wants to do interactive input, so like if you do conda install whatever, then conda will say, you know, these are the packages that I need, do you want to continue? Yes, no. Jupyter and the kernel can't tell
that a process is waiting for input, so you will see the output of that cell will say yes, no, but then there's no way to actually send the yes, no back to that process. So that's kind of a limitation is you have to write all of the commands with the flag
to say like, don't prompt me please. I have been using Python in the notebooks only, but can you mix Python and bash in the same notebook?
So yes, but it's not part of the Jupyter kernel system. So Jupyter's sort of conception of notebook is that each notebook just has one language, but the IPython kernel for notebooks has some of its own support for different languages,
so if you start a shell with %bash, then all of the rest of that cell will be sent to bash to run, so you can mix languages like that. Okay, thank you.
How much difficult is to install and manage a Jupyter hub, for example on Ubuntu server? I would say probably Ubuntu server is probably the most common target platform,
just to guess. We aim at sort of not technical novice, but if your system administrator is that postdoc or PhD student
in the group who's good with computers, then we aim that Jupyter hub should be practical for them to manage. It gets more complicated if the size of the class is large enough that you want to spread the users over multiple servers, but there are people who do that,
and I think there's pretty good documentation on how to do that, so it should be feasible for somebody who's familiar with Python and Docker and things like this. Hi, sorry. I've used notebooks before while teaching at university,
so I didn't know about this, the nb-grader plugin, that sounds quite interesting, but I found just using the vanilla without any plugins getting students to submit the notebooks is the file size,
so we tried to tell them to clear all the outputs, but many of them didn't, so we get very, very, very large notebooks. Is there any automated script to kind of pull them out or is there any suggestions for that in the future? Yeah, there are some scripts that will clear the outputs of a notebook.
They're primarily around sort of version control because it's also a pain for that. I think one of them is called nb-strip-out if you search for that. Okay, thanks.
Thank you for the wonderful talk. My apologies for not replying for your email on the thread, so I won't add a couple of points here.
So one thing you mentioned that slowing down, right? So that's one reason I do teach Python professionally. I use notebooks heavily. So one of the reasons I realized is that using a pre-populated notebook, yes, it makes me go faster,
so what I usually do is try to do live coding in the class in the notebook and then share the notebook after the class. That gives a chance for people to go through the thinking process of how I'm approaching the solution, which usually are sent when you give a notebook with everything filled in.
I think that's an interesting observation I wanted to make it. The other one thing that you haven't I think touched is the duration of the codes when you're using notebooks. There are some things that depends on how long the code is. If you're doing it for a month-long course and you're doing it for a day or two-long course,
there are things, for example, the installation and setup and all that is fine to take a long time if it's a one-month-long course, but if it's a single-day course, you can't afford to do all those things. And also I realized when you're using NVGrader, that works well for long duration courses,
whereas it becomes very tedious for teaching one-day or two-day courses. I don't know if you want to share something on that. I haven't used NVGrader myself. I could imagine that it's probably too much work to set up and get people familiar with for a short course. In terms of the installation, I've done Software Carpentry.
That's typically a two-day workshop that's two-days intensive stuff and we get students set up with Anaconda and for Windows users and things like Git Bash
on the morning of the first day before doing the teaching on the morning of the first day. So I think it is practical, even for short courses, to get the stuff installed if that's what you think the students should be taking away from it and for Software Carpentry it is.
I think we're going for lunch now.