Through the lens of Haskell: exploring new ideas for library design
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 135 | |
Number of Parts | 173 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/20116 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Bilbao, Euskadi, Spain |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 2015135 / 173
5
6
7
9
21
27
30
32
36
37
41
43
44
45
47
51
52
54
55
58
63
66
67
68
69
72
74
75
77
79
82
89
92
93
96
97
98
99
101
104
108
111
112
119
121
122
123
131
134
137
138
139
150
160
165
167
173
00:00
Asynchronous Transfer ModeMoment (mathematics)Library (computing)BitFormal languageScalar fieldCodeComputer animation
00:49
BitThetafunktionDifferent (Kate Ryan album)Multiplication signLibrary (computing)Type theoryFormal languagePhysical systemEquivalence relationSpacetimeAreaDensity functional theoryTask (computing)State of matterWage labourDialectRight angleCollisionLimit (category theory)CompilerSubject indexingComputer animation
03:41
Library (computing)MereologyMultiplication signWeb browserUsabilityWeightSpacetimeWave packetConstraint (mathematics)Web 2.0Computer animation
04:29
Library (computing)BitMereologyRobotFormal languageProjective planeWeb browserEquivalence relationSpacetimeRight angleAsynchronous Transfer ModeImage resolutionCross section (physics)Subject indexingInsertion lossMultiplicationCASE <Informatik>Computer animation
05:34
RobotReal numberPC CardMetropolitan area networkKnotParsingPort scannerParsingMereologyImplementationInterface (computing)Block (periodic table)Library (computing)Video gameoutputImplementationString (computer science)Structural loadEndliche ModelltheorieBoss CorporationPredictability1 (number)State of matterInterface (computing)Electronic mailing listSet (mathematics)Line (geometry)Object (grammar)Streaming mediaLatent heatBuildingArithmetic meanNumberFunctional (mathematics)Wave packetSingle-precision floating-point formatPresentation of a groupType theoryEuler anglesSound effectPosterior probabilityRight angleCodeSocial classChannel capacityFamilyFood energyLengthRepository (publishing)Data structureMereologyBlock (periodic table)Physical systemMultiplication signProcess (computing)Survival analysisSource codeReal numberFormal languageGoodness of fitSemiconductor memoryError messageRobotParsingCASE <Informatik>Graphical user interfaceReading (process)SpacetimePolymorphism (materials science)ParsingWärmestrahlungEmailBitAddress spaceCategory of beingComputer fileFreewareWordComputer animation
10:15
Network socketComputer fileFunction (mathematics)Metropolitan area networkSineFreewareBlock (periodic table)MereologyMereologySpacetimeParsingBlock (periodic table)Functional (mathematics)Interface (computing)Constructor (object-oriented programming)Different (Kate Ryan album)Network socketLibrary (computing)Computer fileSemiconductor memoryArithmetic meanOperator (mathematics)ParsingStreaming mediaLine (geometry)SequenceRepository (publishing)BuildingComputer programmingSampling (statistics)Object (grammar)QuantumStudent's t-testLattice (order)Plug-in (computing)String (computer science)Right angleComputer animation
13:03
Metropolitan area networkMaxima and minimaComputer iconInverse elementBlock (periodic table)Bus (computing)Functional (mathematics)Different (Kate Ryan album)CASE <Informatik>Content (media)Direction (geometry)View (database)Library (computing)Electronic mailing listBlock (periodic table)AbstractionFormal languageBlogTraverse (surveying)Attribute grammarResultantObject (grammar)AuthorizationPoint (geometry)Line (geometry)MathematicsBuildingIntrusion detection systemParameter (computer programming)Network topologyMultiplicationInterface (computing)MereologyDialectMultiplication signEvent horizonReal numberIntegrated development environmentData miningSingle-precision floating-point formatStandard deviationQuicksortDatabaseChemical equationSocial classCategory of beingWordInstance (computer science)EmailData centerMatter waveComputer animation
17:40
Software testingTraffic reportingDirection (geometry)Social classString (computer science)Library (computing)Codierung <Programmierung>Unit testingMeasurementResultantHypothesisBitGradientPlanningAreaDistanceFunctional (mathematics)Dependent and independent variablesProjective planeType theorySheaf (mathematics)Interface (computing)Formal languageObject (grammar)CASE <Informatik>Intrusion detection systemSingle-precision floating-point formatCodeAddress spaceSoftware testingCategory of beingExecution unitEmailComputer animation
19:49
Library (computing)DatabaseInterface (computing)Point (geometry)Standard deviationMetropolitan area networkPhase transitionServer (computing)Different (Kate Ryan album)Direction (geometry)MereologySingle-precision floating-point formatIncidence algebraSpacetimePresentation of a groupFormal languageProcess (computing)Pauli exclusion principleMultiplication signFlow separationQuicksortComputer animation
22:18
Interface (computing)Dedekind cutLibrary (computing)CompilerInterface (computing)Error messageServer (computing)Software design patternCommunications protocolSet (mathematics)BitFormal languageDivisorDatabaseSocial classPhase transitionLinear regressionProcess (computing)Pattern languageSoftwareType theoryLatent heatSuite (music)Different (Kate Ryan album)Web 2.0MathematicsSingle-precision floating-point formatSelf-organizationSoftware frameworkSimilarity (geometry)Run time (program lifecycle phase)Pauli exclusion principlePlug-in (computing)Source codeMultiplicationMonad (category theory)Lecture/Conference
25:54
Library (computing)Projective planeLatent heatFormal languageProgramming languageVideo gameTask (computing)Online helpArithmetic meanParameter (computer programming)Set (mathematics)Data structureDigital watermarkingCompilerLine (geometry)Slide ruleCellular automatonDefault (computer science)Order (biology)CodeGoodness of fitLevel (video gaming)BuildingTerm (mathematics)Computer architectureRight angleInterface (computing)Single-precision floating-point formatSystem callLecture/Conference
Transcript: English(auto-generated)
00:04
Okay, hi everyone, let's talk a little bit about the kind of ideas we can steal from Haskell to make better libraries. So, I'm George, you can find me on Twitter, and as a way to procrastinate, I like learning new stuff, learning new languages.
00:27
This is a picture of me during the Django Girls workshop in Paris, where I was coaching, and when I wasn't answering questions, I was working on a bit of Haskell code, so this is my face when I work on Haskell.
00:43
I learnt a few interesting stuff, and I'm here to talk to you about some last ideas I discovered. So, last year, after EuroPython, there was a keynote about Haskell, it was called What Python Can Learn from Haskell. It was about Haskell, the language, the strongly typed language with the huge compiler.
01:04
And, well, Python has learnt from Haskell in the meantime, we now have typed annotations, and Guido has talked about it yesterday, so we have learnt from Haskell. But, talking about Haskell, the language, it's very last year, it's a bit outdated, and we're not going to talk about the language.
01:24
We're even going to suppose that it's the exact same language as Python. Let's say Haskell is just a dialect of Python with a slightly different syntax, but we don't care at all. What is interesting to us today is Haskell ecosystem, the community and the kind of libraries they produce.
01:44
On the left, you have the Python package index, which you probably know, and on the right, you have the Haskell equivalent, the hacked Haskell package index. So, let's see how the community differs, how the mindset differs, and what kind of ideas we can get from that.
02:03
So, one of the ideas you'll find very often in the Haskell community is the idea of design space. The idea of design space is the idea that for a given problem, you have a lot of solutions, you have a full space of solutions for one single problem. Some of those solutions we know, and some we don't know yet.
02:24
And one idea that is very present in the Haskell community is that it is worth exploring the design space in search for new solutions that we don't know yet, and that might be more interesting than the one we have in regards to, well,
02:41
maybe those are different solutions with different trade-offs that work better in some situations, or are more general, so are faster to run, or just easier to learn and to use. So, it's really the idea of, there are areas, there are spaces we have to discover. In the design of Python, we say there should be one, and preferably only one obvious way to do it.
03:07
And it's like Haskell adds, OK, let's keep looking for that obvious way. And so, you'll often find new libraries or stuff that already exist, but which are different, make different exceptions, and choose different trade-offs.
03:23
And each time this library, we try to compare as much as possible with the existing libraries. So, it's just not reinventing the wheel or doing something new because the other stuff were not invented here. It's really searching for a new, better, more interesting, or just trying.
03:42
And that is not completely foreign to the Python ecosystem, because a few years ago, Kenneth Ritz released the request library, which is a library for doing HTTP client. There was already at the time many libraries to do HTTP client, but this was a new one that tried to be as easy to use as possible.
04:04
He has since launched a movement called Python for Humans, where he tries to encourage others to follow the idea of making libraries that are as simple and easy to use as possible. And in a way, it's kind of trying to explore some new parts of the design space, some part where the ease of use is the most important.
04:29
So, we are now going to take a look at a few Haskell libraries to see some parts of the design space we don't usually see in Python, but that are explored in the Haskell community.
04:41
So, here you have on the left the top downloaded libraries in the Python package index, and on the right the top downloaded libraries in the Haskell package index. When I said Haskell and Python are similar, I wasn't lying. You see the same packages are downloaded in both cases.
05:04
The most downloaded package is the JSON package, and in both languages you have multiple attempts to solve the packaging problem, and you have some HTTP client libraries. But on the Haskell side, you'll find a few libraries that do not have a Python equivalent.
05:26
Those are Lance, Atoparsek, or Conduit, and I'm going to talk a little bit about them. So, first one is Atoparsek. I had to use Atoparsek in a little project of mine, which is a Slack bot that handswares movie quotes.
05:41
Someone tells a line from a movie any line, and the bot handswares with the next line. My girlfriend insisted that it's not a real-life use case, so let's just call it a real-life use case. The input of the bot is subtitle files, the one you download when you download a movie.
06:02
So I needed a way to parse those subtitles, and there is actually a library in Haskell that does that, which is called subtitle parser, and this is the full definition of the package. When you open the documentation, this is all you will see. So just a little word about how to read this.
06:20
When you have something with two columns, like parser-asserti, two columns parser -subtitle, it means there's something called parser-asserti, which has type parser-subtitle. So parser-asserti is defined on the right, it's just a list of lines, and a line is an object with the following properties. And all that interests me in this package is the parser of subtitle.
06:42
So I just told you what a subtitle is, and a parser is kind of an object that is defined in the auto-parsec package. And the way to use it is by using a function which is defined in the auto-parsec package, which is not the package where the subtitle parser lives. That function is called parser-laying, and it takes a parser-subtitles and a byte string, which is
07:07
just a string of bytes, and it returns either an error message or the subtitles I actually want. And that's it. I use the parser from this library and the function from the other library, and I'm done.
07:21
To be fair, this is not exactly how the function looks like, because it's polymorphic on the type of object I want to parse. So it takes any parser and a byte string and returns either the error message or what the parser is supposed to parse. And that is not the only way to use the parser. Auto-parsec provides me a few other ways, like incremental parsing.
07:46
Incremental parsing is when you have a very big set of data to parse, and you don't want to put it all in memory before parsing it, so you feed it to the parser a little bit at a time. And I can do that with the same parser I used in the previous function.
08:04
And yet another way to use it is by using that parser to build as a parse for a bigger parser. There's a lot of functions I can use to do that, and here are a few examples. I can use many to take a parser and get a parser that will parse many, a certain number of the single parser does.
08:25
Or I can use all with two parsers to get a parser that will parse either something or another thing, trying both and give me what matches. And all those ways to use the parser are given to me by auto-parsec, and the person who wrote the subtitle parser didn't care about any of that.
08:45
That person wrote a 40-line library to parse subtitles, and I get a lot of ways to use it for free. And there is a whole lot of other parsers built on the same principle. You'll find in the package repository a lot of package to deal with CSV, so you get a parser of
09:02
CSV, or a parser of JSON value, or a parser of Chrome tab if you want to read Chrome tab files. Or a parser of email address that will return you a parser string that only if it matches an email address. Or if you need to work with thermal files, you'll find a parser for that. And those are different libraries made by different peoples, and in every single of those libraries there is only
09:25
one thing that is useful, and one thing that is defined, it is the parser of the thing you want. And each of those libraries you can use with each of the ways to call I just saw you.
09:40
So in every language a good library simplifies the implementation, helps you write less lines of code. But in this situation a good library helps you simplify the interface, and helps you write less documentation and less interface, and in the end having less things to know before using the library.
10:02
So we have a generic solution which is a parser, the parsing solution, and a lot of specific building blocks that you just can plug and unplug and use without knowing any new things. The second library I would like to show in our exploration of that part of the design space is Conduit.
10:25
Conduit is a streaming library. Streaming meaning we have a long stream, a sequence of objects, of something, and we want to treat them, handle them, but we don't want to load them all into memory when we do so. And there are three concepts in Conduit. The producers will just produce a stream of
10:45
values, and the consumers will just consume a stream of value and do something with it. And Conduits which both consume in one side and produce something else on the other side. And once again you will find a lot of those Conduit constructs in a lot of different libraries.
11:02
For example you can find somewhere in the Haskell repository a Conduit that gives you, sorry, a function that gives you a producer from a socket, a normal standard unique socket. And you'll find a Conduit that is used to decompress a stream of bytes.
11:25
And a sink, meaning a consumer that will just put what it gets into a file. And Conduit provides an operator which is a pipe plugging operator, and you just plug the different parts you have found in different libraries.
11:45
You plug them together and you get this program that actually read from the socket and just zip it and write it in a file. And every single of those parts was written by different people and can be found in a different library.
12:00
And another way, another Conduit construct you can find is a function that just takes the parser we've seen five minutes ago and turns it into a Conduit. So now with the 40 line little subtitle parsing library I have a high performance subtitle streaming library, which is not very useful but still.
12:22
And I can read from a subtitle file and pass every line as it goes and post it straight to IRC. There is actually an IRC consumer in the packet repository. Don't do that, you're going to get banned from any IRC server, you try it, but you could, you could try.
12:42
So once again we find the same thing, you have a general solution for streaming, which is Conduit, and a lot of building blocks that all have the same interface and are exchangeable and reusable, built by a lot of different people. And the last library I'm going to show still into this exploration is Lens.
13:03
The problem Lens tries to solve is data manipulation. So here is some data we can manipulate, it's a blog post because everyone loves the example with blog posts, whose title is made up example, considered harmful, and it has some comments.
13:21
And what Lens gives you is the abstraction of the idea of manipulating a piece of the data. In this case we have title, which is a Lens that points to the title of the blog post. And I can use the view function with Lens title and the blog post, and it's just a getter, it just returns the value pointed by the Lens.
13:46
But I can also combine the Lens with the dot, so the syntax is quite familiar. I have a Lens that points to the author of the blog post, and a Lens that points to the name of an author. I combine them with the dot and pass them to the view function, and I get the name of the author.
14:04
And I can use them as a setter, just use the set function, pass it a Lens, a new value, and it returns a new blog post with the value changed. And where it gets interesting is that Lenses can also be used as getter setters for multiple values.
14:25
In that case, we have comments, which is a Lens that points to the list of comments. But each is not exactly a Lens, it's a traversal, and that means the resulting comments.each points to every single comment in that list.
14:41
And I can still combine it with Lens, with author, to get something, which is called a traversal, that points to every single author of a comment in the blog post object. I use the to-lister function to get that list. The interesting part in that is that those getter pages, Lens and traversal, are values, so
15:05
if they are values, I can store them into a variable that I call comment contents. So here is a traversal that points to every single content of a comment in the blog post. And I can use to-list on that value to get the list of comments, and I can use
15:23
set to change the content of every single comment to blah blah blah, because you shouldn't read the comments anyway. And if this is a value, I can put it in a variable, I can have a function return it, and I can have a library return it. So I can put it in library and have an abstraction of data manipulation in library.
15:47
And for example, there is one such library which lets you manipulate JSON with Lenses. It provides just a handful of Lenses with values, which points to every single value in a JSON list,
16:02
and key, which takes an argument which is an attribute and points to the attribute of the JSON object. So with the same manipulation function provided by Lens, you can now manipulate JSON directly. Another example which is provided, another library which is provided is HTML. There are Lenses to manipulate HTML.
16:27
So we see HTML as this tree with a lot of nodes, and we now have abstract data manipulation functions, objects, those Lenses, which let us look into the HTML.
16:43
So simply, we have one traversal all named, which points to every single node in the HTML tree that has a given name, and we combine it with Lens contents, which provide the contents. And so this line is just all we need to define something that can get every single title in an HTML document.
17:07
And so one last time, you have a general solution of data manipulation and a lot of building blocks in different libraries. So that's really a theme. That's something we can see a lot in Haskell libraries,
17:21
that we don't really see in Python, one single interface for a lot of different libraries. So why am I telling you that? Just not because it's nice, but also because I think borrowing stealing IDs from other languages is a good way to produce excellent results.
17:41
For example, I've talked about requests a little earlier. Requests provide very simple interface to make HTTP requests. It provides a GET function, which takes a string, which is the address of what you want, and gives you a response object, which is exactly what you need.
18:02
And REC is a Haskell library, which is heavily inspired by Requests. It provides the exact same GET function, which has the same interface. It combines it with Lens for the data manipulation part, so instead of doing in Request R
18:21
.response header, you use Lens.response header to get the content type, but that's exactly the same principle. And this is the best Haskell library for manipulating, for making HTTP requests. And it's stolen shamelessly from Python, because the Python library is an excellent idea.
18:42
So taking IDs from other languages helps produce awesome libraries. And in the other direction, there is Hypothesis, which is a library for property-based testing. It is inspired by a library in Haskell called QuickCheck. And the main idea, it's a bit like unit test, except instead of making one example and checking your code work on that example,
19:10
you just say the property you want to check. In this case, if I have a decode and an encode function, I can't expect them to... Well, if I decode and encode something, I want to get the initial stuff back.
19:25
And Hypothesis will make sure this is true for every single text it can generate. And it does really good generating analytics, and it can find every single corner cases you have forgotten.
19:40
And this is an excellent testing library in Python. You can use it right now, it's awesome. And it was inspired by looking into what Haskell does. So as a conclusion, it was a bit fast, but more time for questions, I encourage you to explore the design space.
20:01
Look at what exists and what doesn't exist, and make new stuff to solve existing problems but differently. And every time explain what you did differently and why you did it differently from what already exists. And one direction you can look into that we don't usually explore in Python is factorized library interfaces.
20:24
Make libraries that use other libraries for the interface, so that a new library is just a single little stuff that you don't have to learn. You can just use it with what you already know.
20:44
And the bonus conclusion is the do-it-yourself conclusion. Go learn something new, something else. Anything that's unusual and see what part of the design space you explore and if you could fit into Python.
21:00
And come back at your Python next year and make a presentation about what you learned in that new language. Thank you. Thank you, great talk. Does anybody have any questions?
21:31
It seems to me that in Python when there's several libraries that have the same interface, PEP comes up. We have a PEP that sort of standardizes the interface, like for databases we have a PEP that standardizes an interface.
21:45
For HTTP servers we have WSGI, now there's asyncio which kind of standardizes all the async stuff. Do you have any comments on that? Do you think the Haskell way is better?
22:05
The problem with that is you can just create a new library that will be that common interface, you have to go through the PEP process. Which means that I don't think I can create a PEP, but I can create a library for a common interface.
22:28
The main reason why we do it that way in Python is because Haskell has a compiler that is able to check that the interface is respected by the library. In Python if you get a library that doesn't have the exact same interface you would expect, you'll just get an error at runtime.
22:46
So we need a PEP because we need a strong source of definition of specification of how the interface should behave.
23:03
Thank you. Anybody else? I was thinking about the same thing without a possibility to define interfaces or protocols or however you call them.
23:28
Do you even see a possibility to do something like this in Python or is it just a strong suite of Haskell and those types of languages? And Python can do it but can maybe do it?
23:44
The compiler checking types surely helps but it's not necessary. In fact we already have a few of those common interfaces in Python. You mentioned databases or whiskey or some plugins for web framework all conform to one single interface and that works.
24:09
So I think it is completely possible with what we currently have in Python to have one single interface that can be reused as long as that interface is specified and doesn't change.
24:24
My question is kind of asking you to validate my opinion and it's somewhat related. I thought that Haskell has this a lot that like a design pattern and software is encoded in a type but it's usually a best practice.
24:48
So monads are one example, lenses are another example. Do you think Python world would benefit more from having a standardized way to do things, standardized pattern, always write classes in this way?
25:02
I think that's a strong side of Haskell. So what do you think about this? I'm not sure, I completely get what you said. Sorry?
25:26
It's a little bit of the same. One of the ideas hidden behind what I talked about is finding common patterns in different stuff and giving them a common interface. Which is not something we do that often in Python or maybe we do it but it emerges more organically.
25:48
Multiple libraries converging to a common similar set of interface rather than one single formalized interface.
26:02
I like the theme you draw about these generic libraries that provide the general solution and also the means of composition for the specific libraries. I'm wondering, do you think that there's something in Haskell that makes that more natural to do or easy to do? Is there something, you know, are there like language features of Python that we should be using more or less in order to make that happen?
26:26
As I said, I think the compiler telling you right away if you have the right interface helps. But I think it's also a big part because of the mindset of, well that's how people do it.
26:42
The tools help, sure, but it's also the community and the way to do it. Any more questions?
27:07
Problems at Haskell? A project nowadays because I don't work either with Python or Haskell.
27:23
Most of my projects are personal projects, though it's more take one technology and build something on it. So I tend to alternate. Still I always use Python for small little things I need done in half an hour. And nowadays I try to use Haskell to build bigger projects or projects I think will
27:45
get bigger because the way it works kind of helps to make something small then turn it. The practice in Python to be Pythonic is holding the Python community back at looking
28:08
into other communities for solutions for common problems that all program languages and communities have. Can you repeat your question please? So in Python there's this notion of code being Pythonic.
28:25
Well, good Python code is Pythonic, as people say. So is this wish to have a Pythonic solution holding us back in terms of looking for solutions in other languages that we can import back to Python?
28:43
I don't know. You have two meanings to Pythonic. You have the low level Pythonic is the single line of Pythonic and you have the higher level. It all solutions this architecture of solution Pythonic. And I guess, no, you can find in other languages solutions that we would call Pythonic, elegant structures and elegant solutions to problems.
29:07
So maybe the fact that the name is Pythonic prevents us in some way to look out, but no, I think you can find a solution in another language and find it just as Pythonic as if it was written in Python.
29:25
One more quick question. Come from me. Did you like the talk?