We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PyPy status talk (a.k.a.: no no, PyPy is not dead)

00:00

Formal Metadata

Title
PyPy status talk (a.k.a.: no no, PyPy is not dead)
Title of Series
Part Number
73
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Armin Rigo/Romain Guillebert - PyPy status talk (a.k.a.: no no, PyPy is not dead) The current status of PyPy, with a particular focus on what happened in the last two years, since the last EuroPython PyPy talk. We will give a brief overview of the current speed and the on-going development efforts on the JIT, the GC, NumPy, Python 3 compatibility, CFFI, STM... ----- In this talk we will present the current status of PyPy, with a particular focus on what happened in the last two years, since the last EuroPython PyPy talk. We will give an overview of the current speed and the on-going development efforts, including but not limited to: - the status of the Just-in-Time Compiler (JIT) and PyPy performance in general; - the improvements on the Garbage Collector (GC); - the status of the NumPy and Python 3 compatibility subprojects; - CFFI, which aims to be a general C interface mechanism for both CPython and PyPy; - a quick overview of the STM (Software Transactional Memory) research project, which aims at solving the GIL problem. This is the "general PyPy status talk" that we give every year at EuroPython (except last year; hence the "no no, PyPy is not dead" part of the title of this talk).
Keywords
Speech synthesisInternetworkingRoboticsLine (geometry)WebsiteBitComputer animationLecture/Conference
ChainInterpreter (computing)SubsetIdeal (ethics)Just-in-Time-CompilerGoodness of fitSubsetVideo gameFormal languageProcess (computing)ChainOperator (mathematics)SpeicherbereinigungInterpreter (computing)Just-in-Time-CompilerComputer animation
Interpreter (computing)Event horizonSoftware frameworkTraffic reportingWeb 2.0Arithmetic progressionNumberMultiplicationSpeicherbereinigungJust-in-Time-CompilerArmFormal languageGame theoryLecture/Conference
Code refactoringEmbedded systemSoftware frameworkAerodynamicsFormal languageScanning tunneling microscopeOpen sourceSoftwareComputer programMereologyCodeModul <Datentyp>ArmSoftware testingBenchmarkRight angleSurvival analysisStatement (computer science)Distribution (mathematics)System callProduct (business)Einbettung <Mathematik>AreaCodeLevel (video gaming)NumberPhysical systemBefehlsprozessorImplementationFrequencyArithmetic progressionStructural equation modelingDifferent (Kate Ryan album)CASE <Informatik>Open sourceQuicksortFreewareDampingArmDatabaseExtension (kinesiology)Open setDevice driverScanning tunneling microscope2 (number)Online helpNormal distributionRewritingComputer animation
Convex hullQuicksortMultiplicationProduct (business)Concurrency (computer science)Semiconductor memorySemantics (computer science)WritingMoment (mathematics)Mechanism designBranch (computer science)Database transactionThread (computing)TypprüfungBlogSoftwareMathematical optimizationLevel (video gaming)Just-in-Time-CompilerScanning tunneling microscopeLecture/Conference
Read-only memorySoftwareDatabase transactionThread (computing)Scanning tunneling microscopeRight angleDifferent (Kate Ryan album)Computer animation
CASE <Informatik>WordSoftware testingInsertion lossBridging (networking)AlgorithmDifferent (Kate Ryan album)Semiconductor memoryEndliche ModelltheorieInstance (computer science)Mathematical analysisNumberGoodness of fitData storage deviceNatural numberMathematicsEinbettung <Mathematik>Right angleView (database)Library (computing)Memory managementCellular automatonFocus (optics)Object (grammar)Multiplication signComputer programGroup actionTracing (software)QuicksortFrame problemLevel (video gaming)SubsetMereologyRevision controlSpeicherbereinigungCache (computing)Computer fileLoop (music)Software bugAbsolute valueEmbedded systemIntegrated development environmentFormal languageBranch (computer science)Point (geometry)CompilerPower (physics)Binary codeStandard deviationArmBenchmarkJust-in-Time-CompilerMaxima and minimaPredictabilityAndroid (robot)ImplementationComputing platformLecture/Conference
Moment of inertiaWritingCodeOverhead (computing)Formal languageDifferent (Kate Ryan album)Type theorySemiconductor memoryProcess (computing)Software bugElectronic mailing listTracing (software)Regular languagePhysical systemComputer fileEntire functionSpacetimePressureSemantics (computer science)Thresholding (image processing)Term (mathematics)ImplementationOperating systemStrategy gameJust-in-Time-CompilerArithmetic meanCASE <Informatik>Computer programReal numberQuicksortSpeicherbereinigungObject (grammar)Data storage deviceHeat transferMultiplication signGroup actionGraph (mathematics)Food energyElectric generatorLevel (video gaming)Cartesian coordinate systemState of matterRepresentation (politics)Water vaporGame controllerMereologyAreaElectronic program guidePower (physics)Gastropod shellExtension (kinesiology)InformationThread (computing)TwitterSpeicheradresseWeb pageLecture/Conference
Projective planeAuthorizationTracing (software)Survival analysisComputer programStrategy gameObject (grammar)Computer-generated imageryMultiplication signRight angleSpeech synthesisString (computer science)Electronic mailing listPoint (geometry)Loop (music)Regular graphComputer hardwareIntegerArithmetic meanProduct (business)Letterpress printingVideo gameMathematicsMereologyBinary codeProcess (computing)Mathematical optimizationSummierbarkeitNumberMaxima and minimaMusical ensembleInsertion lossBitPower (physics)Just-in-Time-CompilerBefehlsprozessorComplex (psychology)Revision controlEmbedded systemSoftwareQuicksortBlogDatabase transactionType theoryLimit (category theory)ImplementationMultiplicationParallel portCoprocessorRoundness (object)Lecture/Conference
Transcript: English(auto-generated)
OK, so I'm Armin Rigo, and I'm here to let Romain speak.
Hi. So it's the biggest crowd I've ever spoken in front of, so sorry.
So yes, you can find me on the internet. I've done PyPy work and also a bit of Cython work. I tried to make Cython and PyPy work together, and the approach was wrong, but it was interesting to see,
because now we know it's wrong. And I've worked on Python 3 and NumPy support mostly. So yes, last year we didn't give a PyPy talk, and then people asked us if PyPy was dead. So no, it's fine, don't worry.
Yes. Well, we would have made 10 years of your Python in a row, but we broke this trick. So yes, what is PyPy? PyPy is built on top of the R Python toolchain, which is a subset of Python on which you can write dynamic languages.
So the main advantage over C or C++ is that you get the JIT for free, basically, and also a good garbage collector. And so on top of R Python, the main interpreter we've built
is PyPy, which is the fastest Python interpreter around. So yes, over the last two years, we've done a lot of small progress, things like ARM support, CFFI, eventlet, JVM support,
incremental garbage collector. So if you're interested in games or low latency, it's better than previously. Fast JSON as well. So if you're doing web stuff, this can be useful. And yes, NumPy and faster JIT and stuff like that.
We also have you with GIT support, thanks to the C API for embedding with it. So this is not the same API as the C Python C API, but you can use it to embed PyPy.
So yes, R Python is a framework, so we have also multiple languages on top. So we have an Ruby implementation we've built called Topaz and Hippy, which is a PHP VM.
So yes, to do stuff faster, we need money. And so Python 3, I mean, we managed to do a lot of stuff with not a lot of money, I think. So Python 3 support is 50% done,
and we've done 80% of the work probably. And well, as well, STM, we made a second call for donation to have a more production-ready STM. So hopefully, if you have too much money, I mean.
And then, well, if you've donated before, then thank you. So yes, we also have done commercial support for PyPy. So if you're a big company and are afraid to use something that you don't know how to hack,
then, well, you can hire us. And also, if you have performance issues, if you're open source, usually we'll help you for free. But if you're closed source, then, well, sorry. So yes, if you have Python code,
I mean, we're very compatible. I mean, aside from implementation-specific stuff, it just works. And C code, well, we've worked a lot on being able to communicate with C.
So we have Cpyxt, which is the compatibility layer for C extension API. Well, the thing is, the C extension API is hard to do if you're not C Python, basically. So we've built also CFFI, which is as fast as the C API, well, a lot faster on PyPy.
And so we also have the embedding API, as I said. And we can also talk with C++. Psychopg2CFFI is like the best database driver on PyPy,
for example. And we've also built CFFI-based LXML and Pygame. And also, we are slowly but steadily improving non-Pycable. So yes, PyPy, I mean, it's just fast.
And this is a lot of benchmarks that we have that represent real-world usage. Some of those benchmarks were contributed by unladen swallow. So we didn't write them. So we didn't write benchmarks to show how fast we are. We wrote them to help us get faster.
Well, we didn't write them, but we used them. And so ARM, thanks to the Raspberry Pi Foundation, we have production-level ARM support. It's faster. The speed difference between C Python and PyPy on ARM is bigger than x86, because the ARM CPU is not as smart.
And PyPy just produces very nice code for ARM. And we'll see Python doesn't. And so it's in the standard distribution shipped with the Raspberry Pi.
So that's very cool. Non-Py support is in progress. We passed that much tests, but it doesn't mean much. But it showed that, well, we've done stuff.
And well, it's a hit and miss right now. So if you can just try it and tell us what you need, then we'll work on that before working on something else. And we don't have CyPy support yet, but we have an idea on how to make it work.
So hopefully, this will pan out. And no, we won't rewrite CyPy from scratch. Py3K, so we've released the 3.2 not so long ago. And we started the 3.3 branch.
If you want to get started into PyPy, this is the moment to get started. Because once we've catched up with C Python, then there won't be any entry-level task left. So you can find us at the sprint,
and Py3K is a good way of getting started in PyPy. There's also a few missing optimizations on Python 3, but we are working on bringing them back. So yes, CFFI is a way of interfacing with C. It works on the API as well as the ABI level.
So unlike C types, it's more type-safe. It doesn't segfault as much. Unlike the C API in Cython, it runs on PyPy, which is good. And it's super fast on PyPy. It's almost as fast as just calling C from C.
And well, STM, well, the GIL is kind of a very hard problem. But the advantage of the GIL is that it hides a lot of concurrency problem.
So software transaction memory allows us to have the same GIL semantics without the GIL. So you can run on multiple threads. And it's also a mechanism for saying concurrency.
So threads and locks are horrible, so hopefully this will be better. And yes, we have a release on a PyPy STM with a JIT. So you can find it on the PyPy blog. It wasn't released so long ago, so you can try that. And that's why. And then now we're talking about having a production-ready
PyPy STM, maybe. And if you're more interested in removing the GIL, then you can see our talks tomorrow morning. So yes, you can find us on IRC and our blog and on PyPy.
And well, if you have any questions, then you can ask on IRC or right now.
20 minutes, sir. Yes. Did you benchmark PyPy compared to Swift? I didn't. So Apple was so nice to basically show it. So we have Swift, and then there is Python.
And look at these differences. But I would be really interested to turn it around. Do you have a PyPy? Repeat the question. Repeat the question. Yes, so the question was about Swift and how much Python sucks compared to Swift, according to Apple. And well, if you can look at Alex Gaynor's Twitter,
he wrote the same algorithm Apple used for their benchmark. And he got a very good performance improvement on PyPy. So I think it was just marketing stuff, basically. Yeah, of course. Hey, coming back to maybe mobile as example Swift,
my experience in Python was that call-per-call arm would be about 10 times slower than x86 when running C code, which I put down to very small or no cache and no other speed, which
makes sense, because cache takes a lot of power. How much is this penalty for PyPy? Is it less of an impact or more of an impact, or is it about the same? So the question was about ARM and Python and ARM and C Python compared to PyPy. So I think the main performance difference
is in branch prediction. And PyPy does better because, well, it's a tracing JIT. So it generates just one linear trace. So basically, we. Do you think it's more due to code or data model? Code, I would say.
And finally, a quick one. What do you think of PyPy becoming, possibly in the ideal world, a de facto programming environment for Android? So PyPy on Android. And I don't know. I mean, Google owns the platform.
So ask Google. Why don't you skip 3.3's quarterly? So why aren't we going straight from 3.2 to 3.4? Well, 3.3 is a subset of 3.4.
So if we do 3.3, then we get part of 3.4. I don't think there are any language changes in 3.4. It's just library. Well, then we will get 3.4 for free.
OK, so PyPy support. We want to embed C Python, basically. And then you can pass the way we've written NumPyPy. We use the same storage. Like, the storage is done in the same way. So you can just pass stuff around.
I mean, you can pass around stuff between C Python instances. So you can do the same with PyPy, and you can use that. And PyPy is tons of C, which PyPy won't get any faster. So there's no point in parking that. Yes?
So creating .exe files, right? Standalone .exe files?
We don't have really anything planned on this. But if you can use the embedding API, you can probably, well, you will probably have to write the compiler thing to make it into one binary. But there's the embedding API.
So I think it would be potentially doable. How well can you currently constrain resources used by the JIT, as long as the JIT maximum memory usage or things like this? Can you basically say, I only want to have 50 or 80 megabytes used by the JIT .exe file?
It's going to actually keep to it? So the question was about resource management in the JIT. So you can limit the size of your traces, that sort of stuff. So you can, yes, restrict memory.
Absolutely. How hard are they? Yes, you can as well, I think. No, no, you can't. The numpy problems that are still left, and the numpy bugs that are still left, is that a few really hard problems that we'll solve that need to be fixed? Or is it a bunch of little problems or a bunch of hard problems?
Or what's the nature of them? So yes, what's not done in numpy, basically? It's a lot of very small problems. And yes, more testing, more things like interfacing with C libraries, like for doing FFT, that sort of stuff.
So we're working on that. And it's also a good way to get started if you're interested. Well, for example, the bridge between C libraries and PyPy, you can do it in pure Python. You don't have to learn R Python. So it's a great way to get started, I think.
Between the standard C Python and the PyPy, you can write a program and runs OK and this implementation of Python and in the other
will run without any change? So the question was about differences between PyPy and C Python. So the garbage collector is different. So your objects aren't guaranteed to be cleared as soon as they go out of scope.
So if you rely on destructors to free resources, it can cause problems. For example, file descriptors. So if you open a lot of files in the loop and you don't close them, then you will run out of file descriptors. That's one of the most, well, that's
one of the main differences, I would say. Yes? Is the version normally a T-shirt related to PyPy? No, it's not. Well, is the number on my shirt related to PyPy? No, it's related to Python. How so?
Yes? I have a question about a garbage collector. What is about recraft callbacks? Are they executed with the same semantics as in C Python? Or are they also in like? So the question is about recrafts on PyPy.
Do you know of any differences? No, no differences. Well, I mean, you wrote the garbage collector, right? So he knows best. So PyPy used to have quite a large memory footprint
when it was running at an increasingly improved. Where does it stand now relative to a slightly implementation of a real world system? How much storage does it have? So the question is about memory consumption. So it depends on your system. I would say PyPy, well, it depends on your amount of code related to your amount of data.
If you have small code but a lot of data, then the difference won't be that big. If you have a lot of code that you run very often, and you don't have that much data, then it will take more memory. And we've also fixed a bug in where file descriptors weren't,
well, we fixed the memory basically. So it can be better as well. If I could continue that. Is there any concept of memory pressure on the system that is related to garbage collector? Is there a threshold that you can set on the garbage collector? Meaning that? No, I was just repeating your question.
Yeah, meaning that the garbage collector will take into account the other processes in the system in terms of how frequently it's been ran? I don't think so. Yes, we're not sure. But you have various thresholds that you can set,
like running the garbage collector more often, that sort of things. So you can do that. And then if you use memory in your program and you run out of memory, I mean, there's nothing we can do. We can just free some objects. Is it a fixed percentage overhead that it's have discovered?
No, I think it depends on how big your heap is. So how is the code generated by the JIT shell?
So yes, it's shared between threads, but not between processes. What goes into pycache? What goes into pycache? CFFI-based extensions?
No. No, you can't have traces. Well, you can't write them to disk. Traces have memory addresses, hard written in them. So if you reload them, then your entire memory space is different, and you can choose them.
So Bob Ippolito made a strong argument, or some would say a strong argument, for type annotations in Python as a way to increase performance and so forth. Is Python going to make that irrelevant, or is Python going to use those in fancy ways? What's the plan? So type annotations for PyPy?
PyPy doesn't care. Can you give me in the sense that they will make no difference? Yes, it will make no difference. I mean, you can use them for checking your code or whatever, but PyPy is built for dynamic languages,
so by making the language less dynamic, you don't get that much. Either you turn your language completely static, or if you keep it in between, I don't think you can get much performance out of annotations. Yes?
The PyPy is translated, is that useful for your own programs, or is it still in a space where you have to use it for your own tests? So R-Python as a general-purpose language, we don't recommend people who use,
well, you shouldn't use R-Python as a regular language. If you're writing VMs, I think it's a great language, but if you're writing general-purpose applications, I think it's horrible. You can do it, but it's your own problem. In case I have, for example,
I have to think about my own language, could I use PyPy to statically compile it, or should I make it something else? So the question was about implementing statically-typed languages using PyPy. I don't think you would get more, I think you should use other languages.
I mean, LLVM is just about code generation, you still have to write tons of stuff to target LLVM, but I don't know. At some point, PyPy is also just a VM, so if you target that VM for your language, then that's not wrong. I mean, lots of people, lots of languages are targeted to LLVM, for example.
Yes, so I know, for example, so the question was about other languages targeting PyPy, so the other question was about statically-typed languages. I mean, I don't think you can get as much performance as you could,
but I know that High, the Pythonic Lisp that compiles to Python bytecode, and it runs pretty well on PyPy, but it's a dynamic language. Yes?
Okay, so the question is about storing homogeneous types in lists
and how that affects memory usage and performance. So, we'll have a thing called list strategies and dict strategies and set strategies, yes. So, it works on basic types, basically, integers, strings, floating point numbers,
and, for example, in a list, you just store unboxed ints, so it's just an array of integers like you would get in C, and you can do optimizations based on that. I mean, if you pass an array of integer to a loop
and the loop is jitted, then the loop will assume that you always pass, well, the jit specialized on the fact that the loop is only integers, and so you don't need to spend time unboxing the integer, for example, and if you put it back into the list,
then, well, you don't get the unboxing part, so it's also good for performance. So, what happens if you make it a non-homogeneous list? So, you turn it back to a regular list of objects, and then, well, if you pass that list to the code,
then it will compile a different path. Python 3 version of PyPy, are you still able to unbox integers because the integer is now an arbitrary size? So, Python 3 integers, how are they optimized? So, that's one of the optimizations we removed,
but we want to reintroduce it. I don't know if it has been already reintroduced. Yes, it may have been already done. Yes? I remember a very old blog post
introducing the software transaction in memory, saying the initial implementation to use a pure software implementation, and then in the future there's new Intel processors introducing hardware to support the accelerator. Has that happened?
So, the question is about STM versus HTM, mostly. So, right now we're having a pure STM. It would be possible to have a hardware-assisted STM, but right now I don't think the CPUs are ready.
The Haswell CPU has HTM support, but the limits are too big for us to use it.
So, can we use the PyPy JIT to generate a binary? No. No, it's... I don't think so. Well, the PyPy JIT generates linear traces, so it uses types it sees to generate the traces,
and so if you wanted to compile it statically, then you would need to compile for every type possible in your program, which is, I don't know, what I did for PHP, and they ended up with a 10-gig binary, I think.
So, we don't want that. Yes.
Yes, so C++ support. So, I think the project has moved from using GCC to using Clang, which is much nicer to have... Well, it's easier to write plugins, so now it's using Clang for that.
Yes. I didn't hear your question. Did I understand correctly that which SciPy support you can embed C Python within PyPy? Yes, we are going to embed C Python inside PyPy.
At least that's the approach we are working on. Yes, you could move potentially objects between...
Yes, I mean, it chooses CFFI, so you can do what you want. It chooses the Python C API, so basically you can embed whatever you want. Well, I think it would be possible, maybe. I have to try.
I would like to try to embed Python 3 inside Python 2 or something. That could be fun. Yes. What's the biggest installation of PyPy in production?
I didn't have the authorization to say it publicly, so if you see me outside, then I'll tell you if you don't repeat anyone. Can I follow up on that? Do you want to roll out where they support C Python
because of PyPy, and that uses parallel computing, multiple character recognition, libraries, and all sorts of great complex rollouts, and do you write very positive for that rollout experience?
It was at least 10 years possible. So yes, I can't repeat that, but it seems that someone is using it for something complicated and interesting. Any questions here?
So another round of applause?