Anna Herlihy - Wrestling Python into LLVM Intermediate Representation
The LLVM Project provides an intermediate representation (LLVM-IR)
that can be compiled on many platforms. LLVM-IR is used by analytical
frameworks to achieve language and platform independence. What if we
could add Python to the long list of languages that can be translated
to LLVM-IR? This talk will go through the steps of wrestling Python
into LLVM-IR with a simple, static one-pass compiler.
-----
What is LLVM-IR?
The LLVM Compiler Infrastructure Project provides a transportable
intermediate representation (LLVM-IR) that can be compiled and linked
into multiple types of assembly code. What is great about LLVM-IR is
that you can take any language and distill it into a form that can be
run on many different machines. Once the code gets into IR it doesn’t
matter what platform it was originally written on, and it doesn’t
matter that Python can be slow. It doesn’t matter if you have weird
CPUs - if they’re supported by LLVM it will run.
What is Tupleware?
TupleWare is an analytical framework built at Brown University that
allows users to compile functions into distributed programs that are
automatically deployed. TupleWare is unique because it uses LLVM-IR to
be language and platform independent.
What is PyLLVM?
This is the heart of the talk. PyLLVM is a simple, easy to extend,
one-pass static compiler that takes in the subset of Python most
likely to be used by Tupleware. PyLLVM is based on an existing project
called py2llvm that was abandoned around 2011.
This talk will go through some basic compiler design and talk about
how some LLVM-IR features make our lives easier, and some much harder.
It will cover types, scoping, memory management, and other
implementation details. To conclude, it will compare PyLLVM to Numba,
a Python-to-LLVM compiler from Continuum Analytics and touch on what
the future has in store for PyLLVM. |