We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Serverless Computing: FaaSter, Better, Cheaper and More Pythonic

00:00

Formale Metadaten

Titel
Serverless Computing: FaaSter, Better, Cheaper and More Pythonic
Serientitel
Anzahl der Teile
9
Autor
Lizenz
CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
"Function-as-a-Service (FaaS) is the consequent code-level implementation of the microservices concept in which each function or method is separately instantiated, measured, accounted and billed. As a programming and deployment model, it has become popular for discrete event processing. Several public commercial services offer FaaS hosting, but almost always in silos with arbitrary limits, incompatible tooling for each provider, and no convenient sharing of functions. Snake Functions (Snafu) contrasts these constraints. It is a novel free software tool to fetch, execute, test and host functions implemented in Python and (with slight performance overhead) in other languages, too."
Dienst <Informatik>PrototypingWeb logFunktion <Mathematik>RechenwerkStrom <Mathematik>ProgrammierumgebungImmersion <Topologie>Rapid PrototypingSoftwareentwicklerBildverstehenFramework <Informatik>COMRaum-ZeitTaskLambda-KalkülOffene MengeLaufzeitfehlerNeuroinformatikFunktionale ProgrammierungFramework <Informatik>ImplementierungKeller <Informatik>Kartesische KoordinatenTermPerspektiveMathematikKonstanteService providerCloud ComputingMessage-PassingOrdnung <Mathematik>SoftwareentwicklerProgrammierspracheTaskFeuchteleitungPunktwolkeMultiplikationsoperatorLaufzeitfehlerRechenschieberFunktionale ProgrammierspracheOpen SourceTwitter <Softwareplattform>EreignishorizontSchnittmengeCodeEinsZweiEndliche ModelltheorieGüte der AnpassungBitProgrammierumgebungDienst <Informatik>Konfiguration <Informatik>SystemplattformGrundraumApp <Programm>KonfigurationsraumHalbleiterspeicherSondierungArithmetisches MittelDualitätProgrammierungZahlenbereichSichtenkonzeptt-TestTabelleComputerspielRechenwerkAuswahlaxiomMereologiePrototypingBildverstehenProjektive EbeneServerPunktCASE <Informatik>Lambda-KalkülDatenfeldRechter WinkelVorlesung/Konferenz
FeuchteleitungStandardabweichungService providerQuellcodeKugelOpen SourceAuthentifikationOffene MengeFunktion <Mathematik>Lambda-KalkülPunktwolkeGoogolEin-AusgabeEreignishorizontKontextbezogenes SystemInformationMeta-TagZeichenketteClientPunktDezimalbruchSpeicherabzugRechenwerkParallelrechnerProgrammierparadigmaMAPOverlay-NetzSkalarproduktFramework <Informatik>ROM <Informatik>Algebraisch abgeschlossener KörperAppletGatewayGebäude <Mathematik>Transformation <Mathematik>Desintegration <Mathematik>ProgrammbibliothekMathematikSystemplattformInstallation <Informatik>CodeStellenringMenütechnikMarketinginformationssystemSoftwaretestFigurierte ZahlLambda-KalkülKonfigurationsraumObjekt <Kategorie>DateiformatDämpfungKontextbezogenes SystemPunktService providerHalbleiterspeicherKonstanteEin-AusgabeDifferenteUmsetzung <Informatik>UmwandlungsenthalpieFunktionale ProgrammierungBitDienst <Informatik>QuellcodeFunktionale ProgrammierspracheTexteditorSpielkonsoleFramework <Informatik>Ordnung <Mathematik>Funktion <Mathematik>Pi <Zahl>MultiplikationsoperatorBildverstehenPunktwolkeDistributionenraumZweiTermFehlermeldungElektronische PublikationAdditionGeradeMathematikWort <Informatik>EreignishorizontZeichenketteCASE <Informatik>Klasse <Mathematik>Projektive EbeneSchreiben <Datenverarbeitung>RelativitätstheorieAppletProgrammierungMehrrechnersystemTransformation <Mathematik>NamensraumAggregatzustandLaufzeitfehlerTUNIS <Programm>Automatische IndexierungSchnelltasteWhiteboardMomentenproblemAusnahmebehandlungAbstraktionsebeneCharakteristisches PolynomKlon <Mathematik>VerknüpfungsgliedMailing-ListeServerParametersystemRechter WinkelNeuroinformatikCodeInformationData DictionaryRaum-ZeitMAPDatenstrukturDemoszene <Programmierung>SchlüsselverwaltungStandardabweichungVorlesung/Konferenz
AppletProgrammierspracheFunktion <Mathematik>AuthentifikationArchitektur <Informatik>Operations ResearchKlon <Mathematik>Installation <Informatik>Notepad-ComputerDesintegration <Mathematik>GoogolEreignishorizontFunktionale ProgrammierungIntegralAppletAlgorithmusComputerarchitekturImplementierungProgrammierumgebungFunktionale ProgrammiersprachePhysikalisches SystemPunktwolkeDienst <Informatik>Patch <Software>Service providerKryptologieAuthentifikationDifferenteKartesische KoordinatenProgrammierspracheServerRechter WinkelNummernsystemProjektive EbeneComputersicherheitSoftwaretestMultiplikationATMBildgebendes VerfahrenAdditionGanze FunktionLaufzeitfehlerProzess <Informatik>Brennen <Datenverarbeitung>SkriptspracheARM <Computerarchitektur>Syntaktische AnalyseLambda-KalkülOffene MengeSystemplattformHalbleiterspeicherFreewareCASE <Informatik>System FInstantiierungMAPPunktPhysikalischer EffektNamensraumQuick-SortOpen SourceSoftwareentwicklerDokumentenserverKontextbezogenes SystemInterpretiererDefaultVerschiebungsoperatorPrototypingMessage-PassingMultiplikationsoperatorFront-End <Software>MathematikDemo <Programm>VersionsverwaltungAssoziativgesetzTopologiePerspektiveQuellcodeVerzeichnisdienstAblaufverfolgungTermBimodulRohdatent-TestKonfigurationsraumParametersystemMobiles EndgerätSchnelltasteBitBefehlsprozessorNotebook-ComputerRichtungOrdnung <Mathematik>ZahlenbereichAuswahlaxiomDiagrammCLIMereologieFramework <Informatik>FehlermeldungDateiverwaltungComputerspielInformationKategorie <Mathematik>RechenschieberDämon <Informatik>Web SiteElektronische PublikationProdukt <Mathematik>NeuroinformatikEigentliche AbbildungTelekommunikationTropfenCloud ComputingReelle ZahlLoginVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
Thank you very much and we see a nice dualism here, we have three potentially positive terms and I think we heard today in the second talk by Tim that you can choose any two out of the three so we will see which of the two, which of the three will be chosen.
So the background to the talk is that we see currently a trend in industry that everything is moving towards a cloud and towards a new environment which is called serverless computing environment and nobody really knows what it means but everybody tries to sell it, right?
And so as a university of applied sciences where I am working, it is our task to first of all explore and analyse what is available but second of all also to provide the means to make use of the new facilities without falling into some of the traps.
So I am working at the university as a lecturer, I teach Python, we graduate I think around 270 Python students per year but most of the time actually I am involved in research and I have been using the language as well for most of our prototyping needs.
The use of Python dates back to way before I joined the university in Zurich, I probably started around 15 years ago and as you grow older and more experienced you probably become
more conservative and values are important in life so I have two core values, two constants that never change, first of all my code is still bad and second of all my slides are still bad so you will have to live with that. Maybe a bit of introduction, what's serverless computing?
It's essentially marketing term around a service offering which is called function as a service which in the traditional cloud computing stacks would be located mostly on the platform as a service layer so you can say it's a refinement of the platform as a service idea where you can run individual functions in the cloud and you really only pay for each invocation
so when the function is not used you don't pay anything which is a very intriguing concept obviously and for a lot of applications, not for all applications but for a lot of applications it makes perfect sense to host your application in such a model so if your application is not extremely popular you are going to save a lot of money if you are the person operating
or providing the application. And so the term serverless is because it is seemingly serverless, you do not interact with the servers, with the infrastructure, with the resources directly anymore but of course you are still exposed to some of the configuration options which concern the resources
for example how much memory do you allocate to each function that you want to run, you will find out if the function fails that probably you did not allocate a sufficient amount of memory. What are those functions? Are they really like let's say Python functions? Well typically what is offered as a function is called a function app or a function unit
can be a function in the programming sense but it can also be a container or an application package which are more traditional paths and container platform options. Of course this being a Python event we will look at the first option where the function
we are going to run are actual Python functions. So a developer's vision, what do developers want? And we are currently jointly with two other research institutes performing a survey among developers and ask them what do you want in such serverless environments?
What do you need? What are the pain points? Because our mission in the end is to combine the technological expertise with our scientific excellence with having a neutral view on all the developments in order to be able to support
us with companies. And if we take a very sober look we see that there is still a huge gap between the marketing message and the reality. It's probably not a good idea to include this picture, it's from Aleppo, it's actually very sad but if we look into the world of cloud computing we have tried to map it somehow
right because we need a vendor neutral or cross vendor approach of figuring this out and we see in the upper left corner the major cloud providers, obviously they are all active in the field, we see a couple of specialised providers shown below them and we see a couple
of open source tools which work mostly complementary to the commercial services and on the right side we see a couple of run times which are increasingly emerging to emulate essentially the behaviour of the functional run times of the commercial providers.
If we take a look at from a Python perspective in which of these services can we run Python, it's not too bad, so Python is essentially the number two supported language, it's not supported by all of the services among the big players, I think Google is most known
for supporting only JavaScript in their run time and some of the other run times are not supporting Python as well but in the majority of frameworks it is supported. The underlined frameworks you see on the right side are the ones which come from our research lab, I will introduce them during the talk, you see that at least two of them are also
supporting Python very well. If you take a more concrete look at the run times, we see that there are about 15 or 20 run times that we can distinguish by now and they all have a varying amount of language
support. Whenever I show this table at the conference it is immediately outdated because there are changes going on, it's mostly completing, but then again the support, the actual support for each language differs from it barely runs to it is well supported, there are SDKs, there are debugging tools and so forth.
When we look from a Python perspective then we see that maybe around eight of the run times are of very much importance to be analysed in terms of when we want to run our Python applications in such serverless environments, which path should we take, should we go to AWS, should we go to OVH, should we go to Fission, which runs on top of Kubernetes,
that has to be seen. From a historical point of view, initially the first Python implementations were still Python 2, at a time when Python 3 was actually out there and very mature, so for some reason
Python 2 was chosen, basically all of the providers and all of the framework developers then switched to Python 3 around last year. So you can see that apart from OpenLambda, which is almost that research prototype, basically all the run times now support Python 3. So it's a safe choice, but we can conclude that in general we have very good support
for open source tools in the serverless environments, very good support for Python 3, everything as well, right? But we do have a couple of bad news as well, so first of all the dominant large cloud providers, and they are often the ones which are somehow used in application projects, do not open
source their run times, or not everything of it in any case. The first use barrier is still high, it's again a new set of APIs you will have to learn, you cannot just start, open your editor writing functions and you'll be done with it.
And of course there is a lot of heterogeneity, there are no standards, every provider will require their own format. I will skip the other points because these are already the main objections that we can have, so let's look in detail at some of the functions that we can deploy at the providers.
If we look at AWS Lambda which has been the first major such service, you see that every function essentially requires two arguments, an event which is your own data structure, you can define in terms of a dictionary what data you want to bring into the function, and you get the context object, the context object is essentially instantiated by the
run time itself, it provides information such as where do you actually execute the function, because you can hardly choose that by yourself, you are no longer dealing with servers, or how many seconds do you have left in your function, so you should periodically query this object because there is typically a timeout associated to each function, typically after
five minutes or in the case of Google Cloud in nine minutes, execution time, the execution is terminated and no state is saved, so you better clearly from time to time do I have sufficient time in my function to complete the job or go to the next job, so that brings
us to one of the key characteristics of those functions, they are completely stateless, you cannot save any data except when you bind to external services. You see the other formats, they are all differing, if you want to go to the IBM Cloud you suddenly deal with dictionary in, dictionary out, when you go to functions you deal with dictionary
in but string out, when you go to vision you deal with nothing in but you have to request the data manually from a flask object, so if you don't use flask you are pretty much in trouble and string out, and if you go to Azure, I won't comment on that one, you can
as well write everything by yourself. Things are improving now, there are a couple of abstraction frameworks which at least make the deployment easier, you can now deploy to several of those providers by using something like a serverless framework, or there is a Lambda uploader written
in Python, they are very interesting frameworks, however there are still issues when it comes to the programming itself and these frameworks do not solve the differences in the syntax and synopsis of the functions. If you look at a few Python specific approaches,
there is PyRAN which has been one of the first academic approaches targeting especially scientists who have not so much experience in programming, you can essentially set up a PyRAN executor, it's the PWEX object and you can just give it a local function and the function will
be serialised, uploaded to Amazon S3 and will be executed in Lambda, all of that happens behind the scenes, so if you are not familiar with the whole technology, you just add those additional lines of code, the two lines below the function and the function will just work. One shortcoming obviously is that this tool is very much bound to AWS, you cannot change
providers easily. Another approach is the one called Lambda from Carson Gee, it is essentially a bit like Flask for serverless computing, so again you
have your function, you set up a tune object, which is an object of class Lambda, you set up the configuration parameters, you want it to be hosted in a certain region of Amazon and you set up the memory and you can essentially use that one to create wrapper objects as
is shown below using the decorator dancer which is essentially their word for worker. Again it is bound to AWS and essentially it creates zip packages which you can upload by yourself or it automatically uploads them, it bypasses the need to go to S3, it's already
a bit less depending on Amazon but it still is depending on it. Now when I came across that framework I was very confused because I wrote a package called Lambda myself, at that time to make it easier to handle, so I compared the first commit times and we can see that my Lambda was started around five months before Carson's Lambda
but I did not maybe advertise it enough and second of all I failed to register the PyPy, the Python package index name space in time, so kudos to Carson, he got the Lambda project.
If you look for my Lambda and I will just show you the tool from Git, it's called Lambda transformer on PyPy. There are a couple of even older Lambdas, so both of us were not the first to think about this name but none of them are Python related, so they are all Java related.
My Lambda, what does it do? It allows you to add a decorator to a function, sounds familiar, looks like the other Lambda, but the difference here is that it actually works on a source code transformation level, so Lambda produces a rewritten source file so that you can upload
the function to at the moment Amazon Lambda, that would not be a big difference to the other Lambda except that we are currently working on a converter which makes it possible to also transform into the other formats using the same tool.
And I think the best way to show the tool is to try it out, right? Going away with the error message using a scratch space, let's call it SPS 18, and let's first of all clone
on some GitLab or GitHub. So in our research lab we have the mandate that every tool must be usable directly after the Git checkout. There should be no requirement of configuring anything. Let's see if this works. Let's write our test function. I think everybody
wants to win a pie board, right? In order to win the pie board you need to give the name who is the mayor of Rapperswiljone. Let's make it a constant function and of course
everybody knows that the mayor is the mayor, right? You should all put this into your sheet. You will not increase your chances but you will increase my chances if you do so. And then let's run Lambda on this function and what it does, it tries to figure out what
is my Amazon configuration and then it creates a zip file and uploads that to Lambda and now we go into the Lambda console and we check under functions we have this function and we can just execute it by running test. We need to give it a test name. We have some
input data which we absolutely don't need. We will nevertheless leave it there. We create the test function and if we click test again we get the output. And it must be true if
Lambda says so, right? It's the mayor of Rapperswiljone. Now that tool is certainly useful from a developer perspective. As I mentioned we are currently working on an associated tool which
you already I think find a preliminary version of it in the Git repository to make portable cloud functions possible which is quite important in order to avoid vendor logins. But we also need to cover the runtime side and that's probably the more important tool in the long
term. We do not want to run our cloud functions, our cloud applications always at any of the let's say four or five big providers that are out there. And so we need a tool which makes it also possible to deploy and execute on a private cloud environment on our service
on our developer notebooks and our research notebooks. And we have come up with the idea of a Swiss Army knife tool for serverless computing. It should have all the qualities of a Swiss Army knife. I typically have one with me, not always because of flight restrictions so then I have to leave it at home. But it should be robust, it should be small, it should
be versatile, it should be useful in all sorts of daily life situations that you never imagined would happen. And so that triggered the design of the tool. I'm not sure where to start in the architecture diagram but of course what we have seen so far are the functions, right?
So they are the central part. And you write a function and then you want to execute them. The tool should, even though it's written in Python, we are also friendly to the other languages, you should be able to write your functions also in other languages. So we need a lot of parsers, this down here, right? And so the parsers pass the Python source files
and extract all sorts of functions that they want to execute. And that's already an advantage over the runtimes you typically find at commercial providers where you can only have one entrance point in each source file and you need to go through some hoops to make it possible
to have multiple entrance points whereas in our tool, which is called Snafu, you can basically execute any function and any method which is found in the source file. The execution then is triggered by an event and the event, you see the connectors down here, can be an
explicit HTTP request, it can be a cron job, it can be a change in the file system, it can be an XMPP message, it can be of course the user himself by entering the function name in the CLI. Once a function is specified and the arguments are given, it is executed by an executor, we have a couple of Python 3 executors in the commercial clouds, it's
all about isolation and making sure everything is secure which is all fine but for research purposes we also need the raw Python performance from time to time. So we have a Python 3 in process execution, essentially Snafu will just execute the function and if the function
calls exit then Snafu will exit, use at your own risk. Python 3 isolated does not have this problem, communicates with an external Python interpreter, Python 3 external is very similar, Python 3 tracing has been contributed already by students, traces the function and
does some benchmarking where are the bottlenecks in terms of networking, CPU, memory and so forth. And very interesting LXC, you can virtualise your function execution, you can still have a bit of isolation around it by transparently running through an LXC, Linux containers context, there is a very nice Python binding so the implementation of that took 10 minutes. We
do have a little bit of Python 2 support, Java C and Node.js as well in case you will need it. What is very interesting is that even though you can use it as a developer tool or for research purposes you can also run it as a demo, as a server process and
when you do that it needs to offer a certain API and I was thinking which API should it offer and then I was telling myself it should not offer yet another API but it should offer all of the APIs of the large cloud providers so that all of the tools in this ecosystem
that already exist can be used with Snafu and it can become a drop in replacement. Interestingly the name spaces of the APIs of Amazon, Google, IBM and so forth they don't overlap, they have proper name spaces which means that within a demo you can implement all of them at once
and that is what is done so you can use the Amazon tools to upload functions to Snafu, you can use the Microsoft tools, you can use the Google tools, I'm fine with whatever tool you use as long as you use Snafu in the back end. So there is a couple of security issues security packages involved so Amazon came up with their own crypto algorithm which is
called AWS4 which thankfully was properly documented so that one is now also supported by Snafu. And if you leave the main architecture and go to the top of the picture here you
will see that we also have integration with a tool called Snafu import which actually started as a pure import tool but now is an import export tool so you can pull your functions from Lambda, modify them, test them locally, upload them to IBM cloud functions and you can have such workflows where you do not really depend anymore on those commercial environments
for executing the functions. A couple of use cases you can and we will just do that in a minute directly clone Snafu from GitLab, you can obviously if you like working with Python packages install the Python package, Snafu thankfully was not yet taken by anybody
else. You can obviously run the entire project in Docker, there is a prepared Docker image and we even have set up a multitenant mode where you can run Snafu in a way that each registered user gets his or her own instance very much isolated from the other users and
we have implemented that on Appuyo, the Swiss container platform which means that essentially by calling an OpenShift deployment descriptor you can just install all of Snafu including the multitenancy configuration into your OpenShift account or Appuyo account. We also have raw
Kubernetes support, there is for example the European grid initiative, we have integration with them, it's essentially a research grid or research cloud for researchers all across Europe, they offer now Kubernetes as a service and we also have Snafu integration with them.
You can have it in public clouds, you can have it in private clouds, you can have it in your own notebook, I think it's a versatile tool which you can use to test your functions even though you may want to run them in another environment in production in the future but
you probably do not want to pay for the testing and for the debugging. Some examples of how to do that, let's say you want to import a script, you call Snafu import, you give the source provider, let's say IBM dot functions and you give the target provider, let's say
U plus because you want to run a serverless framework on top of Kubernetes which is locally installed and then you have to redirect the provider tools in AWS, AWS CLI for example you have to just set the endpoint to the local installation so normally if you don't specify
a different port it will listen on port 10,000 which is probably not the best choice but I was lazy to think of a different port number. In the case of OpenWhisk which is the open source runtime by IBM you just need to set for once the property of the endpoint. In the case of Google, well Google uses some very obscure authentication slash security
scheme so you cannot use an endpoint different from the Google Cloud with their tools unless you patch the Google Cloud SDK and the patch is included in Snafu so you just need to run the script and your Google Cloud SDK is patched so that you can use a cloud provider which
is not called Google. A couple of additional examples, I think we will run this, we still have a couple of minutes from Git, in that case it's on GitHub. So when you run it for the first time, since it also will execute Java and C functions, it will need to compile
them first. So the default Git repository has a couple of example functions and you
will see that you will have to run Snafu in a way that the server will not be able to compile. The modules are compiled first and then we get a very nice error message which is probably due to me not having pushed the latest version to GitHub so let me quickly
cheat. I will fix this after the talk so let me use the version in my development directory and so what happens is that it activates the parsers, it parses all the files that it can find in its own directory tree, you can also point to a different directory tree and it
sets up the executors and then you set up the function name, hello world dot hello world for example and if you execute it, it gives you the timing information and that is typically what the function as a service environment does, it executes a function, it's in itself nothing spectacular, right? There are a couple of further ways that you
can invoke the functions, I will skip over that part, it's pretty much documented in a step-by-step tutorial which you can find when you go to the PyPy site and from there you go to read the docs documentation, the same for obviously the daemon mode. And as
the last slide I want to introduce some of the research we are working on so one of the research projects is to have a fully decentralised marketplace for functions so that we do not rely on AWS marketplace or GitHub or any of the centralised systems which are all in the
end somehow fuelled by venture capital, right? We want to have a fully decentralised system, we are currently in the design stage but we already have some prototypes so if you are interested in some of our tools such as our function hub where you can upload and download
functions in a sort of community style then feel free to contact me and my team. And that concludes the talk, there is a lot coming up obviously in terms of the directions we are heading to and if anybody is interested in the topic of serverless specifically, we
also have by the end of the year a community event in the Tonia Real in Zurich. Thank you very much. APPLAUSE