Tips for Writing a Web Server and Beyond
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 15 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/52189 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
2
15
00:00
InternetworkingBenutzerbeteiligungServerComputeranimation
00:26
KundendatenbankCodeInhalt <Mathematik>DatenbankURLCodeKundendatenbankAutomatische HandlungsplanungComputeranimation
00:46
CodeKundendatenbankLoginE-MailStrom <Mathematik>ProgrammverifikationAutorisierungAlgorithmusRandwertPhysikalisches SystemSyntaktische AnalyseAbfrageMiddlewareVersionsverwaltungDatenbankMigration <Informatik>Objektrelationale AbbildungÜbersetzer <Informatik>FehlermeldungZeichenketteInformationRepository <Informatik>URLLaufzeitfehlerQuick-SortClientAuthentifikationAggregatzustandStichprobenumfangAbfragePhysikalisches SystemDatenbankMiddlewareAbstraktionsebeneAlgorithmusTabelleRandwertBitDialektServerGeradeMultiplikationsoperatorBenutzerbeteiligungFront-End <Software>Mathematische LogikParametersystemFigurierte ZahlMinimumAutomatische IndexierungMigration <Informatik>FunktionalObjektrelationale AbbildungObjekt <Kategorie>sinc-FunktionZusammenhängender GraphDatenfeldKundendatenbankMessage-PassingCASE <Informatik>DiagrammCodeRechenschieberInterface <Schaltung>TypentheorieFramework <Informatik>RohdatenTermWeg <Topologie>Projektive EbeneDebuggingBlackboxExistenzsatzDatenflussE-MailInternetworkingFormation <Mathematik>Fortsetzung <Mathematik>SoftwareentwicklerDatensatzApp <Programm>HilfesystemComputeranimation
09:03
ZeichenketteE-MailMakrobefehlAlgorithmische ProgrammierspracheAttributierte GrammatikKundendatenbankMultiplikationsoperatorDatensatzObjekt <Kategorie>FunktionalDatenfeldMakrobefehlTabelleAlgorithmische ProgrammierspracheFehlermeldungAttributierte GrammatikZeichenketteSystemaufrufMereologieDerivation <Algebra>ResultanteBitrateComputeranimation
10:02
KundendatenbankDatenfeldFunktionalQuick-SortEndliche ModelltheorieAbfrageSoftwarewartungMultiplikationsoperatorJSON
10:37
KundendatenbankSoftwaretestMigration <Informatik>SkriptspracheInstantiierungResultanteQuick-SortSoftwaretestFunktionalMigration <Informatik>Schreiben <Datenverarbeitung>SkriptspracheIntegralDatenbankProzess <Informatik>Abfragesinc-FunktionEinfache GenauigkeitMultiplikationsoperatorLesen <Datenverarbeitung>Reelle ZahlSoftwareWeb logEinfach zusammenhängender RaumKartesische KoordinatenEndliche ModelltheorieProgrammierumgebungAusnahmebehandlungSoftwareentwicklerInstantiierungMakrobefehlFortsetzung <Mathematik>KontrollstrukturProdukt <Mathematik>Algorithmische ProgrammierspracheFormale SpracheFormation <Mathematik>SchnittmengeCodeSystemaufrufHoaxThread
14:15
LoginATMRechnernetzSynchronisierungCodeRegulärer Ausdruck <Textverarbeitung>Virtuelle MaschineKomplex <Algebra>FehlermeldungWeb logKontextbezogenes SystemMakrobefehlInformationENUMAliasingFunktion <Mathematik>LoginCodeApp <Programm>TypentheorieFehlermeldungDifferenteMessage-PassingCASE <Informatik>ResultanteSchlüsselverwaltungAbfrageQuick-SortImplementierungElektronische PublikationAliasingSoftwareProgrammbibliothekFunktionalAusnahmebehandlungQuaderFramework <Informatik>Konfiguration <Informatik>BenchmarkAdditionKlon <Mathematik>Front-End <Software>FlächeninhaltPunktFlächentheorieDatenbankGeradeZahlenbereichKartesische KoordinatenEindringerkennungInterleavingKomplex <Algebra>Regulärer Ausdruck <Textverarbeitung>Virtuelle MaschineInstantiierungServerTextsystemKontextbezogenes SystemDateiformatSoftwareentwicklerFunktion <Mathematik>MakrobefehlKonfigurationsraumInformationsspeicherungBitrateGerichteter GraphKlasse <Mathematik>MultiplikationsoperatorFortsetzung <Mathematik>Dienst <Informatik>ATMComputeranimation
20:55
FehlermeldungCodeInformationZeichenketteSoftwareentwicklerMessage-PassingFehlermeldungDienst <Informatik>DatenbankMereologieDatenfeldSummengleichungQuick-SortServerGüte der AnpassungCodeMessage-PassingPhysikalisches SystemInformationTypentheorieMinimumPunktSoftwareentwickler
22:43
InformationRechenschieberMultiplikationsoperatorURLVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:05
Hi, there, guys. Thanks for allowing me to speak today. My name's Porv. My handle on the internet is ToyDew. That's how it's pronounced. And today I want to talk about tips on writing a web server in BEYOND. And the BEYOND is hopefully some of these tips can be
00:22
used beyond just writing a web server. So this is what I plan to talk on today. Designing with lifetimes, database management, logging, and code hardening tips. And this URL is actually, I don't know if you can see that, actually.
00:41
Ah, there we go, perfect. All right, so these are the things that we're gonna be talking about. This is my information. It'll be presented later as well. And this URL, GitHub repo, is actually what this talk is sort of based on. And it's working code, front end and back end. So if you want to play around with it, look at more details, please feel free to do so.
01:02
You can also contact me later on. Okay, so designing with lifetimes. One of the problems I've heard people run into is they still don't know how to, like, sort of take the lifetime concept and design systems with it. It's still kind of a weird, maybe new problem. So I wanted to take a stab at looking at how I did it.
01:22
And I didn't come to this solution at first, so we'll sort of work through the problem and then see how it goes. So this is a sample request in my system. And what it's doing is it's taking something from the internet, it's parsing some query parameters,
01:40
it's doing some middleware stuff, just looking at headers, authenticating. It's then constructing some state that it needs. And it's doing that by hitting the database and some external API. And then it's coming back with that and using some sort of algorithm to compute something useful. And then returning that back to the server or the client.
02:04
So the way I thought about this is that let's just sort of go back out all the way, take an overview, and then dive in, and then hopefully that can help us sort of figure out what's going on. So this essentially is your web server in a black box.
02:20
I actually like to sort of organize my code in such a way. So you have a server layer, you have a backend layer, and you have a data layer. The server layer is, again, gonna just be responsible for parsing your query parameters. The backend layer is going to be where your business logic should go. And the data layer should be talking to your database or external API and stuff.
02:42
So next, let's add boundaries between these things. And the dotted lines here represent things that are borrowed and the solid lines here represent things that are owned. So if you look at this, it should become a little bit clearer why this is the case. So the server layer is spawning a function
03:01
in the backend layer, so that lives. The data layer is being spawned on the backend, so that lives. But when the data layer is returning the data back to the backend layer, it sort of goes out of existence, so it needs to pass the data back as an owned object. Similarly, the backend layer needs to pass it back as an owned object when it's passing it back
03:22
to the server layer. So according to this, it becomes kind of obvious that going down the downflow, you need to maybe pass things as borrowed, but going back up, you need to pass them as owned objects. So if you add a little bit, a couple more components to this, the middleware and algorithm and the external API,
03:42
based on this, it might be obvious what these lines should look like. And that's what they look like. Because the middleware is in the same sort of segment as a server layer, it can use borrowed data. The algorithm, same way, it can just borrow data, possibly mutably. And the external API, because it is sort of
04:01
crossing these interface boundaries, or these system boundaries, it needs to at least return data as an owned object. And then if we apply this to our request we saw previously, it ends up, we draw our boundaries, and it ends up being that's the case.
04:22
And you don't have to read all this now, you can look at the slides later. But this is definitely not how I designed my system. I didn't go out and draw the diagram and figure out this is what it needs to be. It sort of came about after a lot of teasing at it. But then after looking at it, after it was done,
04:42
it seemed like this is probably a good approach to doing something if you're trying something new. Hopefully it works. And I think it's probably a better approach than just going at it and hoping it works. Cool, so next I want to talk about database management.
05:01
So in my project I decided to use Postgres. And one of the first things you should be doing, in my opinion, is setting up migrations. So people might be familiar with Diesel. Diesel is a great ORM. There's actually a CLI component to it that lets you manage migrations. And migrations, if you guys don't know or are familiar with
05:21
is essentially a way to migrate or manage schemas for your database. You can also evolve over time, maybe add a new table or new columns. Also, since these migrations are code, you can then code review them and hopefully also maybe use them for testing.
05:40
And we'll actually be doing that later, you'll see. And all it's really doing is, in the actual database, it's creating a table where it's keeping track of which migrations have been applied or which have been reverted. And that lets you sort of see, you can list them out and see where you are in the world of your migration.
06:04
Cool, and the next question I would beg that you might want to ask is, should you be using an ORM or just opting for raw SQL? And in terms of ORM, I'm gonna be speaking primarily about diesel. So diesel is fantastic.
06:20
It will actually validate and type check your queries at compile time. That's huge, and as a Rust developer, that's probably what you want. It will, however, yeah, so actually what that does is it actually gets rid of a whole bunch of runtime errors that you would otherwise have if you have a mistyped query. A con with that, however, in my opinion,
06:42
and this doesn't apply to diesel necessarily, is just an ORM is another framework that you have to learn. It's complex, it can be a whole lot of learning curve, and in my opinion, in the past, I've run into this unnecessarily with diesel, is that if you try to do something very, very complex
07:01
that's maybe not supported by the ORM, you are going to spend a lot of time sort of churning and maybe hacking away at it, and possibly finding a solution. On the other hand, with raw SQL, you can obviously express the entire dialect there, so there's no sort of abstraction that you need to work against or through.
07:21
So it's extremely easy to express what you want here. A huge con with this is that you're now responsible for manually checking that your queries are actually sane and won't run time error. And as an example, this is what my code used to look like. Maybe you can't see that.
07:41
But basically what I'm doing at the select is that I'm actually listing all the column names myself. I'm also listing the table name, and at the bottom here, I'm actually having to extract data by manually indexing into it. And those indexes need to match up exactly with where I'm querying them.
08:02
So if they're sort of not aligned, this is going to just blow up in your face and cause a run time error. And additionally, beyond just sort of having an initial code that works or not, it's really hard to maintain. So imagine you are maybe adding a new column table, or column, and you need to query
08:21
for additional parameters. You now need to change your code in at least two places, and since you're extracting this destruct, you're probably gonna need to change it in like a third place. That just means it's gonna be extremely error prone and very, very difficult to maintain over time. So ideally, maybe we should be writing something like this.
08:40
And this essentially is a user data struct, and there are just two functions on this that are saying SQL fields and SQL table. And hopefully that just populates everything. And then user data, there's a function exposed on it from pro-squares row, and hopefully that just extracts what it needs to do and things work.
09:02
So luckily, this is actually possible today. There's a fantastic create called Postgres Mapper, and it works via procedural macros. So if you attach a derived procedural macro, it'll actually expose a function called from pro-squares row and that simply takes a pro-squares row object.
09:21
When you query, it returns a row object. It will try to convert that to the user data. And instead of just one time panicking, it'll actually return a result for you. So you are forced to deal with any errors that happen, which is actually great. So it's not gonna just panic out of nowhere.
09:42
And additionally, there's another attribute you can add. This is something I actually added to the create, so I'm quite proud of this. It will provide to you just field names and table names. So the table name, it just extracts from that attribute, and the field names, it'll just take all the field names and contact that and make that a string.
10:01
So it is possible to write code such as this and have this be maintained over time, because if you add new fields to your struct, these functions are automatically going to be updated, and you don't have this crazy maintenance issue over time. If you find yourself, a caveat here,
10:21
if you find yourself querying for maybe fewer fields than what's in your struct, I suggest you make a new struct, so almost a struct per query sort of model, rather than trying to hack something together. Cool, so since we have some benefits
10:41
of the procedural macro helping us out, there's still probably more testing we can do, and I actually found that I needed to do that when I was changing my schemas for my databases. So I decided to write integration tests, and this is actually inspired by a blog post. I'm not sure which one it is, so if anyone knows,
11:03
I'd love to credit them. But all this is doing, it's setting up the database, it's running a test, and it's tearing it down. And it's using this panic, catch, unwind function to actually check if the test that you're running panics,
11:22
and it'll catch that, and it'll convert that to a result. And at the end of this, you can just assert that if that result was okay or not, and this allows you to sort of do the setup and tear down before that. There are a lot of caveats to the catch and wind. As a doc say, you should probably not be using it as a try catch, so in languages,
11:42
other languages, it's like a try catch. In test, however, I believe this should be okay. Additionally, catch and wind will not catch all kinds of panics. You should, again, read the docs, but I believe in this code, it should be okay. And you can see, maybe you can see in the below, I'm essentially testing one of my functions,
12:02
which is the actual function, the actual query I wrote. And so I can verify that it actually returns something that I expect it does. Cool, so let's take a look at what the setup function does, because I think actually this is pretty interesting. So I'll sort of read through this if it's hard to read.
12:23
Essentially, we are trying to connect to an actual Postgres database, so we need a real Postgres instance to connect to. This means that we're actually going to be testing against something that hopefully we're going to be seeing in production, so it mirrors our production environment quite nicely. We then create a database,
12:41
and the reason for that is that if you don't create a new database per test, you're going to have to wait for each test to complete. Run your test in single thread, and that's going to take a very, very long time. However, since these are sort of for just tests, I decided that it was okay to create a database
13:00
and just run all the tests in parallel, and runs pretty quickly that way. And then tear down script is simply drop database, so the cleaning process also becomes very nice and easy. And then finally, remember the migration scripts we had before? Well, we can use those same exact migration scripts to actually populate and create our database schema.
13:24
I also have a migration script called fake data that I was using for development. You don't need that as a migration script, but you can use those same sort of migration scripts to sort of make sure that you're testing against the same database that you assume you're going to have in production.
13:40
And you can actually use this kind of model to test migrations themselves. So set up all your migrations, apply the new migration, make sure things are okay, things don't break. So this probably has applications beyond just testing your SQL queries. One caveat here, again, is that ideally,
14:00
you probably want a in-memory Postgres instance, so you're not dependent on a third-party connection or a network connection. If someone knows how to set that up, I would love to talk to you. Cool. Next, I want to take a look at logging. This deserves a talk on itself,
14:22
so I'm going to actually touch upon a couple of concepts you should be thinking about when approaching this. And for that, I'm gonna be using the SLog crate. SLog is a fantastic crate. It's actually, I would say, more of a framework than a single implementation. And it does that by exposing a couple of traits.
14:41
And the trait that, one of the traits that is useful for actually formatting how you output and store your data is the drain trait. And you can use this to sort of specify, okay, I want a JSON formatter or I want a plain text formatter. And this might be hard to read, but actually I'm asserting at the top
15:02
that at debug config debug assertion, which essentially means in debug or development mode, just output this as plain text. But in release mode, actually format this as JSON and hopefully the JSON is gonna be a lot more useful for the logging backend you guys are using.
15:21
Similarly, you can specify async versus sync. And you can also specify file versus sort of shipping this logs off to a network. And all these things are just backed by the drain trait. So it's also possible to write your own implementation for whatever you might need.
15:42
Cool, so next thing I wanna say is that you should probably be making your logs structured. And what structured really means is the log should be machine searchable. So rather than having a whole glob of plain text logs that you now have to maybe write a complex regex query to like filter and search on something,
16:01
you should have them as key value pairs. And the key value pairs can then be used to filter and search your logs. So an example might be, give me all errors that are associated with release 102. Or give me all errors that are associated with HTTP code error like 500.
16:21
Or even like give me all errors associated with the particular request ID if you are tagging that. And then lastly, logs should be conceptual, which means that you should be adding tags and context to the code path that you're hitting. What this means is that as you're passing these,
16:44
your logger, your logger instance is what you're gonna be passing through your code. These tags and context actually end up being additive. So you have a trace of, okay, it hit my server path and it hit my backend path and then it hit my data path. And this is where we're at.
17:00
So I now know I got a network error and it was in my data layer, not my server layer. It's going to make it easier to debug. And slog out of the box sort of supports that because the logger instance supposedly is quite cheap to clone. It's actually doing a pretty,
17:22
it looks like a clone on all the key values. So I don't know how light that is. So you maybe want to benchmark that yourself. But according to documentations, they're relatively cheap to clone and are additive. I do want to mention that when I wrote this, a crate called, a new framework called tracing was not available.
17:41
Tracing also looks like a fantastic option. Tracing out of the box is meant for distributed tracing or distributed logging. And it's not necessarily only for microservices. Distributed tracing is also really good for asynchronous code because you can't predict when your execution,
18:00
what next execution is going to be. And your execution context sort of interleave. It attaches request IDs and keeps, allows you to sort of do that in asynchronous, in an asynchronous application as well as a microservice application. So take a look at that as well. And lastly, I was really, really confused
18:24
why there wasn't line numbers when I had my logs. So I decided to use the line macro and create sort of a simple syntax macro to just wrap that in. And all this is doing is taking the error macro
18:42
and just wrapping that and attaching the line macro functionality. So my logs now have line numbers and I can easily tell which line in my code actually caused the error. And this works because the syntax macro is actually compiled or converted to code at compile time.
19:01
So the line actually ends up matching with where this is used. All right, so let's look at some code hardening tips. This is more about just error handling. And this might be general knowledge, but these are some things that I found are actually helping me make saner code.
19:25
So one of the things is that you should probably be declaring a global app error enum. And within this enum, you should be declaring all the different types of errors you can possibly see in your app. So for me, that's included something like not logged in or database error or not found error.
19:42
Secondly, you should be creating a type alias for your error result. So the success can be returning anything and the error should be forced to return your app error, the fin error in my case. And then lastly, you should be enforcing that anything that returns a result returns the app result.
20:02
And what this is going to do is that it's going to significantly reduce the API, the error API that you need to deal with. So you no longer have to worry about all the different kinds of errors that different crates and different libraries that you're using are going to create. You only need to worry about the fin
20:22
or the app error that you are dealing with in your app. So the function, let's say example, is that a function is doing a SQL query. It could produce maybe a malformed SQL exception, or maybe it times out it's causing a network exception. You no longer have to worry about that. The function is responsible for converting that
20:41
into a fin error. And in this case, maybe a database error is what's most relevant, and you only have to deal with those errors now. So the point of this is that you're reducing the surface area of your errors that you're dealing with. And lastly, I found this to be good practice,
21:01
sort of wrapping your errors, so the errors you return to your users, wrapping that into a struct, and adding two fields at the very least. And there's a balance here. There's a code part to it, and there's a message part to it. The code part is what you're gonna be using for a developer.
21:20
It's information for the developer, so things you can use to extract what actually happened, and the message is information for the user. The balance is because for the user, you probably want to provide them just enough information to tell them what's going on and how to resolve it. You don't want to dive into deeper of like, this is what happened because of this happened,
21:41
and this failed. With the code point of this, you probably want to express as much information so it becomes easy to debug when they come back to you and they report to customer service that this went wrong and this is the code. An example of this is that, you can see at the bottom maybe, or not, there are two types of errors that cause the same error message.
22:02
So a database error or a server error will cause the message an error occurred with the service. It's pretty generic, and the user shouldn't have to care about what happened because there's no way to recover this really. This is a system error that is going on with your system, so there's really not much they can do with it. This should probably return a 500.
22:22
However, the code of these two things are different. So the server error is 20, the database error 25, and these are just values I picked, but with this code, it's easy to tell, okay, so the server error happened, something's wrong with my system, or a database error happened, okay, my database might be down. I should go check that out.
22:44
And that's it for my talk, but let's see if we have some more time. I don't know if we have time for, we have three minutes. I don't know if we have any more questions, but again, you can't see that URL, but this is my information,
23:01
and I'm pretty sure these slides are gonna be available online, so feel free to contact me.