We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

TCP servers in .NET done right

00:00

Formal Metadata

Title
TCP servers in .NET done right
Title of Series
Number of Parts
170
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many people say that the arrival of good garbage collected languages mean you don't have to worry about things like memory management any more. This might be the case for line of business software, but what about when you want to write a TCP server capable of dealing with a decent number of connections? In this talk we'll look at the challenges of TCP servers in C# by converting a synchronous, thread-per-client server to use hipster-compliant asynchronous evented IO and then optimising it not to die from GC pressure.
Server (computing)Right angleResource allocationAbstractionCodeCodeVideo gameDatabaseMathematicsNatural numberComputer programmingSemiconductor memoryField extensionWindowAxiom of choiceDirected graphMultiplicationResultantDatabase transactionNumberFactory (trading post)AbstractionStress (mechanics)Client (computing)Event horizonOpen source.NET FrameworkNetwork socketMultiplication sign2 (number)Right angleService (economics)Slide ruleSampling (statistics)QuicksortSystem callServer (computing)Point (geometry)Social classThread (computing)
Computer programmingType theoryBitConnected spaceServer (computing)Exception handlingError messageSocial classThread (computing)Keyboard shortcutClient (computing)Characteristic polynomialStreaming mediaDifferent (Kate Ryan album)Network socketSocket-SchnittstelleDemo (music)TelnetPlanningPoint (geometry)Rule of inferenceService (economics)Computer animation
TelnetEscape characterInformationServer (computing)Network socketProcess (computing)CodeOrder (biology)Semiconductor memoryType theoryWindowSynchronizationConnected spaceMereologyMoment (mathematics)Projective planeMemory managementSubsetQueue (abstract data type)NumberServer (computing)Range (statistics)Parameter (computer programming)Exception handlingCASE <Informatik>Normal (geometry)Artificial lifeSocial classSet (mathematics)outputThread (computing)Client (computing)Event horizonTrailCondition numberEndliche Modelltheorie.NET FrameworkHookingConcurrency (computer science)Network socketMultiplication sign2 (number)Service (economics)Pattern languageGame controllerTelnetVideo gameRow (database)Lie groupFunction (mathematics)Line (geometry)Directed graphLoop (music)System callConfiguration spaceRevision controlSpeicherbereinigungCuboidData miningTraffic reportingDifferent (Kate Ryan album)Object (grammar)Message passingComputer animation
Execution unitTelnetData bufferFlagPoint (geometry)Natural numberValidity (statistics)Computer programmingSemiconductor memoryFrequencyBus (computing)Profil (magazine)Computer configurationPhysical lawBitConnected spaceLine (geometry)Directed graphGroup actionStructural loadMereologyResultantNumberAreaSystem callRevision controlGoodness of fitServer (computing)Basis <Mathematik>Exception handlingCASE <Informatik>Error messageElectronic signatureComplete metric spacePoint (geometry)Social classIP addressSet (mathematics)WordTraffic reportingBuffer solutionClient (computing)Event horizonChannel capacityCondition numberObject (grammar)Concurrency (computer science)Network socketMultiplication signMessage passingStack (abstract data type)1 (number)CodeEntire functionState of matterProcess (computing)SpeicherbereinigungCommunications protocol.NET FrameworkFerry CorstenHookingArc (geometry)Socket-SchnittstelleSoftware developerComputer animation
CodeType theoryVariancePoint (geometry)Buffer solution.NET FrameworkTupleObject (grammar)FlagNetwork socketMultiplication signMessage passingSocket-SchnittstellePattern languageState of matterComputer animation
CodeConnected spaceCentralizer and normalizerAbstractionData storage deviceEvent horizonBus (computing)Endliche ModelltheorieTotal S.A.Directed graphContent (media)NumberSystem callClient (computing)Revision controlBuffer solutionComputer animation
Escape characterCodeImplementationInformationOrder (biology)Drop (liquid)Total S.A.BitConnected spaceContent (media)MereologyNumberSystem callServer (computing)Dependent and independent variablesCASE <Informatik>AbstractionCross-correlationSocial classClient (computing)Closed setHookingIntrusion detection systemNetwork socket2 (number)Program slicingRight anglePattern languageType theoryBuildingLine (geometry)Directed graphGroup actionCovering spaceWeightProcess (computing)Point (geometry)Communications protocolResource allocationComputer animation
Server (computing)Computer fileNetwork socketBuffer solutionSystem callInformationInterior (topology)Point cloudEscape characterLie groupMemory managementSlide ruleOnline helpElectric generatorMemory managementBuffer solutionStructural loadFilm editingObject (grammar)Computer animation
Memory managementObject (grammar)Curve fittingSoftwareSemiconductor memoryStructural loadMemory managementContinuous functionTerm (mathematics)Operator (mathematics)SpeicherbereinigungPersonal identification number.NET FrameworkFitness functionObject (grammar)1 (number)Core dumpCuboidEndliche ModelltheorieInterrupt <Informatik>Cycle (graph theory)Operating systemComputer animation
Default (computer science)Semiconductor memoryCuboidGroup actionStructural loadAbstractionEvent horizonObject (grammar)Data structureImplementationComputer programmingType theoryVariable (mathematics)BitConnected spaceLine (geometry)MereologyResultantMemory managementQuicksortSystem callServer (computing)CASE <Informatik>Process (computing)Declarative programmingSocial classCommunications protocolCartesian coordinate systemArray data structureResource allocationFrame problemBuffer solutionClient (computing)LengthNetwork socketMultiplication signMessage passingProgram slicingInterface (computing)Denial-of-service attackData managementCategory of beingPower (physics)Complete metric spaceChromosomal crossoverUniform resource locatorRight angleOffice suiteComputer animation
Execution unitRevision controlSoftware testingCodeStatisticsSemiconductor memoryProfil (magazine)State of matterBitConnected spaceLine (geometry)MereologyMoment (mathematics)Physical systemRevision controlData dictionaryCASE <Informatik>AbstractionExterior algebraSocial classData storage deviceCommunications protocolThread (computing)Internet service providerFile formatFrame problemClient (computing)Event horizonKey (cryptography)TupleConcurrency (computer science)LengthNetwork socketMultiplication signMessage passingStack (abstract data type)Service (economics)TelnetData managementInformationType theoryAverageSubsetLink (knot theory)Sound effectOpen sourceECosComputer animation
Clique-widthData bufferNormed vector spaceMaizeDatabaseImplementationInformationSemiconductor memoryType theoryLevel (video gaming)MereologyNumberEmailServer (computing)CoprocessorCASE <Informatik>Process (computing)Vector potentialCross-correlationAuthenticationData storage deviceCommunications protocolThread (computing)Reverse engineeringLatent heatKeyboard shortcutBuffer solutionClient (computing)Event horizonStreaming mediaEndliche ModelltheorieConstructor (object-oriented programming).NET FrameworkConcurrency (computer science)LengthNetwork socketContext awarenessMultiplication signMessage passingSpacetimeGame controllerData structureValidity (statistics)BitLine (geometry)Content (media)System callMassPoint (geometry)Cartesian coordinate systemResource allocationDisk read-and-write headRight angleService (economics)Demo (music)Computer animation
Computer animation
Transcript: English(auto-generated)
So, I guess we might as well start. Hi everyone, my name is James Nugent, I work for a company called Eventstore. I'm actually quite surprised and impressed with the turnout for a talk that's up against Scott Hanselman and Dr Crockford and James Newton King, so thank you for coming. So what we're going to be talking about, so the title of this talk is TCP Servers Done Right.
What I'm going to qualify that with is for some definition of right. We're going to be looking at .NET code, we're not going to get it up to the performance that you could conceivably get out of native code, mostly because I'm not convinced that's possible, but specifically we're going to be looking at stuff that works well on Windows,
and I'll point you towards some stuff that works well on Mono as well. And secondly we're going to spend most of the time looking at async TCP, which may or may not be the best choice for you. There's a debate going on, or that's been going on for quite a while, it's basically settled I think,
about whether blocking calls with sort of a request per client, sorry with a thread per client, or asynchronous IO is the best way of dealing with things. And it turns out that if you want the best performance for one client, and you have a relatively small number,
you may actually be better off with the request, with the thread per client. I'm not going to show an example of that, but it's a fairly trivial extension to one of the examples we will look at. If people are interested in the code that we look through, then I can stick that up on GitHub, and the big thing we're going to look at at the end is open source anyway, and that's already on GitHub,
so I can point you towards that. So basically what we're going to go through is refactoring the same code basically, from being a basic synchronous server which will only deal with probably one client at a time, and then taking it through, making it deal with multiple clients, and then looking through the memory implications of what we're doing,
and considering how that will scale up to multiple clients, or lots of clients very well. And then we'll try and wrap things up in a slightly nicer abstraction for the purposes that we need. And then we're going to skip away from sample code and go and look through some actual real-life code
of stuff that's in a database server capable of doing 60, 70,000 transactions a second over TCP, which is the event store. This is going to be very code heavy, so in fact I'm going to lose the slides for now and switch to code. Actually, there's one more. So, the basic abstraction in .NET for all this stuff is the socket class.
It covers most of what we need. There are various people who've tried to wrap native things like libuv, etc. in managed code, and call it directly. They've had variable results. It seems that there is some mileage in that, but we're not going to really look at that today.
We're going to stick with the native .NET stuff, which you can get to a reasonable throughput if that's what you need. So, let's switch over to code because that's more fun. The first example I have here is just a... So the program is just calling this server class and running it on the main thread.
And what we're going to do is use the socket class. We're going to bind to this endpoint, so we're going to bind to 127.0.0.1 on port 11000.
And we're going to bind for IP, and we're going to use the stream socket. There are various other types here, so we could go and use, for example, we could go and use datagrams and that kind of thing. But we're going to stick to stream stuff for all of our demos today. And we're going to use rawtcp.
So, running synchronously, there's these two methods. bind is going to set up the socket to bind to the given endpoint that we've defined up here. And then listen is going to start listening for connections.
And then, a little bit of error handling, we'll talk more about error handling later on. And then we're going to go into this main loop, which is going to keep running until we terminate the server. And all that's going to do is accept a connection. So there's a difference between listening for a connection, and then when one appears, we need to accept it to get a hold of a socket that we can talk to.
And at that point, what we're going to start doing is listening for data that the client sends, and immediately sending it back to them. This is going to be a pretty horrible echo server, because it's going to do things literally as we type. And then we're going to, we're eventually going to close the client socket if they send nothing.
So if we run this for a few minutes, we can see what some of the characteristics are. So I'm going to use telnet as the client for all of this stuff, because it's nice and straightforward. So if we start a session, then we can see we're handling this client at the moment,
so we're somewhere here in the code. And if we start typing away, hello world, then we start to see that our output, exactly what we type, is being echoed back to us, which is kind of what we'd expect. If we fire up a second one of these, then, as we're probably expecting, nothing's going to happen.
We're not going to see we're handling another client, and if we type in here, nothing's going to happen. In fact, if we go and terminate this client, so if we use, what is it, control, right bracket, and then Q,
we will actually see that this one is eventually connected, because it's sat in a queue. And the telnet client is going to buffer all the input that we've given, and it's going to send it across when it finally does connect, which is nice, so we get back our whatever random stuff that I typed into that box.
But we can see that this is fundamentally one client at a time, which might be occasionally what you want. If you're doing thread per client, then that's basically exactly what we want, and this is the exact model you'd use for that, but for spawning up a new thread on every accept.
But as it is at the moment, we can only really handle one client at a time, and everything's being done on our main thread, which is probably less than ideal. So, in order to get around that, control C out of it to get rid of it. So, to get around that, we need to start somewhere up here.
So as with most methods on this class, there are two different variants of it. There's accept, and there's also acceptAsync. And acceptAsync is going to take, in fact let me open up the project that's got that in.
I decided that live coding this stuff was a very bad idea, so most of the examples are canned. So, what I've done here is just split it out into, oh it's still in the same kind of thing. So, when we listen, after that we're going to start accepting for some number of concurrent accepts,
and that's just a constant that's defined up here, so I think we're on one at the moment. You can make it any number that you like, there's a trade-off between how long your queue is,
and whether you want clients to wait for a long time, or just disconnect. But when we start accepting, what we're going to do is create this class which gets given as a parameter to the acceptAsync. One of the things that defines is, it has various things that we can define on here.
One of the things that it defines is this completed event handler. And what that's going to do is call our delegate when we get a client connect.
So, what we're going to do here is just hook it up to something that's going to handle it, which we'll look at in a second. And then, because it's an async call, it's possible that it will return immediately. For example, if our queue is backed up of clients trying to connect, the chances are this is going to return immediately, and we'll never get the async fired,
because that's how the pattern works. So, if we don't fire async, then we're just going to call that synchronously. It also could be the case that a client opens a connection and then disconnects before we have time to service it, in which case there's various conditions that can lead to these object-disposed exceptions.
There's not really much you can do about it if this throws when we try and close the socket down. So, in this case, I've just got a little helper method called eat, which will ignore that without having to appease ReSharper everywhere I try and use an empty catch.
So, when we get a connection, what we're going to do is close it immediately, basically. We're going to see if it was successful accept, then we're just going to log the fact that we accepted it,
and then we're going to close down the server. And we're going to look at where the connection came from as well. And then what we're going to do is start accepting again immediately. So, this is what will cause the thing to loop and continue to allow us to service clients. So, if we run this, it will be the world's most irritating server, because you'll connect to it and then it'll immediately disconnect.
I occasionally suspect that that's how they implemented iMessage, by the way. But, if we do this, then we'll see telnet 127.0.0.1.11000. What we should see is that we started processing an accept from this, and then we immediately closed it.
So, what's this? So, this is one of the ports that gets used for accepting client connections. You can configure the range of ports which will be used to service client connections, and in fact how many ports are available for that, which determines in part how many concurrent clients you can deal with.
You can do that, there's Windows settings for doing that, which .NET will, as far as I'm aware, will obey. And, we immediately closed the thing anyway, so the telnet client has detected that and it's just shut itself down. If we could keep it open for long enough, we should be able to, if I hadn't blocked the server by highlighting it,
then we should be able to keep doing that. And, if we were staying open for long enough, we should actually be able to have multiple clients connect to this, because we're not blocking the thread as it's waiting to receive.
So, that leads us to a few things here. The first is, if you imagine that we were expecting quite a few clients to be accepting, then it would be pretty poor to be allocating these things every time that we...
Every time we want to accept a new client, we have to allocate one of these on the heap. And, as far as I'm aware, that's a class. Maybe a struct. I don't think it is. It's a class. Goodbye, ReShopper. So, we probably don't want to be doing that.
So, one of the common patterns that we're going to see here is the need to do some manual memory management. Who here works with .NET all the time? Who remembers being told, like, when .NET came out that garbage collection meant that you don't have to deal with memory yourself, ever? And, you don't have to do... Right, okay. And, that's like the most dangerous thing ever, right?
All the normal memory management patterns still apply, it's just that you don't have to keep track of things yourself all the time. It turns out, if you're into this kind of stuff, and in particular if you're doing a lot of native calls, then actually you do have to worry about this stuff. So, I'm going to pull up the next thing here, which has a slightly different version of that.
And, what we've introduced here is the idea of a pool. And, what that's going to do is determine some number of these objects that we think we're going to need over time. And, it's going to allocate a whole load of them in advance.
And, then it's going to make them available via a pair of methods. One is get, which you use when you want to get one of them. And, the other one is return, which you call when you're done with it. So, in other words, you say, I want to use this object temporarily. And, then you use it for the duration of the period that you need. And, then you return it to the pool and somebody else can use it.
It's implemented in a fairly straightforward manner. They're sitting on a concurrent stack, which is, if you look into the memory usage profiles, a concurrent stack is marginally better than a concurrent queue for it, just because you tend to reuse the same ones more often.
And, when we create the pool, what we're going to do is go through and just create a whole load of them. And, we're actually not going to create them directly in here. We're going to allow the consumer of this class to pass in a thunk, which will determine how the arcs get created, which we'll see is going to be useful in a second.
And, then when you get one, you take, you know, you pop the last one off the stack and when you return it, you push it back on. It's a fairly straightforward class. If you're using Mono, this is one of the areas where you need to pay a bit of attention, because the concurrent stack is not very concurrent. This may be fixed now, but certainly as of 3.2.8, I think it was, it was still not very good.
I can show, I can post some ways of getting around that that we use in the event store. So, what we can see if we go back to our server is when we start up, we're going to create one of these pools.
And, the initial number that we're going to use is basically the number of concurrent accepts, the number of concurrent accepts we're going to allow, times two, on the basis that that way there should always be enough on the stack. You know, even if everyone was taken by an accept that's running, by the time we want another one, there should always be one there, because it'll be returned pretty quickly.
And, the func we're going to use, it's actually a method rather than a func, but what we're going to do is hook in our own delegate into this completed, into this completed event. Other than that, the code is pretty much the same.
We've just factored out the exception handling code. It's very important to make sure that we actually return the thing to the pool. So, even in the case of errors, we really want these objects to get returned to the pool and not lost in the depths of time. So, whether we're successful, or whether we are in an error condition,
we need to make sure that we haven't left any of our own data in these objects which are going to be reused, and we need to return them to the pool. So, in this particular case, when we use acceptAsync, that's going to set the acceptSocket so that we can do further work with it. Now, when we return this to the pool, we don't really want it with somebody else's acceptSocket on it,
because apart from anything else, it means that will never get garbage collected. So, we need to be very careful to set that to null wherever we return these things to a pool. Apart from that, we've changed the server a bit so that it doesn't just disconnect, because that was really annoying. So, what we've done instead is in our little server,
when we construct this thing, when we start listening, sorry, we're going to take an action, which is what should we do when the socket gets accepted. And just for convenience's sake, we're going to give the action both the remote socket, so that we have some easy way of identifying it just temporarily,
and we're also going to give it the actual socket object itself so that it can make calls down to it. So, if we go back to our program, which is the eventual consumer of this class, what we're going to see is that we... I'm running it on a background thread, just so that I can exit it more easily.
We're going to create this, we're going to create the server, and when we start, we're going to pass it this handleConnection delegate. And for now, all that's going to do is, well, it's not going to do anything, it's just going to sit there with the connection open. And this should prove to us that we're actually capable of accepting multiple clients at the same time.
So if we run this, then... Oh, telnet, I don't know why these come up so big. So we're going to sit there, we can see we've got one client that's been accepted from that ephemeral IP address, ephemeral port,
probably got to start another one, and if I'm not mistaken, it should connect. So we're capable of now handling multiple connections at the same time, which is an improvement on our synchronous version. And we're also not, we're not causing horrific memory problems yet.
So, there's actually, there's an interesting tidbit that if you ever get hit by it, the chances of you ever getting hit by it are fairly minimal. If you are, it will troll you for an entire week. It's entirely possible that if you're making lots of connections and breaking them again in a very fast cycle, you can see it on the state chart,
that you can end up connecting to your own TCP port, which will connect, and it will look okay, and nothing will work. We literally spent about a week looking for this in the events.codebase. It was one of the worst trolls we've had in the entire development process. Anyway, so we've got multiple connections.
So if we go through and break some of those, now we can exit our server properly as well. So, let's go make it do something a bit more useful than this. Fortunately, because we've got the socket,
we can start to use something like client, sorry, not client endpoint, we can start to use client socket. And then, okay, so what do we do? So, there are a few options here. Basically what we want to do is when a client starts, we want to start listening for data, because they're, you know, the whole point of an echo server is they're going to send us something,
we're going to send it back. And I'm going to use echo all the way until the end of this talk because it's a nice easy protocol that everybody understands. So, we have a few options here. We have receive, which is fine. That's going to block though, so we don't really want to use that because we're still running single-threaded here, remember.
We've got receive async, which is going to take one of the socket event objects again. That's a valid option, but that's using the asynchronous pattern from .net. I think that was introduced with .net 2 or something like that. Instead what we're going to look at is begin receive.
And this is generally a lot more useful. If we look at the signature for this, if it will stay up during my zoom, which it won't, then basically what we've got to give this thing is a buffer to write into.
So, we're going to say to the socket, start receiving data from this client and put it here. And we're going to give it somewhere to put the data. So, I'm going to not implement that here, I'm going to go on to the next one
because that way I don't have to do all that. So, let's look at number 4.
This one now has a lot more files in it. So, the problem with using the begin receive thing, in fact if I go and bring it up we can actually go and type some of that code just to see the motivation behind wrapping this up in a slightly nicer model.
If we actually go and do this, we could say here that we're going to allocate a byte buffer of say, well, var buffer equals, just say new byte,
and let's say we're going to give it a kilobyte buffer to write into. And then we're going to do client socket.begin receive and we're going to pass in our buffer and we're going to give it an offset into our buffer. So, we're going to say 0 in this case. And we're going to say we're going to receive 1024 bytes with no flags.
And when we're finished, we're going to have a receive completed and we don't need any state. Actually, we do need some state, we need the socket. So, if we go and generate out this method,
this is just using the standard pattern that's been there since .net 1.1 or 1. It's been there for a long time anyway and a lot of stuff implements it. The point is when we get this, we can use, we can get our socket back via, you know, socket, ar.asyncState.
And then, you know, we could go do something with it. The problem is what we don't have is our buffer. So, you know, we could go and write a tuple here or we could go and write our own custom object to put into the state. The point is, once you start implementing this pattern,
you end up with writing quite a lot of code that's quite convoluted. Instead, it would be nice if we could interact with the thing using, basically registering our own callbacks that are for complete messages being received over TCP. And where we could enqueue sends rather than having to block waiting for them, etc. So, this model, I've basically pulled some of this code out of the event store code base.
The central abstraction we've got is this thing called a TCP connection. And what that represents is one client. And it has some methods that we can use. So we can use enqueueSend if we want to send something.
Or we could do a trySend if we think it might fail for some reason. And we have our own version of receiveAsync that's going to take our own callback so that we don't have to bother passing around these buffers. Internally, that's exactly what it's going to do.
But let's see how we'd use this thing. I'm not going to go into the implementation details of that. It's actually a lot more complicated than it needs to be for our simple purposes here because it does a lot of monitoring on the connection. So it will track, for example, the total number of bytes written in red to and from a connection. And it will monitor various other aspects of it that we needed for other things.
And pulling out that code was just too painful, so I left it in. But if we go and look at how our server actually uses this, what we can do is write these nice callbacks instead. We can say, once our connection has been accepted, which is exactly what was happening before,
we're going to create a little abstraction over this client. And we're going to give it an identifier so that we can get back to it easily later on. And we're going to give it the endpoint and we're going to give it the socket. And we're going to use verbose logging for now, so that we can see what's going on underneath.
And at that point, when we do receiveAsync, we're calling receiveAsync on our abstraction rather than on the actual socket, which is a bit of a clumsy API. So what we're going to be able to do is say, is this. So when we receive data, we're going to get back an IEnumerable of array segment.
Who's seen the array segment class before? Oh, sorry, the array segment struct before? Okay, cool. So the array segment struct is basically a flyweight over an array, or, well, actually, over an array. It's a way of passing around a reference to it, which you can iterate over, or you can, let's go look at it.
It's basically a way of allowing you to iterate over the contents of an array, or over a part of an array, and treat the whole thing as if you had an actual slice of an array, if you like. So, because we want to be nice and efficient underneath,
we're going to be writing into pre-allocated arrays, which we'll look at in a second. So we're going to get some data back, which is going to be a reference to, you know, we don't know where it is. It's not in a buffer that we've allocated anymore. And all we're going to do is send it back and start receiving again. This is basically the pattern of most servers, right?
They're going to wait for data, they're going to wait for a request, they're going to process a request, they're going to send a response back. That's the case for most protocols. The protocol that we're going to look at right at the end is actually not like that. It's fully asynchronous. So it sends responses out of order with correlation IDs on them instead.
But now we're basically back to our echo server. And we've added this extra thing which will hook up a connection close thing and just log when the client drops. So if we go look at this, and run it for a second, we should be able to get a client up. And we should be back where we were with our synchronous port number.
We should be back where we were. So now if we type, we should be getting a data echo back at us. And then if we close, we should see some information about our socket.
We've been monitoring various things about it, like the number of same calls. This can be useful during debugging. So, how does that help us? Well, one of the things that we need to be very careful of,
I'm going to go back to slides very briefly, and this is the last time, I promise. One of the things we need to be really careful for is heap fragmentation. We called a lot of begin receives. Now, begin receive is an interesting thing, because we have to pass it a buffer. And when we started this, we were allocating our own buffers.
We were saying, here's a new byte array that we're going to declare, and it's going to be allocated on the heap. And we're going to pass it to this method. And we were passing relatively small buffers. You know, there were 1024k. So they're probably going to be in generation 0 or in the nursery if you're on the monogarbage collector.
So the problem with that is, if we go and look at our heap for a second, let's imagine we have a whole load of objects. And the red ones have been passed to a begin receive call. Oh, I'm using the terms from the monogarbage collector there, rather than the .net one, but yeah, same principle.
The problem is, begin receive is basically a wrap around a native call, right? It's going to do some interop, and the actual network stack is going to have to be writing into this memory. So what happens is the garbage collector pins that memory, which says basically you're not allowed to move this when you do the garbage collection.
Who's heard of the concept of pinning memory before? So, the problem is, if you've got pinned memory, then you can end up with really badly fragmented memory. If you imagine that we go through, you know, all these other objects, which are grey, are basically free to be collected,
then when we go through our next garbage collection cycle, our heap's going to look like this. Now what happens when we want to allocate an object of this size? You know, it won't fit in any of those gaps. You know, there's plenty of free space for it, but there's no continuous space big enough for this object to fit, because we've got a load of things that we're not allowed to move,
because the operating system's expecting them to be in a certain place. Unfortunately we're not allowed to do this, and just squeeze some memory into a box that's just about the right size. So, how can we deal with that? Basically the abstraction that we want to use
is very similar to the socket event args pool that we looked at earlier. What we want to do is pre-allocate a whole load of buffers that we can use, and we can distribute out amongst calls. Ideally we want to be allocating a lot of memory, because if we do that it'll end up on the large object heap, where it's less likely to cause us problems with transient objects.
So, let's go and look at the implementation of that. We have this thing called buffer manager, and what that's going to do, when I get down to the meat of it, is go through allocating chunks of memory,
and they're going to be chunks that are on the large object heap, so it's basically going to declare byte arrays big enough, so bigger than 73 kilobytes on the CLR, big enough to be stuck in the large object heap, and then it's going to chunk that up into, well, it's going to create an array segment over this,
which is big enough to be useful. Basically, when we do these begin receive calls, we basically always want the same buffer size, right? We're going to decide what a useful size for our particular use cases are, and we're going to start,
and you could do it slightly differently, so that you can have variable size buffers as well, but for the most part, we're going to want fixed size buffers, and we're going to stick them again onto a queue, onto a stack, sorry, so that we end up reusing them. And we're going to, in this particular case, this implementation of it also allows it to allocate further memory if necessary,
so if we're doing so many calls that we've run out of the initial amount that we allocated, then we can allocate new slabs further down the heap, but this array segment thing allows us to transparently give out slices of the big pool of memory we've allocated to all these begin receive calls. So now when they pin it, it doesn't matter,
we're not going to get heap fragmentation as a result of that. Unfortunately, it's quite a pain to, it's another motivation for having this TCP connection abstraction, is it's quite a pain to go through and make sure you return all the buffers when you're finished with them. Quite often you're going to be holding onto them for a while
until you've got enough to satisfy what you're actually listening for. When you go through implementing protocols, for example, the AMQP protocol is sent on the wire with frames which are delimited by, you know, you send the length of the frame and then you send the actual data, and you wait until you've got a whole message
before you start processing the message further downstream. So this is quite a useful place to start buffering that. So if we go and look at a protocol a little bit like that,
I think this is the last one of these I have, what we're going to do is make our echo server a little bit more user friendly to people typing into telnet. And rather than sending back every character, what we're going to do is use a horrible protocol which is really open to denial of service attacks. And we're going to wait for new lines in our data before we send it back.
The reason that that's a horrible protocol is because it's unbounded us the amount of time we're going to wait before we send anything back, meaning a particularly malicious client could just send us a load of data and fill it up. So what we have instead, what we're going to do, we have this interface called a message framer. And what that's going to be doing is responsible,
it's going to be responsible for looking into the data we've received through our calls and deciding when we've got a message that's complete enough to send it on to the processing part of our application. And our particular implementation of this, what have I called it? I've even called it crappy temporary framer.
So what that's going to do is try and parse every segment as it comes in. I'll look at where it's hooked up in a second. But basically we're going to copy all the data into our own array. This isn't going to be, this one isn't going to be pinned so it's not such a problem that we're allocating it.
It might still be better to not do that for other performance reasons. And then we're going to assume that we have a string and we're going to see whether it's got a new line in it. And if it has, we're going to call the handler that we've registered with this class to say, here's a message that we've got, this is good enough that our application downstream is going to start processing this instead of,
instead of keeping it internal to the TCP transport. And I haven't implemented frame in this case. So our callback is just a, you know, here's all the data, go do what you want with it. If you're implementing a protocol that has a known wire format, maybe you probably have some kind of thing that's capable of deserialising the messages
or translating the message into some internal structure that the rest of your program can deal with. So let's look at where this is hooked up. It's actually in our basic echo server class again. Where are we? It's in this on data received.
The first time that we get some data on a connection, what we're going to do is decide whether we've already seen data for that connection before. This is a really good reason why we gave the connection an ID earlier. We need to be able to track, if we're going to sort of do this, if we're going to do this asynchronous re-entrant type stuff
where we're not guaranteed we're going to get a whole message in one callback because apart from anything else, we're only receiving 1,024 bytes at a time, it could be over many calls. We need some way of correlating the messages together by socket. So by giving the connection an ID,
we've come up with a key that we can use for that and that we can easily get back. So as part of our little protocol here, we're going to, as part of our little protocol implementation, we're going to have a concurrent dictionary of GUID, which is what our connection ID was, and then we're going to link back to the actual connection and we're going to link back to the current framer,
which is going to be responsible for knowing, for storing the state of the messages being sent over that particular socket at the moment, albeit that have been received over that particular socket at the moment. So when we actually get some data, we need to, we can go and look up in that dictionary
whether or not we already have a framer for it, and if we haven't, we're just going to register a new one. Now I've stuck a nasty little hack in here because our echo protocol doesn't actually have any kind of way of correlating back to a client because we're just receiving ASCII over it, or UGF8 or whatever we're sending over it actually.
So I've chosen to make it that whenever we start a new one, we just serialize that little client ID back into the message so that the downstream provider can get it when they need to. Otherwise, all we're going to do is go through and unframe the data every time we get it,
which in our case was going through looking for new lines and deciding whether or not we've got a good enough message that we can pass it on downstream. And then carry it, go back to receiving so that we can get more if we need to. So when we get a complete one,
this callback will be called by the message framer. And what that's going to do in our particular case, we know that we can get the connection ID back. All we're going to get is the data, right? We're not going to get some kind of tuple of the data in the connection, although we could do that as well. It's just a bit harder to go through and implement.
So when we've got a complete message, we can go and get the socket back. And then just enqueue the data, we can go and enqueue a send of whatever we were sent in the first place. And then our abstraction in the background will go and send that over the socket and make sure that everything's good there.
So if we run this version of it, we should now be able to deal with multiple clients. So if I do telnet 127.0.0.1 11,000, then we should be able to type, and we won't necessarily see anything at the moment
because it's not actually echoing yet. But if we press enter, we should get our text back. Now we see it. I don't know why the Microsoft telnet client does that, but apparently it does. And again. And we should be able to deal with multiple clients. So this one.
Yep. And our messages end up being correlated back to the right connection because of that little lookup dictionary we had. And if we quit, we'll see the same stats about each connection. So we've gone from having something that's entirely blocking
only dealing with one client at a time and doing horrible things for our memory to being able to deal with multiple clients without doing anything nasty to our memory profile All still single-threaded, right? All of this is still running on one thread.
So let's go and look at an actual protocol, an actual use of this sort of stack. And I'm going to go into the event store code base here. This stuff is all open source. It's all on GitHub. Oh, hold on. Is that the right thing?
I know it is. Most of the classes that we were just looking at came straight out of this code base. They've been tested reasonably heavily. So in this particular case,
rather than having a basic echo service or something like that, what we have is a TCP service. And it's capable of listening to a few messages. We're messaging all around this thing internally. But it's capable of listening to a few messages which tell it when the system should be starting,
when it should be shutting down, that kind of thing. But the important thing here is when we get a connection accepted, we start up with one of these TCP connection managers, which is basically the same abstraction we had earlier, and we start receiving on it.
In our particular case, our protocol, our on-the-wire format is very similar to the AMQP one. We send frames of messages, or sorry, we send delimited messages where we send the length of the message first, and then we send the actual data of it. The data happens to be a serialized protobuf. The protobuf was perfectly fast enough for us,
and there are various, there's plenty of other alternative formats for that. So in our case, the message framer looks slightly easier. It's this length prefixed one. And what that does is exactly what we were doing earlier, but rather than looking for new lines,
it's looking for us to have enough data. And it's also looking for the data to be valid in some ways. So there are various, we build into the header something like, we know that the first n bytes is going to actually not be a serialized protobuf. It's going to be a message ID. It's going to be a correlation ID. It's going to be some authentication information
and that kind of thing. So we deal with all that here. But then, the complete thing that we deal with is actually, where is that method used? That's the easiest way of getting to it. The on message arrived callback is going to look very different.
What we're going to do is package it up into this message type here. Which is going to go on to, it's basically responsible for transporting all of the information about the particular message that we've just got,
which we've termed to be a package. And that's going to go on to the next stage in the process which deserializes the protobufs and then sends the message on to the next processor. So it will enqueue it for something else to deal with, depending on the type of the message that we see.
The same thing happens in reverse for sending. So there are a number of TCP send services, which are basically responsible for handling other applications,
for other parts of the application, saying we need to send this thing over TCP, then it will deal with getting it down the right socket based on the client's ID. So, this model tends to work quite well if you're trying to deal with long running clients
and you're trying to deal with lots of them. It's probably not so great if you're trying to deal with lots of transient clients, so it's probably not for example the best way of writing an HTTP server that's supposed to deal with lots of clients. It's not bad if you're trying to build a database client, it works well for us.
With that, and kind of out of demos, does anybody have any questions about all this kind of stuff? Or, if not, then... Oh, sorry, go ahead.
Okay, so the question is, if you compare it to WCF's net TCP binding, then what might be the advantages of doing this? So the first advantage is that you're not tied to the SOAP protocol, which WCF is. So, it may be that... Huh? Okay.
So, my understanding of the .net TCP binding was that it would basically be dealing with your... The only advantage you're going to get from it is for it to marshall your messages for you, and deal with the SOAP headers, right?
Maybe I'm misunderstanding that. Okay, so if it's doing a thread per request, then what you may end up seeing is actually a lower latency per request, but the chances are you'll be able to deal with a lot fewer concurrent clients well, and you'll degrade worse under high load, because there will be a whole lot of context switching
between client threads. So I'm not entirely familiar with the underlying implementation of the WCF bindings. Last time I looked at it, it didn't appear to be that much use for just implementing general protocols. That may not be the case anymore. Oh, sure, if you have control over the client and the server,
and you want them to be communicating like that, then it may be a perfectly good choice for you. You probably have to measure it in your specific use case. I'm not familiar enough with the internals of that to be able to tell you. Are there any others? Great. Oh, yeah.
So actually one of the other things that I should go through and point out, there is a type in the event store called a buffer pool stream, which will use those pre-allocated chunks
in an implementation of stream. So if you're doing, for example, lots of XML manipulation, so you're deserializing a large XML document, it may be useful to do that to avoid the same kind of memory issues that you see with that.
Potentially. So the array segments themselves aren't because they're structs, and I think the concurrent stack is going to allocate, it's going to use a pre-allocated, it's an array back thing, right? So that's going to, if you know the size roughly
that you're after up front, then you could specify that to the, you could pass that to the stack constructor and it should allocate enough space to not be churning too much, I guess. Cool. With that, time for a beer.