We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Transmission Control Protocol

00:00

Formal Metadata

Title
Transmission Control Protocol
Subtitle
TCP/IP basics
Title of Series
Number of Parts
165
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
TCP/IP is the most widely used protocol on the Internet for transmitting data. But how does it work in detail? This talk will explain the TCP protocol, from handshake over established to teardown in detail - and elaborate a bit on protocol adjustments over time and congestion control.
Keywords
Musical ensembleInformation securityAreaProgramming languageCommunications protocolGame controllerSelf-organizationData transmissionSoftwareFlow separationLecture/Conference
Musical ensembleBitWebsiteSuite (music)Client (computing)Data transmissionInternetworkingCommunications protocolGame controllerComputer animationLecture/Conference
WebsiteServer (computing)Web browserLaptopServer (computing)Web 2.0Web browserCommunications protocolWeb pageSoftwareInformationNetwork topologyBitComputer animation
InternetworkingTelecommunicationRouter (computing)SatelliteInternetworkingRouter (computing)SoftwareNeuroinformatikServer (computing)Fiber (mathematics)SatelliteLink (knot theory)NumberCommunications protocolConnected spacePhysicalismLaptopMusical ensembleArrow of timeComputer animation
Computer networkLocal ringLink (knot theory)InternetworkingOSI modelTelecommunicationRouter (computing)SatelliteLink (knot theory)Task (computing)SoftwareLaptopShared memoryRouter (computing)Local area networkLecture/ConferenceComputer animation
InternetworkingTelecommunicationSatelliteRouter (computing)Local ringComputer networkLink (knot theory)OSI modelAddress spaceTask (computing)RoutingServer (computing)CASE <Informatik>Link (knot theory)Shared memoryInternetworkingIP addressHypermediaOrder (biology)Flow separationMultiplicationSoftwareAbstractionRouter (computing)DiagramAddress spaceComputer animation
InternetworkingOSI modelLink (knot theory)Connected spaceSource codeLecture/Conference
Connectivity (graph theory)Source codeTransportschichtStreaming mediaUDP <Protokoll>UDP <Protokoll>AbstractionCommunications protocolOverhead (computing)BitStreaming mediaCategory of beingData transmissionGame controllerComputer animation
Transport Layer SecurityAnwendungsschichtConnected spaceStreaming mediaCartesian coordinate systemDirect numerical simulationTransport Layer SecurityComputer wormLevel (video gaming)Focus (optics)Lecture/ConferenceComputer animation
InternetworkingTelecommunicationRouter (computing)SatelliteFiber (mathematics)LaptopLink (knot theory)Wide area networkDataflowDiagramLaptopCartesian coordinate systemServer (computing)Router (computing)Web browserOptical disc driveGroup actionInternetworkingWeb 2.0Right angleProcess (computing)Connected spaceTelecommunicationInformationLink (knot theory)DataflowComputer animation
InternetworkingLink (knot theory)InternetworkingCartesian coordinate systemEmailOrder (biology)Link (knot theory)Single-precision floating-point formatMultiplication signProgram flowchart
DataflowWide area networkFiber (mathematics)Link (knot theory)InternetworkingGreatest elementCartesian coordinate systemInformationEmailInternetworkingComputer wormCASE <Informatik>Computer animationProgram flowchart
Data typeUDP <Protokoll>LengthEmailFlagSystem identificationAddress spaceComputer wormComputer configurationSource codeInformationMessage passingError messageICMPEntire functionIP addressOrder (biology)Link (knot theory)BitEmailCountingCartesian coordinate systemFrame problemNumberTotal S.A.Communications protocolRevision controlMultiplicationError messageLengthComputer wormSource codeLaptopServer (computing)Type theoryMultiplication signRouter (computing)Different (Kate Ryan album)Field (computer science)System identificationFlagProduct (business)Message passingGame controllerWordUniqueness quantificationInternetworkingComputer configurationTimestamp
InformationMessage passingError messageICMPCarry (arithmetic)Computer programmingOperator (mathematics)InformationBitMultiplication signConnected spaceComputer animationLecture/Conference
Source codeOverhead (computing)EmailUDP <Protokoll>Network socketClient (computing)Programmer (hardware)Order (biology)Computer programmingRobotBitEmailIP addressInformationSource codeFrame problemEntire functionComputer wormField (computer science)LengthOverhead (computing)Client (computing)Stack (abstract data type)Socket-SchnittstelleComputer animation
Network socketUDP <Protokoll>Client (computing)Functional (mathematics)InternetworkingFrame problemSoftwareNetwork socketBuffer solutionMaxima and minimaFamilyClient (computing)Computer fileServer (computing)IP addressDataflowLatent heatKeyboard shortcutSinc functionType theoryAddress spaceComputer animation
Network socketUDP <Protokoll>FlagComputer configurationComputer wormEmailSource codeWindowSequenceVariety (linguistics)BitCommunications protocolEmailAutomatic differentiationNumberComputer configurationFlagWindowBuffer solutionSpacetimeComputer wormField (computer science)SynchronizationSingle-precision floating-point formatOrder (biology)Arc (geometry)SequenceLengthSource codeComputer animation
Network socketClient (computing)Loop (music)Client (computing)SubsetConnected spaceComputer fileNumberFunctional (mathematics)Network socketAddress spaceServer (computing)Physical systemIP addressOrder (biology)Socket-SchnittstelleFunction (mathematics)Broadcast programmingFamilyPoint (geometry)Task (computing)Flow separationThread (computing)Keyboard shortcutBuffer solutionLoop (music)Latent heatSystem callStreaming mediaComputer animation
Virtual machineFlagComputer configurationComputer wormWindowEmailSource codeSequenceSheaf (mathematics)Order (biology)NumberSequenceEmailInformationState of matterLatent heatConnected spaceImplementationSoftware testingServer (computing)MereologyFinite-state machineDiagramProgram flowchartComputer animation
Client (computing)Random numberSequenceNumberRandomizationSequenceConnected spaceServer (computing)NumberFlagClient (computing)EmailNetwork socketRange (statistics)State of matterComputer fileBlock (periodic table)32-bitSystem callArc (geometry)Selectivity (electronic)SineIntegerSynchronizationComputer animationDiagram
Random numberSequenceNumberSequenceNumberFlagMultiplication signPoint (geometry)BitSineMereologySinc functionComputer animation
SpacetimeData bufferEmailDeadlockComputer networkRead-only memoryKernel (computing)DataflowControl flowÜberlastkontrolleData recoveryBit rateMaxima and minimaWindowScaling (geometry)Extension (kinesiology)SpywareWeb serviceWeb 2.0Cartesian coordinate systemPoint (geometry)Strategy gameLink (knot theory)Order (biology)Field (computer science)Router (computing)Connected spaceWindowExtension (kinesiology)Server (computing)Scaling (geometry)ÜberlastkontrolleSequenceNumberSemiconductor memoryImplementationDenial-of-service attackClient (computing)BitBuffer solutionSoftwareSpacetimeBit rateDeadlockMaxima and minimaMultiplication signInformationEmailFlagKernel (computing)Web browserDataflowGame controllerFrame problemMeasurementLoginVotingLevel (video gaming)Program slicingComputer animation
Computer networkData modelTheoremInteractive televisionImplementationUDP <Protokoll>Interface (computing)Socket-SchnittstelleDistribution (mathematics)SchwerpunktsystemAddress spaceFingerprintState transition systemConnected spaceFrame problemImplementationLatent heatComplex numberTheoryFormal languageTheoremoutputDiagramSocket-SchnittstelleInterface (computing)State of matterBlock (periodic table)Phase transitionGame controllerRule of inferenceLecture/ConferenceComputer animation
Rule of inferenceThread (computing)Socket-SchnittstelleComputer networkStack (abstract data type)Data modelOracleLogicSoftware testingGroup actionLogicState transition systemSoftwareEndliche ModelltheorieSemantics (computer science)Computer animation
Socket-SchnittstelleStack (abstract data type)Computer networkData modelOracleSoftware testingLogicWeb pageFinite-state machineLatent heatCodeLine (geometry)Tracing (software)Endliche ModelltheorieGroup actionLogicValidity (statistics)Software testingRegular graphComputer animation
Virtual machineClosed setMIDIArchitectureDiagramFinite-state machineBitGroup actionState of matterRevision controlDiagramProgram flowchartComputer animation
ArchitectureSoftwareComputer architectureLatent heatSemantics (computer science)Computer animationLecture/ConferenceMeeting/Interview
InternetworkingEndliche ModelltheorieImplementationLatent heatLecture/Conference
ImplementationPoint (geometry)Latent heatGroup actionMoment (mathematics)Validity (statistics)State of matterMusical ensembleLecture/ConferenceMeeting/Interview
Data modelComputer wormINTEGRALCommunications protocolDifferent (Kate Ryan album)ImplementationError messageBitLecture/Conference
Maxima and minimaLogicMereologyOrder (biology)Computer virusLatent heatPredicate logicHigher-order logicGroup actionState of matterReading (process)Lecture/Conference
Lemma (mathematics)Semiconductor memoryCartesian closed categoryMusical ensembleLecture/ConferenceDiagram
Transcript: English(auto-generated)
The next talk is by Hannes Minnert, you can see him here already.
It's called Transmission Control Protocol, also known as TCP. And Hannes Minnert works at a non-profit organization in Berlin. It's called Center for the Cultivation of Technology. And he also works on an open... No, what is it?
It's called Mirage OS. If you don't know it, maybe you can find out what it is. And he researches in several engineering areas such as programming languages, network protocols, security protocols and many, many more. So give him a warm applause for his talk.
Thank you. Yes. Today I want to talk a bit about the Transmission Control Protocol and the Internet Protocol Suite. So what is it all about?
It's a foundation talk here. So if you already know TCP IP by heart, then maybe only the last five minutes will be of interest for you. Otherwise, so if you want to connect your laptop or if you want to browse to a website somewhere, you want to read that website.
It is that the client on your laptop, so the web browser, that sends an HTTP request to the web server host. So sends an HTTP request which is specified by the HTTP protocol. It's maybe get slash is a common method
of getting the main page of a website. But how is this information actually transmitted to the server? That is the question and the motivation for this talk. So that is something I want to go deep into the answer for the question. So let's look a bit at the network topology.
So on the left hand side, we have the laptop which sends to some server a GET request. You can see that by the dashed arrow. And the laptop itself is connected likely via a wireless network to the Internet. But what is actually the Internet? Well, the Internet is a collection of computers
and your laptop or anyone's mobile phone is likely connected to a router. A router is just a normal computer which has some knowledge about the network. And that router is likely connected via fiber or a satellite or any other link
like can also be an add-on cable to another router or to several routers. And there's a... In this picture, you can only see two routers, the router A and router B. But there may be any number of routers
or nearly any number of routers in between you and the server. So here the router B is connected via Ethernet, which is just a physical cable to the server. And Ethernet is a protocol which is talked over the cable. So I won't go into the physical network connectivity
like fibers and satellite and ethernet and cables and copper cables in this talk at all. But I will start with a layer which is on top of the physical medium. So the first one is a data link layer.
And well, what is a data link layer? What task it is, is it has a scope of a network and it only spans over the local network to which a host is connected. So in this picture, only the laptop and the router A share the same data link layer.
As well as the router B and router A, they share the same data link layer. It's also the case that router B and the server share the same data link layer. What is the task of the data link layer? Well, it's pretty easy. It just moves internet layer packets between two different hosts on the same link.
So the data link layer is really, its only purpose is to provide an abstraction over the physical thing and how many bytes you can transport on the physical media over the link. So the next layer is already the internet layer or the, yeah,
the internet layer, which task is to transport packets across multiple networks. So as you have seen in the diagram, there are router A and router B, they are both connected to several data link layers and they use the internet layer in order to transport packets across them.
The internet layer solves already the issue of addressing by providing for every host an IP address. An IP address is actually the internet protocol address. And the internet layer provides another task or solves another task, which is routing.
So it forwards packets to the next router, which is hopefully closer to the final destination. That is the task. The internet layer also has support for fragmentation. So if your higher layer sends something which is way too big for the data link layer,
then the internet layer can fragment that and the other side has to reassemble it. What is on top of the internet layer is the transport layer. So the transport layer establishes host to host connectivity. It does multiplexing usually using source and destination ports.
And there are two widely used transport layer protocols, which I will go into more detail in this talk, which is the user datagram protocol and the transmission control protocol. That's UDP and TCP. And they have different properties.
So UDP is unreliable and it is not ordered. And it is only an abstraction over datagrams. And it has, on the advantage side, it has a very low overhead. Whereas TCP is reliable and ordered byte stream.
So you have a reliable byte stream which you can work on. The downside of TCP is that it's connection establishment and teardown is slightly more complex. In UDP you just don't have to establish a connection and teardown a connection.
But in TCP you have to synchronize the two hosts. Then on top of the transport layer, we have the application layer. And the application layer just exchanges application data over the transport layer. So some examples for application layers are HTTP or TLS or DNS.
So in the first example we saw there was HTTP and HTTP was used to send the GET request. So that is all application layer, which I won't focus on in this talk at all. For the lower layers, the application layer is just payload.
So it's just some arbitrary data. So if we look again at that picture and we draw the different layers which are supported or which are used by the different devices, we end up with a diagram similar to that. So here on the left we have the laptop again, which has all four layers.
And then we have the routers in the middle, which are only using data link and the internet layer. And then on the right hand side we have the server, which also has all four layers. So the transport layer is really host to host.
So the TCP we saw earlier, the TCP is establishing a connection from the laptop to the server. And on top of TCP, so on top of the transport layer, there's the process to process communication. So the application layer, which is the web browser talking to the web server.
So only on the highest layer here we have the GET request. And the routers in the middle, they don't have to inspect or they don't have to use information of the transport or application layer from the laptop or the server.
So the routers just using the internet layer, they forward packets to the next router or to the final destination. So the laptop first sends the whole TCP segment or a TCP packet to the router
and the router A decides, oh yeah, I will forward it to router B because router B is more closer to the final destination than myself. And the router B says, oh yeah, well I actually know and I'm connected via ethernet to the final destination. So I will just forward it to the server.
That's how the data flow of such a connection would look like. How does the packet actually look like? So we have seen that the application layer, we have the application data, which is here in blue. And that one is just the GET request. And then the transport layer actually prefixes the application data with a header, which is a common header
that encodes some data. We will look into the TCP header in more detail soon. Then the internet layer also adds a header, a prefix, the IP header, which is just put in front of the TCP header.
And then the data link layer, well that is the lowest layer we actually care about. And that one will likely prepend the header and append the footer in order to synchronize or to make sure that the physical wire only sees a single packet at a time.
So as you can see from the layering from those two pictures, on the one side you have the bottom two up layer and every layer, if you go down from the application to the transport, to the internet, to the data link, they basically add some header information.
And the internet layer, for example, that takes the TCP header, so the transport layer, and the application layer as payload. So it doesn't care that it is TCP, it could as well be UDP in this case. So what is actually in the, so I will not go into the data link layer details at all,
but here is the header of an IP version 4 frame or packet. And that one is at least 20 bytes. It contains of various fields.
The first one is a 4-bit version, which usually is version 4 in our current world. Then it has a 4-bit header length, which is header length in words, so in multiples of 32 bits. Then it has some not really used or stuff I won't deal with in this talk.
It has a total length field, which is 16 bits, and it describes how long the entire IP frame is. Then it has an identification, which is also a 16-bit unique number, and 16 bits for fragmentation flags and offset.
And that is crucial, so if the IP header decides, oh yeah, well the packet you, the application data you sent me is way too big for this data link, I need to fragment it, then it will just reuse the very same identification number and then use here the 16 bits and the fragmentation flags and offset in order to portion that application data into multiple IP fragments.
Then it has a field, which is 8-bit, so one entire byte. It's the time to live, and it's actually not a timestamp, but it's only a count.
So how many routers should this packet live, how long should this packet live, and every router decreases that time to live by one. Then it has a 1-byte protocol field, which specifies what is the type of the payload
carried by this IP version 4 packet. Then it has a 16-bit header checksum, which is a CSE checksum, to avoid that some bits got flipped on the transport. Then we can see the source IP address and the destination IP address,
which is, yeah, very, I mean, the source IP address is the IP address of my laptop and the destination IP address is the IP address of the server. And then after those 20 bytes, you have either IP options, if the header length was more than 20 bytes,
or you have directly the payload. Now for the protocol field here, there are various types and various types are predefined. One is ICMP, which is the Internet Control Message Protocol, I will talk a bit about that, which is the protocol field, there the number is set to 1,
then for TCP it's set to 6, and for UDP it's 17. We have other protocols which can be carried over an IP frame or an IP packet, but I won't go into the details here. As you can see, there are at least 255 numbers here in the protocol field.
So because it's 8-bit long, you can store up to 256 different numbers in there. So ICMP is a protocol I haven't talked about at all, but it is the Internet Control Message Protocol. So it sits on top of IP and its purpose is on the one side to deliver error messages,
such as destination host unreachable or time to live exceeded, and on the other side it also can carry operational information like diagnostics. There's one program which you may know, which is called PING,
and the purpose of PING is to send an ICMP issue request to a remote host, and the remote host is then supposed to send the very same packet with only one single bit flipped and send that back to you, and that is an ICMP issue reply.
And if you can successfully PING another host, you can verify that the other host has at least IP connectivity up and online. Okay, let's look into the next layer, which is the transport layer, and at first we will look into a UDP header.
A UDP header is only 8 bytes. It consists of a source port, a destination port, then the length of the entire UDP frame, and the checksum. The checksum is, again, a 16-bit field. It's computed over the entire payload, and the header plus some IP pseudo-header.
So this actually carries the information of the source and destination IP address inside of itself. UDP, as I mentioned, it is unreliable, unordered, and its advantage is that it's low overhead datagrams.
As you can see, it adds 8 bytes to the payload, whereas IP already added 20 bytes to the payload. Here's a simple Unix program which is a UDP client. This program does not compile because I left out some bits,
but in order to see how you actually use this whole IP stack. So the TCP IP stack is usually embedded in the kernel, and as an API programmer, you have the API provided by the Unix Sockets API,
and that one usually contains the very same five or seven functions. The first one is socket. Socket opens or creates a file descriptor, and you specify the address family and the socket type. So this is the address family internet, and the socket is a datagram socket.
It's called dgram in Unix. Once that is created, then for a UDP client, you just say, oh, I will use the function send to, which takes a socket file descriptor, so just a file descriptor,
and then some data and will just send it to the other side. Since it's unreliable, it just fire and forget. Then afterwards, we close the socket file descriptor, because we are nice here and we try to be nice. The other side, so if you don't have a UDP client,
but if you want to implement a UDP server or a UDP listener, what you do is you, again, create a socket. Then you have the function which is called bind. Bind can bind it to a specific IP address on your server or on your network stack. Then you say receive from.
Receive from takes the socket file descriptor and the buffer and some maximum size and an offset, and you just receive from will only return once you actually receive the UDP frame on that IP address and port.
Then we print out that we received some packets and we close the socket file descriptor. So that's UDP. UDP is used for a variety of protocols, and it's crucial to have it. TCP, on the other hand, is a bit bigger.
So instead of 8 bytes header, TCP adds another 20 bytes of header. What does the TCP header contain? Well, similar to UDP, it contains the source port and destination port. Both are, again, 16 bits. Then it contains two sequence numbers. One is the sequence number itself.
It's a 32-bit number, and one is the acknowledgement number, which is the last sequence number. We have seen from the other side. Then TCP contains a data offset. Data offset is similar to the header length field. So a TCP segment may also contain some options.
So the header may contain options before a payload. That's why we need a data offset field in order to be able to find out where does the actual payload start. Then TCP has certain flags, and some of these flags are just single-bit values.
And some of them I mentioned down here, which I will go into more detail later, which is acknowledgement or ACK, synchronized or SYN, and finished or FIN. There's also reset and some urgent stuff. I will not go into detail on that. Then we have a 16-bit field, which is the window size,
which is the size of the receive buffer. Then we have, again, a 16-bit field checksum. And then we have some space for the urgent stuff. I will not go into detail. A TCP client, if you program it in a Unix way, you have a very similar API as we have seen in the UDP.
So we first create a file descriptor using the socket system call, which we give, again, the address family INAT and the SOC stream, which is the, since we are stream-oriented,
it's the name of the TCP socket. Then, as a TCP client, we connect using the socket file descriptor to a remote host. And then once we are connected,
so connect will only return once the TCP session has been established. Then we say here receive. So we receive on the socket file descriptor the specific buffer. Then we print it, and then we close the socket file descriptor again.
The TCP listener is very similar. So, well, first we create a socket, then we bind it, and bind specifies the IP address and also the port number. Then we use a function called listen on the socket file descriptor.
And then we enter a loop. So now we wait for client connections which appear at some point. And for every client connection, we call accept, and accept returns whenever there was a client which successfully established a TCP connection.
What accept returns is a new file descriptor. So another file descriptor, not the same as a socket file descriptor. So the socket file descriptor, we call again accept on it at a later point. Usually you then handle any work
on the client connection on this new FD. You handle that in a separate process or a separate thread or a separate task in order to enable the server to accept another connection while you are handling the one client connection. Then we just do some printf output
and we send a hello world to the client, to the client connection, so to this new file descriptor. Then we close it and we start from the while one and we accept a new client socket. So that is TCP listener as you will see it in any network program.
Now TCP, as I mentioned it, it has to do some work in order to establish a session and to tear down. The main work which needs to be done
is to synchronize the initial sequence numbers because we have seen in the header that we have this sequence number and somehow we need to transport that information to the other side. So here is the TCP state machine which has initially been part of the RFC which is the specification for TCP
and also duplicated in books like Stephen's design and implementation of TCP IP and TCP IP illustrated and so on. So you can see it has here one specific state which is listen and listen is as we've seen in the server implementation
if you call listen then you are in the listen state. And you always start, well, you always end up in the close state after you've called close basically. I will go into more detail of connection establishment and tear down right now.
So on the connection establishment we have seen on the client side we start with a socket in the close state then we say the Unix call connect on that socket and that connect does send
an initial TCP segment to the server side which has the synchronized flag set to true or set to one and the sequence number is some artificial number, some random 32-bits integer number. So I just call it A here.
The state of the file descriptor goes from closed to soon send and soon send, yeah, well, we just have sent out the synchronized segment. So TCP segment which doesn't carry any data
but only the TCP header. On the server side we had prepared previously. We started in a close state then we call listen then we end up in a listen state. Now in the listen state we call accept and accept blocks until the
soon is received. And once the soon is received a new file descriptor is spawned and that one ends up in the soon received state. The server sends out the TCP segment again without any data but the soon and acknowledgement flags are set and the sequence number is set to some B and the acknowledgement number
is set to A plus one. So the acknowledgement number acknowledges that the soon was received with the sequence number A plus one. Upon the client receiving that soon and arc it is in the established state and it will send out an acknowledgement segment
so that the other side the server knows oh yeah my segment has been received and that one is sent with the sequence number of A plus one because A was already used here and the soon flag consumes one byte or one in the sequence number range.
And the acknowledgement number is also set to B plus one so that is the sequence number from here plus one. Once that is received the server ends up in the established state. Sequence numbers. Yeah well it's a good idea if both hosts pick a random initial sequence number
for each connection otherwise we can get into some nasty attacks. The acknowledgement number is the next sequence number from the other hosts and the sequence numbers always increase for each byte of data and for the soon and fin flags which are only single bits. Each sequence number must be acknowledged
and each send packet is retransmitted unless it is acknowledged after a certain timeout and after trying it several times at some point the TCP stack gives up. To tear down since I'm a bit short on time
I will skip that. TCP provides us with flow control. What does that mean? Well every network stack so the kernel has a receive buffer for each TCP connection and that buffer is size limited to avoid kernel memory exhaustion which means that whenever the application
so the web server or the web browser is reading data some buffer space is reclaimed and when TCP segments are arriving some of that buffer is consumed. It's a sliding window and we've seen in the TCP header it's a window size
so there's a 16-bit field called window size which specifies how many more bytes my TCP stack has for receiving data from the other side. To avoid deadlocks there's also a timer
called the persist timer which is started when the window size is zero and that then at the timeout retransmits a TCP segment in order to get information about the new window size from the other side. Congestion control I will also skip a bit
but the main idea is to control the rate of data entering the network because if you're using multiple routers at some point you may saturate some of the network links and that is avoided in TCP
by applying congestion control which measures, for example, the time between segment send and acknowledgement received also has to do with slow start and how your window size your window buffer grows. Acknowledgements well, there are some strategies
the basic one is every segment is acknowledged individually there's a delayed act where you collect multiple segments to acknowledge them at a certain time then you have also selective acknowledgements where you can acknowledge discontinuous segments
which helps for lowering the amount of retransmissions. TCP also carries some maximum segment size to avoid fragmentation actually on the IP layer because that is partially open there's some struggle because you have simultaneous open
so what if both parties want to open a connection at the very same time then you have a flag which is called reset in order to terminate a connection. There are some extensions like window scaling and fast open to improve the throughput and also to lower the delay.
There are some attacks like denial of service so if your server implementation accepts something and allocates a lot of memory for a client which doesn't do a lot but just sending a SYN frame that is bad and leads to denial of service. Connection hijacking if you can predict the sequence numbers
then you can hijack and emit data into an established connection. There have been some blind in-window attacks what does that mean? That even without knowing the sequence number you can do something on an established TCP connection such as sending a reset
or sending a SYN frame and tearing that connection down. The specification for TCP is written in English prose in the collection of RFCs and there are some widely deployed implementations. During some research work in Cambridge over the last years
me and various colleagues implemented a formal model developed in the interactive theorem through our Hall 4 which has a precise specification with implementation looseness. We really use that as an input so the sockets API and interface for getting
the TCP control block which is the host internal state of the TCP and then the via interface which is data received and sent on that. We use that formal model to validate itself so we used actual implementations to do that.
We use it to draw some diagrams where you can see the rules which fires on the left hand side when something happened like there was a connect called and then the logical rule connect one was used in the label transition system.
Then we see here as well some TCP segments which are going out and in. What are the contributions of the network semantics? We checked the model. We validated the model by recording traces and executing them. We published a paper
called engineering with a logic rigorous test or regular specification and validation for TCP IP and the Unix sockets API. The specification itself is typeset in 384 pages that's all the transitions you basically need. It's roughly 10,000 lines of 404 code and a lot of comments where we embedded a lot of latest code.
And the Unix TCP IP stack has usually around 15,000 lines of code. The TCP state machine we saw earlier is here in this diagram and we tried to draw a more correct TCP state machine
which led us to this picture which is a bit more complicated. We have this state non-existing up here and we have much more transitions due to timers and so on. So the state machine used in common literature is actually not complete or not precise and we have a revision for that.
Conclusion is TCP IP is widely deployed. I hope I managed to give you some insight how TCP IP actually works. It's a layered architecture which is agnostic of underlying layers and in the network semantics
working we had an executable specification. That's all I have to say and I welcome you to ask any questions either now or offline.
Thank you. So if you have any questions just go to the microphones. We have two here and two on the right side. And do we have some questions from the internet? No. Questions? No. No questions?
Yeah, one question. Come on. Don't be shy. Hi. Thanks. That was a very interesting talk. So your model does it allow synthesizing a implementation from the specification or is it used mostly for validating? It's at the moment
used for validation because we have the specification looseness so we have implementation looseness so at some point in the implementation you have to choose whether you take one transition or the other one. So if you go into a failure state or if you go into a success or if you transmit some piece of data and go into a success state.
So we don't have synthesized any implementation but there's ongoing work to use it as a implementation as a base for implementation. Ok, and do you think that if such an implementation can be made can it be made efficient as well when synthesized?
Ok, thanks. Yeah, your question please. Thank you. How independent is TCP from IP? I mean, can you integrate TCP over different protocols like Bluetooth or something like that?
Since TCP requires for error messages a bit of ICMP I haven't seen any TCP implementation on top of any other medium than IP so I don't know but I can think of it.
Could work. Ok, your question please. Thank you, hello. So you used Hall 4 for the specification part. Did you actually need to use the higher order logic part of Hall or would it be possible to just use predicate logics?
I will have to reread. I think we need actually some higher order logic for it for the whole state and the transitions. It would be interesting to meet and tell.
Yes, well the paper has been published at Journal of ACM and luckily skyhub.is is available and you can download it for free from there. Ok, thanks. Any more questions? No? Then, thank you Hannes.
A warm applause for Hannes please.