We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Headscale: How we are using integration testing to reimplement Tailscale

00:00

Formal Metadata

Title
Headscale: How we are using integration testing to reimplement Tailscale
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We use Go, containers, and integration testing to reimplement Tailscale’s (https://tailscale.com/) control server, the closed source SaaS product that coordinates their Open Source WireGuard-based client. This talk tells the story of how we managed to build a stable version of the control server that is now starred by almost 10k on Github and has users all over the world through extensive focus on integration tests. Tailscale is a modern mesh VPN built on top of Wireguard (https://wireguard.com/). It works like an overlay network between the computers of your networks - using NAT traversal. The Tailscale’s core client, Android client, and supporting libraries and software are Open Source. The SaaS control server and client UIs to proprietary OS (macOS, iOS and Windows) areis closed source. Headscale implements this control server and allows self-hosters and open source aficionados to use Tailscale’s clients without having to use a closed source product. In this talk we will tell the story of how we started reimplementing the control server based on the code from Tailscale’s official client. We went through stages of “man this works surprisingly well” to “how does this even work” after we started to use aggressive integration testing to improve the “correctness” of our implementation. For every test we added, we found tons of issues to fix, and eventually, we got to a stage where we could improve headscale with great confidence. Today, integration testing is potentially the most vital part of our development cycle and we both use it to fix “old and broken”, and add “new and shiny”. We have managed to take what started out as a learning project to understand how Tailscale worked and built it into a healthy open source project and community.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
INTEGRALServer (computing)Open sourceRevision controlGame controllerSoftware testingBitComputer animation
EmpennageBitRouter (computing)Service (economics)Point (geometry)Reverse engineeringNatural numberClient (computing)Formal languagePublic-key cryptographyCoordinate systemInheritance (object-oriented programming)Core dumpSelf-organizationGoodness of fitHierarchyPolygon meshBlogSoftware as a serviceView (database)Direction (geometry)ConcentricDifferent (Kate Ryan album)Single-precision floating-point formatPower (physics)MereologyCollaborationismPoint cloudStaff (military)Program flowchart
Computer fontImplementationSanitary sewerOpen sourceSound effectScaling (geometry)Computer wormCommunications protocolClient (computing)Open sourceMultiplication signFreewareKey (cryptography)
Reverse engineeringWeb serviceSoftwareMetadataMathematicsLevel (video gaming)IP addressPublic-key cryptographyVermaschtes NetzWeb 2.0Classical physicsService (economics)Client (computing)Finite-state machineBitOrder (biology)Series (mathematics)2 (number)Computer animationDiagram
BitTrailWechselseitiger AusschlussFinite-state machineThread (computing)Server (computing)Multiplication signBefehlsprozessorCycle (graph theory)Asynchronous Transfer ModeComputer animationDiagram
Maxima and minimaMultiplication signSign (mathematics)Meeting/Interview
Multilateration1 (number)Software testingClient (computing)Integrated development environmentBit2 (number)Instance (computer science)PressureProper mapSoftware frameworkCodeMultiplication signImplementationINTEGRALSoftwareProjective planeError messageVirtual machineLevel (video gaming)LogicUnit testingBlock (periodic table)DeadlockPoint (geometry)StatisticsShared memoryReverse engineeringState of matterFinite-state machineComputer animation
Communications protocolNoiseSpacetimeDisintegrationAbstractionClient (computing)Message passingBand matrixSoftware testingSoftware developerMultiplication signVapor barrierQuicksortPoint (geometry)INTEGRALComputer clusterImplementationServer (computing)Parallel portView (database)Software frameworkBitCodeGame controllerType theoryFunctional (mathematics)Service (economics)Revision controlSoftwareComputer fileDisk read-and-write headRight angleCode refactoringLine (geometry)CASE <Informatik>Regular graphConfidence intervalCommitment schemeCommunications protocolTheoryRandomizationProjective planeControl flowReverse engineeringComputer animationDiagram
Connectivity (graph theory)EmpennageScale (map)Software testingProduct (business)Server (computing)Web 2.0TrailDataflowMultiplication signKey (cryptography)Client (computing)Image registrationSoftware developerComputer fileInternetworkingSoftwareBitMereologyType theoryINTEGRALVirtual machineHuman migrationImplementationWorkstation <Musikinstrument>AbstractionSlide ruleExecution unitGame controllerDirect numerical simulationQR codePoint (geometry)Filesharing-SystemUnit testingRoundness (object)Service (economics)Computer animation
Open setComputer animationProgram flowchart
Transcript: English(auto-generated)
Next up, we have two speakers for the prize of one. They are gonna talk about everything that has to do with an open, open, more open source version of Tailscale, so let's give an applause to Christopher and Juan.
Hello, hello, okay, this is cool. Hello, my name is Christopher, and I'm gonna, together with Juan there, talk a bit about how we use integration testing to kind of re-implement the control panel or the control server of Tailscale.
So first, a little bit about us. Juan Fontalonso is the creator of Headscale. He works for the European Space Agency on cloud and DevOps and infrastructure. He claims to have been my first manager, but I think that's incorrect, and he has the attention span of a goldfish. Which makes the whole collaboration very fun,
and I'm here with Christopher. He's the top contributor of Headscale and one of the other maintainers alongside me. He's part of the technical staff at Tailscale, and part of his time, paid by Tailscale, is to work improving Headscale. I was his manager, at least from a hierarchical point of view
and one of the challenges we have is that he always finds this kind of super niche languages like OCaml or things like that, where he tries to re-implement Headscaling. But first of all, how many people here know Tailscale? And Headscale? Good.
That's pretty good. So for the people who don't know, we'll do like a quick tweak, what is Tailscale? So Tailscale tried to solve this problem where you basically sit and you want to connect your organization or home or something like this, and you have an old school or legacy VPN concentrator
where you connect into your kind of perimeter, you have access to absolutely everything, there's a single point of failure and a massive bottleneck and it tries to do this by creating like a mesh VPN that uses direct connections, wire guard
and kind of facilitates this for you using techniques like natural reversal and has a very, very powerful client that will make sure that you always reach like what you're trying to get to, and it offers a lot of different kind of granular access and you get a lot more power
compared to your old school bottleneck, single point of failure VPN. And in Tailscale, the clients are open source, at least for the open platforms, and what they have is a closed SaaS. But still, they are quite open when it comes to explaining how the whole thing works. And in March 2020, they publish a blog post
basically explaining how the whole thing worked, how they use these natural reversal techniques so you don't have to open the ports in your router. And there was a phrase in this blog post that gathered my attention for a little bit and was basically talking about a coordination server, that the clients talk to a coordination server,
the core of this service offering, which is essentially a shared dropbox for these WireGuard public keys. So I was puzzled by that and basically took that open source clients and started reverse engineering.
Basically, a lot of print-ups to see what kind of payload were they sending, what kind of endpoints or protocol they were doing. And yeah, this was around April 2020 in June. I mean, I had a lot of free time at that time. And in June, I did the initial release.
I talked to my friend Christopher about tail scale and he was very happy distributing WireGuard keys with Ansible, which, yeah. So I kept doing my own thing for a while. Headscale gained a little bit of traction and around mid-2021, he joined because he was quite curious about the whole thing.
But he was afraid about breaking stuff. And that's why kinda we are here. Although he was not afraid of making a logon that I think is super nice. So what I've learned during this reverse engineering exercise is that the test-scale clients talk to what is basically a web service.
This web service receives metadata from the clients like the endpoints or the WireGuard public keys that they use and assigns them IP addresses like you would have in a classic traditional VPN service. Once everybody knows about everything,
you can establish this mesh network across the clients without, because the data doesn't go through the web service. So we arrived to the initial stage of Headscale, the illusion that everything works and kinda worked until it stopped doing. So we had this web service, we implemented the web service, a series of endpoints
that we found in the reverse engineering exercise and we were assigning an IP address to when that node arrives and what happens when a second node arrives. Hey, we want to tell that I am here and I want to find my friends and I want to communicate with them.
So in order to handle that and to handle all the metadata that you need to establish the connections, we developed a little bit of a state machine that to handle a new node has arrived, there has been a change in the map of the network and we need to distribute the updated metadata
that we have. However, at that time, I was kinda learning goal and we follow a little bit of a weird approach when handling concurrency, which was basically adding more mutex every time we needed it. And this is a problem because at the end,
we ended up with a great mutex for this state machine and this is a very big problem because a Python track is tomorrow, so the gridlocks are over there. So what ended up happening inside the state machine or what didn't end up happening was that basically some of the failure modes we saw
was that a new node trying to register and then we burned a couple of CPU cycles trying to calculate some stuff and then we did nothing. So no updates were sent out or anything. Sometimes we will have a new node joining and we would compute everything, send some natural traffic, we just omitted the new information, that was kinda crucial for everyone to know,
so it ended up not working. And sometimes a new node joined, nothing really happened, but then eventually something happens and it sent out an update to everyone and that was useful. And sometimes on the individual update channels for each node, some of these aforementioned mutexes
kinda deadlocked up the whole thread or the go routine and then we just never sent updates to particular nodes and sometimes we just deadlocked the whole server and you kinda had to kick it to make it come back to life. But still there was kinda this notion that it did work pretty well eventually most of the time
and it gave us kinda like this illusion of working. And because what you often saw was that you had like three nodes and only two of them actually talked together and as long as those had received the updates they needed, the user was happy and you're just like, ah, it works, so I'm gonna press the star sign on GitHub
but share it with my friends. So, but we figured that eventually this would like caught up with us and we're trying to get to this stage where we're, you know, it works most of the time. So what we did have was a fair amount of unit tests but the problem with unit tests is that we're trying to reverse engineering something,
that we're also learning how it works and what we spent a lot of time on was misunderstanding how it was supposed to work, writing unit tests that would pass but they were wrong. So you kinda have like a passing test and it's an entirely wrong implementation. And 90% of what we were actually trying to do
was integrate with a third party software and this is where we get to actual integration tests. So what I started doing was I found this Docker test framework which basically allows you like programmatically create Docker containers. So we started making tests that spun up a headscaled container.
It created a bunch of tailscaled instances also running in Docker and associated them with a couple of users and tried to like emulate the entire environment so you can test everyone to everyone. We had them join the headscaled server and since it takes a little bit of time
for everyone to catch up with each other and you know send the updates and stuff so we put the sleep of two minutes in front of the test which is a terrible idea but you know you learn. And then after that sleep runs out presumably everyone is now up to date and can talk to each other. So we had a test, the most basic test is
is my network working? Can everyone ping everyone? So we tried to do that. And of course that didn't work because of all of the errors we actually had in the code. And I ran some initial like tried to make some statistics on my laptop and out of like 100 test runs we had 70 failures.
That's pretty bad. But at this point we're starting to approach like we have an actual goal that we can measure so we can improve on this. And quite rapidly we figured out that these two big blocks of problems that we have is associated with two things. So one of them is the being able to reliably send updates to all of our clients
which is the kind of deadlock problem that the update channels were just locking up and didn't really work. So we made a massive, massive rewrite PR that redid the whole logic and made sure that we always were able to send an update to the client as long as it was connected. And then the other problem was this state machine that was very broken.
And then we kinda figured out that we can make a global state and we tried to simplify it initially and optimize later. So basically a global state, how can we determine if everyone is up to date and make sure that we know when you last received the successful update? And if not, we have to reissue ones
to make sure that you know everything. However, changing the Rambo culture takes a little bit of time. We kept merging stuff without proper integration testing. But as Christopher said, we didn't have the incentive. We didn't have the pressure because the thing really worked.
It's not the same when you are in your home lab and you join a node than when you are joining 100 nodes within one second. So if you are slowly joining machines to your tail net, things were working. However, the project was gaining popularity and we were increasing more and more in contributions
in external PRs. And this was around August 2021 or September, something like that. So, but we were, it was great. We were getting to a point where we could improve headscale with confidence. We had a, we could improve headscale with confidence
in three aspects. Tangle, from a tangle point of view, given that the project is started as reverse engineering effort, we had a lot of stuff that was not that great. We could improve or maintain the compatibility with this third party external clients that we are using and we could improve from a community point of view.
I'm going to talk a little bit about this now. For starters, we could improve from a technical point of view. We could do massive refactoring within the project or implementation of the second version of the tailscale protocol without breaking the existing users.
The only thing that breaks is probably the mental health of the reviewer that has to deal with 3,000 lines of code. But that's a different thing. Then, as I said, we have this minor small detail that we completely depend on a third party client
because we are using exactly the same official clients as a tailscale. However, I mean, we have a very good working relationship with them and every time that they change something, we get a heads up. However, we keep within our integration tests quite a big commitment for support this client.
So we target the head of the repository, we target the unstable releases and we target nine minor releases of the client to make sure that nothing breaks from their side or from ours because it can happen. And then I think integration testing
can also help the community because we, as maintainers, can trust in a better way those random PRs from random unknown people that appear in GitHub, which is something that is not given.
And in theory, or that's what one would think, is that by having integration tests, contributors, those external people that we don't know, should also feel more confident when submitting a PR. But that's a theory. So it does still come with some challenges.
So one of the things that we see occasionally is that a PR comes in and it doesn't have a test and then we ask nicely if they can add tests and then the contributor disappears. So some of the times we're taking, we're trying to improve on this thing and always get them in.
So what we try to do is if they truly disappear, we try to pick it up if it's a feature that we really want and we have bandwidth to do so. Sometimes we try to reach out and sit and help them write the test and onboard them in this kind of things.
One of the tests actually for our, there's a SSH feature and the test for that, I knew the developer and he was also in Norway, so once I was dropping by Oslo, we sat down for an afternoon and we worked on them together and paired on them. That's not available for everyone sadly. But we always try to get this test message out there
in a way. But there is a couple of other challenges as well and that is that adding the test raises some sort of learning curve. So you need to know go test, you need to understand our test framework, you need to have Docker and all of these kind of thing versus not writing tests, that a lot less code.
And it's hard to convince people how awesome tests actually really are, that they're not really a chore and that you really, really thank yourself later for doing them. So some of the things we're trying to do to even make this barrier lower since we're so heavily dependent on this
for compatibility and everything is that we're making our own test framework V2 because we depended on a lot of repeated and copied code and there was a really high bar for adding new tests and it was really hard to update and change and it did depend on time.sleep which haunted me so many times and it couldn't really be run in parallel
for many of the previous reasons. And the documentation wasn't really good. I knew how to use it, one knew how to use it and then that was about it. So a couple of other people figured it out. So what we're trying to do is we're abstracting things a bit away. So we have this concept called control server
which is what essentially headscale is and the tailscale product, the software as a service. And it's implemented as like headscale in container and it exposes convenient functions that now have Godox support and all of these things to make it easier for developers to actually use it. And then we have the tailscale client which is implemented at tailscaling container
and it has the same type of convenience functions. And what this allows us to do is previously the two files on the right here, sorry, on the left is two different version of the setup code for the tests because when you needed something that was slightly special you had to copy the whole thing
and then make a new file to be able to write a test case like you see on the other side here. But now after abstracting that away making it a lot more configurable we allow people to write more or less regular test cases but you just set up what we call a scenario which is a headscale with a given amount of tailscale nodes and then you let them ping each other
or something like this. So what do we test right now? We tried to, we kept all of the original tests so basically we make all nodes join the network and we make them ping each other to verify that we have a fully functioning network both by IP and magic DNS. Magic DNS is tailscaled DNS system.
We test taildrop which is a file sharing features a bit like Apple's airdrop and we send the files from every node to every node to make sure that they work. We test all our registration flows because we've broken them a couple of times so it was better to do it that way which is pre-author keys and web plus a command line flow
and even open ID we currently have tests for. We try to isolate all of our network from the internet and test with our own embedded relay server because tailscale depends on relay servers that we also embed in our binary and we have a preliminary test for the SSH features
that we support which is like authenticated by headscale so you can SSH into your machine and we test SSH all to all and we try to do negative tests and also we test our CLI because if you may change something you don't want to sit and type in every single command
in a structured way manually because that's just painful. So in the future we want to also improve this granular access control that tailscale offer. Currently this is a very good example of where we have added a lot of unit tests and they all pass but they're all wrong. So well they're mostly wrong
so we have to kind of redo most of this into integration tests first and then kind of backfill the unit tests once we know how the implementation is actually supposed to work. And one of the things we've been dabbling with especially for this ACL feature is to use that control server abstraction we had before
and use the tailscale product to test our tests because if they pass on the public server we know they're correct and then we can use them to verify our thing. And then maybe run tailscale in the VM instead of Docker to test it properly but that's more of a benefit for tailscale than it is for us. So if you're just here waiting for the next talk
a little bit of a TLDR is that I mean we cannot understate how important having this integration testing when we depend on an external party have been for the development of tailscale. I reckon also the name is also excellent.
Point tailscale would have been worse. We have, I mean with the integration testing we are able to maintain this compatibility with the client and we are able to take contributions from third-party developers otherwise it's a little bit more difficult to develop this trust across the internet, right?
And even though the tests are not perfect we still have to migrate unit tests towards integration tests. I think this is one of the keys for the success of the project. And yeah. So some extra things. Tailscale's hosting a happy hour
at the Brewdog by the station. This QR code takes you to a signup form. I'll quickly switch back to this slide at the end but I have like a question slide as well so you know we go through this. Basically this is how to reach us. GitHub, we have a Discord community and we're very happy to talk to anyone who wants to talk to us here at Fostem
so please feel free to reach out. And I'll leave it at this one if anyone has any questions. We have some minutes, I think. Thank you. While I have your attention we have a go for that lost air wallet.
Look to the left, look to the right, front and back. If you see a wallet that is not yours please come bring it to the front. It will help this person a lot. Thank you. Let's switch things. After you look for the wallet and you have a question, raise your hand and I'll try to come with this microphone.
Always in the front, thank you. How come the tailscale guys are not mad at you? And not only are not mad at you but they hired you afterwards. I mean, part of it, is it working? No?
Okay. I think part of it is that they are quite chill. I mean, they are quite chill. They could have taken this way worse than they have. And I don't think we are competition. We are focused on self-hostess, on home labs, perhaps a little bit of a small company. And what usually happens is that people
that use headscale at home, then they go to their companies and they talk about tailscale. And when you're in a company you actually prefer to pay for the service. So it's like a way. Thank you very much. It's like a way of selling tailscale also. Sorry, tailscale also.
Okay, thank you very much. Last round of applause. If you have any questions you can catch them in the hallway track.