We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

FOSDEM infrastructure review

00:00

Formal Metadata

Title
FOSDEM infrastructure review
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Service (economics)Server (computing)Dynamic Host Configuration ProtocolCore dumpVideoconferencingVolumenvisualisierungWebsiteProcess (computing)Revision controlProtein foldingStreaming mediaVideoconferencingPoint cloudServer (computing)BuildingLaptopQuicksortMultiplication signSoftware repositoryReal numberRoundness (object)Dynamic Host Configuration ProtocolVolumenvisualisierungPresentation of a groupOpen sourceUsabilityCloningBand matrixCore dumpCuboidCartesian coordinate systemForcing (mathematics)Tape driveService (economics)WebsiteException handlingNeuroinformatikComputer animation
CuboidCASE <Informatik>PiCore dumpSoftwareCodierung <Programmierung>Electronic visual displayMassVideoconferencingComputer animation
Link (knot theory)Maxima and minimaRoundness (object)Open sourceMultiplication signLaptopService (economics)Proof theoryPhysical systemWater vaporVideoconferencingBuildingForcing (mathematics)SoftwareCodierung <Programmierung>Software repositoryComputer animation
VolumenvisualisierungDivision (mathematics)EmailLaptopStreaming mediaBand matrixInternetworkingVideoconferencingWeightCuboidProcess (computing)1 (number)WebsiteKeyboard shortcutComputer animation
BefehlsprozessorReduction of orderRouter (computing)Computer networkDreizehnPerspective (visual)Gamma functionRAIDGame controllerServer (computing)UsabilitySoftware repositoryEvent horizonRevision controlElectric currentLine (geometry)Service (economics)BlogEmailSinc functionInternetworkingKeyboard shortcutInstallation artQuicksortConnected spaceMIDIVideoconferencingComputer fileInformation securityRouter (computing)Direct numerical simulationAsynchronous Transfer ModeMultiplication signService (economics)Perspective (visual)BefehlsprozessorRevision controlLine (geometry)Ocean currentDampingServer (computing)Metric systemSoftware maintenanceGame controllerBit rateVirtual machineBitKey (cryptography)Reduction of order1 (number)CodecVideo gameGoodness of fitArmCore dumpOpticsSoftwareComputer animation
Conditional-access moduleExecution unitMaxima and minimaVacuumFluxCommunications protocolDrum memoryEmailMathematicsInterface (computing)Computer animation
Connectivity (graph theory)Server (computing)Service (economics)DreizehnBlogInternet service providerBitLine (geometry)Link (knot theory)Band matrixSinc functionComputer animation
OvalVery long instruction wordCloningMultiplication signBand matrixLink (knot theory)Musical ensembleCuboidMoore's lawLaptopDecision theoryTouchscreenBuildingAddress spaceProper mapNeuroinformatikDuality (mathematics)Pattern recognitionGame controllerPropagatorHand fanPower (physics)TransmissionskoeffizientOpen sourceVideoconferencingWeightWebsiteInformationRevision controlStreaming mediaCloning1 (number)PiDiagramComputer animation
Multiplication signMereologyDecision theoryRouter (computing)RoutingEvent horizonPlanningVideoconferencingComputer hardwareTelecommunicationPoint cloudContext awarenessGame controllerBitMatrix (mathematics)Streaming mediaVirtual LANBridging (networking)CuboidSoftwareReal-time operating systemInternet service providerNumberVideo gameService (economics)Forcing (mathematics)Metric systemComputer animation
Computer animationProgram flowchart
Transcript: English(auto-generated)
Okay, hello everyone and welcome to the last lightning talk of the conference. I hope you've enjoyed yourselves. This will be the FOSDEM infrastructure review, same as every year presented by Richard and Basti.
So normally I do this thing, but Basti has been helping a ton and the ball of spaghetti and spit and duct tape, which I left him turned into something usable. So I'm just going to sit here on the side and I'm here for the Q&A.
But for the rest, it's Basti and it's his first public talk for real. So give him a big round of applause. Well thank you, I hope I will not screw this one up. So we'll have about 15 minutes and 10 minutes of talk and five minutes for Q&A and I hope
it's somewhat interesting to you. First, the facts. The core infrastructure hasn't changed that much since the last FOSDEM 2020. We're still running on its Cisco ASR10K for routing ACLs, NAT64 and DHCP.
We have already reused several switches that were already here from the last FOSDEM. They're owned by FOSDEM. These are Cisco C3750 switches. We had our old servers which are now turning 10 this year.
They were still here and they will be replaced next year. We have done like all the years before everything with Prometheus, Loki and Grafana for monitoring our infrastructure because that's what helps us running all the conference here. And we've built some public dashboards and we just put it out to a VM outside of ULB
because we were running out of bandwidth like the years before. And I'll come to that back later. We have a quite beefy video infrastructure. You might have seen this one here. It's a video capturing device.
It's called a video box here at FOSDEM. It's all public. It's all open source except one piece that's in there. You can find it on GitHub if you try to build it yourself. Go ahead, just grab the GitHub repo and clone it. These devices, there's two of them, one at the camera, one here for the presenter's laptop.
They sent their streams to a big render farm that we have over in the K building where like every year our render farm is running on some laptops. So, laptops send the streams off to the cloud from Hatzner and from there we just distribute
it to the world so everyone at home can see the talks. We have some sort of semi-automated rev-up and cutting process. Those of you who have been talking here maybe have known S-REVIEW for years. This is the first time it's running on Kubernetes.
So, we are trying to go cloud-native as well with our infrastructure just to show how all is being held together. This is our video boxes. I don't know if you can see it. We got those Blackmagic encoders here that are turning the signals that we get like
SDI, HDMI into a useful signal that we can process with our banana pie that we have in there. Everything is wired up to a dump switch here and then we go out like here and have our own switching infrastructure inside those boxes.
There is some SSD below here where we just in case of network failure dump everything to the SSD as well. So, hopefully everything that has been talked about at the conference is still captured and available in case of a network breakdown. Those boxes also have a nice display for the speaker so we can see if everything is
running or it's not running, which makes it easy for people to operate these boxes here. You don't have to be a video pro. You just have to wire yourself up to the box. You see a nice FOSTUM logo and see, okay, everything is working and you're done and everything gets set out.
This is like how the video system is actually working. We have all this can be found on the GitHub. You don't have to take screenshots for that. If you like to see it, you can we will tear down this room afterwards so we can just everyone can have a look at the infrastructure we're using because it's not
being used after this talk. You see it's quite some interesting things to do. This is the instructions that all our volunteers get when they wire up the whole buildings here on one day on Friday. They're not here but they should be given some round of applause because they're
volunteers that are doing really the hard work and building up on one day the complete FOSTUM. Maybe it's time for a round of applause for them.
Here we have another thing. This is also on the GitHub repo where you can see where something is coming from. We have the room sound system. This is what you're hearing me through. And we have a camera with audio gear, speaker, laptops, and it's all getting pushed down until someone reaches your device down here.
There's a ton of services processing it in between. And this is all done with almost all done with open source software. Expect for the encoder that's running in there, which is from Blackmagic Design. So how is it processed? We have a rendering farm.
These are the laptops. It's 27 this year. For those of you who don't know, those laptops are being sold after FOSTUM. So if you want one, you can grab one. This year, they're already gone. But for next year, maybe you want to have a cheap device. You can have them with everything that's on them because we literally don't care
for that. You can have it because everything's been processed after the FOSTUM. You can see it. Some racks where we just put them four-wise, and we have 27 of them. We have some switch infrastructure that is used for processing all that stuff.
And this one's not running out of bandwidth. But we're coming back to what's running out of bandwidth. You might see this mess over here. This is our internet. And it looks like every common internet on the planet.
And this is our safety net. We have a big box here where all the streams go. And this will be sent out to Bulgaria to the video team right after FOSTUM. So we have a really off-site copy of everything. So the challenges for this year. DNS64.
All the years we've been running on BIND9 since ages. And we switched to CoreDNS, just like testing it on Sunday of FOSTUM 2020. We really saw a significant reduction in CPU usage. And that's why we stuck to CoreDNS since then.
And this year we also replaced the remaining BIND installations that we handled for all the internal DNS and all other recursive stuff that's been used here to provide you internet access. Richie always used to give you some timelines. And that's what I'm trying to do as well.
There were times when it was mentally challenging for people building up FOSTUM. We got better by year by year by year by doing some sort of automation and getting people used to know what to do and have everything set up before that.
We installed routers. You see that there's a slight... It's getting better year after year. This year we had like a very... We thought it would be okay from what we know. We just set it up in January and everything worked. We came here on the 5th of January, I think.
Put everything up and it just worked, which is great. Which gives you some sort of things not to care about. Because there were other things to care about. The network, to have it up and running here, took us a bit longer this year than last year's
because we were playing around with the second uplink that we got. We used to have one gigabit uplink. Last week we got a 10 gigabit uplink and we thought, okay, just enable that and play with it. And it turned out to be not that easy to getting up both of the BGP sessions running
and doing it properly. That's why it took us a bit longer this year. The monitoring was also one thing which really helps us to understand if FOSDEM is ready to go or if something has to stay very, very late here. The last years we've been very, very good at that.
Basically in January everything was done like the last of January. But it's January. This time in the first half of January everything was set up and was running and it worked. It was really great because some people actually got some sleep at FOSDEM.
Didn't need to stay here very long because everything was all pre-made and just go and look at the dashboard. Okay, this is missing, this is missing. And just say, okay, just have them all checked. The video buildup took a bit longer this year because of we're getting old and rusty
at that. Also very many new faces that have never built up such a great conference. This is why we took us a bit longer and the video team also, yeah, I think they got the least amount of sleep of all of the stuff that was running the conference.
This was the story so far. We closed FOSDEM 2020. I was also there at 2020. 2020 was really one of the best ones we ever had from a technical perspective.
We had everything running via Ansible, just like one command and then wait an hour till everything is deployed and you're gone. Cool. Have some beer, some mate in between and everything was cool. Then we had this pandemic. Just for me, like a week after FOSDEM, everything went down.
And we, you know, we had FOSDEM 2021 and 2022. There were no conference here at the ULB. So we had no infrastructure to manage was quite okay. We had to do some other things like most of you have learned that we have a big metrics installation to run and the FOSDEM conference and the company and help you
with communicating during the conference. Then there was this bad thing that the maintainer of the infrastructure left FOSDEM between these years. And so Richie searched for someone who was dumb enough to do that. Yeah, that's me.
So this year we're back again in persona. Sorry? Found it. Yeah, thanks. So after two years, we came looking for the two machines after almost two years.
Like no one touched them. They rebooted one or two times due to power outages in the server cabinet. But we had a working SSH key. We had tons of updates to install after literally three years. I wonder, nobody broke into the machines because they were public exposed on the internet.
But only SSH and I think a three year old or three and a half year old Prometheus installation which was full of bugs. We noticed that the battery controllers, the battery packs of the rate controllers have been depleted. So this was the only thing that actually happened in the three years.
The batteries went to zero and didn't set themselves on fire. So everything was okay. The machines worked. Just a bit of performance degradation, but everything seemed to be okay. And then we tried to run this Ansible thing from the last years.
And three years later, Ansible has done a lot of things in the time. And you want to use a current version of Ansible with that old stuff. You end up like this. This is me. Start from scratch or fix all the Ansible roles. You can have a look at them. They're also on GitHub.
So when we thought, okay, how do we do this? And said, okay, then just Ansible will be gone. We just fix it after the FOSDEM because we will have to renew the service anyway. And everything will change.
So the service timeline. We have them service alive at the 8th of January. Services, DNS64 all the way to the mid of January. We had centralized all our locks. This was something Richie was looking for since ages. That we had easy accessible lock files for everything that's running here at FOSDEM.
Which was good that we had them because we could see things like, oh, the internet line that was proposed to be there actually came. Nobody told us, but it came up. You see that? Thanks to the centralized logging, we were aware of things like that. And then we could go and fire up our BGP sessions.
Then two days later, we noticed, okay, firing up the BGP sessions. Wasn't that a good idea? Because we lost almost all connectivity. Stop, it says, but I don't care. Yeah, I just keep talking, yeah.
We lost all our connectivity and said, okay, damn it. We're in some sort of panic mode. Because the reason for looking at the service was like this bind security issue that was been, I read the mail at the morning of January 28th and said, okay, we have to fix the bind installations and then suddenly can't reach the service anymore.
Okay, are they already hacked or what's going on? And doing some back and forth with our centralized logging, you see that? This is Grafana Loki that we leveraged for that. We were kind of like, yeah, it's been really nice to debug things like that.
We also noticed that there was an interface constantly flapping to our backbone, which we also could fix within that session. After that, we said, okay, there are some MTU problems. We have to restart BGP and so on and back and forth.
And then we finally agreed to just throw away the BGP sessions, go with the one gigabit line, and yesterday evening we switched to the 10 gigabit line because we had the congested uplink like since 11 in the morning.
So many people using too much bandwidth. And since yesterday evening, everything is okay. It's better. And we're on the 10 gigabit link due to the fact that there are not so many people here today. Yesterday there were quite a bit more. The link was not fully saturated, but you can tell this is the place where we could
use some more bandwidth. It was like, I don't know, this is usually time for something to eat. But at 3.30 we could actually use something of the new bandwidth that we had available. So if you want to look at all of the things, we have a dashboard put out there
publicly. If you want to have a look at the infrastructure and answerable repo, that will be fixed to work with current answerable versions within the next few days. Just clone our infrastructure, clone everything. And if you have any questions, I'll be glad to take them. Yeah, fire away.
As I don't see any questions then, we are about to tear down this room after this.
So please don't leave anything in here, because it will be cleaned and everything will be torn out. If anyone else has a question, just there's we use lab.
The question is, why do you use laptops for rendering? Because they have a built in USB called battery. So in place of the power outage, we can easily run with them. Also, they're very cheap for us. We can just use the computing power and sell it at the same price that we bought it to the people here.
You get a cheap laptop. We get some computing time on them before. And that's the main reason for running it on laptops. Well, actually, the question was why you were using Banana Pi.
That's a good question. The thing is that the capabilities of the Banana Pi were a bit better than the Raspberry Pi, the times the decision was made. If you see, there's a big LCD screen in front of the boxes where you can see that thing. I think it was with driving those LCD panels and also the computing power available on
the Banana Pi that wasn't, yeah. But actually, we have to look that up in the repo. There's everything documented. Okay. Yeah, there's another one in the front.
So the question was if there are any public dashboards out there. Yeah, we've put some public dashboards on dashboard.graphana.org. Oh, dashboard.fasm.org. Sorry.
Which you can have a look at the infrastructure. We used to have some more dashboards like the t-shirts that have been sold. But due to the fact that we changed the shop, we converted to something that we bought to an open source solution. And the thing is we totally forgot to monitor that.
So that's, but there are some dashboards out there to monitor it. And if you want to see something more, just come to me after the talk and I'll show you something more here at the laptop. Okay. Yeah, another one.
The biggest one standing here. No, actually the biggest issues we had was like running all that stuff after three years and not having set up everything properly was quite challenging.
Like on Saturday morning, we had to run and redo the whole video installation on the K building because of, you see those transmitters here, they were not plugged properly. And so we had no audio on the stream. This was one thing. And then another very challenging thing was like when we played around and as a play,
we did not engineer anything properly. When we played around with the BGP sessions, it was not clear how long it would take till things distributed to the whole net. And we were literally just trying to get information. Is it working? Is it working not? And till this BGP information propagate from here to the rest of the planet, like Brazil,
it takes quite some time. And so you can't be sure that you're setting up BGP session, everything works because shit will hit the fan after 10, 20, 30 minutes and not instantly. And so it's quite a problem to have instant recognition if things are going well or not.
So the question was if the problems with the Wi-Fi that we had here on site were due
to our BGP playing or was it due to something else, solar flares or so. The thing is that we had some issues. We've been given access to the WLC, the wireless controllers. You see these boxes over there, they're centrally controlled and we have to dig in that.
We have some visibility of the infrastructure that's owned by the ULB. They've given us access to that so we can engineer that. But we're not quite sure why was that. Most of the time FOSDEM, which is an IPv6 only, was working quite good except for
some Apple devices that do tend to just set up an IPv4 address even if there is no proper IPv4 and things get complicated. FOSDEM dual stack, which is dual stack, usually worked for most of the Apple devices.
But we're not very certain. Yeah, you will see that. There's another one. So the question is if the live streams will be made rewindable or not.
Honestly, I can't tell you that. I don't know. I can ask the video guys if they're planning that for next year. But there's no plan of that as far as I know. The biggest challenge was to redo things with HDMI over VGA, which we had the last
year. But there's another one. Yeah. So the question is that we're planning to use service.
Do we know what? And what's planned for next year? We'll have a talk about that next week, I think. And then we go through the postmortem, which is usually a week after FOSDEM. And then we decide on things to be bought for next year because switches are old and routers are always also old, I think.
And with one more year on the route to go, that should be fine for next year. But what after that? We have to make some decisions and some investments for next year to run this stuff. And this will be done next week when we're all a bit cooled down. And refreshed after this FOSDEM.
Anyone else? Yeah. Come. So the question was, what part of the infrastructure are being reused?
And what do we bring for the event? Well, in numbers, I think it was three truckloads of stuff. No, three. Because the video arrived. We bring mainly cameras and those boxes here.
Switches stay at the ULB. Most of them stay here. But the one that didn't stay here, they won't be here next year because ULB is planning to do some tidying up and giving us here some video ports for our VLANs.
They're very, very good at working with us. We get access to most of the infrastructure. We just tell them what you like to use, and they just throw it on their controllers and bridge it to our service, and we can use it and make fun with it.
And they will be replacing part of the network infrastructure next year. We then will have to bring even less gear here. Which one first?
So the question was, what's about all the other stuff that FOSDEM is doing through the year? Do we host it on our own hardware? Is it in the cloud or somewhere? We have another company called Tigron here. It's a Belgian provider. Most of the stuff is running at Tigron during the year.
During FOSDEM, we also spin up some VMs at Hetzner in Germany, and they are only for during the event and short time after the event. So like cutting videos and so on in the cloud. And they will be turned off like two or three weeks, and then everything is running
on Tigron on our own hardware there as well. So there was another question. So the question was, what is being used for the communication between volunteers? We have that matrix set up. I don't know who's aware of matrix.
It's a real-time communication tool like Slack or something like that. We use matrix since 2020 internal for our video team for communicating. And then we expanded that for 2020. And then with the pandemic, we opened it up for all of the people.
And now the volunteers are being coordinated through that. We also have our own drunk terrestrial that we have here, especially for this event set up. And the volunteers also can be reached via those radios.
Am I correct, volunteers? Yes. Okay. We have two volunteers here. Is there anything else you want to know? Where's the money? The question is, where's the money, Lebowski?
That's the real phrase from the film. I don't actually know. I'm not yet a member of FOSDEM stuff. So you have to ask someone in a yellow shirt. There happens to be one next to me. Just throw him the microphone. We have a money box and a bank account.
Anyone else? Three to one. Thank you very much.