Impact of zstd
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 44 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46120 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
All Systems Go! 201921 / 44
5
10
17
20
21
30
32
36
44
00:00
System programmingComputer animation
00:05
Software engineering
00:16
Data compressionAlgorithmCodeExpert systemFacebookBlogDifferent (Kate Ryan album)Data compressionExpert systemStandard deviationFacebookProjective planeComputer animation
00:46
AlgorithmData compressionInsertion lossSystem programmingBitData compressionQuicksortComputer-assisted translationMedical imagingBackupVideoconferencingMultiplication signStandard deviationComputer animation
01:43
Physical systemSystem programmingData compressionData compressionNetwork topologyStatement (computer science)Database10 (number)LoginProduct (business)Object (grammar)Pointer (computer programming)Standard deviationFile archiverPhysical systemComputer animation
02:40
System programmingPhysical systemProduct (business)ResultantBackupComputer animationLecture/Conference
03:03
Personal digital assistantKernel (computing)Revision controlSystem programmingFile formatData compressionRead-only memoryAlgorithmClient (computing)Projective planeComputer animation
03:34
FacebookSystem programmingComputer clusterOpen sourceData storage deviceLimit (category theory)Data compressionOffice suiteBitPoint cloud
04:06
WebsiteSystem programmingLattice (order)Computer animation
Transcript: English(auto-generated)
00:05
Hi, my name is Wille Dainio, I'm a Software Engineer at ION, this talk was originally supposed to be given by our CEO Oscar Esaroma, but unfortunately he couldn't make it, so you're stuck with me. So let's get to it. C standard. So it's a lossless data compression algorithm developed by Jan Kolett of Facebook, it's
00:23
based on LC, it has a really cool logo, and we think that it's a pretty important piece of technology. And just a quick shout out to Jan Kolett, so he's a data compression expert working on Facebook, he's been authoring many many cool projects, he has a very cool blog, go read it, and he has made a really big difference in the industry, so thank you very much.
00:47
So compression algorithms, what are they and how do we classify them? Well there's the lossy ones, where you lose some data as you compress it, which kind of sort of works out for things like videos, images, audio, as long as you don't
01:04
go too crazy with it, like with that cat picture. But it doesn't really work that well, for example, for compressing backups, there you really want to make sure that every single bit is restored afterwards. So for that we need a lossless compression algorithm. And zlib used to be the standard thing to use for a long time, and then there are
01:23
some more specialized things that are either faster, but then don't compress that well, or then compress really well, but are super slow to use. And Z-standard is pretty much like the best of both worlds, it offers really fast compression, even faster decompression, and it has a really great compression ratio.
01:44
So if we look at a few examples, so on the right-hand side, if you take the git tree of the systemd and compress it, you can see that Z-standard does pretty well. So it did slightly better than gzip, but much closer to lz4 speed.
02:03
And on the left-hand side, if you've got the 100pg walk segments, it might not make too much sense intuitively to compress nulls, but when we run tens of thousands of databases that have very infrequent writes, we end up archiving a lot of log statements that
02:22
contain very little data. So as you can see, gzip is just too slow to use, and lz4, it turns even nulls into pretty big objects, so Z-standard was clearly the winner here. And if we look at some of the data from our production system, so this is taken this
02:44
morning, and it's PostgreSQL base backups side-by-side, or the mean value of those running with snappy, which is what we used before, and then Z-standard, what we're using now, and the results are pretty good.
03:05
And it's been adopted by quite a lot of projects across the years, so other people have also noticed that it's kind of cool. Just a few gotchas, so when you start using it, don't do anything stupid, like
03:21
trying to decompress too much data into a too small buffer, or introduce a new algorithm to legacy clients that don't know how to deal with it. Yeah, anyway, a bit about us. So we're Ivan, we run open source data technologies in public clouds, we're using
03:43
Z-standard for transport and storage compression in various places, we're just setting up an office here in Berlin, we're hiring, obviously. And I have a very limited supply of Ivan-branded socks, so if your feet feel cold, come and talk to me. Yeah, so thank you.
Recommendations
Series of 2 media