Impact of zstd - TIB AV-Portal

Impact of zstd

00:00

7

Related Material

Chaos Computer Club e.V.

Saarenmaa, Oskari Tainio, Ville

Formal Metadata

Title

Impact of zstd

Title of Series

All Systems Go! 2019

Number of Parts

44

Author

Saarenmaa, Oskari

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/46120 (DOI)

Publisher

Chaos Computer Club e.V.

Release Date

Language

Producer

All Systems Go!

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Zstandard (zstd) is a new lossless compression algorithm with a very attractive compression rate and performance. In production environments it comes with some quantifiable benefits but also with some surprising issues.

All Systems Go! 201921 / 44

1

24:23

The state of Thunderbolt on GNU/Linux

2

38:00

Stateful systems on immutable infrastructure

3

20:49

Squeezing Water from Stone - KornShell in 2019

4

23:43

Senpai - Automatic memory sizing for containers

5

24:06

systemd @ Facebook in 2019

6

37:53

Securing Bare Metal Micro Services: Service Mesh

7

35:50

Rootless, Reproducible & Hermetic: Secure Container Build Showdown

8

26:52

Revamping libcontainer's systemd driver

9

32:51

Resource control @ Facebook - 2019

10

44:26

Reinventing Home Directories

11

38:22

Purely Functional Package Management

12

25:35

Privacy-Respecting Linux Desktop Monitoring

13

37:42

PostgreSQL at low level: stay curious!

14

42:54

pidfds: Process file descriptors on Linux

15

21:30

oomd2 and beyond: a year of improvements

16

38:47

OCIv2: Container Images Considered Harmful

17

24:45

News from the coreboot land

18

25:51

Microcontroller Firmware from Scratch

19

33:04

Linux distro should be an upstream contributor too

20

27:02

iwd - State of the union

21

04:12

22

41:01

How Microsoft SQL Server Went Multi-Platform: SQLPAL

23

44:12

GNU poke, an extensible editor for structured binary data

24

26:55

Generating seccomp profiles for containers using podman and eBPF

25

34:47

Effective infrastructure monitoring with Grafana

26

37:01

eBPF support in the GNU Toolchain

27

23:26

Building applications at once for Flatpack, Snapd, Dockers

28

15:32

Development and testing with lrun

29

23:09

Custom cgroup-bpf programs in systemd

30

18:58

Container Live Migration

31

23:54

Coinboot - Cost effective, diskless GPU clusters for blockchain hashing and beyond

32

11:04

Closing of All Systems Go! 2019

33

29:20

Buildroot : Using embedded tools to build container images

34

25:39

Building {Portable Service, Container} Images with Buck

35

40:55

Boot Loader Specification + sd-boot

36

41:58

BMC management with bmc-toolbox

37

39:19

Atomic updates and configuration files in /etc

38

03:24

Alternatives to standard utilities

39

37:43

Yomi - an openSUSE installer based on SaltStack

40

04:00

Using RPMs for systemd development

41

37:12

Trust is good, control is better - A (short) story about Network Policies

42

42:34

2019 - Transactional Updates with Btrfs

43

37:30

Traceloop for systemd and Kubernetes + Inspektor Gadget

44

04:24

Time-limited login sessions

Automatic playback

Speech

Text

Image

00:00

System programmingComputer animation

00:05

Software engineering

00:16

Data compressionAlgorithmCodeExpert systemFacebookBlogDifferent (Kate Ryan album)Data compressionExpert systemStandard deviationFacebookProjective planeComputer animation

00:46

AlgorithmData compressionInsertion lossSystem programmingBitData compressionQuicksortComputer-assisted translationMedical imagingBackupVideoconferencingMultiplication signStandard deviationComputer animation

01:43

Physical systemSystem programmingData compressionData compressionNetwork topologyStatement (computer science)Database10 (number)LoginProduct (business)Object (grammar)Pointer (computer programming)Standard deviationFile archiverPhysical systemComputer animation

02:40

System programmingPhysical systemProduct (business)ResultantBackupComputer animationLecture/Conference

03:03

Personal digital assistantKernel (computing)Revision controlSystem programmingFile formatData compressionRead-only memoryAlgorithmClient (computing)Projective planeComputer animation

03:34

FacebookSystem programmingComputer clusterOpen sourceData storage deviceLimit (category theory)Data compressionOffice suiteBitPoint cloud

04:06

WebsiteSystem programmingLattice (order)Computer animation

Transcript: English(auto-generated)

00:05

Hi, my name is Wille Dainio, I'm a Software Engineer at ION, this talk was originally supposed to be given by our CEO Oscar Esaroma, but unfortunately he couldn't make it, so you're stuck with me. So let's get to it. C standard. So it's a lossless data compression algorithm developed by Jan Kolett of Facebook, it's

00:23

based on LC, it has a really cool logo, and we think that it's a pretty important piece of technology. And just a quick shout out to Jan Kolett, so he's a data compression expert working on Facebook, he's been authoring many many cool projects, he has a very cool blog, go read it, and he has made a really big difference in the industry, so thank you very much.

00:47

So compression algorithms, what are they and how do we classify them? Well there's the lossy ones, where you lose some data as you compress it, which kind of sort of works out for things like videos, images, audio, as long as you don't

01:04

go too crazy with it, like with that cat picture. But it doesn't really work that well, for example, for compressing backups, there you really want to make sure that every single bit is restored afterwards. So for that we need a lossless compression algorithm. And zlib used to be the standard thing to use for a long time, and then there are

01:23

some more specialized things that are either faster, but then don't compress that well, or then compress really well, but are super slow to use. And Z-standard is pretty much like the best of both worlds, it offers really fast compression, even faster decompression, and it has a really great compression ratio.

01:44

So if we look at a few examples, so on the right-hand side, if you take the git tree of the systemd and compress it, you can see that Z-standard does pretty well. So it did slightly better than gzip, but much closer to lz4 speed.

02:03

And on the left-hand side, if you've got the 100pg walk segments, it might not make too much sense intuitively to compress nulls, but when we run tens of thousands of databases that have very infrequent writes, we end up archiving a lot of log statements that

02:22

contain very little data. So as you can see, gzip is just too slow to use, and lz4, it turns even nulls into pretty big objects, so Z-standard was clearly the winner here. And if we look at some of the data from our production system, so this is taken this

02:44

morning, and it's PostgreSQL base backups side-by-side, or the mean value of those running with snappy, which is what we used before, and then Z-standard, what we're using now, and the results are pretty good.

03:05

And it's been adopted by quite a lot of projects across the years, so other people have also noticed that it's kind of cool. Just a few gotchas, so when you start using it, don't do anything stupid, like

03:21

trying to decompress too much data into a too small buffer, or introduce a new algorithm to legacy clients that don't know how to deal with it. Yeah, anyway, a bit about us. So we're Ivan, we run open source data technologies in public clouds, we're using

03:43

Z-standard for transport and storage compression in various places, we're just setting up an office here in Berlin, we're hiring, obviously. And I have a very limited supply of Ivan-branded socks, so if your feet feel cold, come and talk to me. Yeah, so thank you.