How We Gained Observability Into Our CI/CD Pipeline - TIB AV-Portal

How We Gained Observability Into Our CI/CD Pipeline

00:00

0

Zugehöriges Material

Horovits, Dotan

Formale Metadaten

Titel

How We Gained Observability Into Our CI/CD Pipeline

Untertitel

Using best of breed open source to monitor Jenkins

Serientitel

Anzahl der Teile

542

Autor

Horovits, Dotan

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/61539 (DOI)

Herausgeber

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

We all know that observability is a must-have for operating systems in production. But we often neglect our own backyard - our software release process. That was our mistake, which led us to wasting time and energy in handling failures in the CI/CD pipeline, and made our Developer-on-Duty (DoD) shifts tedious. On this talk I’d like to share how we built effective observability into our CI/CD pipeline using intelligent data collection, dashboarding and alerting, to boost our response to failures and improve our quality of life on the way. This talk will give practical guidance on how to improve observability into your CI/CD pipeline. Whether you use Jenkins like we do, or other CI/CD tools, you’ll learn how to augment them and reach higher productivity.

FOSDEM 2023217 / 542

1

10:57

Tribe - a content structuring and collaborative framework

2

21:58

Decentralizing moderation

3

15:16

Challenges in Home Energy Management

4

20:42

EVerest: AC and DC electric vehicle charging with open source software and hardware

5

20:43

Green software engineering

6

30:01

What the energy industry can learn from how open source technology has transformed other traditional industries

7

39:45

Carbon Intensity Aware Scheduling in Kubernetes

8

30:06

Inside the FIM (Fbi IMproved) Scriptable Image Viewer

9

24:53

Using GNU Guix Containers with FHS (Filesystem Hierarchy Standard) Support

10

32:35

An Introduction to Guix Home

11

29:50

Bringing RISC-V to Guix's bootstrap

12

1:28:49

Learn 8-bit machine language with the Toy CPU emulator

13

28:12

Literate Storytelling: Interpreting Syntaxes for Explorers

14

30:18

15

14:39

16

16:20

Breaking away from Big Tech

17

16:33

Build CI/CD pipelines as code, run them anywhere

18

41:03

Building an actor library for Quickwit's indexing pipeline.

19

32:20

Celebrating 25 years of Open Source

20

29:17

CI/CD for Machine learning models

21

14:45

CoffeOSM: improve OpenStreetMap a receipt at a time

22

15:21

Consulting for digital humanists

23

19:44

Continuous Delivery to many Kubernetes Clusters

24

15:20

Continuously Update Everything

25

1:48:15

How regulating software for the European market could impact FOSS

26

15:36

data mountains - turn your data into mountains!

27

27:53

Device driver gardening

28

24:24

Delivering a crossplane-based platform

29

14:35

Keep Your Dependencies In Check

30

35:45

Perspectives from the Open Source Developer

31

30:06

Elasticsearch Internals

32

47:16

The ELISA Project - Enabling Linux in Safety Applications

33

15:28

FireCRaCer: The Best Of Both Worlds

34

22:56

FireCRaCer: The Best Of Both Worlds

35

40:59

Teaching machines to handle bugs and test Firefox more efficiently.

36

29:42

FOSDEM infrastructure review

37

15:36

FOSSbot: An open source and open design educational robot

38

16:09

FPGA-based music synthesis with open-source tools

39

26:52

Fuzion — Intro for Java Developers: Mapping Java's Features to Simpler Mechanisms

40

14:28

gallia: An Extendable Pentesting Framework

41

13:31

A GitLab forge for all teachers and students in France?

42

19:12

Classics Never Get Old: Two Easy Pieces For GraalVM

43

14:53

44

19:39

Write Once, Run Anywhere... Well, What About Heterogeneous Hardware?

45

22:29

2D animations in Haskell using gloss, lens and state

46

04:43

Acknowledgements, *prize draw* and farewell

47

28:36

Open-Source Opportunities with the Haskell Foundation

48

41:50

Hackathon HaskellKatas style

49

28:04

Recipes for reducing cognitive load

50

24:41

Reconciliation Pattern, Control Theory and Cluster API: The Holy Trinity

51

25:08

Optimizing string usage in Go programs

52

24:38

FOSDEM 2023 - Go Lightning talks: Come speak!

53

28:00

Is Go Object-Oriented? A Case of Public Opinion

54

23:23

Headscale: How we are using integration testing to reimplement Tailscale

55

27:58

Five Steps to Make Your Go Code Faster & More Efficient

56

21:38

Go Even Further Without Wires

57

29:29

What's new in Delve / Tracing Go programs with eBPF

58

23:35

Debugging concurrency programs in Go

59

26:55

Building a CI pipeline with Dagger in Go

60

25:21

Our Mad Journey of Building a Vector Database in Go

61

25:10

Building FPGA Bitstreams with Open-Source Tools

62

27:32

WAM: an embedded web runtime history

63

24:34

U-Boot as PSCI provider on ARM64

64

26:40

Troika: Submit, monitor, and interrupt jobs on any HPC system with the same interface

65

17:24

Running MPI applications on Toro unikernel

66

26:50

Simplifying the creation of Slurm client environments

67

24:28

How the Spack package manager tames the stat storm

68

28:26

Keeping the HPC ecosystem working with Spack CI

69

26:45

Reverse engineering a solar roof datalogger

70

09:40

Ups and Downs with Remote Desktop Protocol (RDP) on Wayland, Weston and the Yocto Project

71

26:49

Writing a Telegram Antispam Bot in Python: An introduction to async programming

72

23:23

Building a Semantic Search Application in Python, Using Haystack

73

22:06

Code reloading techniques in Python

74

30:10

Will PyScript replace Django?

75

26:25

Simple, open, music recommendations with Python

76

55:35

A quick introduction to MicroPython

77

24:43

Python Logging Like Your Job Depends on It

78

25:52

Pip install malware

79

24:18

Realtime 3D Graphics on a MicroPython ESP32

80

24:42

DuckDB: Bringing analytical SQL directly to your Python shell.

81

25:47

Talk to DBus from a Python application

82

24:52

Continuous Documentation for Your Code

83

26:51

The PolyVent FLOSS Ventilator

84

28:24

Efficiently exploit HPC resources in scientific analysis and visualization with ParaView

85

08:37

Overengineering an ML pet project to learn about MLOps

86

06:17

Open Source Switching: Upstreaming ONIE NVMEM and switch BSP drivers

87

26:29

88

25:00

MUST: Compiler-aided MPI correctness checking with TypeART

89

11:25

HPC Container Conformance

90

26:02

Matter and Thread as Connectivity Solution for Embedded

91

25:41

LOFAR: FOSS HPC across 2000 kilometers

92

26:19

Convergent camera applications for mobile Linux devices

93

22:59

Link-time Call Graph Analysis to facilitate user-guided program instrumentation

94

18:23

LIBRSB: Universal Sparse BLAS Library

95

10:02

The LDBC benchmark suite

96

25:38

KUKSA.val Vehicle Abstraction

97

28:00

Self-service Kubernetes Platforms with RDMA on OpenStack

98

08:04

Exploring a swedish smarthome hub

99

10:08

A journey to the hardware world

100

27:15

How to deal with validation as an HPC software?

101

20:42

Developing effective testing pipelines for HPC applications

102

10:10

Multiple Double Arithmetic on Graphics Processing Units

103

27:23

Introduction to Watermill: Siple Go Event-Driven application in 20 minutes

104

20:23

Visually programming Go

105

24:13

The State of Go: What's new since Go 1.19

106

23:39

Squeezing a go function

107

31:40

OpenTelemetry with Grafana

108

26:45

Building a Web UI for the Fedora installer

109

50:46

Advanced Camera Support on Allwinner SoCs with Mainline Linux

110

25:06

How to get public administrations to use more FOSS

111

26:24

Accessibility & Open Source: How open source is key to building a more inclusive world.

112

25:34

A11y: EAA, WCAG, WAI, ARIA, WTF? – it’s for the people stupid!

113

24:49

A mirror without reflection for Kotlin/Multiplatform

114

20:43

20 minutes from zero to a live chatbot with Tock

115

23:44

5 errors when building embedded systems

116

24:47

On-prem to Cloud to Edge and beyond: Evolution of women contributors in distros & FOSS

117

25:50

Windows and Office "tax" refund

118

24:24

Walking native stacks in BPF without frame pointers

119

24:29

Is it time to migrate to Vue 3?

120

25:16

Introducing Vegvisir: An automation framework for testing QUIC application logic

121

27:12

Value driven design

122

31:32

Uncover the Missing Link

123

28:55

Practical introduction to OpenTelemetry tracing

124

21:09

Better Kotlin Multiplatform architecture with Dependency Injection and KSP

125

28:39

The O11y toolkit

126

34:22

The State of Kotlin

127

14:57

Practical and fun automation for all your terminal sessions

128

20:20

Take your shot of Vitamin!

129

28:37

Strong Dynamic Type Checking for JavaScript

130

25:55

Having Something To Hide

131

15:25

Finite state machine (and some retrogaming)

132

24:57

Shrinking in the Age of Kotlin

133

24:37

Sharp photos and short movies on a mobile phone

134

21:29

Whom Do You Trust?

135

25:44

Where does that code come from?

136

30:33

Demystifying StackRox

137

11:00

What Does Rugby Have To Do With Sigstore?

138

24:41

Enabling FIDO2/WebAuthn support for remotely managed users

139

21:46

Mercator: Mapping the information system

140

24:56

Post Quantum Cryptography in Voice/Video over IP

141

25:58

Remote Attestation with Keylime

142

26:17

Kerberos PKINIT: what, why, and how (to break it)

143

18:39

Converting HPKE to be PQ

144

21:56

OpenSSL in RHEL: FIPS-140-3 certification

145

19:01

Playing with Nix in adverse HPC environments

146

18:05

147

19:14

Nixel: a nicer way to write your Nix expressions

148

12:46

Make Anyone Use Nix

149

20:10

I am excited about NixOS, I want to tell you why!

150

16:02

devenv.sh - Fast, Declarative, Reproducible, and Composable Developer Environments

151

20:28

The Nix package manager development process

152

05:58

Contracts for free!

153

20:13

A success story of adopting Nix at a workplace

154

14:50

NGI Search and OpenWebSearch.EU projects

155

28:47

prplMesh: open source Wi-Fi mesh

156

26:26

So you want to build a deterministic networking system

157

23:47

Snabbflow: a scalable IPFIX exporter

158

22:40

Need to connect your k8s pods to multiple networks? No problem [with calico/vpp]!

159

15:46

Peer-to-peer Browser Connectivity

160

27:07

ntopng: an actionable event-driven network traffic analysis application

161

25:27

Service MESH without the MESS

162

27:39

MetalLB and FRR: a match made in heaven

163

19:41

Networking management made simple with Nmstate

164

30:14

What is an IDS and Network Security Monitoring in 2023?

165

28:47

Hole punching in the wild

166

32:36

Decentralized Storage with IPFS

167

24:32

DDoS attack detection with open source FastNetMon Community

168

25:05

"CNI Unleashed"

169

31:42

Golden Signals with Cilium and Grafana

170

43:35

Open Source Software at NASA

171

21:10

What I Miss In Java (The Perspectives Of A Kotlin Developer)

172

18:10

Major Migrations Made Easy With OpenRewrite

173

31:17

The Microkernel Landscape in 2023

174

25:51

MicroBlocks: small, fast, human friendly

175

20:10

Making Continuous Delivery Accessible to All

176

15:10

Lua for the lazy C developer

177

30:10

Loupe: Designing Application-driven Compatibility Layers in Custom Operating Systems

178

15:30

Writer Content Controls -- what happened in the past half year

179

10:01

A Rocket Engine for LibreOffice Templates

180

08:10

Cleaning up the unittest code mess

181

10:24

State of the Toolchain

182

09:24

Putting the R in LibreOffice: a Shiny dashboard for QA

183

09:06

Improvements to LibreOffice PDF accessibility

184

05:48

Supporting old proprietary graphic formats

185

09:15

News from the ODF Toolkit

186

11:01

Migrating to LibreOffice Technology - old and new motivations and challenges

187

08:55

Marrying Collabora Online and LibreOffice WASM

188

09:03

LibreOfficeKit – bridge between your application and LibreOffice

189

07:59

An Interoperability Improvement in LibreOffice Impress Tables

190

09:57

Fun project by design – How LibreOffice development can be full of flow?

191

09:00

Footnotes in multi-column sections

192

04:26

Feature Locking and Feature Restriction

193

10:10

Turbocharging an elephant. Making Libreoffice faster.

194

08:42

LibreOffice Dark Modes

195

09:30

Crashtesting LibreOffice in the backyard

196

08:01

Make Collabora Online yours

197

10:04

Collabora Online and WASM

198

09:41

Collabora Online over lock-down

199

47:03

200

09:48

LibreOffice graphics subsystems - SystemSpecificRenderers

201

11:32

Libre-SOC: From architecture and simulation to test silicon, and beyond

202

14:57

Keyoxide: verifying online identity with cryptography

203

16:12

Jubako, a new generic container format

204

11:29

Combining EASY!Appointments with Jitsi for online appointment management

205

23:06

Combining EASY!Appointments with Jitsi for online appointment management

206

15:54

Is YAML the Answer?

207

22:17

208

18:06

Devroom kick-off talk: UKI? DDI?? Oh my!!!

209

23:15

Ubuntu Core: a technical overview

210

24:59

Image-Based Linux and TPMs

211

21:45

openSUSE MicroOS design

212

23:38

Building initrds in a new way

213

15:41

MachineOS: a Trusted, SecureBoot Image-based Container OS

214

16:33

Converging image and package based OS updates

215

15:05

I2P: Major Changes of the Peer-to-Peer Network

216

38:40

Hardware acceleration for Unikernels

217

31:56

How We Gained Observability Into Our CI/CD Pipeline

218

17:56

How To Automate Documentation Workflow For Developers

219

44:39

Introducing Helios Micokernel

220

13:27

Introducing Helios

221

31:01

Hedy: A gradual and multi-lingual programming language for education

222

32:42

Web application architecture in Haskell with flora.pm

223

13:15

A quick overview of the Haskell tooling

224

06:05

The Haskell Security Advisory Database

225

29:32

On the path of better interoperability with Rust!

226

16:10

Breaking the Code of Inclusion: Designing Micro Materials Based on PRIMM Principles for Accessible Programming Education.

227

14:42

Beyond Wikipedia

228

26:03

Speak binary to me

229

26:13

Shorter feedback loops with Livebook

230

04:56

Running Erlang and Elixir on microcontrollers with AtomVM

231

28:56

LiveView keeps you warm!

232

18:07

Introduction to Gleam

233

11:21

Elixir - Old wine in new casks

234

23:11

Distributed music programming with Gleam, BEAM, and the Web Audio API

235

26:34

The Actor Model as a Load Testing Framework

236

15:44

Do more awkward user interviews

237

14:33

AsyncGetStackTrace: The Improved Version Of AsyncGetCallTrace (JEP 435)

238

49:05

Building a Linux-compatible Unikernel

239

28:34

Building Personalized AI Apps with MIT App Inventor

240

23:28

Why And How To Upgrade To Java 17 (And Prepare For 21)

241

15:40

Tableaunoir: an online blackboard for teaching

242

38:43

Sustaining Free and Open Source Software

243

49:24

Open Source in Environmental Sustainability

244

14:08

Should there be a standard in libre localization?

245

17:19

Staging of Artifacts in a Build System

246

43:24

What is Digital Sovereignty and how can OSS help to achieve it?

247

21:49

The Role of Open Source at the EU Technology Roadmap for a European Sovereign Cloud

248

18:52

The role of Open Infrastructure in digital sovereignty

249

1:03:30

The Importance of Collaborative Applications for European Digital Sovereignty

250

55:59

The Co-operative Cloud

251

25:09

Responsible Clouds and the Green Web Triangle

252

20:04

Operate First community cloud

253

22:02

On-premise data centers do not need to be legacy

254

42:11

Is Open Source Coming back to your Cloud?

255

14:27

How we created a Documentation Framework that works across a group of vendors in the sovereign cloud stack community

256

40:24

From Zero to Hero with Solid

257

31:28

Effective management of Kubernetes resources for cluster admins

258

36:35

Distributed Storage in the Cloud

259

49:55

Building Strong Foundations for a More Secure Future

260

31:02

Snap! - Build Your Own Blocks

261

48:48

Similarity Detection in Online Integrity

262

23:36

Afraid Of Java Cold Starts In Serverless?

263

16:07

Self-hosting for non-coders?

264

21:11

265

38:24

Intro to Ceph on Kubernetes using Rook

266

35:51

Lessons learnt managing and scaling 200TB glusterfs cluster @PhonePe

267

39:58

Dynamic load change in SDS systems

268

17:40

Present and future of Ceph integration with OpenStack and k8s

269

24:02

A Rust-Based, modular Unikernel for MicroVMs

270

18:35

Using Rust for your network management tools!

271

08:27

Slint: Are we GUI yet?

272

20:51

Scalable graph algorithms in Rust (and Python)

273

41:41

Rust API Design Learnings

274

08:04

Neovim and rust-analyzer are best friends

275

20:17

Merging process of the rust compiler

276

30:31

Let's write Snake game!

277

35:24

How Pydantic V2 leverages Rust's Superpowers

278

30:52

279

33:46

Building a distributed search engine with tantivy

280

07:56

Presentation of BastionLab, a Rust open-source privacy framework for confidential data science collaboration

281

20:30

Backward and forward compatibility for security features

282

40:31

Aurae: Distributed Runtime

283

20:13

atuin: magical shell history with Rust

284

08:38

A Rusty CHERI - The path to hardware capabilities in Rust

285

39:27

A deep dive inside the Rust frontend for GCC

286

49:51

Rosegarden: A Slumbering Giant

287

25:10

Quarkus 101: Intro To Java Development With Quarkus

288

26:21

Modernizing Legacy Messaging System with Apache Pulsar

289

52:31

Podcasting 2.0: it's all about Interoperability

290

50:46

Running a Hybrid Event with Open Source

291

17:05

Update on #JavaOnRaspberryPi and Pi4J

292

11:55

Announcing pg_statviz

293

15:53

294

51:59

Evolution of OSv: Towards Greater Modularity and Composability

295

12:41

Open Source Good Governance – GGI Framework presentation & deployment

296

15:10

Get Started with Open Source Formal Verification

297

40:54

Making the world a better place through Open Source

298

14:53

299

1:05:53

NOVA Microhypervisor Feature Update

300

18:04

Towards Secure Boot for NixOS

301

04:17

Energy policy by the European Commission

302

22:33

European Eichrecht

303

26:21

Tackling document collaboration challenges in 2023

304

31:40

Conquering tribal knowledge with Grav

305

52:06

Creating a content pipeline with Antora

306

45:50

Open Source Collaboration Tools for Alfresco

307

19:41

A Study of Fine-Grain Compartment Interface Vulnerabilities: What, Why, and What We Should Do About Them

308

23:55

Project Veraison (VERificAtIon of atteStatiON)

309

18:05

Rust based Shim-Firmware for confidential container

310

24:36

Nydus Image Service for Confidential Containers

311

21:39

Scalable Confidential Computing on Kubernetes with Marblerun

312

22:37

Gramine Library OS

313

17:47

Building a secure network of trusted applications on untrusted hosts

314

30:20

THE BASE - FOSS Confidential Container SDK to ease the development

315

20:44

Confidential Containers and the Pitfalls of Runtime Attestation

316

29:06

Cascaded Foci (SFUs)

317

34:03

Building a social app on top of Matrix

318

11:45

tissue—the minimalist git+plain text issue tracker

319

31:24

Introduction to the Synapse Kubernetes Operator

320

25:13

vhost-user-blk: a fast userspace block I/O interface

321

36:46

Operating Ceph from Ceph Dashboard

322

31:39

Reviving Reverse Polish Lisp

323

55:36

Self-conscious Reflexive Interpreters

324

29:58

Introduction to Pre-Scheme

325

18:16

Research at the service of free knowledge: Building open tools to support research on Wikimedia projects

326

15:17

The Software Sustainability Institute Community and Events

327

05:20

Establishing the Research Software Engineering (RSE) Asia Association with the Open Life Science programm

328

28:53

Frictionless Application (IDE for CSV)

329

29:10

Papis: a simple, powerful and extendable command-line bibliography manager

330

30:06

V2GLiberty: The open stack that could

331

27:49

Power profiling with the Firefox Profiler

332

22:14

OpenSTEF: Open Source energy predictions

333

26:49

Update on open-source energy system modeling in the global south and including Africa

334

19:45

4 Years of Energy Management with openHAB

335

25:12

Getting to a fossil free internet by 2030

336

26:58

Combatting Software-Driven Environmental Harm With Free Software

337

32:10

OpenCSD, simple and intuitive computational storage emulation with QEMU and eBPF

338

32:14

Stack walking/unwinding without frame pointers

339

29:40

The state of r2land

340

31:18

GNU poke beyond the CLI (Command Line Interface)

341

31:01

342

24:34

Libabigail, State Of The Onion

343

33:06

Libabigail, State Of The Onion

344

23:49

fq - jq for binary formats

345

26:50

An introduction into AMD/Xilinx libsystemctlm-soc

346

30:25

7 things I learned about old computers, via emulation

347

27:05

Transit network planning for everyone

348

25:36

Using open source software to boost measurement data in railways

349

26:40

Automated short-term train planning in OSRD

350

11:57

OpenStreetMap, one geographic database to rule them all?

351

27:03

OpenTripPlanner

352

17:42

Introducing MOTIS Project

353

25:25

Public Transport Data in KDE Itinerary

354

13:13

355

26:43

Pushing the PSP

356

31:52

Emulator development in Java

357

27:38

Emulator development in Java

358

30:27

Understanding the Bull GAMMA 3 first generation computer through emulation

359

49:25

360

17:50

AMENDMENT Global Open Source Quality Assurance of Emergency Supplies

361

29:11

AMENDMENT Covid Exposure Notification Out in the Open

362

31:12

AMENDMENT Public Money? Public Code! in Europe

363

13:31

Defining a multi-architecture interface for SYCL in LLVM Clang

364

53:59

Reggae: cool way of managing jails/VMs on FreeBSD

365

32:12

Bringing your project closer to users – translating libre with Weblate

366

34:30

Building an atractive way in an old infra for new translators

367

16:51

Translating documentation with cloud tools and scripts

368

13:31

Defining a multi-architecture interface for SYCL in LLVM Clang

369

29:51

20 years with Gettext

370

30:10

Translate All The Things!

371

28:17

Demystifying compiler-rt-sanitizers for multiple architectures

372

30:42

Game of Trees Daemon

373

22:32

Happy 5th anniversary pkg-provides

374

32:33

Open source C/C++ embedded toolchains using LLVM

375

31:06

Case study of creating and maintaining an analysis and instrumentation tool based on LLVM: PARCOACH

376

16:52

BSD Driver Harmony

377

42:07

AMENDMENT The New EU Interoperable Europe Act and the Reuse of Software in Public Administration

378

27:19

How to Build your own MLIR Dialect

379

26:20

FIDO beyond the browser

380

18:31

Elliptic curves in FOSS

381

23:35

How to protect your Kubernetes cluster using Crowdsec

382

22:18

Secure by accident

383

25:05

Graphing tools for scheduler tracing

384

12:51

A complete compliance toolchain for Yocto projects

385

27:36

Understanding and Managing the Dependency in SBOM with the New Feature of SW360

386

26:33

In SBOMs We Trust: How Accurate, Complete, and Actionable Are They?

387

29:49

A standard BOM for Siemens

388

12:08

REUSE Software... or if you want nice a nice SBOM downstream, push REUSE upstream

389

1:22:33

Panel discussion: SBOM content, usefulness, and caveats

390

19:51

Generating SBOM made easy with ORT

391

26:22

Automated SBoM generation with OpenEmbedded and the Yocto Project

392

28:03

The 7 key ingredients of a great SBOM

393

26:08

Hermine: converting SBOMS into legal obligations

394

27:57

Using SPDX for functional safety

395

11:41

FOSSology and SPDX

396

27:48

SBOM contents for embedded system images

397

27:22

Build recorder: a system to capture detailed information

398

15:38

Sailing into the Linux port with Sony Open Devices

399

23:59

A Service as a Software Substitute (SaaSS) is unjust like proprietary software

400

21:28

Automating a rolling binary release for Spack

401

49:59

Is “European open source” a thing?

402

30:10

If it’s public money, make it public code!

403

24:27

Controlling the web with a PS5 controller

404

32:01

Adopting continuous-profiling: Understand how your code utilizes cpu/memory

405

26:10

Practical UX at OpenProject

406

48:50

When it all GOes right

407

48:07

Tour de Data Types: VARCHAR2 or CHAR(255)?

408

50:36

Why Database Teams Need Human Factors Training

409

50:10

How to Give Your Postgres Blog Posts: An Outsize Impact

410

49:57

411

43:10

Deep Dive Into Query Performance

412

43:29

DBA Evolution: the Changing Role of the Database Administrator

413

33:11

Observability in Postgres

414

27:36

The problems you will have when creating a plugins system for your shiny UI project

415

17:17

What's new in the world of phosh?

416

20:16

Penpot official launch!

417

28:42

Best Practices for Operators Monitoring and Observability in Operator SDK

418

15:09

Webmapping and massive statistical data, a democratization story

419

29:55

Interactive network visualizations as "guided close reading" devices for the social sciences

420

15:07

The Turing Way: Changing research culture through open collaboration

421

28:35

Tackling disinformation with OSS

422

30:37

RICardo and GeoPolHist: Exploring trade relations between the geopolitical entities of the world from c. 1800 to 1938

423

28:16

Relativitization: an interstellar social simulation framework and a turn-based strategy game

424

24:57

PIMMI: a command line interface to study image propagation

425

25:55

Preliminary analysis of crowdsourced sound data with FOSS

426

26:23

MuPhyN - MultiPhysical Nexus

427

26:45

Guix, toward practical transparent, verifiable and long-term reproducible research

428

14:09

Executable papers in the Humanities, or how did we land to the Journal of Digital History

429

28:40

CorTexT: An open platform for social sciences and humanities Methodological expertise

430

26:39

Setting up OpenQA testing for GNOME

431

26:15

Open Research Open Panel: Open discussion among the open research tools and technologies community

432

24:14

Upstream Collaboration and Linux Distributions Collaboration - Is that excluded?

433

15:40

Ondev2: Distro-Independent Installer For Linux Mobile

434

25:09

Observability-driven development with OpenTelemetry

435

19:27

Nurturing, Motivating and Recognizing Non-Code Contributions

436

22:54

Visualize the NPM dependencies city ecosystem of your node project in VR

437

1:01:01

Fear the mutants. Love the mutants.

438

27:11

MPTCP in the upstream kernel

439

24:14

Localize your open source project with Pontoon

440

31:17

The Road to Intl.MessageFormat

441

22:38

Firefox Profiler beyond the web

442

25:38

Understanding the energy use of Firefox

443

26:45

The Digital Services Act 101

444

24:08

Cache The World

445

17:10

Mobian: to stable... and beyond!

446

25:17

meta netdevices

447

17:45

Mainline Linux on recent Qualcomm SoCs: Fairphone 4

448

24:57

Lomiri Mobile Linux in Desktop mode

449

30:22

Loki: Cloud Native Logging

450

24:31

Linux Kernel Functional Testing

451

25:20

Linux Distributions’ State of Gaming

452

09:11

Lightning Talks: NetXMS | Parca | OpenSearch

453

22:52

Open Source Initiative: Proposed Changes to License Review Process

454

50:44

Panel: Hot Topics - Organizers of the Legal & Policy DevRoom discuss the issues of the day

455

51:33

Learning From the Big Failures To Improve FOSS Advocacy and Adoption

456

23:28

Exploring the power of OpenTelemetry on Kubernetes

457

13:43

KubeOS: Container OS based on OpenEuler

458

19:12

KRuMP - Kotlin-Rust-Multiplatform?!

459

24:07

Why we ditched JavaScript for Kotlin/JS

460

23:22

Kotlin Multiplatform: From “Hello World” to the Real World

461

12:56

Kotlin Multiplatform for Android & iOS library developers

462

16:55

Hacking the Linux Kernel to get moar FPS

463

29:33

KDLP: Kernel Development Learning Pipeline

464

19:40

How we build and maintain Kairos

465

24:06

jxr in /engine/ - coding in WebXR on a plane

466

26:56

Improving the Kotlin Developer Experience in Koin 3.2

467

22:42

Javascript for Privacy-Protecting Peer-to-Peer Applications

468

27:19

Exploring Database Containers

469

29:30

7 years of cgroup v2: the future of Linux resource control

470

29:54

Bottlerocket OS - a container-optimized Linux

471

25:13

composefs: An opportunistically sharing verified image filesystem

472

31:54

Just A Community Minute

473

16:50

CentOS Stream: RHEL development in public

474

30:10

Centering DEI Within Your Open Source Project

475

30:48

Building Open Source Teams

476

22:25

Building a UX Research toolkit

477

08:55

Bluetooth state in PipeWire and WirePlumber

478

24:32

Developing Bluetooth Mesh networks with Rust

479

25:16

eBPF loader deep dive

480

21:49

Monitor your databases with Open Source tools like PMM

481

17:41

Optimizing BPF hashmap and friends

482

23:55

Be pushy! Let's join for wider and better Kotlin support worldwide

483

23:20

barebox, the bootloader for Linux kernel developers

484

22:13

Reckoning with new app store changes: Is now our chance?

485

25:36

A practical approach to build an open and evolvable Digital Experience Platform (DXP)

486

22:04

Zig and Guile for fast code and a REPL

487

38:24

Zero Knowledge Cryptography and Anonymous Engineering

488

50:13

Tools for linking Wikidata and OpenStreetMap

489

31:02

Matrix Widgets in the "Sovereign Workplace" for the German public sector

490

35:19

Whippet: A new production embeddable garbage collector

491

16:37

Quantitative Analysis of Open Source WebRTC Developer Trends

492

23:54

Exploring WebAssembly with Forth (and vice versa)

493

17:49

W3C WebRTC Meetup Update

494

14:51

Using SPDK with the Xen hypervisor

495

25:45

OKD Virtualization: what’s new, what’s next

496

25:16

A journey through supporting VMs with dedicated CPUs on Kubernetes

497

19:36

Fuzzing Device Models in Rust: Common Pitfalls

498

26:15

blkhash - fast disk image checksums

499

31:21

Trustworthy Platform Module

500

28:49

Trixnity: One Matrix SDK for (almost) everything written in Kotlin

501

50:53

Clear skies, no clouds in sight. Running a 14 person company on only free software.

502

24:53

Semihosting U-Boot

503

18:51

Secure payments over VoIP calls in the cloud

504

26:56

Online schema change at scale in TiDB

505

12:15

Scaling Open Source Realtime Messaging System for Millions

506

40:54

Self-Hosting (Almost) All The Way Down

507

35:16

QtRVSim—Education from Assembly to Pipeline, Cache Performance, and C Level Programming

508

35:43

Bringing up the OpenHW Group RISC-V tool chains

509

37:20

Porting RISC-V to GNU Guix

510

28:24

How to add an GCC builtin to the RISC-V compiler

511

48:21

Reimplementing the Coreutils in a modern language (Rust)

512

39:11

Building an Plant Monitoring App with InfluxDB, Python, and Flask with Edge to cloud replication

513

50:53

Passwordless Linux - where are we?

514

32:38

Open Source Firmware status on AMD platforms 2023 - 4th edition

515

45:36

DNF5: the new era in RPM software management

516

24:55

Deep Dive Into Query Performance

517

22:15

Data-in-use Encryption with MariaDB

518

22:09

Transparent, asynchronous, efficient communication

519

26:40

Migrating from proprietary to Open-Source knowledge management tools

520

15:26

The Relentless March of Markdown

521

24:30

Optimizing your core application for integration

522

25:41

Nextcloud Numbers and Hubs

523

25:05

Deploy an enterprise search server with Fess

524

23:05

Collaborating with Collabora Online

525

50:45

The End of Free Software

526

07:03

Build your own Real Time Billing using CGRateS

527

29:51

Open Source Confidential Computing with RISC-V

528

22:42

Keeping safety-critical programs alive when Linux isn’t able to

529

18:44

We need a Let’s Encrypt movement for Confidential Computing

530

19:51

LSKV: Democratising Confidential Computing from the Core

531

25:05

Autonomous Confidential Kubernetes

532

29:03

Introduction to Secure Execution for s390x

533

27:37

Tilting a Pyramid: Confidentiality in a Cloud Native Environment

534

50:30

Open Source Business Guidebook

535

13:13

Bridging ActivityPub with Kazarma

536

31:29

Overview of Secure Boot state in the ARM-based SoCs 2nd edition

537

22:23

Hardware-backed attestation in TLS

538

47:10

Maker Tools in the Browser

539

29:15

The under-equipped social scientist ?

540

27:26

Over a decade of anti-tracking work at Mozilla

541

26:04

Building External Evangelists

542

16:59

Hardening Linux System with File Access Policy Daemon

Automatisches Abspielen

Sprache

Text

Bild

00:00

Luenberger-BeobachterWechselsprungFließgleichgewichtGrenzschichtablösungComputerspielPhysikalisches SystemSoftwaretestApp <Programm>SoftwareentwicklerVerzweigendes ProgrammComputeranimation

02:19

ExistenzaussageUmwandlungsenthalpieWort <Informatik>Quick-SortProgramm/QuellcodeComputeranimation

02:57

Offene MengePunktwolkeLuenberger-BeobachterOffene MengeHauptidealDatenverwaltungOpen SourceSystemplattformLoginProdukt <Mathematik>NeuroinformatikSoftwareentwickler

04:03

MathematikFrequenzMetrisches SystemMathematikMultiplikationsoperatorBitrateDreiecksfreier GraphVerschlingungQR-CodeProdukt <Mathematik>BitCASE <Informatik>FrequenzTouchscreenMinimum

05:22

Luenberger-BeobachterExpertensystemMultiplikationsoperatorCASE <Informatik>InformationProjektive EbeneGüte der AnpassungVerzweigendes ProgrammGraphFilter <Stochastik>Zentrische StreckungComputeranimation

06:28

SichtenkonzeptSichtenkonzeptMusterspracheIdentifizierbarkeitVerkehrsinformationTwitter <Softwareplattform>SoftwaretestProjektive EbeneProgrammfehlerVerzweigendes Programm

07:17

Explosion <Stochastik>Elastische DeformationLuenberger-BeobachterOffene MengeKeller <Informatik>Open SourceVerkehrsinformationPunktTermVisualisierungInformationsspeicherungComputeranimation

07:59

Virtuelle MaschineSchedulingTypentheorieZahlenbereichProgrammierumgebungVerzweigendes ProgrammInformationTouchscreenVirtuelle MaschineGebäude <Mathematik>VariableComputeranimation

08:35

Virtuelle MaschineVirtuelle MaschineServerInformationsspeicherungGamecontrollerInformationOffene MengeInformationsüberlastungZahlenbereichDefaultExpertensystemElastische DeformationComputeranimation

09:59

Verzweigendes ProgrammVirtuelle MaschineVisualisierungGebäude <Mathematik>Offene MengeComputeranimation

10:47

Explosion <Stochastik>GraphBildschirmfensterUmwandlungsenthalpieMultiplikationsoperatorSichtenkonzeptTabelleComputeranimation

11:11

Explosion <Stochastik>VisualisierungCASE <Informatik>Virtuelle MaschineMultiplikationsoperatorInstantiierungCodeProgrammfehlerComputeranimation

12:02

Explosion <Stochastik>MultiplikationsoperatorVisualisierungVideo GenieVerschlingungKonditionszahlProgrammierumgebungDatenfeldAbfrageSoftwareentwicklerPhasenumwandlungVerkehrsinformationLuenberger-BeobachterSummierbarkeitElastische DeformationPhysikalisches SystemOffene MengeComputeranimation

13:58

SoftwaretestInformationBitratePhysikalisches SystemMittelwertZählenMultiplikationsoperatorOffene MengeProzess <Informatik>Ordnung <Mathematik>Elastische DeformationVerzweigendes ProgrammComputeranimation

14:56

GammafunktionTrennschärfe <Statistik>Elastische DeformationPunktVerkehrsinformationOffene MengeMusterspracheDifferenteVisualisierungSoftwaretestTabelleMultigraphExtreme programmingSichtenkonzeptStatistikComputeranimation

15:39

Explosion <Stochastik>VerkehrsinformationProgrammierumgebungVariableComputeranimation

16:17

VisualisierungMultiplikationsoperatorUmwandlungsenthalpieOrdnung <Mathematik>AblaufverfolgungWort <Informatik>Prozess <Informatik>Computeranimation

17:04

Kette <Mathematik>CASE <Informatik>Kontextbezogenes SystemFehlermeldungFront-End <Software>ComputerarchitekturRechter WinkelSichtenkonzeptProgrammierumgebungMultiplikationsoperatorPhysikalisches SystemSystemaufrufFächer <Mathematik>Ablaufverfolgung

18:15

Offene MengeProjektive EbeneMetrisches SystemSystemplattformPunktwolkeLuenberger-BeobachterVerschlingungProzess <Informatik>App <Programm>LoginNeuroinformatikQuick-SortAblaufverfolgungOpen SourceMultiplikationsoperatorInformationsspeicherungQR-CodeTypentheoriePunktStandardabweichungWort <Informatik>Computeranimation

20:08

Elektronischer ProgrammführerKonfigurationsraumInformationVerschlingungElektronischer ProgrammführerDateiformatOpen SourceFront-End <Software>Produkt <Mathematik>Offene MengeAuswahlaxiomTermComputeranimation

21:05

Explosion <Stochastik>TopologieGraphVisualisierungEinsGeradeGlobale OptimierungSichtenkonzeptCASE <Informatik>Parallele SchnittstelleStatistikRechter WinkelMultiplikationsoperatorDatenstrukturFolge <Mathematik>AblaufverfolgungMereologieComputeranimation

22:12

Element <Gruppentheorie>AdditionKontextbezogenes SystemGebäude <Mathematik>AblaufverfolgungBlackboxComputeranimation

22:56

ProgrammierumgebungPhysikalisches SystemMetrisches SystemServerDatenflussCodeComputeranimation

23:20

ServerBenutzerbeteiligungDateiformatKonfigurationsraumPlug inAuswahlaxiomEin-AusgabeMetrisches SystemOpen SourceProdukt <Mathematik>Computeranimation

24:07

Metrisches SystemClientRechter WinkelDateiformatInformationsspeicherungATMPhasenumwandlungDefaultPlug inFunktion <Mathematik>Front-End <Software>Metrisches System

25:04

SCSISynchronisierungGebäude <Mathematik>LastPhysikalisches SystemZahlenbereichVerzweigendes ProgrammHalbleiterspeicherMini-DiscVirtuelle MaschineTypentheorieBefehlsprozessorComputeranimation

25:43

Chi-Quadrat-VerteilungAppletWarteschlangeZählenSpeicherverwaltungThreadSpeicherbereinigungMetrisches SystemImplementierungProzess <Informatik>MultiplikationsoperatorVisualisierungTypentheorieComputeranimation

26:17

Metrisches SystemLuenberger-BeobachterKette <Mathematik>Produkt <Mathematik>Elastische DeformationOffene MengeComputeranimation

26:45

SichtenkonzeptSoftwaretestVerzweigendes ProgrammDifferenteVerkehrsinformationMultiplikationsoperatorServerGamecontrollerGrenzschichtablösung

27:15

Explosion <Stochastik>Metrisches SystemDreiecksfreier GraphOpen SourceInformationMultiplikationsoperatorOffene MengeMathematikSoftwareentwicklerComputeranimation

27:54

MAPElektronischer ProgrammführerLuenberger-BeobachterMusterspracheMetrisches SystemCodeProzess <Informatik>MultiplikationsoperatorTermMulti-Tier-ArchitekturSoftwareentwicklerMereologieInstantiierungGrenzschichtablösungVirtuelle MaschineDatenstrukturTypentheorieFrequenzZahlenbereichZeitreihenanalyseProgrammierumgebungMetropolitan area networkRauschenAggregatzustandBitSystemaufrufSummengleichungComputeranimation

31:52

Flussdiagramm

Transkript: Englisch(automatisch erzeugt)

00:08

So, I hope it will be fun enough for you to wake up at the end of the day and Very excited to be here at FOSDEM and specifically the CI CD Devroom. And today I'd like to share with you about

00:21

How we gained observability into our CI CD pipeline And how you can do too So let's start with a day in the life of a DoD, a developer on duty, at least in my company And it goes like that So the first thing that DoD does in the morning, at least it used to be before we did this exercise

00:43

Is going into the Jenkins. We worked with Jenkins But the takeaways by the way will be very applicable to any other system you work with so Nothing too specific here Getting into Jenkins at the beginning of the morning. We're looking at the

01:01

The status there, the pipelines for the last few hours over the night and of course checking if anything is red And most importantly if there's a red master And if you can obviously finish your coffee or jump straight into the investigation And to be honest, sometimes people actually forgot to go into the Jenkins and check this. So that's another topic we'll maybe touch upon

01:29

So you go in and then you need to go let's say you see a failure or see something red you need to start going one by one on the different runs and Start figuring out understanding what failed, where it failed, why it failed and so on and

01:46

It's important that you actually you needed to go one by one on the different runs And we have several runs. We have the backend. We have the app We have smoke tests several of these and start getting the picture getting the pattern across and understanding

02:01

across runs, across branches What's going on? And on top of all of that It was very difficult to compare with historical behavior with the past behavior to understand what's an anomaly What's the steady state for these days and so on? So and just to give you a few examples of questions that we found it

02:23

difficult or time-consuming to answer things such as Did all runs fail on the same step? Did all runs fail for the same reason? Is that on a specific branch? Is that on a specific machine? If something's taking longer, is that normal? Is that anomalous? What's the benchmark?

02:46

So these sorts of questions it took us too long to answer and We realized we need to improve. A word about myself, my name is Dotan Horvitz I'm the principal developer advocate at a company called logs.io

03:04

logs.io provides a cloud native observability platform that's built on popular open source tools such as you probably know Prometheus OpenSearch, OpenTelemetry Jaeger and others. I come from a background as a developer

03:22

Solutions architect even a product manager and most importantly I'm an advocate of open source and communities I Run a podcast called open observability talks about open source DevOps Observability, so if you're interested in these topics and you like podcast to check it out. I also run

03:43

organize Co-organize several communities the local chapter of the CNCF the cloud native computing foundation in Tel Aviv Kubernetes community days DevOps days Etc and you can find me everywhere at Horvitz. So if you have something interesting you tweet feel free to tag me

04:02

so before I get into how we improved our CICD pipeline or capabilities, let's first understand what We want to improve on and actually I see very often that people jump into solving before really understanding the metric the KPI that they want to improve and

04:24

Very basically, there are four primary metrics for Let's say DevOps performance and you can see there on the on the screen There's the deployment frequency lead time for changes Change failure rate and NPTR mean time to recovery. I

04:44

Don't have time to go over all of these but very important So if you if you're new to this and if you want to read a bit more about that I left a QR code and a short link for you at the bottom for a 101 on the Dora metrics Do check it out. I think it's priceless

05:02

and In our case we needed to improve on the lead time for changes or sometimes called cycle time Which is the amount of time it takes a commit To get into production which in our case was the time was too too long too high and was holding us back

05:21

so We are experts at the observability in you know in my engineering team That's what we do for a living. So it was very clear to us that what we're missing in our case is observability into our CICD pipeline and To be fair with Jenkins and there are lots of things to complain about Jenkins

05:42

But there is some some capabilities within Jenkins. You can go into a specific pipeline run You can see the different steps. You can see how how much time an individual step took Using some plugins. You can also visualize the graph and we even wire Jenkins to get alerts on slack

06:02

But that wasn't good enough for us and the reason that we wanted to find a way to monitor Aggregated and filter the information according to our own time scale according to our own filters Obviously to see things across branches across runs

06:20

To compare with historical data with their own filtering So that's where we aimed at and we launched this internal project with these requirements for requirements one first and foremost as We need the dashboard we need dashboard with aggregated views to be able to see the aggregated

06:40

data across pipelines across runs across branches as we talked about Secondly, we want to do have access to historical data to be able to compare to understand trends to identify patterns anomalies and so on Thirdly we wanted reports and alerts to be able to automate as much as possible and

07:04

lastly we wanted some Ability to view flaky tests test performance and to be able to understand their impact on the pipeline so that was the project requirements and How we did that Essentially takes four steps

07:23

Collect store Visualize and report And I'll show you exactly how it's done and what each step entails in terms of the tech stack We were very versed with the elk stack elastic search cabana Then we also switched over to open search and open search dashboards after elastic

07:44

Relicenced and it was no longer open source. So that was our natural point to start our Observability journey and I'll show you how we did these four steps with this tech stack So the first step is collect and for that we instrumented the pipeline

08:02

To collect all the relevant information and put it in environment variables Which information you can see some examples here on the screen the branch the kamicha The machine IP the run type with its schedule triggered by merge to master or something else

08:21

failed step step duration build number anything essentially that you find useful for investigation later my recommendation collected and Persisted so that's the collect phase And after collect comes store and for that we created a new summary step at the end of the pipeline one

08:43

Where we ran a command to collect all that Information that we did in the first step and created the JSON and Persisted it to elastic search as I mentioned then open move to open search And it's important to say again for the fairness of Jenkins and for the Jenkins experts here Jenkins does have some built-in

09:06

Persistency capabilities and we try them out But it wasn't good enough for us and the reason is that by default Jenkins Essentially keeps all the bills and stores them on the Jenkins machine

09:20

which burdens these machines of course and Then you start needing to limit the number of bills and the day the duration how many days and so on and so forth. So That wasn't good enough for us. We needed to A more powerful access to historical data. We wanted to persist historical data in our own

09:40

In our own control the duration the retention and most importantly off of the Jenkins servers so as not to Risk and overload the critical path So that's about store and after store once we have all the data in elastic search or open search Now it's very easy to build command dashboards or open search dashboards and visualizations on top of that

10:07

And Then comes the question. Sorry Then comes the question. Okay, so which visualizations should I build and For that and that's a tip take it with you go back to the pains go back to the questions that you found it

10:22

hard to answer and This would be the starting point So if you remember before we mentioned things such as did all runs fail on the same step Did all runs fail for the same reason? How many fail these at the specific branch that the specific machine and so on These are the questions that we guide you then to choose the right

10:42

Visualizations for your dashboard and I'll give you some some examples here so Let's start with the top-line view you want to understand the health of your house table your pipeline is so Visualize the success and failure rates. You can do that overall in general or at a specific time window

11:02

On a graph very easy to see the first glance. What's the health status of your of your pipeline you want to find problematic steps then visualize failures segmented by Pipeline steps again very easy to see the spiking step there

11:22

You want to detect problematic build machines visualize failures segmented by machine and that by the way, I Saved us a lot of wasted time going and checking bugs in the release code When we saw such a thing, we just go you kill the machine you let the auto scaler spin up a new instance

11:42

And you start clean and in many cases it solves the problem. So lots of time saved this in general this aspect of code Based or environmental based the issues is definitely a challenge. I'm assuming not just for me So I'll get back to that soon

12:02

Another example duration per step again very easy to see where And the time is spent So that's the visualize part and after visualize comes the reporting and alerting phase And if you remember before the DoD the developer on duty needed to go manually and check

12:22

Jenkins and then the health check now the DoD gets Start of day report directly to slack and actually as you can see the report already contains the link to the dashboard and even a snapshot of the dashboard embedded within the

12:41

Slack so that at the first glance even without going into the dashboard You can see if if you can finish your coffee or if there's something alerting that you need to click that Link and go start investigating And of course, it doesn't have to be a scheduled report It could be also you can define triggered alerts On any of that the the fields the the data that we collected in the first phase and the collect phase

13:05

So and you can do any complex queries that or conditions that you want you want to do something like if The sum of failures goes above X or the average duration goes above Y trigger an alert So essentially anything that you can formalize as a Lucene query

13:22

You can automate as an alert and that's some alerting layer that we built on top of elastic search and open search for that One last note, I'm giving the examples from slack because that's what we use in our environment but you're not limited obviously to slack you have

13:40

Support for many notification aid and endpoints depending on your systems pager duty Victor all stops genie MS teams, whatever We personally work with slack so that the examples are with with slack So that's how we build Observability into the Jenkins pipelines, but as we all know, especially here in the CI CD def room Jenkins

14:03

CI CD is much more than just Jenkins. So what else? So we wanted to analyze if you remember the original requirements to analyze flaky tests and test performance and following the same process collecting all the

14:22

Relevant information from test run and storing it in elastic search and open search and then Creating a cabana dashboard or open search dashboards and as you can see very All the relevant usual suspects that you'd expect the test duration fail tests flaky test

14:41

failure count and rate moving averages fail tests by branch over time all of the things that you would need in order to Analyze and understand the impact of your test and the flaky tests in your in your system And similarly after visualize you can also report we created

15:01

Reports to slack we have a dedicated selection for that following the same pattern One important point is about the openness So once you have the data in open search or elastic search It's very easy for different teams to create different visualizations on top of that same data

15:21

so I took another extreme a different team that Didn't like the graphs and preferred the the table views and the counters To visualize again very similarly test Test stats and so on and that's the beauty of it

15:40

So just to summarize we instrumented Jenkins pipeline to collect Relevant data and put it in environment variables then at the end of the pipeline We created a JSON with all this data and persisted it to elastic such open search Then we created cabana dashboards on top of that data And lastly we created reports and alerts on that data. So four steps collect store

16:07

visualize and report So that was our first step in the journey, but we didn't stop there The next step was We asked ourselves. What can we do? in order to investigate performance of

16:23

specific pipeline runs so you have a run that takes a lot of time you want to optimize but where is the problem and That's actually what distributed tracing is ideal for How many people know what distributed tracing is with a show of hands?

16:40

Okay, I see that most of us there are a few that know so maybe I'll say a word about that soon Very importantly Jenkins know that Jenkins has the capability to emit Trace data spans just like it does for logs. So it's already built in so we decided to visualize jobs and pipeline executions as distributed tracing that was the next

17:04

step and For those who? Don't know this is what the tracing essentially helps pinpoint where issues occur in where Latency is in production environments in distributed systems It's not specific for CI CD

17:21

if you think about the microservice architecture and a request coming in and flowing through a chain of interacting microservices then When something goes wrong you get an error on that request You want to know where the error is within this chain or if it's there's a latency you want to know where the latency is That's this of the tracing in a nutshell and the way it works

17:41

Is that each step in this call chain or in our case each step in the pipeline? creates and emits a span you can think about the span as a structured log that also contains the trace ID the Start time the duration and some other context and then there is a back end that collects all these fans

18:01

Reconstruct the trace and then visualizes it typically in this Timeline view or Gantt chart that said that you can see on the On the right hand side so now that we understand that this with the tracing. Let's see how we add this is with the tracing type of

18:21

performance pipeline performance into a CI CD pipeline and same process first step collect and For the collects app we decided to use open telemetry collector Who who doesn't know about open telemetry who doesn't know the project that's why they have a background, okay

18:43

I have a few a few so I'll say a word about that And anyway, I added the link you see a QR code in the link at the lower corner there For a beginner's guide to open telemetry that I wrote I gave a talk about open telemetry at kubecon Europe, so You'll find useful but very briefly

19:04

It's an observability platform for collecting logs metrics and traces, so it's not specific only to traces in an open unified standard manner It's it's an open source project under the CNCF the the cloud native computing foundation

19:22

And at the time It's a fairly young project, but at the time the tracing piece of open telemetry was already GA generally available So we decided to go with that today by the way also metrics is soon to be GA. It's already in release candidate And logging is still not there

19:41

So what do you need to do if you choose the open telemetry you need to set up the open telemetry collector It's sort of an agent for each to send you need to install the Jenkins open telemetry Plug-in very easy to do that on the UI and then you need to configure the Jenkins open telemetry plug-in To send the open to the open telemetry collector and point over

20:02

OTL P over gRPC protocol, that's the collect phase and after collect comes store For the back end we used Jaeger Jaeger is a also a very popular open source under the CNCF for specifically for distributed tracing

20:21

And we use Jaeger to monitor our own production environment, so that was our natural choice also for this we also have a Jaeger based service, so we just use that but anything that I show here actually you can use with any Jaeger distro whichever one used managed or self-serve

20:42

So and if you do run your own by the way I added the link on how to deploy Jaeger on Kubernetes in production, so you have a link there as Short link that I added very useful guide So what you need to do you need to configure open telemetry collector to send the To export in in open telemetry collector terms to export to Jaeger in in in the right format

21:05

All the aggregated information and once you have that Then you can visualize the visualize part is much easier this case because you have a Jaeger UI with predefined Dashboard you don't need to start composing Visuals essentially what you can see here on the on the

21:22

On the left hand side you can see this indented tree structure and then on the right the gun chart each line here is a span and It's very easy to see the pipeline sequence It's the text is very is a bit small But you can see for each step of the pipeline you can see the duration How much it took you see which ones ran in parallel and which ones ran sequentially if you have a very long latency

21:48

On the overall you can see where most of the time is being spent where the critical path where you best optimize and so on and by the way, Jaeger also offers other views like

22:00

Recently added the flame graph and you have Trace statistics and graph view and so on so but this is what people are used to so I'm showing the the timeline view So that's on Jaeger and of course as we said before CICD is more than just Jenkins so what we can do beyond just Jenkins

22:21

And what you can do is actually to instrument additional pieces like Maven Ansible and other elements to get final granularity into your Traces and steps so for example here the things that you see in yellow is Maven build steps So what before used to be one black box span in the trace suddenly?

22:42

Now you can click open and see the different build steps each one with its own duration It's one each one with its own context and so on so That's in a nutshell how we added tracing to our CICD pipeline The next step is as we as I mentioned before many of the pipelines actually failed not because of the released code

23:03

But because of the CICD environment So we decided to monitor metrics from the Jenkins servers and the environment It goes to the system the containers the JVM essentially anything that could break irrespective of the released code and following the same flow so the first step collect we

23:24

Used a telegraph we use that in production so use that here as well. That's an open source by influx data And essentially you need two steps you need to first enable a Configure sorry Jenkins to expose metrics in Prometheus format we work a lot with Prometheus for metrics, so that was our natural choice

23:48

and That's a simple configuration the Jenkins web UI And then you need to install telegraph if you don't already have that and then make sure that it Configure it to scrape the metrics off of the Jenkins server in a using the Prometheus

24:03

Input plug-in okay, so that's the first step the second step is On the store side as I mentioned we use Prometheus for metrics, so we use that as well here We even have our own managed Prometheus, so we use that but anything that I show here is

24:20

identical whether you use Prometheus or any Prometheus compatible backend And Essentially you need to configure telegraph to send the metrics to Prometheus And you have two ways to do that you can do that in pull mode or in push mode So pull mode is the default for for Prometheus essentially when you configure a telegraph to expose a slash

24:42

metrics endpoint and then It can be Exposed for Prometheus to scrape it from if you want to do that you use the Prometheus client output plug-in Or if you want to do it in push mode Then you use the HTTP output plug-in just a important note make sure that you set the data format to

25:00

Prometheus remote, right So that's the store phase and then once you have all the data in Prometheus Then it's very easy to create cabana dais or yeah, Grafana dashboards on top of that and I gave some examples here you can filter of course by build type by branch machine ID build number and so on and

25:20

You can monitor in this example. This is a system monitoring so CPU memory disk usage load and so on you can monitor the Docker container like the CPU IO inbound outbound disk usage obviously the running stopped paused containers by Jenkins machine everything that you'd expect and

25:44

JVM metrics being a Java implementation thread count heap memory garbage collection duration Things like that. You can even of course monitor the Jenkins nodes queues executors themselves So again, you have an example dashboard here You can see the queue size that was break down the Jenkins jobs the count executed over time

26:06

Break down by job status and so on So this is the types just to obviously lots of other visualizations you can create and you can also create alerts I won't show that in the lack of Time so just to summarize what we've seen

26:23

Treat your CI CD the same as you treat your production for your production use whatever elastic search open search Grafana to monitor to create observability do the same with your CI CD pipeline and Preferably leverage the same stack the same

26:41

Tool chain for the for that and don't reinvent the wheel That was our journey As I mentioned we wanted dashboards and aggregated views for several to see Several pipeline across pipelines across different run branches over time and so on We wanted historical data and controlled persistence Off of the Jenkins servers to determine the duration the retention of that data

27:05

We wanted reports and alerts to automate as much as possible And lastly we want to test performance flaky tests and so on that you saw how we achieve that Four steps if there's one thing to take out of that talk take this one collect store

27:21

visualize and report alert and And What we gained just to summarize Significant improvement in our lead time for changes in our cycle time if you remember the Dora metrics at the beginning On the way we also got an improved developer on duty experience

27:42

much less of a suffer there It's based on open source very important. We're here on their host them so based on open search open telemetry Jaeger Prometheus telegraph You saw the stack if you want more information You have here a cure code for a guide to see I see the observability that I wrote so you're welcome to take a shoulder

28:03

Bitly short link and read more about this, but this was very much in a nutshell Thank you very much for listening. I'm doton horvitz and enjoy the rest of the conference I don't know if we have time for questions

28:20

No, so I'm here if you want questions, or if you want a sticker and made the open source be with you Thank you We have time for questions if there are any Questions you want we can just see for a few minutes. Is that a question?

28:46

Thanks, so have you considered like Persistence how long do you store your metrics in your traces? Have you wondered about that like for how long at a time you store your metrics? So we have that's that was part of the original challenge when we use the Jenkins persistence because when you persist it on the nodes

29:04

Themselves and obviously you're very limited. There's the plug-in that you can configure per days or per number of bills and so on When you do it off of that the critical path you have much more room to maneuver And then it depends on the amount of data you collect we started small so we collected that we for longer periods

29:23

The more it came with the appetite grew and people wanted more and more types of metrics and time series data So we needed to be a bit more conservative But it's very much dependent on the on the you know your practices in terms of the data And the other question was more about like the process so it iterative you you explained it

29:44

Iterative is the best because it really depends you need to learn the patterns of your data consumption the telemetry and then you can optimize to The balance between having the observability and not overloading in over prior over care cost right thank you very very interesting Thank you. There was another question in the back. Yeah

30:00

Thank you, so what was the most surprising insight that you've learned good or bad, and how did you react I? Think I was most surprised personally about the amount of failures that occur because of the environment and what kinds of things and How simple it is to just kill the machine kill the instance Let the autoscaler spin it back up and you save yourself a lot of hassle and a lot of waking people up at night

30:25

So that was astonishing how many things are irrespective of the code and just environmental And we we took a lot of learnings out there to make their environment more robust to get people to clean after them to Automate the cleanups and things like that. That's that's for me was insightful Thank you

30:41

Any other questions Then I have one last one. Sorry. No noise My question is who usually the who are usually the people looking at the dashboard because I mean I maintain a man a lot Of dashboard in the past and sometime I had a feeling that I was the only one looking at those that work So I'm just wondering if you identify a type of people really benefit from those dashboard

31:00

So it's a very interesting or a question because we also learned and we changed the the org structure Several times so it moves between dev and DevOps. We now have a release engineering team So they are the main stakeholders to look at that. But this dashboard is the gold as I said the developer on duty so everyone that is now on call needs to see that that's for sure and

31:25

There's the tier 2 tier 3 So let's say the the chain for for that you also use that as a high level also by the team leads And in the developer side of things so these are the main stakeholders and depending on if it's the critical part of the developer on duty and the tiers or if it's

31:41

The overall thing the health state in general by the release engineer Thank you very much everyone