Consuming Open Source Configuration - TIB AV-Portal

Consuming Open Source Configuration

00:00

3

Formal Metadata

Title

Consuming Open Source Configuration

Subtitle

Infrastructure and configuration is now code, and some of it is open source. What is it like to be downstream of one of these projects?

Alternative Title

Configuration Management - Consuming Foss Configuration

Title of Series

Number of Parts

150

Author

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/34486 (DOI)

Publisher

Release Date

Language

Production Year

2015

Content Metadata

Subject Area	Computer Science
Genre	Conference/Talk

FOSDEM 20154 / 150

1

37:42

Interfacing Infrastructure as code with non-expert users

2

48:24

Orchestration of Services with Juju

3

34:58

Contributing to Foreman

4

37:01

Consuming Open Source Configuration

5

31:44

Consul first steps

6

18:55

Better Devops through Thievery

7

42:56

CentOS (community) Infra revealed

8

43:51

Can Distros Make the Link? Let's Package the Customizable, Free Software Web of the Future!

9

22:49

Internet of #allthethings

10

28:58

Using the Linux IO framework for SDR

11

47:50

Introduction to Using GNU Radio

12

27:47

To the moon and back. Software Defiened Radio and High Power transmissions

13

24:01

Zero-Overhead Metaprogramming

14

24:57

15

47:24

The State of PHPUnit

16

47:47

Profiling PHP applications

17

43:32

(Re)discovering SPL

18

43:47

19

44:37

Php package design

20

55:22

21

52:31

Dependency Management with Composer: PHP Reinvented

22

45:34

Beyond PHP - It's not (just) about the code

23

50:22

How (not) to create a language specification for Perl 6

24

49:05

25

53:53

Perl 6: Beyond dynamic vs. static

26

45:24

27

42:00

Perl's Syntactic Legacy

28

49:49

29

46:28

Every pixel hurts

30

25:44

31

27:30

Automatic Multicast Tunneling & Upipe: A Proof of Concept

32

25:41

Why open source lets a broadcaster sleep at night

33

26:43

GStreamer in the living room and in outer space

34

34:13

Web Video Players Architecture & Open Source Community

35

25:20

Producing media content for the browsers using GPAC

36

21:48

Vimeo and the open source community

37

26:33

38

32:41

Kodi mediacenter (XBMC) past, present and future

39

25:35

Harnessing FOSS in an End to End Online Video Platform

40

29:36

Peer 5 content delivery network and how it uses WebRTC and FOSS

41

26:26

Daala Video Codec

42

13:30

Improving LibreOffice quality

43

24:52

New features in Gerrit Code Review 2.11

44

16:45

LibreOffice IDE integration

45

20:57

LibreOffice Calc dependency & performance work

46

18:11

MySQL & NoSQL - Best of both worlds

47

25:39

Gdb tips and tricks for MySQL DBAs

48

19:52

Character encoding

49

23:18

Performance Schema for MySQL Troubleshooting

50

18:34

Materialized Views for MySQL

51

19:53

Capacity metrics in daily MySQL checks

52

17:06

Visualizing benchmark data with R and gpplot2

53

24:27

Servo (the parallel web browser) and YOU!

54

25:50

Firefox OS Tricorder

55

24:56

Building an open Internet of Things with Java and Eclipse IoT

56

09:25

Diagnosing Performance Issues Using Thermostat

57

24:45

Menage all the things, small and big, with open source LwM2M implementations

58

26:45

XMPP-IoT an open solution for things

59

25:22

Put an "Actor Model" in your House

60

24:08

What's new inside the Linux IEEE 802.15.2 subsystem?

61

25:20

PicoTCP for Linux Kernel tinification

62

23:31

Patchwork Toolkit

63

22:06

Working with I/O using libmraa on Linux

64

22:31

FOSDEM 2015 - Open discussion

65

20:28

Creating an IoT device with ease

66

25:12

Orchestrating computer systems, a new protocol

67

07:59

Welcome to the IoT Devroom

68

24:57

IoT through Matrix

69

29:32

70

52:32

Supporting Nouveau on the Tegra K1 System-on-chip

71

33:12

The Tamil Driver

72

20:04

Sync points in the Intel gfx driver

73

44:18

Replacing Xorg input-drivers with libinput

74

28:44

How to test OpenGL drivers using Free Software

75

52:47

Atomic Mode-Setting

76

49:02

EFL - A UI Toolkit Designed for the Embedded Worl

77

52:57

Wine Development Updates, Performance and the D3D9 State Tracker

78

47:33

Native D3D9 on Mesa

79

51:55

Video Capture and Colorspaces

80

45:10

Testing Video4Linux Applications and Drivers

81

35:04

Finding Bad Needles in Worldwide Haystacks

82

37:06

The State of Go

83

40:31

Moving MongoDB components to Go

84

27:31

Go and the modern enterprise

85

1:45:11

FOSDEM 2015: Go Lightning Talks

86

46:02

87

44:10

88

39:40

Bleve - Text indexing for Go

89

30:01

90

30:07

91

08:54

Potree - Rendering Large Point Clouds in Web Browsers

92

25:19

Tempus: A framework for multimodal trip planning

93

18:49

Scotty, I need a data in three minutes! (Or we're all dead!)

94

15:19

GRASS Development APIs

95

13:43

GRASS GIS 7: Efficiently processing big geospatial data

96

20:42

Open Standards for Big Geo Data

97

21:26

Mobile Map Technology

98

13:26

Ol3-Cesium: 3D for OpenLayers map

99

13:21

Habitat - A programmable personal geospatial datastore

100

22:08

Distributed tile processing with GeoTrellis and Spark

101

12:13

GeoTrellis and the GeoTiff File Format

102

24:23

Taking Web GIS beyond Google Maps with the Geomajas Client and Spatial Application Server

103

23:47

Douglas-Peucker updated

104

22:20

105

14:23

PicoTCP on Mobile Ad Hoc networks

106

16:42

Bridging the gap between simulation and Gis

107

25:20

QtCreator (QtC) for µC development

108

52:22

Starting with the Yocto Project

109

45:30

Recycle your Android devices for anything: Run real Linux on them

110

51:17

Reached milestones and ongoing development on Replicant

111

22:10

Porting Tizen: Common to open source hardware devices

112

16:57

Upstream Allwinner ARM SoC (A10 / sunix) support status

113

32:34

LAVA for bisecting kernel bugs

114

46:29

How to program your camera!

115

25:18

FreeRTOS - An introduction

116

22:46

Multi-user support in an embedded secured environment

117

47:27

Embedded freedom roundtable

118

44:56

How to record all TV

119

26:23

Internet all the things! Curl everywhere

120

22:56

Xvisor: An open-source, lightweight, embedded hypervisor for your car

121

48:36

The bitbox console

122

22:07

Backporting Linux mainline drivers

123

23:09

SDCC Small Device C Compiler

124

30:49

Qucs - Quite Universal Circuit Simulator

125

29:17

The NGSPICE circuit simulator

126

16:49

127

33:46

128

19:17

High-Level Open/Free FPGA development tools from OHR

129

31:03

GHDL - A libre VHDL simulator

130

19:23

Edacore: Less work for everybody

131

29:53

An introduction to the gEDA project

132

16:20

FOSS TCAD/EDA for Compact/SPICE Modeling

133

18:01

Interactive PCB Routing

134

10:46

Adding VHDL support to Icarus Verilog

135

19:03

3D modelling, CAD, and its relevance to PCB design

136

18:07

Synthesizing gateware with GCC

137

47:22

The Emacs Of Distros

138

31:29

The Storage SIG And GlusterFS

139

29:18

Software Collections for bleeding edge stacks on enterprise

140

41:12

Providing a Long-Term Support dristribution with Gentoo Prefix

141

29:34

Retooling Fedora

142

29:54

Live Atomic updates

143

58:03

How CoreOS is Built, Modified, and Updated

144

34:52

Homebrew - The Good, the Bad and the Ugly of OSX Packagin

145

40:06

Get more people interested in your Distros users group

146

53:39

Are distributions really boring and a solved problem?

147

30:21

CentOS: Community Build Service

148

35:00

Fully Public Puppet

149

25:40

Leapfrogging the bootstrap

150

52:34

The Tumbleweed Factory

Automatic playback

Speech

Text

Image

00:00

Open sourceOpen setConfiguration spaceCodePhysical systemTerm (mathematics)Data managementHill differential equationArchitectureSoftware developerVideo trackingRandom numberVotingJava appletKey (cryptography)Endliche ModelltheorieSoftware testingNumberBitOpen sourceFeedbackRevision controlReflection (mathematics)Hard disk driveTerm (mathematics)Group actionConfiguration managementGoodness of fitContinuous integration3 (number)Patch (Unix)Point cloudPlanningStandard deviationProduct (business)CodeInstance (computer science)PressureMultiplication signElement (mathematics)SummierbarkeitSoftwareSoftware developerInternet service providerCASE <Informatik>Software bugScripting languageImplementationGastropod shellLattice (order)BuildingConfidence intervalRepository (publishing)ResultantInheritance (object-oriented programming)Internet forumPhysical systemRoboticsInverse elementEmailOperator (mathematics)Arithmetic meanCloningProjective planeTheorySpherical capTowerComputer fileMultilaterationThumbnailComputer architectureService (economics)Configuration spaceComputer-assisted translationSelf-organizationCategory of beingDiagramWritingHeegaard splittingDirectory serviceQuicksortDivergenceDemonProcess (computing)Line (geometry)MappingXMLComputer animation

09:18

Software testingSoftware developerCodeJava appletOpen sourceConfiguration spaceService (economics)Stack (abstract data type)Open setStreaming mediaComputer networkPoint cloudOrdinary differential equationComputer configurationInstallation artRepository (publishing)Module (mathematics)Revision controlPole (complex analysis)Physical systemGastropod shellScripting languageModul <Datentyp>State of matterProxy serverModule (mathematics)Lattice (order)Line (geometry)Patch (Unix)Goodness of fitSoftware developerEndliche ModelltheorieBitData centerNetwork topologyPhysical systemVenn diagramRevision controlInformationWebsiteWritingSoftware repositoryInstallation artConfiguration managementCore dumpEmailElectronic mailing list2 (number)Context awarenessPoint (geometry)Different (Kate Ryan album)Level (video gaming)Rule of inferenceTheory of relativityIP addressOpen sourceSoftwareMathematicsConnected spaceSystem administratorPhysicalismImplementationRepository (publishing)CodeService (economics)HypothesisMultiplication signDefault (computer science)Term (mathematics)DiagramTrailNetwork socketGroup actionPoint cloudElement (mathematics)Directory serviceFirewall (computing)DistanceServer (computing)Computer configurationFile systemSoftware testingCovering spaceUniform resource locatorPresentation of a groupMathematical optimizationRight angleMusical ensembleRegular graphConsistencyProjective planeString (computer science)Conjugacy classINTEGRALSlide ruleBit ratePhysical lawLocal ringSelectivity (electronic)Metropolitan area networkSocial classComplex (psychology)Template (C++)Polar coordinate systemView (database)Integrated development environmentValidity (statistics)WordComputer animation

18:31

Module (mathematics)Social classMenu (computing)Repository (publishing)Directory serviceMathematicsSynchronizationConfiguration spaceOpen sourceCodeModul <Datentyp>PlastikkarteDisintegrationSoftware testingMathematicsVariable (mathematics)Gastropod shellModule (mathematics)Scripting languageConfiguration spaceSoftware repositoryEmailCore dumpType theoryCase moddingWritingField (computer science)Multiplication signPatch (Unix)DivisorComplex (psychology)RoutingEndliche ModelltheorieDirected graphLine (geometry)BuildingEquivalence relationSynchronizationProjective planeDemonPoint (geometry)Physical systemDirectory serviceInstallation artPlastikkarteResultantComputer fileWebsiteIntegrated development environmentMereologyFlow separationProcess (computing)Repository (publishing)Software testingInclined planeCurvatureInheritance (object-oriented programming)Revision controlGoodness of fitSocial classLatent heatSoftware frameworkError messageMaxima and minimaSubsetCodeReflexive spaceVideo gameWordMoving averageTupleProduct (business)Proper mapCone penetration testFigurate numberUniform boundedness principleNatural numberPulse (signal processing)Data storage deviceDecision theoryWavePressureTheory of everything2 (number)Hill differential equationRule of inferenceTemplate (C++)MIDISystem administratorSet (mathematics)Drop (liquid)Data structureSelectivity (electronic)Covering spaceComputer animation

27:43

Modul <Datentyp>Module (mathematics)Open setStack (abstract data type)Software testingSynchronizationGame theoryOpen sourceConfiguration spaceCodeComputer-generated imageryLogical constantTwitterProduct (business)Virtual machineModule (mathematics)Software testingOpen sourceSoftware developerCodeDemonMaxima and minimaAbstractionVulnerability (computing)Source codeComputer architectureLevel (video gaming)Web 2.0Server (computing)Cartesian coordinate systemSound effectSubsetClosed setMultiplication signResultantWebsiteBitLine (geometry)Group actionSoftware repositoryMathematicsSoftwarePhysical systemScripting languageCloningCuboidLink (knot theory)Internet service providerSlide ruleInternetworkingInformationSocial classINTEGRALRevision controlExecution unitSet (mathematics)PlanningUniform boundedness principleEndliche ModelltheorieForm (programming)Local ringRight angleNatural languageCoefficient of determinationArithmetic progressionOperator (mathematics)Figurate numberDifferent (Kate Ryan album)Type theoryLaptopPhysical lawCategory of beingReal numberStreaming mediaoutputData storage deviceFilm editingInferenceOffice suiteInstance (computer science)Density of statesSelf-organizationComputer animation

36:56

GoogolComputer animation

Transcript: English(auto-generated)

00:05

Just put a thumbs up or something. I'll try to, oh, if you need a little louder, I'll try to get it going. OK, this talk is called, I can't tell if we're closing the doors or this. Anyways, we can just roll with it. This talk is called Consuming Open Source Infrastructure.

00:25

So who am I? Well, my name is Spencer Crum. I work at HP. My primary open source contributions are to the Puppet ecosystem and to the OpenStack ecosystem. What we do is we run a CI system inside HP.

00:40

And it turns out that it's a clone of an upstream OpenStack CI system. And Liz will be talking about that directly after this talk. So running somebody else's infrastructure as your own, it turns out, is not super trivial. I thought it would be a lot easier than it ended up actually being. And so this talk is going to cover sort of why that was

01:03

and what we've gone to and stuff like that. So the target audience here is really for operations people. There's three main people I want to talk to today. The first are people who are currently pulling down someone else's configuration, consuming someone else's open source configuration. The second is people who are considering doing that, maybe making plans.

01:22

And the third group are people who are producing something in the open source and want to learn how to do that a little bit better as somebody who's actually doing it, I can give you some feedback. So this is very much a what are we doing talk. And it will talk about where, as we canoe across the river, we actually are, which does not

01:40

mean that I'm going to give you 10 steps to do it easily. It means I'm going to give you a few key takeaways. And that's kind of what you're going to get out of this talk. And I'll try to slow the narrative down a little bit when we get to those takeaways so that we can really focus on them. And also, we're going to be fine on time. So feel free to get loud and ask a question immediately

02:02

when you think of it. So we always define our terms when we start up something. So open source. Open source to me means something that has an OSI approved license on it. But it also means a little bit more than that. So when I first learned about open source, I was taught by Bart Massey, who's a professor at PSU.

02:22

And what he said is that you can write a piece of open source software, and then put it on a hard drive, and then put that hard drive in the basement. And it's just as open source as it was before. And so there's this element of being able to share it. And that is kind of required. So I consider it only really an open source infrastructure

02:42

if you're actively doing your best to share that code with people, not just licensing it and hiding it somewhere. And so that means you put it somewhere visible. There's another concept called open development. And so what that means is that it's not really open source unless it's developed in the open. So for instance, if a company, everybody

03:00

goes to the company from 9 to 5, and they have meetings inside the company. They have internal emails in the company. All the code review is internal to the company. And then a cron job runs every Friday that pushes it to GitHub. That's not openly developed, because there's not a forum which anybody can join and communicate. So infrastructure.

03:20

Well, this is kind of the beginning of an infrastructure. It's some kind of a plan. And we can take that plan, that diagram of things connected to each other, and using configuration management tools like Puppet, we can put that into code. And that code can be committed to some kind of version control repository, and then that can be used to share. And now that it's code, it can be OSI licensed,

03:40

and we can start talking about it, infrastructure as code. So consuming, what does consuming mean? Consuming literally means to eat. But it basically means using something you yourself did not write, that your organization did not write, someplace you got started. I'm also going to use words like upstream and downstream. I'm going to say you, me, us, them, we.

04:03

And what I mean by that is that I consider myself a downstream. We at HP, we are a downstream. We consume from upstream. And upstream is OpenStack's continuous integration team. And when I say we, I generally mean downstream. And when I say they, I generally mean upstream.

04:20

But we'll talk a little bit more about that later. So it's not just OpenStack that's doing this, not by a long shot. The Wikimedia Foundation has an open source infrastructure. Oh, my bad. Jenkins, R. Tyler's running around here. He's always looking for people to help

04:41

with the Jenkins infrastructure. And Mozilla's infrastructure is also open source. So there's some takeaways immediately on why this is good. And the simplest thing to say is that we benefit from consuming infrastructure because we don't have to do all the work ourselves. Just right away, the fact that they wrote a 10 line shell

05:01

script and we get to use it means we have to do significantly less than we used to do. It's the classic not invented here. It's the inverse of not invented here. When I was on a robot team at the end of my college career, we were building a quadcopter to fly. And one of our experienced mentors said, you know, you should just start with the Parrot AR drone and then add your features on top of that. We're like, no, we need to write our own. And then 10 months later,

05:21

we finally had something that would fly and we could start working on the real project. So we don't have to do all the work ourselves. This is a good thing. But I think there's a more important lesson. We benefit from consuming the infrastructure because we have confidence that the architecture is viable.

05:41

And so what that means is when you set out a team of architecture, a team of DevOps people to go build something, they're gonna spend some time planning it. They're gonna spend some time mapping it out. They're gonna spend some time building it. And all that time is essentially risk. There's projects that are waiting for them to complete. There's developer time you're spending on it. And if it doesn't come to something that's useful to you,

06:01

that's all lost. But if you know that someone else has used this implementation successfully, that gives you some confidence that you're not gonna end up losing that much, which basically reduces the risk. So the first thing we said was an advantage was that we didn't have to write everything ourself.

06:21

And there's a corollary to that, which is that any departure we take from the reference implementation is immediately technical debt. That's something we have to think about when we consume someone else's infrastructure. So about me. Our team at HP is called the Ghostbusters,

06:43

and our project is called Gozer. And we consume the OpenStack CI infrastructure. And this is kind of where the rubber meets the road. So there's some tools they use. They use Puppet for configuration management. They use Ansible for orchestration. They use Git for version control. And they run all their infrastructure on OpenStack.

07:01

And so they have two OpenStack clouds. They have a Rackspace cloud and HP cloud, and they kind of split their servers between them. So this is kind of what operators do with Git these days.

07:23

Just put the Git on everything. Which is a good thing. Back when I was at PSU at Portland State University, we used to tell other people that this directory is managed by Git by touching a file called all caps, use the Git luke. It's a good thing, it's a good thing. It means everything's a Git,

07:40

which means it's shareable, which is actually really cool. So if you look at the goals for both of these teams, both teams are building CI infrastructure for OpenStack developers. Upstream is building it for OpenStack, or Upstream developers. And downstream in HP, we're building it for HP developers. So what HP has is an OpenStack product. It's called Helion. You should buy it.

08:00

And when we take standard OpenStack, we apply a little bit of HP internal patches, and then we test those HP internal patches with HP internal CI. And it's attempted to be a pretty good mirror reflection of what happens upstream. And so that means you look like this. But if you take a little bit further look,

08:21

our goals are actually somewhat divergent. So the upstream OpenStack team has to run a lot of daemons that we don't have to run. They have to run a bunch of infrastructure for publishing packages to PyPI. They have to run a lot of infrastructure for voting and other technical or governance needs of the OpenStack foundation. They have to run a bug tracker. We don't have to run those things,

08:41

because in many cases, those don't apply. And in terms of bug tracking, HP uses JIRA, so we don't have to pick these things. But it also sucks downstream, because downstream, we started out as a team that would provide CI infrastructure to OpenStack developers with an HP. But they got so excited about the idea that now we have developers all over HP coming out

09:01

of the woodwork saying, run our tests. And so we have .net tests, and we have Docker tests, and we have VMware tests, and we have Java tests. And some of those can map to the models that we learned from upstream. But some of those can't. And every time we roll some custom solution ourselves, we've incurred technical debt.

09:21

So it really, I don't know how to make a Venn diagram with open source software. This is like, this is like 45 minutes of my time. It's just unreasonable. It's pretty bad. So what are some takeaways?

09:41

So for a downstream, what you can do is as you get started with this project, like early in the project, look at what, you remember that diagram of everything connected to everything else, what do you not need to replicate? Resist the urge to replicate all the things, because that's just work you're going to use, and maybe you don't have to do it. For upstreams, keep track of what you're coupling

10:02

to what, and where your assumptions are. And obviously it's really hard to figure out what your assumptions are, but here's some ground rules. So don't imply colocation. So if you have two services that are colocated on the same host, talk over a localhost socket. Don't just go directly to file system access. That way a downstream can split them. Don't assume user ID and group membership.

10:21

So if SSH is one user to another user on a different host, don't assume that you have the group permissions, or group write permissions in some directory somewhere. And this is kind of a good one. Firewall rules and network proximity slash layout. It's really easy to think, well, our network is set up like this. And maybe you can't change that. Maybe that's just the way it is,

10:40

but you can document it, which will help your downstreams figure out what to do. So let's talk about the network. So upstream, the kind of reference implementation of this CI system has two open stack clouds they work with. One is Rackspace, one is HP Cloud. And so every server they spin up has a public IP address, and they firewall it all the way off,

11:01

and then whitelist select hosts. And so what that means is that any host on the network, or any host in the infrastructure that wants to make a connection to any other host can do so as long as it's plus one by the administrators. When we went to do that at HP, we made a mistake of not respecting that network

11:21

topology. And so we built it out with two data centers. Well, one data center in a physical data center in Fort Collins in Colorado, an HP data center. And we put some of our infrastructure on HP Cloud. And that represented a different network topology because connections from the data center to the cloud were fine. That works. But the cloud can't make connections

11:41

into the data center. And so the reference implementation has push, push, push, push, push, push, push. And we had to hack it to be push, push, pull, push, push, pull, which is hard enough to begin with. But we had to somehow trigger pulls. And so it ended up looking kind of like this.

12:07

And there was another issue. So anybody who's ever worked in a corporate data center knows about HTTP underscore proxy as a variable. And so what that meant is that anywhere inside HP's network, if you want to make an outgoing network request, you're allowed to do that as long as you

12:20

go through the HTTP proxy. And what that meant was we had to change a little. We had to make these small little changes to add HTTP proxy validation, to add that to the environment, to add that to the daemons or whatever. And so that meant that we had to go upstream. So let's talk about upstream contribution. So downstreams must contribute upstream. This is probably the central thesis of my presentation

12:42

today. Every single member of the Ghostbusters team contributes upstream. And we don't do it 50% of our time. But there are some days it's 100% of our time. And there are some days it's 10%. And there are some days we're not at all. There's practical and social reasons why you have to do that. The practical reasons are stuff like this. We just need to, this is the puppet templating language,

13:02

by the way. You just have to sneak in there and add a little, if HTTP proxy do something. And if you're a downstream, this is your bread and butter. You're going to submit this to upstream a lot. It's a small little change. It changes behavior slightly, not in a big way. It's off by default. And you could turn it on when that's consumed downstream.

13:20

But there's also social reasons. And the social reasons are important. And you can think of the social reasons as an exercise in social capital. And how do you get social capital? Well, you get social capital through trust. It's essentially just a trust exercise.

13:42

And so these are the things that we get from upstream. We get testing. They test our stuff for us. They write a bunch of code that we consume from them. They review our code. They support us when we have issues. And we get to read their docs. And so when I first started writing the slide,

14:01

I had a long list of things we provide back to them. But then I realized it's the exact same list in terms of what we push upstream. And the reason we have to do all of those things is because when we have an issue with one of them, we need it to be important to the upstream core developers that our issue becomes solved.

14:22

And if you show up in an IRC channel or on a mailing list and say, it's all broken for me, and you're someone who is not known to that community, they will spend 10 seconds trying to help you and then get on with their lives. But if you are a consistent contributor, if you're solving problems for them, they're going to go that extra mile for you. And it becomes a two-way street. And it's a trust relationship.

14:43

And so the takeaways from this is context matters. We got bit by the network. And I think after this presentation, if any of you sat down to try to emulate something, you'd look really hard at the network. But there might be other assumptions that you're not thinking of when you're trying to consume like that. And so one of the things I have it in my mind that

15:01

could be true is geographic location, network latency, and maybe virtualization versus bare metal. But those are just examples. But this is really what comes out. Upstream contribution is not an option. You have to contribute upstream. If you don't contribute upstream, you might as well just stay at home.

15:20

There's really no point. So there's different ways to contribute upstream, different levels of upstream contribution. And you can kind of describe it in two camps. You can say successful upstream contribution and unsuccessful contribution upstream.

15:42

And these are the same rules for contributing to any public open source project, except this one happens to be infrastructure related. And it's simple things. It's being consistent. It's being reliable and frequent in your reviews. It's submitting small patches instead

16:00

of big giant dump patches. It's having a regular cadence. It's showing up to the meetings. And it's doing things that are so fufu, like active listening. And I think we all know what an unsuccessful contribution looks like, right? It's like somebody showing up with a 3,000 line patch saying, hey, I solved all the problems.

16:20

Just merge it, and we're good. And the developers are like, hold on, no. And the developers say, I have all these technical complaints with your issue. And the downstream contributor says, no, man, it works in our end. Just merge it. And then they get hostile. And they say, no, it really works.

16:40

This doesn't work, blah, blah, blah, blah, blah. And that's the kind of person you don't want to be. And if you are that kind of person, you're not going to be successful in this model of consuming and giving back. And in fact, consuming is really the wrong word, because it's a two-way street. OK, so now we can talk a little bit more technical. And I know this is the configuration management room, not the puppet room.

17:01

But we're going to talk a little bit about the puppet situation in OpenStack info right now. So when I first got there about seven or eight months ago, this was the situation. Upstream used Hira for secrets only. They were on puppet 2.7, which is really bad, even for seven months ago. They used a site.pp, so they didn't have an enc.

17:20

They had a monolithic repository, which meant all of their kind of modules, all the code they wrote, was in one big Git repository. And they had a modules dir in there, and then 60 or so modules. They were stuck on Apache module version 0.0.4, which is from like 2012.

17:41

And you bring that up. So Apache module 0.0.4 is very old. And what that does more so than, because you can get anything you want done in the Apache module version 0.0.4. It's not like there's features missing from that. But because there's an API change and you can't upgrade to a modern Apache version,

18:02

that means you can't use any module that depends on a modern Apache version. So that only sends you further down into the system of consuming old crap and writing your own crap. They had a forked VCS repo module. In fact, they still do, which means that there's OpenStack CI dash VCS repo instead of Puppa Labs dash VCS

18:23

repo, which pisses off the module installer like nobody's ends. They had a decent role system. And we'll talk about that in a minute. Actually, we'll talk about that next, which turned out to be really good. And they had a shell script for installing puppet modules. So what that means is that rather than using R10K or something, it's just a shell script and it wraps puppet module install.

18:45

So for the roles, they had a pretty good role system. So there was an entire module called OpenStack underscore project. And that had classes like OpenStack underscore project colon colon Jenkins. And that's where they tried to put most of their OpenStack

19:01

specific stuff. And then they had a module Jenkins they wrote themselves that was more or less free of hardcoded variables. Of course, there's hardcoded variables all over the place. But at least there's a way forward here, right? And the same with Zul, which is a daemon we were at, which Playa will probably tell you about. And so when we were bootstrapping this, when we were first getting it, I should probably

19:23

When we were first getting things ready to go, we did a deep copy of OpenStack project to a new module called HP. And so we then did a find or place. Everywhere it said OpenStack project, we replaced it with HP. So now we had our own HP colon colon Jenkins that could go use the kind of public Jenkins module. And what we actually did is two Git repositories.

19:42

So we pulled down our config repo, which basically just had the HP module in it. And theirs, which had OpenStack project as well as a bunch of other modules. And we put both of those directories in the module path. And so OpenStack colon colon project, or OpenStack underscore project was still available, but it wasn't referenced anywhere in our site.pp.

20:02

So it actually worked pretty well. And that gave us the opportunity to do some really neat stuff. We can change in HP anything we actually need to change without touching upstream. And downstream, we can just run Git pull or Git cherry pick to pull changes from downstream. So that's a really great way of describing

20:20

something that actually went pretty terribly. So we fork every repo we use, which you can argue about that decision or not. But we need to be able to control what's in our environment. So we kind of need to control it. So we kind of need to fork it. And we very rarely sync that kind of public from upstream. We are 1,000 commits behind, like no joke.

20:41

Which isn't actually as bad as it sounds. But the worst part is that we have the several patches ahead. So there's places where we needed to change it and now that we have changes on our own, we can't just pull anymore. It's going to be merge conflicts. It's pretty rough. And even though OpenStack in front of the upstream

21:02

has moved to 3.7, we're still stuck on 2.7 because we can't move fast enough to pull. So what are the takeaways here? Find your intermediary between upstream and downstream. And I'm not saying that you make a module called

21:21

OpenStack underscore project. I'm saying that you find someplace that both you, as downstream, and they, as upstream, can hard code variables without stepping on each other's toes. And I think the traditional way, if you were green fielding this, you would probably do that today with high ref. And there's an interesting story about that,

21:40

which is that the first time that I tried to do this, I suggested high ref. And I wrote some patches and I sent them up and they weren't merged. And you can argue about why that wasn't happening, but I'll tell you why I think it didn't work. And I think it was because I was a relatively new contributor who had not built that trust relationship, back

22:01

and forth trust, codependence relationship, with the upstream cores. And I was proposing this patch that was very complex. It was ripping out a lot of things, changing things that were hard coded into variables, doing high ref lookups. And so if you consider those two factors together, it's a new contributor we don't trust

22:21

and a big bite of complexity. And that's an obvious no. I think today, now that I have a fair amount of social capital with these people, I could say, look guys, I think we need to go this higher route. And I know a lot of them and I know how to address their concerns. I know how to talk to these people. And as a result, I could probably get that merged now.

22:40

And it's just one example of how you have to be serious about contributing upstream, otherwise this model can't work. The other thing is that use small sinks. And so if you go upstream and you drop a 3,000 line patch bomb, you're going to be told no.

23:00

You're going to be sent away. They're going to laugh you out of the building. Similarly, if you wait three months before you sink from upstream, you're going to have the equivalent of a 3,000 line patch that you're trying to dump in your infrastructure. And you're going to be like, no, I don't want to do that either. So the only way forward is to sink often with small little sinks. And what that means is you need some kind of testing framework.

23:21

And one of the errors we made is we didn't set up some minimal acceptance test framework at the beginning. And that has put us very far behind. And that's what I mean by reflexive of upstream, is that if you wouldn't accept it upstream, why would they accept it downstream? Or if they wouldn't accept it upstream, why would you accept it downstream?

23:44

So this is my model for keeping up with upstream. I don't know if you guys are familiar with the story of Sisyphus. If I recall correctly, Sisyphus was a Greek who pissed off a god. And so he was sentenced for life to push a boulder up a hill. And as soon as he pushed it up the hill,

24:01

it would roll down again. And he'd have to push it up. Or he could be a kitten on a sled. So let's talk about what has happened since upstream. And I am pretty serious when I say that the Ghostbusters are a big part of these changes. We've upgraded from Puppet 2.7 to Puppet 3. Also note that I use the word we. I totally consider myself a part

24:20

of the OpenStack upstream team. And I think they consider me a part of the team, too. And that's your goal. We've split the data into a separate Git repository. And interestingly enough, we did not actually use Hira. So there was a lot of hard-coded data. There's some config files that Playa will tell you about later. But there are these giant blobs of data, like YAML, and which jobs to run,

24:41

and how many projects there are, and who's allowed to merge what where. And as kind of a Puppet fanboy, my inclination is to put that all in Hira. But what the team figured out to do is they can put these flat config files that configure that daemon in a Git repository, and just tell Puppet VCS repo

25:00

to just make sure it's over there, and point Zul at it, and point the daemon at the config file, and you're done. You don't actually need the complexity that Hira provides, which turned out to be a really cool result. It also means that there's now, when someone pulls down our configuration and tries to use it downstream, they just have to add their own Git repository

25:20

full of config files. It's pretty slick. We've split the monolithic repo into repos for modules. So like yesterday or something, Jeremy sent out an email that said, we announced that we have 61 new modules. And they're all pretty OpenStack specific. That's totally true. But now that they're all split out individually and it can be published to the Forge,

25:41

we're hoping to get folks like you involved in maintaining and using them. There's machinery to release modules to the Forge using Puppet Blacksmith, more or less the same way we release PyPy modules right now. So if we want to release a new version of, say, OpenStack Nova, what somebody does who's core on Nova is they make a signed Git tag,

26:02

and they push that Git tag into our code review system. And then machinery fires, and it ends up on PyPy. So now you push a signed Git tag of a Puppet module and some machinery fires, and a new version ends up on the Forge, which is a great pipeline. So you said that the Apache 0.0.4 module was pretty good.

26:27

OK, well, somebody in our team is really excited about the Puppet 0.0.4 module. And that's because the Puppet Apache 0.0.4 module is weakly modeling.

26:40

So a modern Apache module, you have defined types for mod rewrite rules, for example. And so you have to not only understand your Apache configuration, but you have to translate that into the Puppet DSL. What the Apache 0.0.4 module lets you do is just say, here's a template. Put that in my vhost pools. And so that means you, as an administrator,

27:02

can write native Apache code, and Puppet will figure it out. Puppet won't have any visibility into the resource it's creating, but it can get the job done. And so some people think that that's feature complete. It's weakly modeled versus strongly modeled. It's just as good. And so what we've actually done is we've done a deep copy and another find and replace. We've created an OpenStack HTTP module,

27:20

which means that anybody who's a stickler and wants to use the old one can use that one, but it can coexist in the same module path with a modern Puppet Labs Apache module, which means that we can pull in a new one and there's an actual upgrade path. It also means that anybody who wants a weakly modeled Apache module can use ours. We have made our installmodules.sh script super smart.

27:44

It now has three different types of modules. It has modules to install from the Forge, modules to install from Git, and modules to install during integration testing. And that ties into the last thing, which is we've built this integration testing system upstream so that when you propose a change to the Jenkins module,

28:00

it tests that with all the other modules and the site.pp and upstream to make sure that everything works. Now, it's not the best testing. It's kind of weak testing, because all it really does is make sure that Puppet compiles, but that's better than we were before. So what's planned upstream,

28:24

so what's planned upstream is we're going to put semver on all the modules, and we're going to probably switch out of using the Forge and go to straight Git clone, because we're tired of the transitive dependency problem in Puppet. And we're going to figure out some kind of a fix for our VCS repo. There's a couple ways we can go on this.

28:41

We can either add our, we can change our module to provide a new provider for the Puppet Labs VCS repo. We can also try to merge the behaviors of ours into the Git provider of the Puppet Labs VCS repo module. We haven't really figured that out yet. Yeah.

29:06

So when you use Puppet module install, you have a transitive dependency problem sometimes. And so transitive dependency problem means, okay, so the question was, why do we want to go to Git instead of using Puppet module install? And the answer is that there's a transitive dependency

29:23

problem where you ask for Puppet Labs Postgres, right? And a dependency of that is the concat module and the NTP module. And so it'll pull down the NTP module, but if you already have an NTP module installed, it will fail to install the NTP module,

29:40

and you'll get a version mismatch, and maybe the Postgres module won't work. The Puppet module tool is also kind of bad at identifying that this problem has occurred and reporting that to the user in a way that's detectable by script. Does that answer your question? So what's planned downstream is we're gonna start

30:02

dogfooding the OpenStack product that HP develops, which means that we're gonna change from a network that does not well reflect the reference architecture to a network that does well reflect the network architecture. That's gonna be huge. We're planning to upgrade to Puppet 3. That's in progress, but we're operators, right?

30:21

So we're fixing the machine while it's driving, and so that's scary. We have minimal kind of weak testing, but effective testing upstream that we wanna bring downstream. And we're gonna, now that all the modules are synced out individually, we're 1,000 commits behind across 61 modules, right? But remember, we're only using a subset

30:41

of OpenStack infras tooling. So we're not actually 1,000 commits behind. We're more like 80, and 80 is doable, especially when you can do the commits that went to Zul module, and the commits that went to Jenkins module. You can do that one at a time. It's also worth noting that OpenStack infra sets up a lot of daemons.

31:01

Have you ever been so mad at Google Hangouts that you set up an asterisk server? These people were, but I love them. So some closing thoughts. I was hanging out with Nigel Kirsten from Puppet Labs the other day, and he was talking about the days of Puppet 0.24.

31:21

And he said at that time, everybody wrote code. And what he meant by that was that everybody who was using the product at that time was reading the source code and changing it to get results that they wanted. And so that means that it's, you gotta treat it like a, so that means there's no,

31:41

there's no users who aren't reading code. Like LibreOffice, for instance, a user without reading any source code can install and run and click in buttons. That's kinda where these open source infrastructures are right now. You have to consider yourself a developer. You're gonna have to go in there and twiddle some bits. There isn't git clone dot slash install.

32:02

It's not there yet. Maybe one day it'll become polished enough that there'll be infrastructures in a box and you can download them and run them. And I think some organizations, like 4J is one of them, are trying to build that. But right now, with OpenStack infra and probably the others that I mentioned at the beginning, you're gonna have to roll up your sleeves

32:21

and get in there and write some code. This is a bunch of links that you can use when you look at these slides on the internet. And that's my contact information. Are there any questions? Okay, well thank you for, oh yeah.

32:43

Yeah, yeah, yeah.

33:03

That is absolutely one of the paths we want our development to take, which is that the modules right now, despite best efforts, have cross dependencies, right? Hidden cross dependencies, the worst kind, right? So what we wanna do over the next several weeks is identify where those are and nip them in the bud.

33:20

And if we need to have a clear dependency, we need to call that out in the documentation and stuff. Yeah? Yeah, so usually there's like unit, so unit versus kind of acceptance, so yeah.

33:46

I've done some work, so Beaker was brought up. So there's some work right now in OpenStack infra that I sort of spearheaded. So that we can, using the OpenStack infra node pool system, spin up a machine and run Beaker tests on that machine. That was actually kind of difficult to do

34:00

and the hack is kind of janky. Because Beaker is an opinionated tool. Beaker wants to be in charge. It wants to identify when the VM spins up and when it spins down. It wants to be in charge of reporting results. And we already have a lot of tooling for that stuff. It's like considerably better. So two daemons that wanna be in charge, or one set of daemons and another run on command tool

34:23

is kind of meant to be done on a laptop, don't really get along that well. But I'd be happy to talk about that after this.

35:41

So one of the things I did, so I wrote a module called PuppetBoard, right? So there was a question about, it was kind of long, but. The idea was that you don't have real abstraction layers in Puppet. You still have to ask for what you want, even if you're at a high level. And so you can't ask for web server.

36:00

You have to basically be specific about Apache versus NGINX, right? So what we did in the PuppetBoard module was I too got tired of forcing dependencies on people. And so there's a class you include to set up the PuppetBoard Whisky application. And then you include either a PuppetBoard colon colon Apache or a PuppetBoard colon colon NGINX.

36:23

So that's provided by the public module and you just pick one. That's better than nothing, I guess. Okay, well thank you for your attention, everybody.

Recommendations