We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Making Komplett big by going small

00:00

Formal Metadata

Title
Making Komplett big by going small
Subtitle
Making every mistake count
Title of Series
Number of Parts
96
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How we're scaling the architecture, ecommerce platform and our business in one of Europe’s largest e-commerce providers & changing with the business as we go. Growth problems are good, it means your business is doing well. How can we build an organisation and architecture that gives us room to grow and change, while still keeping our customers happy? At Komplett Group we are going through growth as an organisation in size and scope. In this talk we will dive into how the development team at Komplett are breaking up a 10 year old platform. By making every mistake in the book we’re building a platform for our future! We’d like to share our experiences in scaling the architecture and development team. We are breaking a monolith into micro-services to match change within our organization. At the same time we have scaled our web development team from 5 to 50 , and we're not stopping here!
Software developerSelf-organizationArchitectureProcess (computing)Local GroupWeb 2.0Data storage deviceSoftwareGroup actionMathematicsTelecommunicationComputer architectureRight angleWeb-DesignerSelf-organizationKey (cryptography)Arithmetic meanPoint (geometry)Branch (computer science)RepetitionProcess (computing)FeedbackRevision controlUMLComputer animation
Web 2.0Order (biology)Scaling (geometry)1 (number)Graphical user interfaceDatabaseRight angleMultiplication signQueue (abstract data type)Physical systemDirection (geometry)System callProcedural programmingWeb browserPoint (geometry)Dependent and independent variablesEmailMessage passingMathematicsState of matterError messageUniversal product codePlastikkarteXSLTTransformation (genetics)BitString (computer science)Document Type DefinitionData storage deviceOffice suiteParameter (computer programming)Line (geometry)Library catalogComputer architectureControl flowGreatest elementComa BerenicesFerry CorstenProduct (business)Web crawlerFile formatGame controllerDisk read-and-write headComputer animation
PlastikkarteVelocityExecution unitGastropod shellPoint (geometry)Right angleSoftware developerService (economics)Software bugComputer animation
Multiplication signMobile appWebsiteMobile WebTransformation (genetics)Complex (psychology)Document Type DefinitionComputer animation
Mobile WebComputer programmingWeb pageArchitectureSolid geometryEvent horizonTime domainMultiplication signPlastikkarteEntire functionCodeData storage deviceRevision controlProcess (computing)Projective planeWeb 2.0Dependent and independent variablesGraphical user interfaceSoftware bugSubsetVideo gameWebsiteFunctional (mathematics)BuildingEvent horizonSource codeMultilaterationComputer filePhysical systemAdaptive behaviorSelf-organizationFigurate numberImplementationGoodness of fitAlgebraische K-TheorieParity (mathematics)Right angleSet (mathematics)Percolation theorySoftware testingCausalityRewritingControl flowMereologyComputer architectureBoom (sailing)Metropolitan area networkPoint (geometry)Computer programmingComputer animation
RewritingMultiplication signWebsiteSoftware developerWeb 2.0Projective planeWeb-DesignerAuthorizationSelf-organizationGraphical user interfaceSystem call40 (number)Confidence intervalComputer animation
Domain nameCodeComa BerenicesOpen setField (computer science)NumberInternet service providerSoftware developerCodeGraphical user interfaceScaling (geometry)Domain nameSelf-organizationProcess (computing)Arithmetic meanNormal (geometry)Cellular automatonRight anglePoint (geometry)Group actionMetropolitan area networkSystem callMultiplication signComputer animation
Data modelSoftware developerSelf-organizationMultiplication signMereologyLocal ringEndliche ModelltheorieOperator (mathematics)Product (business)Domain nameProjective planeGame controllerDebuggerSoftware testingLatent heatRepresentation (politics)Core dumpDifferent (Kate Ryan album)LogicOffice suiteRight angleQuicksortGroup actionNetwork topologyComputer animation
Graphical user interfaceBit rateMultiplication signRewritingProjective planeEndliche ModelltheorieWeb browserWebsiteInternetworkingScaling (geometry)Computer animation
Machine visionService (economics)Software developerFacebookVirtual machineService (economics)Task (computing)GoogolDatabaseControl flowProduct (business)Template (C++)Complex (psychology)ArmIntegrated development environmentMathematicsDesign by contractServer (computing)Self-organizationComputer architecturePhysical systemClient (computing)2 (number)MereologySoftware testing1 (number)Multiplication signAxiom of choiceWebsiteQuicksortOffice suiteInformation technology consultingLevel (video gaming)System callArchaeological field surveyLocal ringEndliche ModelltheorieTraffic reportingMachine visionProcess (computing)Right angleBranch (computer science)SoftwareDomain nameHeegaard splittingKeyboard shortcutCASE <Informatik>Monster groupSurfaceCodeSoftware repositorySoftware bugPlastikkarteComputer animation
Service (economics)Data modelService (economics)ArmSoftware developerDomain nameCodeOffice suiteSet (mathematics)CircleSelf-organizationTelecommunicationData structureWindowMetropolitan area networkMereologyComplete metric spaceDifferent (Kate Ryan album)YouTubeWater vaporComputer programmingMultiplicationGroup actionTouchscreenLogicExecution unitDependent and independent variablesComputer architectureExpert systemReverse engineeringData managementPhysical lawGraph coloringEndliche ModelltheorieProcess (computing)Partition (number theory)Key (cryptography)Thread (computing)Representation (politics)DataflowMultiplication signWhiteboardUniform resource locatorRight angleSquare numberRoutingSoftware testingAreaReal numberComplex (psychology)Cross-multiplicationComa BerenicesWordBasis <Mathematik>Video gamePhysical systemFamilyComputer animation
Self-organizationService (economics)Cache (computing)Software testingService (economics)Different (Kate Ryan album)Self-organizationProjective planeLibrary (computing)Standard deviationFinite-state machineServer (computing)Characteristic polynomialMultiplication signScaling (geometry)Mechanism designLevel (video gaming)Software testingPhysical systemClient (computing)Point (geometry)State observerComputer animation
Service (economics)Scaling (geometry)Service (economics)Characteristic polynomialDifferent (Kate Ryan album)outputMagnetic-core memorySoftware developerFlow separationDefault (computer science)Autonomic computingNumberHeegaard splittingComplete metric spaceMobile appStack (abstract data type)Android (robot)Basis <Mathematik>View (database)DivisorDomain nameScaling (geometry)Field (computer science)Right angleContext awarenessExecution unitArmDatabaseMatching (graph theory)Operator (mathematics)Computer animation
FeedbackSoftware developerService (economics)Library catalogComputer clusterSoftware bugRevision controlStreaming mediaWeb pageLevel (video gaming)WebsiteSound effectRight angleOrder (biology)Formal languageProcess (computing)Different (Kate Ryan album)Data storage device10 (number)Instance (computer science)Flow separationMultiplication signC sharpF sharpTwitterBlogRing (mathematics)Java appletYouTubeVertex (graph theory)WeightLecture/ConferenceJSONComputer animation
Service (economics)Data modelSoftware maintenanceDomain nameService (economics)Different (Kate Ryan album)Representation (politics)Forcing (mathematics)Task (computing)Message passingInformationNumberWeb pageEvent horizonComputer architectureContent (media)CASE <Informatik>Normal (geometry)Type theoryDependent and independent variablesInteractive televisionAreaData structureBitMultiplication signWater vaporData storage devicePhysical systemScaling (geometry)Cartesian coordinate systemInstance (computer science)Software developerComputer clusterComputer animation
Green's function
Transcript: English(auto-generated)
So, hi everyone! Welcome! Thanks for being here at the very last talk, and our talk. Are we ready to start? Yeah, I think so. I just want to start, Thomas.
Okay, so thanks for coming to our talk, titled, making complex big by going small. Who are we? So, I'm Pavneet. I'm a web developer at Komplett. I'm also a squad lead. We'll get into that a little later.
And my name is Thomas. I'm a lead software architect on the web team in Komplett. And what are we going to talk about? Feedback? We are going to talk about how we've scaled our team and our group as we've changed.
We've gone from a small group to a rather large one. And we've had to make changes both to our architecture and our organization in this process. And that's what we're going to talk about today. So it's kind of an experience talk.
Right. So Komplett group. For those of you who aren't familiar with us, we are a group of web stores. We have 16 web shops active now. We have around 7.3 billion crowns revenue. Just under a billion dollars. 1.8 million active customers. 800 employees. And around 20 million uniques.
We suspect that means unique sessions in 2015. So these are some of our stores. May not be that easy to see.
But we're traditionally an e-commerce electronics store. But you may be able to see that we've now branched off into travel, insurance, car parts, baby equipment.
Amongst other things. And just last week we started the mobile phone company as well. So things are changing. But we've not always been as big as we were yesterday or today. We started in the mid 90s and then we had a very simple architecture.
So we had an online catalog and as a customer you contacted the sales department. So you actually phoned them. I know it sounds quaint but that's how it was done. You phoned them and they called out into the office and said everybody get off the system because I have a customer.
It was a one user system where we placed orders directly into the system when we had the customer on the line. That didn't scale very well. So we went on and created what would become our future web platform, Chrome.
We called it Chrome in the early 2000s. So we were before the browser. If anybody mentions Chrome, it's us. It's our name, okay? At least for this talk. So the Chrome to be, it wasn't called Chrome at that point.
But it was one of the first e-commerce stores in Europe at least that actually showed you an online stock status. So you could go to it and see if we had the product in stock. The customer connected to it directly and the orders were placed in the order database.
Still a quite simple architecture. Then we got a bit, I don't know if, we got a bit over ourselves and we bought an ERP system, a German three letter acronym.
I'll shout at you. So now we had to, from this point on it's actually called Chrome. So now we had to have a system that kept up a better off time than this German system. Because apparently such systems go down. So now we had to have a message queue talking to the system.
And Chrome placed orders through messages and the ERP system got them from there and populated our database directly. So that was our main architecture up until, well now really.
Pretty recently. So we're really still here. But we have done some changes and we are working to change it. But what Chrome did at this time was, and this will blow your minds. We connect to the database, get an XML back and then we read that in and we do an XSLT transformation on that database.
Producing HTML in DTD format. So this is actual running production code from May 2008. Not today. It's much better today, right?
No, it's not. So this does three things that I want to point out. First we have a static method returning a string. This string will go directly onto the response. No controllers, no nothing. It's very efficient.
And what it does is that it goes directly to the database. Yep. It goes directly to the database, populating parameters or arguments? I don't know. I think the ones on the left are parameters and the ones on the right are arguments. Well, as you can see it shouts at the database because the database talks SAP.
And we populate the stored procedure call directly. We actually make an XML header down there. And what we get back, if you go to the next one, is then run through an XSL transformation and that's put directly onto the response.
This is how you create a million dollar industry, people. So this is high tech. Okay? So if you were wondering, we weren't really very happy with this.
It's nasty and it's error prone and it's global state all over the place. And every time we did something, we ended up breaking something else somewhere on our site. Usually gift cards. They always break.
So we'd add a button somewhere and gift card stops working and we remove a button and then gift card stops working. It's horrible, horrible. So what do we do? Well, like Thomas mentioned, we are at a bad place here.
What does that mean, a bad place? This platform has served us for many, many years. But the team, if you look at the team, we were five developers at this point. We were sitting in each of our corners in our room. Yes, we had a pentagonal room, yeah.
Doing each of our own things based on business needs. Which sounds fair enough, right? The thing is, we prioritized things based on the person that shouted the loudest. And whoever shouted the loudest got their thing through.
So as we did things, based on what you just saw, we started to grind to a halt in regards to our speed and our velocity. So we stopped delivering features, or slowed down delivering features, and started delivering more bug fixes.
Usually on gift cards. And when we didn't fix gift cards, we fixed something else and broke gift cards. So if you ever get a gift card, you can hope that it works. Sometimes it does. It's been fine, hasn't it?
No. This is fine, right? So basically, we were struggling. We were at a point where we needed to do something. We weren't really feeling inspired. And along came a business need. Yeah, so this was when mobile phones started to get popular.
I think you may remember that time. The heady days of the first iPhone. So we needed to be able to show a mobile site that worked on a mobile phone. So either we could produce an app, or we could produce a website that actually worked on mobile phones.
But we needed a mobile presence. It was a growing market, and business came to us with this need. And we said, sure, we can do that. We just need to rewrite everything.
Because if you've ever done XSLT transformations, taking XML and making DTD for HTML out of it, it's not really conducive to mobile phones rendering. It's nasty.
It's challenging. Challenging, yes. So what did we do? Well, then let's talk about Project K2. So the K's were complete. 2 is for 2. Yes, we're good at naming. There's more on that later.
So Project K2 is our response to the business need. And the business said, we need a mobile presence. We translated that to be, we need a mobile first web store. So at this time, responsive web, or adaptive web as we like to call it,
well, we lost that battle a long time ago. Responsive web was in its, I wouldn't say infancy, but it was starting up. We didn't have cross-browser support, and it was a little unsure. Another approach at this time was building standalone websites, which is what we went for.
And that's also what we proposed. It was a good thing because that meant we didn't have to touch the old code. So we had a really good thing. Which meant we could do a file new project. And we thought that, okay, this is great. Now we can finally focus on the good stuff.
Do it right. Do it right, yeah. So, buzzwords. Anyone recognize these? We did all of them. Yes. So that's it. We went for code quality. We went for the solar principles. We were going to be test-driven. We pulled in CQRS, DDD, event sourcing.
And then there's the technical side. Then obviously on the process side, we introduced Agile and Scrum. We did pair programming, and with all this came new admin pages, a new infrastructure, a new architecture, as you see, and obviously a new deployment pipeline.
It will be easy. Yeah. And what this meant was we're taking our old monolith, turning it around, and creating a new one. And it was actually pretty good. We learned a lot. We learned to work as a team. We learned to write better code.
We learned a lot about Agile processes. And we delivered the M-Complet NO, the first mobile version of our store, within around six months. So we remade it. We did it. And we made it work. It was good.
But what did it mean? Six months. But it didn't have gift cards. So we couldn't break anything. That's why it worked. All five of us had gone over to be working 100% on this platform, which meant six months without any new features on the existing desktop platform.
Or bug figures. Two gift cards. And the business owners had to wait, right? So they were fine. Let's wait for this. We delivered this, and then they were eager. Cool. Now we can focus on fixing gift cards.
Couldn't leave gift cards. It's a nightmare. And we thought, okay. We got the feature requests in. We started looking at what was involved. And we realized we didn't want to go back to this XLT. We don't want to go back there. So we thought, we've more or less created a new web store.
Let's do a little more. Let's create a desktop version of this on the new platform. It should be easy. It should be easy. And what happened is the business never asked for this. But we got a initial accept. We said it'll be fast.
We'll see what it did in six months. We'll use it in another six months, and we'll be ready. And we ended up building out. And what we have to do is we need to gather requirements, because we didn't know what was happening in the old system. And the thing is, neither did the business.
They didn't really know all the quirks. I mean, when you have a system running for 10 years, there's so many small quirks, right? So the spec is the old system. Make it like that one. And that's what we did. And that's where we ended up, what they told us. So how should this work? We can do this. We can do something nice and shiny.
We can not implement gift cards. No, that was an option. But they said, look at the old system. Do what that does. And this led us to a feature parity that much. Yeah, so we spent, I think, one and a half years trying to re-implement the entire Chrome platform in K2.
Just building functionality we already had. And remember, while we were doing this, the old desktop that actually was earning us money, the one that was important, it had now not been touched in two years.
And the mobile platform that some very, very few users actually used, because they don't like to go to M sites for some reason, and it had very, very few features. That also wasn't being worked on, because we were working on the new shiny stuff
that hadn't been deployed and hadn't been activated in any way yet. So after two years of this, the organization came to us and they asked, hey guys, we kind of need to do stuff. Are you finished soon?
We said, we don't know. We didn't even know which features we were missing, because nobody knew. So they did the only sane thing. They killed it. Yeah. So that's like two or three years of your life down the drain.
So if you learn one thing from this, rewrites are bad. I'm not saying that they can't work, but it's really, really hard. And if you can find a different way of doing it,
please consider it. So this was a very dark time for us in the web team at Komplete. We had five developers when the project was killed and we lost half-ish of those. So one of the chronologists' e-commerce sites in Norway
suddenly didn't have a lot of web developers. We'd lost our self-confidence. We'd lost our respect in the organization. So the organization then embarked on a year-long search
for some platform, right? We still want to be a web e-commerce thing. There must be some platform that we can buy. And there are. There are platforms out there. Most of them have three-letter acronyms. So we launched a rather large project
to figure out which one is best for us. We got proposals in where we played them against each other and we saw who had the features we needed, who had the support we needed.
And after a year of this, I think it was three finalists and the web team was asked to just submit a proposal of our own for our Chrome platform just to have a baseline because we knew that nothing could be worse than that.
To have something above that. And then on, I think it was a Wednesday, we actually got the call that we won. We have it on great authority that we are the best in the world at this. Do you remember what time it was?
I don't remember the exact time. So what does that mean? What that meant was they still trusted in us. Or they put their trust in us, rather. But what that also meant is there were new business domains that they wanted to go into.
They wanted to do new things. Well, actually not they, we. And we still had a great heap of code. We hadn't touched this in actually three years now. And we still didn't have a mobile platform.
So we're three years behind the rest of the field. And we're asking ourselves, can we do this? I mean, can we do this? We were actually really unsure. Three years earlier we just said we couldn't do it on Chrome. And now we had to.
So how could the same people get trust and deliver? That's what we have to ask ourselves. So they trust us. Meaning the business and the organization trust us. But do we trust us? Can we trust in ourselves? Can we get to a place where we can make this work?
So there's a saying that culture eats process for breakfast. So that's where we started. We started with culture. We started with values. We started with defining who we are as developers and what it meant to be a developer in Scandinavia
or Northern Europe's largest e-commerce provider. So we did this. We spent time. This was one of our workshops. And we end up with 10, 11, I can't count, a number of values.
These values are things that we believed in, things that we embodied and things we felt that we already had, some of them, and other things were things we aspired to be. This is for ourselves.
And this is also for new developers coming on because this is the point, right? We can't scale with us five. We needed to increase and we needed to expand. So? Yeah.
So then we did the next thing. We just added lots of developers, right? So business came to us with needs. So the entire organization had needs. Our customers have needs. And we weren't able to, we didn't have the manpower, the muscle to do it. So we needed more people.
And we hired a lot of more people. The good thing was that we'd already thought about culture so we didn't kind of just fall apart on that. But just adding people or developers has its own problems.
So at this time we were like five developers and we had a QA guy and we had two ops and that was the web team. So pretty small, right? And we needed to scale. So we looked around and if you've ever searched about scaling agile,
there's a lot of talk about that. But there's a kind of local firm that seems to do this pretty well. So we looked to them. They're not Norwegian, they're Swedish. Spotify, you might have heard of them. So around this time they published about their Spotify model.
Have you heard of it? Raise of hands. And we can go quickly through this. Yeah. So this is a part of the Spotify model which we've cut out. Just to go quickly through this. It's basically built around autonomous squads
who deliver to a specific business need with their dedicated product owner. They're autonomous, meaning they have control over their build pipeline. They have control over everything deployment, all the testing. And direct access to whoever is the representative on the business side.
These squads are then grouped together in tribes. And these tribes usually represent a logical business area. So crossing business domain. So across squads we then have something called chapters.
Chapters basically meaning it could be interests, right? Front end interests. It could be DevOps, it could be something domain specific. And then we have multiple tribes. So this is how Spotify did it. One of the core features or one of the most important aspects of the Spotify model
is that each team is co-located. So each team has to be at the same area, same place, same office, and work together. So we took this and this is awesome. This is a blueprint. This is something we could use for ourselves. And we take this and we start with our major project called the Minion Project.
Which is a... We can get to that. So this is Bob. He's in our office. He runs around all the time. Fun guy. So the Minion Project is to not do a big rewrite.
It is to change Chrome while it is living and make it mobile friendly. It's nearing its end now actually. So if you go to our sites they're actually quite good I think on mobiles. That's what the Minion Project is.
And we had said we couldn't do this years earlier and then we couldn't. But that was three years ago which is an eternity in Internet years. So now we actually could. Now we had the technology. Browsers were sufficiently advanced and we might have learned a thing or two.
But still we needed to produce at a rate that we couldn't as a few. And with the Spotify model we thought we were able to scale our team. Ramp up. So we started ramping up. This is an illustration of how we ramped up over time.
Time everywhere now is beginning mid 2014. So we started scaling against multiple offices. And so we scaled by adding consultants basically.
Because it's really hard to get good developers locally. And added smart developers, good people. They became part of the team. But something happens when you throw many people into a monolith. So even with the Spotify model and starting to split up people into squads and talking about business domains.
We still had one code base. Yeah. And now we had many many many developers working in that code base. And if you've ever worked in a shared code base I know Facebook can do it and I know Google can do it.
But really we are not Facebook or Google. We're not that clever. So what happens is that one developer does something to make some new feature or fix some bug and then give card breaks. And another developer does something else and then that doesn't merge with his thing.
And suddenly you have long living feature branches that are weeks out of master and you need to rebase them. And you can't rebase them because things have changed horribly. And it's just not working at all. So what do we do about this? Well our problem was that when we actually managed to merge something and deliver some
some deployable thing and we put it on our test systems and it never worked. Gift card always breaks and everything else always breaks. So we had things living in our test systems for weeks before we could almost never but
we could dare to put it in production and then figure out what breaks only in production. It's a bad place to be in because with such a large monolith with many people working on it actually stabilizing it is a non-trivial task.
So we figured out that we need to change. We can't work like this. We need to instead just because we're not good enough I guess to make such a thing work we need to split it up into smaller deployables just because we need to get something to production quickly.
And we sat down and we made an architectural vision for Komplett which is it's Komplett's vision. I'm not saying this is what everyone has to do but we believe that for our domain and our organization we need to have a landscape of services. We need to have many services that live in the environments that connect to each other.
We need those services to be stable in their contracts. We are not allowed to change, do breaking changes on services. We actually go as far to say if you are breaking your contract then you are making a new service.
You should not override the old service. You should deploy a new service beside it and tell your clients to start using that instead. We needed a consistent deployment and we had to do this while our system was running. So how do you do that?
Well you take whatever new part of the monolith you need to split out and you make some service for it just because you can't do it in the old one because gift card will break and you put a feature switch around it. So you deploy your new service and you make sure it works and you can actually make it work because it's smaller so you are able to understand it.
Once we are happy with it, we've seen that it actually works in the development environment. Then we turn it on by the feature switch in the test environment and check that it works and fix whatever doesn't work in test because there is always something.
And then we do the same in production. So we separate the actual deployment of the services from the activation of the services through feature switches. So it's a strategic choice. It is done at a different level than the technical level. As developers, we don't decide when something turns on. That's up to the guys who are responsible for that part of the site.
So we have people who are responsible for the checkout and they decide when we turn on and off stuff in the checkout. So that's how we decided to do it and it's been working relatively good but the problem is to create a new service.
It sounds easy, right? But it's not. Because you need to make a new repository, you need to set up the build jobs, the deployment pipeline. You need to manage the servers and make sure they are up and running and ready to be deployed to you.
You may even need to create databases and all of that stuff. And that meant that when we told the developers to please don't make this in the old one, make a new thing, I don't want to do that because then I can't start actually solving the problem for
a week or two because it takes me that long to actually get it out and running. And often they kind of shortcut it so they got it running in the development environment but nobody really cared about tests because it's not going live for a month, right?
And then you have to take it live and it doesn't work in tests. So what we had to do is we had to make something that can make a service. So we debated the naming of this. At one time it was the service maker, service maker, but it's now just the service maker.
I think we also called it foundry at one time, but it doesn't matter. So it's an interesting little thing that's been very, very important for us. So we have a developer, he or she needs to create a service, so they go to the service maker.
And the service maker is actually a website on our internal network where the developer enters the name of the service and whether or not that service is going to need a database and that's all. Then they press the create button and the service maker goes out, connects to Bitbucket in our case and creates the Git repo.
It connects to our build machine which is Jenkins for now and it connects to our deployment machines which is Octopus for now and it sets it up on all of those. It also goes to the database service and creates new databases because no service is allowed to use another service's database.
So if you need a database, you get a new one. Once it's done that, it actually then commits a template of a service to the Git repo.
That's picked up by the build server which builds the service because that's what build servers do. Then it pushes that to our deployment server Octopus and that then deploys that service to all environments. So after about 30 seconds from the create, they have an actual running service all the way.
This has made it immensely easier to convince developers that you should create a new service instead of doing it in the old one. So what we've learnt is that if you are going to have a service architecture, creating services must be cheap.
So we've talked about how we do this technically but how do we actually do this from a process and business standpoint? How do we actually push work through our system?
So we mentioned that the Spotify model brought us this far but we were feeling pain. And being strictly location-based didn't really work for us. We didn't have that many developers or it just didn't work for us.
It's not how our organizations grew together. So we went back and looked at our model and we thought we don't like those squares and things. We created circles. Circles are much better. This is what we would like to call the complete model.
It's circular. This is actually a snapshot of how the team is basically screwed together today. And just to kind of walk you through this, each circle is then a person. A small circle.
Each small circle is a person, each white circle. Each color circle is a squad. And each squad has a color which identifies them. Not a name, not a feature, not a business domain.
Because you're not guaranteed to work on the same thing forever. So each squad has a color and we keep them as stable as possible. We want the same people working on the same parts of the domain. So a squad was built up of 4 to 6 developers with their own dedicated QA.
So someone who's expert at testing. How they solve QA within the squad, that's up to them. But that's what they have. We have two dedicated DevOps and they're outside the squads.
And then we have the orange squad up there, which is our infrastructure team. Each of these squads are actually focusing on delivering business value. And they were working within a feature set.
So as an example, the red squad here is solely responsible for the checkout and nothing else. The black squad is solely responsible for the shopping cart and what's involved there. And these are now different parts of the code. This is something we split up. We split up the code into a different deployment unit and that squad is responsible for delivering the value there.
And each of these groupings, these larger circles, they are a logical grouping across multiple domains. So in the Spotify model, this would be similar to a tribe.
So within this tribe, we then have one or more UX responsible. People who speak to the business experts and gather requirements and work with them directly. And we also have a delivery manager.
So this is basically how we're set up now. And the infrastructure squad is responsible for making the lives of each other squad a lot easier. So we make the service maker. I'm in the orange squad. We make the service maker and stuff like that.
So you're the service maker maker. So what we've really done here is we like to call it the reverse Conway. Because Conway's law says that any architecture will kind of reflect the organization and the partitions in the organization.
So we've tried to partition the organization the way we want the architecture to be. If that makes sense. But again, we've talked about people, we've talked about scaling, we've talked about everything. But how do we actually work? So key, the real key here is basically, we've learned the experience is communication.
And none of these squads are co-located. So each squad has always got members from at least two locations. And one I think actually has four, which is almost as many as there are people in the squad.
Not counting people working from home. So this is where we really break from the Spotify model. There's a reason for this, right? The reason is bringing aboard new members, especially when you're in remote locations, then it's really hard for them to actually get the domain, get the culture. And by having a squad which is spread with representatives in the main office in Norway and with other offices,
then it's easier to actually spread knowledge, bring people aboard and the onboarding process becomes easier. For everyone. But communication is still hard, right? We still need to have some way of bringing new developers that aren't nearby near to us, I mean closer to us.
So what we have done, one of the things we've done is basically set up a window into each office. And we use something called a peer-in for this. That's B. And that's Thomas.
So we have multiple of these screens where we can peer into each other's offices. We're not kind of watching what people do. This is just about glancing and seeing who's there, seeing what's happening. Are they having cake? And that's all fun having cake because then we're standing on the side and looking at the camera.
We want cake. Please, send cake here. We dance sometimes. We have a lot of fun. So this gives us kind of a window into each other. We're there all the time. And the beginning was awkward but now it's natural.
And again, continuing on communication, we have something called Flordock. It's a lot like Slack except this works. For us. Please remember, green notes only. It's just a major difference between this and anything else.
It's not as fancy but we have something called Threads or Flows which makes communication a lot easier. This is a shot from our water cooler. You see a very beautiful man over there. He's known as Pretty Thomas. I'm the other one.
Thank you. I'm not sure if you can see this but if you see the rooms on the side. The rooms are basically whatever we need in regards to squads and structure. We have other rooms like interest rooms like the checkout room over there. And then we have something called Code Ponella.
So this is the code panel. It's something we do on YouTube. Then we have something called KDC which is our Complete Developer Conference. This is where we gather together yearly. And do things together. So basically we create channels based on whatever we need. And we have GIF wars.
Yes, I said GIF wars. On water cooler. And we also use a lot of Skype and TeamViewer for pair programming. So the main learning here is, it should be obvious, but organization matters. But we tend to not care sometimes.
So just to emphasize, organization really does matter. So we're making a landscape of services. That should be easy, right? Each service is a small thing. It's not hard to make a service. We've made sure of that.
And then let's just do it, right? Well, it turns out there are different problems with services than with monoliths. So how do we make sure that our services are actually up? How do we make them perform?
How do we make them scale in ways that are relevant to these services? Well, each service has an SLA to its clients. We use a lot of caching to make sure that they perform at that level.
We are very, very into performance testing our solutions. So whenever we launch something new, it has been performance tested quite, quite strenuously. But we also need to have the services have some common characteristics. And we make that easier through the service maker,
but they need to do stuff like they need to have known health endpoints so that our infrastructure can make sure that they're up and say that it's safe to actually send traffic there because services go up and down all the time. And we need them to log to a centralized logging service in a specific way
so that we can recognize requests throughout different services. So if you actually go to our front end and do something, that might be five services, ten services involved in serving your request in some way.
In a monolith, that's quite easy to log because you just log and it's all from the same place. But when you are in a service landscape, it gets really hard to figure out what actually happened and why didn't it work. So we've instituted standards on that, ways we log,
and to make this easier, we've made that a library that we have on our internal nugget server that's actually put into each new service by the service maker. So we use the service maker kind of to drive the service behavior that we want.
If you make a new service now, it already has the endpoints that we expected to have and it already has all the logging mechanisms. You can just log as normal and it will log everything that we expect to find.
So we've had to do quite a few of these things and that's what the orange squad does. We figure out what turns out to be the best practice that we want our systems to follow and then we pull that out of the projects that started doing it, usually as a nugget package or something, and we put that in the service maker
so that from now on, all new services do that. So the learning here is that services aren't monoliths. They are different and they have different needs and they have different characteristics
and you need to be aware of that. So where do we go from here? One thing we haven't really spoken about is what we do within each squad, technology-wise. We are kind of connected to an existing technology stack. We are used to working with .NET, we're used to working with Knockout,
is what we have on the main store now, but because we now started splitting things into different autonomous units, we are now available to have the opportunity to work on completely new technologies,
which is what you touched on. Usually learnings come through squads doing new things within their domain, within their solutions, and talking to other squads saying, that's a good idea, why do we also need it? And they might start implementing it and then this orange squad comes in. So what would be cool though is having fully mandated squads,
or completely autonomous squads, where we actually then pull in our operations, we pull in the business users, there's no people in between. So we're really interested in trying to have complete business units, including developers.
There will be more microservices, but how small they will be is something we're unsure of. Because we're doing our best to stick to the business domains in regards to where we're spread out.
We think about business context, we think about our domain, but we have to take some baby steps along the way. So we're not religious about the size of our microservices. Size shouldn't matter, right? But we need to have them smaller than our huge monolith. That we know.
We're also looking into exposing some more APIs, right? We want to expose public APIs. I hope we get there soon. We should get there. We don't think we can make everything, but we have a platform of lots of things and it would be really cool if someone made Android apps or iOS apps even.
Or the other way around. And obviously more business areas. So just to summarize, I mean, scaling is hard. And we have to find our own way.
We're still finding our own way. We're not there yet, but we're getting there. Culture is crucial, right? This is number one in regards to getting people on board and actually building a team. So we also want to really embrace and learn from failures.
And this is really important, especially across cultures. Failing should be something everyone does by default. It should be the default way of approaching things. And that's actually really hard when you are working in several different cultures because different cultures have different views on failure.
So that's been one of the things we had to work on. So we now call them fabulous failures just to try to make it a bit better. It's a good thing to fail. Then you've learned something. And we're going to continue failing.
And hopefully also we're going to continue sharing as we go. We started doing things in public. This is one of the things. And we'll continue as much as we can. One of the greatest success factors which you can feel on a day-to-day basis is are your developers, or is everyone happy?
It's because happy people make great stuff. So make them happy. And then they will probably make good stuff. So, let Denise take a picture. I can send you afterwards.
So we'd like to get feedback on this. We want to hear more from you guys. We want to hear if this thing makes sense, right? Yeah. Contact us on Twitter. Talk to us now. You can also follow us on Call the Panella, which is a YouTube stream we do.
We try to do it at least once a month, perhaps twice a month. It's in Norwegian, sorry, for our non-Norwegian trends. You really should learn it. It's a great language. So easy. But we try to interview people and do stuff there.
And Paul has a blog that you really should read. It's not Norwegian. That's in English. And I'm just on Twitter. So I hope you guys have learned something or got some ideas or inspired to do something or have feedback for us. That would be awesome.
We want to go more into detail moving forward. And we hope you guys leave green notes over there or press the ring button. As many as possible. And if nobody wants to go home, we can have lots of questions today.
It's in the tens. It's not in the hundreds yet. But we believe we will reach at least a hundred by the end of the year. Mostly we are, as I said, we have this service maker which makes it really easy to make services with the technology that we usually use.
We have a few Java services that we are phasing out actually. Because C Sharp is cool. And I have a bet going on. So I want to be the first one to actually produce an F Sharp service.
But there is no requirement on that. Yes, we are. But the easy path right now is .net. So that's the thing most of our developers do.
We are moving towards deploying our services to Service Fabric. That's what we're now looking at. Not quite sure whether it will work or not.
But in that you can run anything you want. As long as it's executable. It can run, they say. I'm sorry? Yeah.
That's one of the things with smaller services. One of the kind of goals is that they should be so small that if there's a bug you can delete it and start anew. And that's not a huge cost. I'm not sure that many of our services are that small yet.
But it's certainly something we're hoping to get towards. But as I said, we don't think there is a magical, this is the right size for microservice. We have of course a catalog service.
That is our catalog. We have services for actually placing orders. We have services for getting users for doing all these small things. We have some services that are more like verticals. So as I said, our checkout is a separate site. It looks like you're still on the same site, but actually you're not.
And these days we are... Our my page or your customer page is also a separate site? Yeah. And we have separate services underneath that that handle the storage. Right now we are actually in the process of deploying a new checkout for our B2B customers that is on a different technology from our other checkouts.
We have two different checkouts on different technologies. Two different, but different. And that's kind of what we want. We want to be able to test out new things.
So the question is how many different services, different versions of services do we have deployed?
Since we say that we don't want to deploy... I'm sorry? We don't want to deploy over. We want to deploy side by side. So this is not something we are very good at yet. So we are working on that.
And it's also something that becomes more important as things stabilize. It's in effect when people are moving quickly, they are usually communicating on both sides of this communication. And then it's not that important to us. So the things that are stable are the things, and in use are the things we don't overwrite.
But we have some things that are in different versions, but not a lot of them. Because we go into different kinds of problems there with the data storage for instance. Since they are not allowed to use the same data, it's not as easy as it sounds.
I think our time is running out. We have more questions though. So we can take one over here now.
So the question is basically, if I understand correctly, is this representative of how we are actually organized today, right?
So yes, we have two tribes. We don't call them tribes, but we have two tribes, which kind of represent the two areas of our domain. How we see it now, based on how the business is also kind of structured. So we have a before buy. It's everything about getting a customer into the store.
It's about where we do a lot of scaling, where we do a lot of work on presenting content. And we have a kind of during slash after buy, where we do a lot of interactive pages. We do the checkout, you do your cart, you do your my page, that type of stuff. And yeah, the orange squad is something.
So in regards to how many number of people we have on each squad, we have around between four and seven, basically. Depending on how we do things. And we do for when you kind of start something new, we do a kind of task force approach where you may split off two people, three people.
They start working something and just to get the thing, the service up and running. And then the rest as well just finishes up and then joins them. So they stay within their tribe and do different.
No, each squad has at least one dedicated QA and the rest are developers. There's one lead who is more responsible for the delivery, but he or she is also usually a developer. And how they solve this role-wise is up to the squad, right?
Is there a question there? So the question is how do microservices communicate with each other, right? Mainly by rest, so they call each other. This is one of the things we are working on changing now because we think to be able to
scale, we need to push information out instead of having them get it when the customer is actually coming in. So we are reworking to get a push architecture where when something happens somewhere in the system, a message is sent.
And then the services that are interested in that, they do their work so that when the customer actually kind of knocks on the door, we don't have to build the store. It's ready for them. This is also one of the things we are now discovering as we...
you move to a service landscape, because if you do a lot of requests, then you end up in a situation where you deploy something new. And if one service is down or unstable, then that kind of bubbles all the way out. So instead of that, we want it to be so that when a service, when some event happens in the system,
it propagates it all the way out, so that if something goes down in that pipeline, it should still be up as far as the user discovers. They might not get the latest updates. So you might see somewhat stale data.
But at least we will have data. OK. How much is left of the monolith? The question is, how much is left of the monolith? I should have checked that. Quite a lot. Quite a bit. Quite a lot. We still have XSLT, actually.
There are pages where you will be served by the XSLT. But mostly you can avoid them, right? Most of the shopping experience has actually been moved over. Yeah. It's things like the feeds and stuff that's not really important. But the main things that a normal user encounters,
those are, well, on the new technology. So? Yeah, we have one more.
So the question was, when a squad develops a service, do they have to maintain it for the lifetime of that service, or does it move to a different squad? And the answer is that it's there. So there are certain cases where, for instance, if one squad has an area of responsibility
and they don't have time to develop something that has to be developed, then a different squad can do that and then transfer it. But usually, they own the service as long as it's alive. But since we've elected to do it by business area, it might be that the people who
are responsible for master data will have to maintain something someone else made. OK? Thank you, guys. Thanks so much for being here. Have a nice weekend. Green.