We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Spin Me a Yarn

00:00

Formal Metadata

Title
Spin Me a Yarn
Title of Series
Number of Parts
28
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
VideoconferencingCodeCloud computingEuler anglesMathematicsEndliche ModelltheorieOctreeAngleLevel (video gaming)BitPoint (geometry)Control flowTelecommunicationVector potentialCartesian coordinate systemHand fanDependent and independent variablesSoftware frameworkLine (geometry)Term (mathematics)ResultantData managementMultiplication signCodeMobile appMeeting/Interview
Windows RegistryGamma functionDeterminismImage resolutionSolid geometryArithmetic meanMultiplication signPoint (geometry)ExponentiationTwitterArithmetic progressionSparse matrixInternet forumAddress spaceData managementGraph (mathematics)Client (computing)Windows RegistrySoftware development kitImage resolutionMedical imagingSource codeBitGoodness of fitMoment (mathematics)MereologyProcess (computing)First-person shooterType theoryCodeInternetworkingDisk read-and-write headCoefficient of determinationInstallation artPhase transitionStructural loadComputer animation
Physical systemSoftwareLevel (video gaming)Type theoryStudent's t-testInstallation artCASE <Informatik>Touch typingNamespaceMobile appRevision controlClient (computing)Different (Kate Ryan album)Semantics (computer science)MathematicsPatch (Unix)SoftwareGraph (mathematics)Computer-assisted translationData managementControl flowPhysical systemWindows RegistryMusical ensembleRight angleWeb pageComputer fileComputer animation
CodeInterpreter (computing)DiagramModule (mathematics)InternetworkingData managementGraph (mathematics)Projective planeCASE <Informatik>CodeGroup actionTheoremInstallation artCompilerComputer animation
Windows RegistryImage resolutionPhase transitionCache (computing)Graph (mathematics)MiniDiscNetwork topologyStructural loadGroup actionLink (knot theory)Revision controlState of matterTerm (mathematics)Local ring1 (number)Data managementWordIdeal (ethics)Graph (mathematics)CASE <Informatik>Image resolutionPatch (Unix)Network topologyBitGroup actionDifferent (Kate Ryan album)Electronic mailing listObject (grammar)InformationResultantPhase transitionForm (programming)MereologyLevel (video gaming)Covering spaceCloningMoment (mathematics)Resolvent formalismUniform resource locatorData loggerBlock (periodic table)Kernel (computing)Revision controlInternetworkingPower (physics)Address spaceNumberProcess (computing)Data storage deviceSource codeRadical (chemistry)Water vaporWindows RegistryRange (statistics)Projective planeTerm (mathematics)Hash functionComputer fileQuicksortGoodness of fitRight angleData structurePattern languageModule (mathematics)Cache (computing)Inheritance (object-oriented programming)Directory serviceMobile appSoftware repositoryEndliche ModelltheorieOcean currentCoefficient of determinationFile formatMatching (graph theory)Multiplication signLink (knot theory)MiniDiscBuildingCodeComputer animation
Default (computer science)CodeCone penetration testLevel (video gaming)MultiplicationParallel computingResultantData structureInheritance (object-oriented programming)Default (computer science)Level (video gaming)Planar graphWindows RegistryMultiplication signBlock (periodic table)Remote procedure callThermoelectric effectParallel portSource codeACIDRight angleAreaCASE <Informatik>StapeldateiBuildingSerial portLine (geometry)Linear regressionNeuroinformatikComputer fileAlgorithmPhysical lawBitData managementWindowOcean currentExecution unitIdeal (ethics)Cache (computing)CloningNetwork topologyBenchmarkBinary codeNumberForm (programming)Computer hardwareFood energyInsertion lossVirtual machineComplex (psychology)Profil (magazine)Graph (mathematics)Inclusion mapProjective planeKernel (computing)Dependent and independent variablesRevision controlSoftware developerSynchronizationMeasurementSystem callMathematicsCore dumpWordImage resolutionCodeDirection (geometry)Module (mathematics)Mobile appInstallation artData loggerNP-hardClient (computing)Proof theoryDifferent (Kate Ryan album)Computer animation
Common Language InfrastructureData modelSpacetimeData loggerProjective planeSoftwareBlock (periodic table)Stability theoryData managementPower (physics)Inheritance (object-oriented programming)Cartesian coordinate systemType theoryCASE <Informatik>PlotterPoint (geometry)Complex (psychology)AuthorizationGrass (card game)Open sourceEndliche ModelltheorieSpacetimeComputer animation
TwitterComputer-assisted translationAuditory maskingComputer animation
Transcript: English(auto-generated)
So now let's spin your Yarn today about two famous package managers in the JavaScript world, namely npm and Yarn. Now, I gave this talk two days ago at the bonus conference session,
so you feel, whoever listen to it, feel free, you can have a longer break. I added to two or three jokes. They might or might not be worth to stay on. You can complain afterwards, you can complain afterwards. So just a question for the audience. Who of you uses npm in their day-to-day work? Okay, quite a bit, and who uses Yarn?
Who migrated over here? Very good. So in this talk, we're gonna look into how they actually work under the hood, and we look into what kind of shortcomings Yarn tries to address and how it addresses them of npm. So first, let's talk just very quickly about me. I'm from a country that, in which the chancellor,
or the chancellor has, I'm a fan of her, but like, in which the chancellor has the same name as a famous JavaScript framework, Angular Merkle. I actually moved to Ireland, though,
to work on my humor with questionable results. If you ask my colleagues and friends, they might not agree with my puns. And actually, I used to work in a company, we did Angular, and at that time, it was Angular 1.13, and it was the breaking change between Angular 1.13 and Angular 2,
and I gave a talk at Intercom at some stage, my current company, and one of the engineers asked me, well, what do you do to migrate your apps over? And I had absolutely no idea. You know what I did? I migrated over to Intercom and worked now in Ember. That was my solution to this problem.
So if you don't know Intercom, Intercom is a customer communication platform, and it's one of the larger Ember applications out there. So with a lot of code comes a lot of responsibilities and also dependencies, and up until last year, we used NPM to manage all our dependencies,
but we migrated over to Yarn. But wait, wait, wait, no. This is, tell me, a storyline, right? So we have to start properly here, guys. So let's start. Once upon a time, think about any Star Wars theme or whatever comes in your mind in this moment. So once upon a time, there was NPM, and NPM was actually first released in January 2010 by Isaac Schluter,
and he addressed shortcomings of package managers at that time. And NPM came with two things, a registry where you could upload code and download packages, and a client, a toolkit, if you want to call it that, like that where you can manage your dependencies.
So the NPM registry became super successful. There are over 300,000 packages out there, according to the source you see below, and per week, there are 11,000 packages published. So it wouldn't be a story, though, if there wouldn't be a twist. So this is a tweet of one of our lead engineers, Gavin,
and he found out that if you disable the progress bar, NPM actually installs twice as fast. Now, as I said, you saw the graph of intercom, and this is not negligible for a large app like ours. And while this even made it into an issue
for the NPM folks, and while I was reading up on the issue, I saw that loads of people, I was talking to my own guys at work, I was talking to other engineers online, and it seemed like there was this common topic or common theme, how people considered the NPM install phase.
Because that was really slow, and you could go for coffee during the time. So there were actually two points people complained about, two shortcomings. One was non-determinism, which means there was a non-deterministic install, and the second, performance, which we saw already. In October 2016, engineers from a couple of companies,
like Google, Exponent, Tilde, Facebook, they all got together and built YARN, and released YARN, and YARN actually builds upon the good parts of NPM, so it uses the NPM registry, but it tries to address things like the shortcomings,
so like it promises consistent and reliable dependency resolution, and it promises improved performance. So while I was reading up on these issues, and read up on the documentation of YARN, I was like, in my head it was like, dependency resolution, performance, I have no idea what this actually means. I felt like this dog, I felt like internet dog.
How do they actually work on that? What happens if I type NPM install or YARN? No idea, and that's what we're gonna do. So we're gonna look into now what actually happens when you type NPM install or YARN. Let's start with the beginning. So a story always introduces the characters, so in our case it's a couple of definitions
we have to go through. So there's two disclaimers in this talk, they're very quick, one is, I assume you all use a package manager, most of you raised their hands when I asked, and you are familiar with the NPM ecosystem. We won't touch on the NPM registry, we will only look into the client, so YARN client and the NPM client.
So then the question is what are packages? Well, packages are just pieces of software that can be downloaded, easy, right, that's okay. And packages may depend on other packages. What are dependencies? So we see here an app for example that has two dependencies on A and B. Well, they are specified inside a manifest file,
which is usually in NPM and YARN's case, the package.json, and they follow semantic versioning. Semantic versioning defines or it shows or it reflects the type of version change, so it can be a patch or a major API breaking change.
So there's one last thing, which we have to just very quickly address. So you see here in this graph for example, we have two packages of the same name but different versions, S in this case, and this is possible, so NPM and YARN allow packages with the same name, different versions,
but they can allow it just because they use nested dependencies. And you can see them as, like they give, this is their own namespace for a dependency. We will get back to this in a while with a graph as well. So again, nested dependencies make sure we are not in dependency hell, and we can actually, this allows us to install packages with different versions.
So let's start under the hood. So this is a schematic graph where we start out with our project code, that's fine, and then we have a human readable and human writable manifest file, right? That's our package.json. And we have also the dependency code,
which is all the dependencies installed and compiled and in a way that they can be fetched by the compiler or interpreter. In our case, that's the node modules folder. This is actually, this diagram is actually taken from a very, very brilliant medium article by Sam Boyer, so you wanna write a package manager.
You ever have a Saturday morning read that article, it's really well written and makes just your mind bloom. So okay, science talk now, we're not internet talk anymore, we're science talk now, we're starting to go deeper. So just for, to make it easier for us and simpler, we assume that we are first time installation,
so we have empty, we never installed that app before, never installed these dependencies, so we have no pre-cached packages, and we have an empty node modules folder. There are three phases in the install phase. One is, the first one is the dependency resolution. So dependency resolution actually determines
which packages are installed where in this dependency folder, and in this phase, we make requests through the registry and look them up recursively. The second one is fetching packages. We have to fetch these actual packages in a compressed format and place them in a global cache. And the last part is we link these packages,
so we're copying the files from a global cache into a local node modules folder. All right, so that's all well and done. Let's start with an example, because examples are always good. I'm using here Ashley G. Williams NPM sandbox example. Ashley Williams is a, I think she's the community manager of NPM. Definitely check out that GitHub repo.
So in this case, we have the manifest file, our package.json, and this describes a very simple app with two direct dependencies, A and B, and these dependencies each have a child dependency on S, but in different versions. So A has it in version one, and B has it in version two.
Okay, that's all good. And this is the resulting node module, actually you see on the right hand side. So you see here the nested structure. We have A and B that were our top level dependencies on the top level, but NPM and YARN try to flatten dependencies as much as possible. So actually one of the S dependencies, one of the S child dependencies can be installed
into the top level here, and the other one, because we allow nested dependencies, and we said we allow packages of the same name, different versions, will be installed under its parent directory. So in this case, B here. We are making this a little bit simpler for us,
just for now, for clarification, so our dependency graph looks like A1, B1, S1, S2. We don't really care about the semantic versioning here, and our node module's folder will be just looking a little bit easier as well, so that you can follow and don't have to read the terminal code. Okay, so let's start with NPM. What does NPM do? The first phase we know, dependency resolution.
Okay, reloading the exist, so the first thing we do is reload an existing node module's folder from the disk. Well, in this case, because we never installed this, we have an empty node module's folder. The second thing is we clone the current tree. There's no current tree at the moment, so that's all good. Now comes the exciting phase.
We're building actually the ideal tree, and building the ideal tree means we're trying to resolve all these dependencies. So let's go and do it. So first we have A1, and that's a top level dependency. That's fine, we can put this into our node module's folder at the top level. We have S1, and because remember we said we want
to have this as flat as possible, we can actually resolve this at the top level as well. Then B1, B1 as well. And now S2, because there is already an S in our node module's folder with a different version, we have to resolve it as a child dependency of B1.
Okay, so that's the dependency resolution done for NPM. The last bit what NPM does is actually generates the actions to take, so it has the current tree and an ideal tree, and now it compares them and says, okay, what do I need to do actually to get from the current tree to the ideal tree? And in this case, it's easy because we had nothing and we have four new nodes, so we're gonna add
these four new nodes. The next step, we're gonna fetch the packages from the registry, and we're gonna link them. The registry comes back with these packages, we store them in the global cache, and we link them to our local node module's folder. So it will look like actually exactly like we started out.
So Yarn works a little bit different. Yarn doesn't build up an ideal tree, sorry, Yarn doesn't build an ideal from node module's folder it first creates a list of package requests. So package requests, think about it, it's just an object, and it contains like, they call it a pattern, a pattern is the name of the package
and the version number. So in our case, we start just for an example with A1. The next thing is Yarn tries to find this version on the registry, and while it does this actually, it tries also locally to say, did I have this already, is there any matching version range which I could use?
In our case, because we never installed this package, no, so the registry then comes back with much more information, we have a URL, we have a hash, and so on and so forth, and we also know now the child dependencies. So we can do this whole thing again for S1 now. So we're resolving S1, and like MPM,
Yarn tries to flatten dependencies. Next one, B1, B1, and S2, again, we can't resolve it to the top level because there's already an S package there, so it needs to go under B1. Yarn also the same thing, we're gonna fetch the package,
well, in this case, we have the packages fetched, and we link them to our node models folder. The node models folder looks exactly the same in our simple, very simple example where we have only four packages. Now, there's one step I kind of hid from you, and that's a very important step and a very major difference between Yarn and MPM.
Yarn will actually then write a log file. So this log file contains pretty much that graph which you see on the left-hand side, and it contains all the versions Yarn has installed. So for example, in the words of Dan Abramoff, to make it more clear, your package.json states what I want for the project, whereas your log file says
what I had in terms of dependencies. So it's actually almost like it's a snapshot of the current node models folder. So now, we don't feel like internet dog anymore, right? I think we have a pretty good idea how this works. So now, let's have a look at the shortcomings we address and see whether we could explain them a little bit better.
So first, we had this thing with the non-determinism, and what does this actually mean? Well, non-determinism is like this famous, it works on my machine problem. Like, I install something, and a couple of minutes later, my colleague installs it, and it doesn't work. I will explain to you now why MPM actually
is inherently non-deterministic. There's actually a brilliant post on the MPM documentation. It's actually really good documentation. So let's use, as an example, a bit of a more complex dependency graph. In this case, we have three packages, direct packages, A1, B1, C1, and three child dependencies, S1, S2, and S1.
So this is our existing node modules folder, which you see on the right-hand side. There's no magic here. You know this all, right? There's no open questions. So let's assume, though, that we wanna upgrade our package A1 from version one to version two. And this means also this results
in an upgrade of the S package from version one to version two. Now, let's assume Tomster does this. And Tomster has already this existing node modules folder you see on the right-hand side. Remember how our dependency resolution went. We clone the current tree, we make a, so we load the current tree from this,
we clone the current tree, and then we try to build up the ideal tree. Well, how will our ideal tree look like for Tomster? Let's see. So A2 is our first dependency. And we can actually just exchange these two. S2 is our next dependency.
And because there's already a version of S in the top-level dependencies here, but it's not S2, unfortunately, well, it has to be a child dependency of A, right? So we put it there. The next one is B1. B1 is there, fine. There's nothing there, no nothing to worry about, S2. That's a child dependency of the S2.
C1, C1, and S1 is down there. So we're all good. Let's assume now Zoe comes along. She has a new shiny laptop, and she's gonna install our app. How will this look like for her? How will this resolution look like for her? Well, okay, so again, we clone the current tree, nothing in it. We're building the ideal tree.
Let's see how the ideal tree works like. A2, that's fine. We can resolve this directly on our top-level. S1, interesting, because we have no S there, we can actually put it on the top-level as a top-level dependency. B1, put it there, directly in the top-level.
S2 was there. C1, C1, you see it gets repetitive, right? And then S1, now because there is already an S in a different version, S1 has to be the child dependency of C1. Hmm, we're ending up with two different node module graphs, right, and that's a problem we run into with NPM.
Now, in our case, this was a really, like a tiny example app. It gets worse if you have, like, because of semware changes in the packages or in the child dependencies, little version changes. So, is there anything we can do? Well, the intercom animals say,
when in doubt, clear node modules out. But, like, this takes ages, we're back to my, to the comic, remember, at the beginning. So, NPM actually had a solution, and this is the NPM shrink wrap. So, this is actually a log file for NPM, where they write down the dependencies of the current,
or write down the current node modules with all the dependencies. But the problem is, by default, it's turned off, and it's the responsibility of the developers to actually update it, and it gets easily out of sync. Imagine intercom, we are 70 engineers, all working on something together. It can get easily chaotic with this NPM shrink wrap file.
Now, with YARN, for example, if we do an upgrade of package A, and Tomster does, like, assume we start with Tomster again, we don't care about the node modules folder in this case. We don't, we have a different resolution algorithm with YARN. So, we can actually do, we create a package request
put it in the top, and it comes back, and we can have a look at S2 as the next, as the child dependency, put it as the top level dependency here, have a look at B1, can put it as a top level, S2, ah, there we have it, C1, and the same here, we have S1,
which we can't resolve directly at the top level, we have to put it as a child dependency here. But remember that we write all the time, at the end, we write that log file, and that log file is our single source of proof. So, if Zoe comes along now, she knows how the node modules folder of Tomster looked like, and she will get the same result
based on this. So, that's actually one of the advantages of YARN. We always get the same node module structure. So, remember our graph from the beginning? We had the project code and the manifest file. We actually have another thing, which is computer written, or driven by YARN algorithm,
and that's the log file. So, that's actually the holy grail of Trinity for package management. I made that just up, guys. So, performance, let's have a look at, very quickly, at performance. Now, bear with me, this is a little bit hard, it's, so there is actually, sorry,
let me start first with, it was a great issue opened by Sam Saccone in November 2015, and he found out that there is a regression between the NPM2 client and the NPM3 client, and that issue is brilliant to read through, because I learned so much about flame graphs, and like, even how hard really it is to debug performance.
It's just a, I'm a nerd about this issue. Like, I really recommend, like, it's a fantastic write-up, actually, of a performance issue. I wanna explain this, I tried to explain it. This is my own example, and it could be a bit contrived, but like, I hope it shows you why yarn is faster than NPM.
So, NPM is actually a multistage installation. Remember where we said, okay, we do all this, we do all these things, we clone the current tree, and we build up the ideal tree, and then we actually do these things. In this case, we do them in a serialized fashion, so we have these four packages we need to fetch, and we can only fetch one after the other,
after the other, after the other. So, in this case, for example, remember we had our very simple example, A1, we get A1, we get B1, we get S1, and we get S2. So, that's quite like serialized fashion. Whereas yarn has this built-in parallelism,
sorry, hard word for me. If you look, like, when we said we created a package request, what it actually did in reality is it created a package request for the two top-level dependencies, for A1 and B1 at the same time. And the same thing for child dependencies when they come back. So, in this example, let's assume
we're creating these two package requests directly, and one of them comes back faster, where we can directly move on. So, that's why you see here A1 and B1 highlighted at the same time. They actually, in reality, resolved at the same time in yarn. So, let's assume they go off to the registry, they come back, and B1 is a little bit faster.
Then, we can fetch S2, and then we can fetch S1. So, this is the tiny advantage of yarn over NPM. And you see here that this is a benchmark that the yarn guys actually did with warm caches and cold caches. So, it's always faster than NPM, in their case.
So, let's have a look at the last chapter of our story. If you want to try out yarn, Ember CLI 2.1.3 onwards is actually yarn-aware. So, if you type ember install, it will create a log file for you, and you commit this log file.
Also, I forgot, you always commit your log file. You can also migrate your own projects typing yarn. The yarn documentation is fantastic. Have a look. They explain to you how you can migrate your own projects. And you can actually contribute. It's an open-source project.
It follows the same governance model, like Rust and Ember. So, if you have a feature request, open an RFC, let people look at it, and they will comment on it. But, let's not forget, yarn is really written on the shoulder of giants. And NPM, the guy, the folks from NPM, they constantly try or improve the stability
and performance of NPM. So, definitely follow the block and watch that space as well. I would say, as a conclusion, I think whether you use NPM or yarn, in both cases, you use a powerful package manager that drives all your applications. And I wanna give also a big shout out
to all the contributors of both NPM and the yarn project, because it's fantastic software. It's super complex, and it's used by every one of us, everyone in this room, every day. So, that's actually a wombat, which is the mascot of NPM, and a cat. It's not really clear, but I explained it there.
If you have any questions, tweet at me at sarifridge or serina at intercom.io. Thank you very much. You were awesome. Thank you.