Really unconventional migrations
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 61 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/54071 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Plone Conference 201758 / 61
3
11
14
21
22
25
26
27
33
34
37
38
44
45
53
55
57
58
00:00
PhysicsState of matterSet (mathematics)Installation artMereologyRevision controlProduct (business)Gamma functionHuman migrationIntegrated development environmentView (database)CodeOperations researchSynchronizationPRINCE2Virtual machineFigurate numberWebsiteTheory of everythingMultiplication signChainGoogolGraphical user interfaceScripting languageMobile appGraphics tabletDemo (music)Cartesian coordinate systemLine (geometry)Protein foldingDisk read-and-write headSpacetimeBefehlsprozessorWeb 2.0Revision controlMusical ensembleSocial classIntrusion detection systemStorage area networkDemosceneHTTP cookieStandard deviationAnalytic setSoftware testingNoise (electronics)Diallyl disulfideTheoryLoop (music)Probability density functionMachine visionProduct (business)FreewareMixed realityYouTubeBoiling pointComputer fileSymbol tableInheritance (object-oriented programming)Ultraviolet photoelectron spectroscopySoftwareDoubling the cubeFacebookLogic gateAlgebraBlogBoom (sailing)Observational studyMoment (mathematics)Key (cryptography)View (database)Server (computing)Human migrationPerturbation theoryInstance (computer science)SynchronizationState of matterAuthenticationProper mapSystem callQuery languageStructural loadOperator (mathematics)Task (computing)Free variables and bound variablesFault-tolerant systemChecklistDifferent (Kate Ryan album)Software developerMereologyCuboidSet (mathematics)Web pageBlock (periodic table)Functional (mathematics)Mathematical analysisCodeIntegrated development environmentLevel (video gaming)Procedural programmingVirtueller ServerComputer animation
09:08
Venn diagramSign (mathematics)Human migrationIntegrated development environmentPrice indexSynchronizationMereologyWitt algebraLibrary catalogRepetitionLevel (video gaming)Object (grammar)Revision controlWeightFunction (mathematics)Modulo (jargon)AliasingSocial classInterface (computing)UsabilityContext awarenessKey (cryptography)Game theoryDefault (computer science)WebsiteContent (media)Group actionType theoryMaxima and minimaLattice (order)Focus (optics)Patch (Unix)Point (geometry)Subject indexingDistributed computingMereologyWindows RegistryMultiplication signWeb 2.0Inheritance (object-oriented programming)MathematicsDiallyl disulfideProcedural programmingOrder (biology)Sign (mathematics)StapeldateiInternet service providerSource codeChainPlanningFigurate numberMusical ensembleMobile appAreaGoodness of fitProcess (computing)Interface (computing)Group actionWebsiteGame theoryFunctional (mathematics)Theory of everythingComputer configurationAttribute grammarSocial classAliasingDisk read-and-write headSuite (music)Software testingCategory of beingLine (geometry)3 (number)FamilyRemote procedure callMetadataEndliche ModelltheorieEvoluteBlogRow (database)CodeBargaining problemComputer clusterValue-added networkMessage passingSelectivity (electronic)Renewal theoryGraph coloringFlow separationComplete metric spaceComputer-assisted translationVirtual machineDatabase transactionBlock (periodic table)Digital photographyHuman migrationSolar timePatch (Unix)Utility softwareLevel (video gaming)Instance (computer science)Content (media)Computer fileSynchronizationIntegrated development environmentAxiom of choiceGeneric programmingShape (magazine)Object (grammar)Library catalogView (database)Revision controlMultilaterationDatabaseComputer animation
18:07
Human migrationStandard deviationObject (grammar)Content (media)Context awarenessComputer fileMathematicsInheritance (object-oriented programming)Infinite conjugacy class propertyAttribute grammarGamma functionWrapper (data mining)Data typeCache (computing)Interface (computing)12 (number)UsabilityParallel portType theoryComplete metric spaceFunctional (mathematics)Instance (computer science)Library catalogState of matterContent (media)Multiplication signRevision controlObject (grammar)BitCartesian coordinate systemMoment (mathematics)Physical systemDifferent (Kate Ryan album)Phase transitionSocial classProduct (business)Subject indexingInterpreter (computing)File system1 (number)Inheritance (object-oriented programming)Field (computer science)Office suiteProcedural programmingLevel (video gaming)CASE <Informatik>Parallel portMereologyCache (computing)Interface (computing)NP-hardService-oriented architectureRegular graphWebsiteSynchronizationScripting languageOrder (biology)SurgeryMathematicsAttribute grammarSystem callHuman migrationComputer fileProcess (computing)Hacker (term)Value-added networkBlogTheory of everythingPower (physics)NeuroinformatikQuadrilateralDecision theoryData miningLaptopVideo gameMusical ensembleDisk read-and-write headGraphics tabletGroup actionInductive reasoningProbability density functionContrast (vision)ArmReliefPositional notationMechanism designTotal S.A.Square numberExtreme programmingElectronic visual displayTouch typingPole (complex analysis)Mobile appTelecommunicationTablet computerStudent's t-testHand fanExpert systemPlug-in (computing)Condition numberYouTubeDesign of experimentsDefault (computer science)Core dumpRight angleBeat (acoustics)Goodness of fitDressing (medical)View (database)Game theoryWeb 2.0Computer animation
27:05
PlanningSynchronizationLibrary catalogModule (mathematics)AliasingHuman migrationWeb pageStandard deviationPersonal digital assistantData integrityLink (knot theory)World Wide Web ConsortiumFunction (mathematics)Task (computing)Computer configurationTheoryInstallation artPlane (geometry)Content (media)AirfoilCASE <Informatik>Similarity (geometry)Subject indexingLibrary catalogINTEGRALLink (knot theory)AliasingBlogProcedural programmingPresentation of a groupPoint (geometry)Multiplication signComputer configurationHuman migrationSocial classPressureType theoryStandard deviationContent (media)Module (mathematics)MereologyPlanningPlastikkarteDMX512Gateway (telecommunications)Slide ruleMusical ensembleBlock (periodic table)Motif (narrative)WordValue-added networkData managementService (economics)Single-precision floating-point formatGoogolStaff (military)Expert systemObservational studyTransport Layer SecurityGoodness of fitDigital photographyShape (magazine)LengthEndliche ModelltheorieFrequencyScheduling (computing)Series (mathematics)VideoconferencingInductive reasoningCausalityComputer-assisted translationComputer animation
Transcript: English(auto-generated)
00:15
Hello, everybody. I will try to be fast to gain some time that I lost here.
00:21
I'm Alessandro Pisa. I studied as a physicist, but something went wrong and I started working with Plone. Sorry? Can you hear me? Okay. So I was saying that, I was saying that, okay, great.
00:41
So I was saying that I started working with Plone something like 10 years ago and since the past year, I've been working with SysLab and I quite sometimes was the migration guy for migrating websites. So migration is just a matter of state
01:01
and you go from state A to state B where you have different servers, different machines, different software versions or different software applications. And of course, the customer does not care about how do you get from A to B, but you do because all the cost of the migration is up to you
01:22
and your company. So sometimes the migration proceeds in a well-defined path and the cost can be predicted and some other times where things happen and you have to go out of the beaten path. And of course, I have
01:45
a story to tell you about a particular migration where I did some weird things to migrate my site from state A to state B. So the state A was that we had a Plone 4 1.3 site and we had to reach version 5.06.
02:03
We had the custom UI and we wanted to upgrade Plone and install Quaive after it is the latest Plone 5 version. And we had a different set of a Don from the two states. And one of the important things was that we wanted
02:21
to get rid of archetypes and switch completely to dexterity. We also had Solar 3 in the original site and we wanted to upgrade to Solar 5. So what does the standard theory says about that? That you first should upgrade to the latest Plone 4 3 version. Then to Plone 5, then install Quaive.
02:43
Then call ATC Teemigrator to change your archetypes content to the dexterity ones. This is first class traveling because Plone out of the box gives you great tools to do that. But then we have some tricky parts. Of course, we have to upgrade Solar.
03:01
We also have to make the Don's compatible with Quaive and install the new one and install the new one. And maybe there are some compatibility issues. And so this is quite tricky. And so your first class traveling starts to be a low cost flight.
03:23
But there is another part. The DataFS was not that big, was 10 gigabytes, but we had 300 gigabytes of blobs. And this is tricky because we had 200,000 archetype files. Each one with previews. So for every file, for every PDF,
03:46
we had a preview of the pages. And we also had versions of those files. And yeah, also 5,000 users that were using that site in a nasty way for years infected your DataFS
04:02
with nasty things. So we made a rough time analysis of what could happen if we just follow the standard path. And just getting the fresh data would take hours just to seeing the DataFS and the blobs.
04:21
Upgrading to Plone would also take hours. Upgrading the products will be tricky because of compatibility problems and whatever. And also migrating from archetypes to dexterity would take days. And of course, you cannot anymore proceed in this path.
04:43
So why it will not work? Because also in the development phase, because you develop a migration, you will suffer because it will be too slow. Also, we agreed with the customer that the migration should take one weekend. And we wanted in this weekend to do some migration
05:02
plus quality assurance and maybe fix some last minute problems. And of course, if this would not work, we would have to roll back because we cannot block 5,000 users that were using this tool as a central tool for their work. And so we need to cut down the development time
05:22
to less than one day, at least. So we define a strategy. First, we had to define what are our environments, then making the migration procedure convenient and automatable and identify the bottlenecks and remove them.
05:41
We had three environments available. Of course, the production cluster. We had several virtual machines that were hosting the production data and production instances and staging environment where we and the customer could test if the application was working right before and after the migration.
06:02
And we had just one server, one virtual server dedicated to the migration. The server dedicated to the migration was not that powerful. It was just eight gigabytes of RAM, two CPUs, and quite some space to store the migration data.
06:21
And we decided to perform the migration using a view. Why a view? Because it helped us to organize the code. Each function called by this view would be a convergence step so we could reach the important states
06:42
while doing the migration. And it allowed fast development because when developing, we just called the plumber load and rerun the migration view and the modification were applied immediately. Another thing is that if you wanted to call this view
07:01
to a script, to a bash script, we could just make a call query to our plumb site and the call will call the view with the proper cookies for authentication or whatever. And this is better, in my opinion,
07:21
than using a script because every time you modify it, you have to kill the instance and rerun it and it will take time. So how the migration view looks like. The call method is just made of several calls
07:42
to different step. Each step should be convergent so it should check if it should run and if it should, it will make some modification to the datafs, then log what it happened and then commit. So after each step, there was a comment.
08:01
What did we achieve with this? That we had a well-defined, reliable and clean upgrade path for our datafs. So it was already one step because every time you make a migration, you probably have to do many things and you don't want to have a checklist on a paper and remember, okay, I have to do this
08:22
and then you forget and you screw it up and you have to start again. Yes, we also minimized the need for manual operations. We also had another problem. Just to rsync 300 gigabytes of Bob, it's quite a big task.
08:42
So even if you have already synchronized it, when you make the migration, you want the latest blobs and rsync in them will take a lot of time. So the problem is can we start the migration without the blobs? And the answer is yes because using this package,
09:02
experimental graceful blob missing, you can replace your missing blobs with placeholders that you can overwrite data with the original data and this allowed us to start our migration fast. So what do we do here
09:21
is that we prepare the migration environment and you do that once and you copy the datafs when you need it. So when you have to make the final migration or a test before the final migration and you can sync the blobs in background. Of course, we also decided to disable solar
09:42
because we will take care of solar later. What did we achieve? That now we can start the migration in minutes because who combined one single file of 10 gigabytes is quite a fast thing. Can you do it for your own migrations? It depends because maybe your migration procedure
10:05
must have solar or the blobs or whatever and if you can live without missing parts, you have to be able to fix the missing parts later when they are available.
10:22
So and as you will see later, we will be able to do this kind of thing. There is another problem now that just upgrading Plone through portal migrator takes quite some time and there are reasons for that. The first reason is the portal catalog
10:41
because there are in the Plone app upgrade sometimes you add an index at the metadata or whatever and you have to take all your brains, get the object and re-index it to fix the new brain, the new index or the new metadata. And okay, this takes quite some time and it is 45 minutes.
11:04
Then you have another index and another metadata and you have to repeat the same thing again. So take all the brains, all the objects and whatever and this will take other 45 minutes and then again and again. So this was quite a huge blocker
11:21
and at the end you had huge transactions and you had to interrupt to monkey patch this procedure to make intermediate commits because otherwise your virtual machine would die. And so we have been bold, really bold. We have decided to go brainless.
11:40
So one of the first step of our migration view was to wipe out the catalog. So we took the catalog and completely cleared it and then we run the migration and the treatment was that the Plone upgrade instead of running in four hours, it took one minute. And this was quite an achievement because we didn't have to get all the objects
12:02
from our database. Probably it was just fixing some schemes, setting some Plone registry and it was all. So can you do that at home? For us it worked because we plan to modify all our objects when changing them
12:22
from archetype to dexterity. So they will be re-indexed at a later stage. But another problem could be that some of the steps might depend on the catalog to be there and so you have to check if you can adopt this solution.
12:40
But anyway, if you have not those concerns, you will be amazed of the advantages of going brainless. So we have another problem with the add-ons. Of course, if you decide to go first to the latest release of Plone 4.3 and then to Plone 5, you have to run twice
13:03
the build-out, you have to prepare two environments and whatever and you also have problem with changing versions of the add-ons itself. Also, maybe you have to make these add-ons compatible
13:20
just for uninstalling them and uninstallation is always a problem because you never know if the uninstallation procedure is good enough to clear everything that you don't want anymore. So again, we've been bold and we went straight to the goal. We decided to provide one build-out so skipping the Plone 4.3 latest migration
13:43
and we just started with 5.0.6 and our build-out was not containing anymore the unwanted packages. And yeah, of course, now the instance will not even start and so we have to heal and clean up the DataFS to cope with the fact that we don't have anymore
14:03
some classes or interfaces. What is the solution that we decided to adopt? We decided to use the alias model function that allows to replace code that you don't have anymore with new one. So these allow us to remove all the unwanted add-ons
14:25
from the build-out and start already with the latest build-out. The final build-out. And how does it work? You basically provide a missing classes model
14:45
and you import this alias model function, sorry, this alias model function from Plone App upgrade and you say, okay, for every classes in my missing package or for every interface in my removed package,
15:01
alias those model with the missing classes one. And how does the missing classes model look like? It is like that. You can import whatever you want. For example, you can specify your own interfaces or use defining persistent tools that are not there anymore
15:20
and eventually you can provide even some functions there or attributes. And another option was to use wildcard fixed persistent utilities, but of course you don't have the ability to define missing functions that you see here for this missing persistent tool.
15:42
And so it is a big advantage to use this alias model function. Once you have that in place, your instance is running and you can start removing the enemies. The first enemy are all the broken persistent utilities. And here it is the code that we use to remove them.
16:04
Okay, this is something that you can read later when I share the slide. And also another thing that we fixed was the portal setup because removing the packages abruptly left the portal setup not in a very good shape.
16:21
So it was even quite easy to fix it. And also we clean up the skins. Instead of going for running the generic setup XML, we just decided to take the portal skins tool and to fix it and removing all the selection
16:42
we didn't want and just to set the good layers. And this is something that it's good to have because after years of usage upon site, the skins layers can be quite messy. And okay, there may be other things to fix.
17:00
For example, portal actions, old portal transform, old portal types, some site properties may be migrated to the portal registry. And this is the point where you want to get rid of some obsolete content that you don't want anymore, which is always a good choice when you want to perform immigration,
17:22
remove everything you can. So what are our achievements that we never needed to have intermediate build outs? We focused on the goal and not on something that we will disregard at a later stage. And we have been providing and getting rid of a patch
17:43
in an easy way because the code is quite self-contained and it's really easy to provide a new missing interface or remove this alias module trick.
18:01
Can you do that? Probably yes, but you have to know what is happening in your .fs and you have to know how your .fs should look like at the end of this process. But the biggest problem for our migration was the time needed for the ATCT Migrator.
18:24
And the standard migration is a big problem that it pretends to create new instances for your objects and then the new instances will replace your old instances. This means that with 300 of gigabytes of blob as a huge cost and also I'm not even sure
18:43
that all the old versions are kept and so you will get a duplication of your data and maybe you will die before the, either you reach the desired state or the procedure complete.
19:03
So also this versioning problem, I think it was not solved at the time. Well, what was our solution? So instead of replacing content, we went for cosmetic surgery. So we went very careful and decided to avoid
19:21
to create new instances for our content but we decided to modify the existing content in order to reflect the new state that we wanted. So we also didn't want to create new blobs because duplicating 300 gigabytes of blobs is an nightmare and they are already there.
19:42
So being that the catalog is now empty, we decided to take the zopfind and apply script and run it to the portal and apply this self migrate function that basically does this. It will take an object in the path because this is the way that it should be called
20:03
by zopfind and apply and it will take a migrator that is the correct migrator for the object and based on the portal type on where your object is and when you have the migrator, you will call it. The migrator does something like that.
20:22
It's a really simple thing and it has a call function that changes the base class, migrate all the fields to adapt the new class and reindex the object and resets the modified date because reindexing the object will also reset the modified date
20:42
to the moment that you call the reindex object. How does changing the class work? Changing the class is you just import a new class and you detach your object from its parent in the zodb
21:00
and you change the dunder class attribute with the new class and you reattach the object to the parent. Why you have to detach and reattach? This is to win some job caches. I'm not really aware of everything that is happening there but I know that this works most of the time
21:23
and you can try it yourself with the Python interpreter. You can define a class A with a class attribute A equal to one and the same for B and then instantiate A and the way you call, you can even define a instance attribute C equal one
21:43
and then switch the base class for A to B. And when you call A.A, you see that the class attribute is not there anymore because now the lower case A is an instance of B. So it has an attribute B which is a class attribute.
22:04
It maintains, it keeps the C instance attribute and they are both equal to one. So this works perfectly and I really like that Python works like that. It's amazing. And now we have to, of course, change the base class
22:21
but we want to adapt the fields because maybe they have been renamed, you have to initialize new ones and you have to update them because before they were datetimes object with the camel case and then you want the regular Python datetime ones. And this is an example.
22:44
We import the old blobs from Plonablob, the new blobs from Plon named file and we have a function to migrate the file field. You pass the field name that it will be file most of the time. You take the old file
23:00
which is an instance of blob broker. You instantiate the new file and you start copying the attribute that you want to migrate and the most important attribute is the blob and the blob is unchanged between the two implementation. It is always a ZODB blob blob and this means that we are not touching the blob
23:20
on the file system. We are just moving the ZODB blob object that contains a reference to the path on the file system and then we set the file name and we set the attribute on our migrated object. Okay, then the final step is to reset the modification date.
23:41
Of course, when you init your migrator, you want to store the old modification date to reset it at a later stage after you reindex your object. Can you do that? This is something hard and you must know perfectly your content because every migrator should put all the bits
24:04
in the right place and you have to take care of market interfaces so probably you need some more method to fix this also and you have to take care of the ZODB cache because sometimes it may happen that even if you change the dunder class attribute,
24:22
the object after some time goes back to the original class. What are our achievements? Okay, now with this trick, the data phase is ready something like 12 hour. This means that we can launch the migration
24:41
before going to bed or at 6 p.m. and wake up the day after and everything is in place and this is a really nice achievement and we never needed to actually touch the blob on different system and we never needed solar for the moment and now it is time to fix the missing part.
25:03
What are we missing? Basically, solar that can be migrated in parallel because it is just something that is related to the solar application and migrating solar is quite fast. It can be done in half an hour. You can do whatever you want while washing your teeth, for example.
25:21
At the end, when the data phase has reached the state that we want, we can just make atomic updates that basically reindex everything with the new data phase data without touching the searchable text which is the index that takes much time to be computed
25:46
because maybe you have to take all the PDFs and office documents and convert them to text before indexing. Now it is time to copy the data to staging and production
26:01
because we want to keep the migrated data untouched by humans. Only the migration procedure should touch them and that the data phase is fast. The copy, copying the solar data is fast and we cannot sync the few new blobs
26:20
that maybe the migration installing new products created really few blobs but it's okay. You can wait for them to appear even when the QA phase is running. So this is not a big problem. So this was our site before
26:41
and then this is the state of the site that we reached after installing QAVE and it was quite a nice achievement and what we get is a big thank you because the biggest achievement that you can have in these cases is a thank you from your customer
27:01
that is happy because they have been really, they understood that the value that you delivered and the professional way that you used to deliver this thing. So just a small recap. We made a nice migration planning in my opinion
27:23
and we always made the worst part that was syncing blobs and touching them in the background and never needed for in any step and the catalog was clear before starting and this allowed a nice pro-migration
27:40
and we used the trick of the module aliasing to start already with a final build-out and we used the cross-migration to speed up this archetype to dexterity migration and we used atomic updates to avoid reading the blobs and recalculating the searchable text.
28:02
Should you try this, maybe you can decide to apply some of these tips to your particular use case. In many cases, the standard migration tools are perfect. I really like them and in many other cases, you may decide to go with some other things
28:23
that are more fit for the case but for example, transmogrify, whatever fancy export and import procedure and okay, there are also some things that I did not tell you because you can use the dunderclass method
28:40
but you could also use the alias module or the zodb update package that makes something similar. Also, a syncing blobs is something that can be improved and we did and in the migration, you may need to also take care of other catalog.
29:04
For example, the reference catalog or you have to take care of the link integrity and we of course made some other minor modification to the .fs or some other tweaks during the migration. Can we improve this further?
29:20
Yes, of course we could. We could speed it up even more but basically, the migration started at six in the afternoon and I didn't want to improve it to wake up at four in the morning. So it was enough, 12 hour was the perfect time. So I really put a slip, I didn't tell you that.
29:43
So thank you very much. Thank you, Cisla, for making this possible and thank you all people because it's great to be here. Okay, I think I ran. I don't know how much time do I have. Okay, so perfect, thank you.
30:06
Yes, David?
30:25
Yes?
30:41
Yeah, yeah, yeah. Also, another option that I had was taking all the indexes from the catalog. I don't know, should I repeat the question for the video, for the people at home? No, the idea is to speed up further the migration just to re-index all the indexes
31:01
that I know that should change and this is a smart idea but at the end, the only index that would take a significant amount of time to be calculated is the searchable text. So one thing that I even tried and it worked was to take all the indexes and remove the modified index.
31:21
So I did not need to do the, to reset the modified index at a later stage but at the end, would not modify substantially your migration time. Yeah, so it's a nice point but then I needed to wake up half an hour before, so.
31:47
Okay, so, yeah.
33:41
Yes, yeah, yeah, yeah.
34:01
I did the same also with every content type and yeah, I also, I will share this presentation later and there is actually a link to your talk that you did in Bucharest because it was amazing really and it describes completely the standard path
34:20
that you should follow and everybody should be aware of your talk and okay, and I'm really glad that things are changing and you have this improved so much this migration. Yeah, yeah.
34:42
You also have to thank that guy with the red shirt because I was inspired by a blog post from David and I think it was also Martin that was discussing this for changing dexterity classes.
35:00
Yeah, for that, that's, yeah, yeah, yeah, yeah, but we know that David is a great guy. Okay, thank you. So, I was too fast. So, you made me too much pressure with these technical problems at the beginning.
35:24
Sorry? Okay, so I hope I was not too fast and you were able to follow anyway. I put, this is a Google Drive presentation and I will share it online. It is, it has quite some notes there.
35:42
So, maybe even when you are at home, you can have something that is still usable, okay? So, I think that's it. Any other question? So, thank you again. Thank you.