Logo TIB AV-Portal Logo TIB AV-Portal

Why storing files for the web is not as straightforward as you might think.

Video in TIB AV-Portal: Why storing files for the web is not as straightforward as you might think.

Formal Metadata

Why storing files for the web is not as straightforward as you might think.
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Alessandro Molina - Why storing files for the web is not as straightforward as you might think. DEPOT is a file storage framework born from the experience on a project that saved a lot of files on disk, until the day it went online and the customer system engineering team decided to switch to Heroku, which doesn't support storing files on disk. The talk will cover the facets of a feature "saving files" which has always been considered straightforward but that can become complex in the era of cloud deployment and when infrastructure migration happens. After exposing the major drawbacks and issues that big projects might face on short and long terms with file storage the talk will introduce DEPOT and how it tried to solve most of the issues while providing a super-easy-to-use interface for developers. We will see how to use DEPOT to provide attachments on SQLAlchemy or MongoDB and how to handle problems like migration to a different storage backend and long term evolution. Like SQLAlchemy makes possible to switch your storage on the fly without touching code, DEPOT aims at making so possible for files and even use multiple different storages together.
Keywords EuroPython Conference EP 2015 EuroPython 2015
web app server means processes validation Development NET framework Right objects libraries
Slides files Development code constraints time images Guide sets storage Mass part powerful web image Lecture/Conference core environment testing disk cloud choice scale decision server projects storage plan lines several production proof Computer animation software case cloud libraries
Free Computer animation files software case files software law disk applications
scale files varieties time storage sets applications period Computer animation case memory software file system testing classes scratch disk classes
choice constraints files Development real time storage water lines applications production mathematics Computer animation environment software phase framework life configuration testing classes testing disk web
files key ease of use clients part applications mathematics web applications mathematics Computer animation environment angles framework environment testing framework macros web
point suite implementation server Sequel files varieties time unit real time shape theories second production web mathematics service memory environment testing framework Office web mobile response key Development moment projects storage ease of use basis applications mathematics web applications means words Computer animation environment case Blog chain framework life testing sort libraries record
server Actions runtime files time directions Guide storage mathematics service file system environment configuration model structure extent physics tasks web Multiple key runtime files storage ease of use applications mathematics Computer animation framework configuration testing Right Free reading
runtime time part dictionaries photos image mathematics different configuration core framework model systems area web regression Development files storage Types web applications data management interfaces configuration Free middleware write spaces files Sequel images storage Content bits inverted fields specific configuration continuous conditions default default Multiple choice Audiotrack key fields gute print Computer animation case formal classes
point bottom functionality Actions histograms files states high resolution time argument font part fields number web image Avatars specific memory different configuration model extent systems addition Multiple information construction Content storage Databases maximal Transactional Types data management Computer animation Query case Universal website
implementation files Super Content storage maximal image means processes Computer animation memory logic case website
Slides files code time high resolution unit dictionaries part fields number Avatars crashes bridges thumb exception information formating Databases category Location words Computer animation case phase objects
point content delivery network area addition files varieties time volume part data replication production web category Types words Computer animation environment software case communication core Universal
Computer animation demo
server functionality Identify files code time sources clients production powerful versions goodness terms core level box framework utilizes systems web information cellular binding storage rollback Transactional lines Types case logic Right sort Free
but several really we use of course objectives C and job for mobile applications but for everything which relies on
a server needs invited I've been a member of the 2 yes to framework of development team of for the last 40 years you don't know if the it's 1 of the oldest web frameworks together we jungle and I contributed to various pipe on the web were libraries like this morning to be object opened up a meaning which is use that's was . net for related to move to the right you have been there may be common no since the and I worked almost all those we just informally called which are libraries related to uh validation landforms for the
web so most of my work can be related to the web for the past 2 years what I'm going to go is about the project that's really happen and we had the power company we started as just as a plane proof of technology the customer came and say a I want to try my ideas see if he can reward works properly people can use the and it's not the huge mass and something like that so we start out with a very simple code base that then we can define the problem would became what the customer as you saw it happens always like these the customer Montaigne's we something is just an idea test and then begins to frontist and that the core part of this was that the saved a local files most of the images in this case so we decided that as it was just a proof of concept and we were really on the budget we should be done liking 2 days with this slide that to not rely on the cloud storage because it will be involved more time to bringing in many library to store the files and more time more money to actually pay for this knowledge itself so we just decided to go for storing files on the the scale the lighting and gene set of them so the most simple solution because it was a really simple and for a proof of concept was good enough the issue is that they cast and a technical guide on the side and this guy was in charge of deciding how to solution which several switch infrastructures and so on and he started the real problem because the customer provided as defined that this using we've there where this software is going to run just 3 days before the goal line so we didn't know where the soft and was going to run on the of 3 days before the public lunch and the issue is that as they were obviously shot the margin because at the beginning it was just a proof of chronology this and decided not to rent Yeltsin and this was
actually my face and they told me because they
decided to go for the was possible solution in this case they went for a free solution on a and the rocket doesn't support storing files these let you can store fires and use their we just disappear whenever the application assurances so actually we could then deploy the software on the black because we start 5 restored them on the east and we knew that whenever the application started the 5 was just be so I hope that this was a huge all that's
right before the lounge remember that we had like 3 days before the goal liable the world softer and so
we decided to to a variety of everything we had from scratch everything related to storing files generating polymers of blood and making them available serving them everything we use just playing we just relied on the genes so so that we just to save the file on the same set of them we have to switch everything to another solution which could warrant Weaver rotten in this case we decided to go with period FS which is the file system storage of more to be only for any of you knows what these are actually because the application relied on to be afforded up debates among the BBS suffered for storing files among the itself and it's actually a really good support because the scales through Monday be and it's pretty fast and some them because they just a key-value storage so you just put w phylum would be with celebrate is and it's really fast because is going to serve it from memory if the files is able to or to stay in memory the issue is that it was just a new edge have we didn't have time maybe we cool have time to write it properly but has we wear in public panic we just started to look for
the fastest solution to make everything working and so we monkey best all the classes that were going to say that but as replace them we have something that saved along with a fast and then we monkey patched
our we said about to actually whenever a specific path was asked that he went to offensively didn't buy and sell them back so it was actually a huge mess and the and it went online with practical testing because we finished the it's like the day before we tried it lower testing environment but we didn't try only the real world deployments so we didn't have time to try to 1 another about what application them and so we went online we just the solution after we lines and things got everything water and so we didn't have any major failure because actually what we did was we use uh we came together and told that we actually needed a better solution evolves obvious for everyone in the team that this kind of thing should not happen anymore we knew that the customer change the idea we knew that we did the best possible things with the budget time and knowledge we had at the time but still we had initial steel we did the wrong choice so we wanted to find a solution that war independently from their budget constraint from their customers change of change of requirements and ideas and we decided that the solution should be a tool that our developers could use and just rely on the tool and don't care about how and the way the trials are going everything related to storing files should be moved to their production to the deployment phase with
configuration and not the decoding phase so that's how actually people born we it deeper for that to make our life easier to store files and be able to just say people store this file I don't care about where you're going to story I just wanted to be able to give it back to me when you will need to serve to
to the client actually we wanted it not only to be rebuffed cost of the fast enough for most you web application use handiest thousand arrests in part because they started seeing how it was the best to design a framework there should be using a Web application environment and was related to story in files there are a few things I let by
working on 2 and we assume for a few years to reuse through as being used like since 2007 if I'm not wrong and so he the evolved a lot we saw a lot of changes we started with a template engine which was named to keep that we move forward to gain she and now again she's not support them anymore so we're going to move forward to Tajik and of course every year of everyone of users needs to be able to continue to applications and for example some of our work uses the the light KEGG angulation key that use the ginger to some use the macro and so on we need to be able to support all of them allow users work with all of them so what I found is actually that web applications a lease on the part of developing them out you much like a little keep the here for a lot of the issues they want things like they
want them to be and they might change their mind like every 5 seconds OK whenever you're working with developers on the web what the weblog is a really fast but so you're infrastructure might change any time you might start small then you have like 10 thousand users the next day and you need to scaling changes everything you infrastructure like chain start with the specific technology you decide to go we've started fights on the east and then the next thing you need to change to monitor the for starting fires because you need to scale or you develop those just like the previous idea any more or maybe the library you're using like in the
case of uh key when we switched to then she and so everything you do so there words requires that the fire more able to change or real-time while on production because the web the threat environment changes for the Office of for various reasons not all of them are good sometimes he change just because the school to switch to a single acknowledges things like that but whatever you use I want to be able to change with the working on and that the point is that automatic testing is actually something which is done for Yale almost web obligations because it's easy to simulate the environment it's easy to perform a request and check their the response so most web applications want to be able to provide to provide automatic test and the test suite so wherever you variety framework for the web or should make really easy to uh Munch press the framework for 1 approaching is the wrong them but to derive the framework in a way that these good for making it easy to write tests so to simulate the production applications without needing wall production infrastructure I making an example Signal alchemy is really good and 1 of the reason why it's really good that is able consequent life because when you overwrite test you don't need to set up the wall a sequel environment or environment just run the test with you can do that you can go we seek a lighter you can even go we see polite in memory which doesn't even need to step toward you about the basic when we decided to choose their among the sort of library for 2 years to because whenever you start a new project into where users can choose to go for a sequel about the basis or moment to be we decided to go for means because mean that the future which is called the mongering memory implementation which made it possible to brightest unit without needing the mobility and all its simulated there was no need to be server in memory so you can create a records check them and so on without needing to to even start the and the put should be able to do the same thing I want to be able to savor files without needing to actually start the fire storage itself or without needing to actually upload them on this free if I'm going to use some of the web services so and the last point is that while I land these actually been making things really simple and easy-to-use leans over providing them a huge amount of features riding in huge amount of features so requires a theory of the investment in trying to keep them together moving the forward keeping them in shape and so on while uh and usually you are not able to cover all the use cases some of the features because maybe you're
going to use just 20 per cent of the features but there would be 1 of few user which we rely on the other 80 per cent so just focus on the really important features let users right
extensions over them if they would be a good foundation is solid then people we start relying it for writing their own extension this is 1 of the reasons why for example the book doesn't have a file system of structural doesn't have director is he doesn't have the concept of collections of flies you just saw a file you want a direct year key right to self it's not hard to store files to so they pointed to the file somewhere where you can have the working so and in fact there is a guide which brought the put their fast which is an extension for the that provides support for file system finds the light because it works also on things like predefined which do not provide prices the model just can save that file and you cannot say I want to every group all files in any way so the 1st thing we focus on these to allow for for such changes because that was our 1st problem we had we face that problem so we knew pretty well uh what we needed to check and what we needed to do so the 1st reading free things we decided to do so was to all out to configure multiple Storage Engines so whenever you use the boat you can say I want to say something here something there's something else there to I want to free the french Storage Engines because I want to use a lot of sense also agree that fast and also almost a web services as free and we wanted to be able to switch Storage Engines at runtime Weaver gracefullest stopped of course not that you can actually sweet cheating you configuration something the server unless you probably Reisinger and he didn't have to we should continue to keep working on the previous itself Phys so you can say From now on upload files 100 fs not everything I uploaded only these should continue to war and people to do that and we want task was to be able to rely on multiple storage is concurrently so not the you know that not only you could that be the fast as free or whatever but you could also use them in your application at the same time and this is because
actually happened for a young 1 of our users came say a people this is really cool but I want to start myopic does might be items are applaudable my social network there and whatever is a temporary file formal use should be on these 2 so how can I use free different Storage Engines at the same time and this has being like the 2nd question we add on people so as being the area of need from
our 1 of our users to be able to use multiple Storage Engines concurrently so whenever you upload a file if you don't not specifying anything defined roles to the default Storage Engines storage not Storage Engines you specify something you can derive the fight to be uploaded on the specific storage and storage are actually even defined by a name so that storage right now can beyond fs but you should configure a new which is named the same bodies on as free you wildfires continued to be self from fast and whatever your problem you will be set out from best free because people those that they all 5 on longer read the 1st and then you find solace free and you're still using the their storage which is names of artists in case of use of the images and then you can of course uh use multiple of them on during uh during runtime and that's made possible because the boat as I told you has no concept of a five-year key so it's able to indentify five-spot by an idea and the IDE space to the storage and so every finds meaningful even defined by an idea and so as far as the storage as the same name in the file has the same idea you will be able to look out for that file even if the underlying storage changed and the other part we wanted to do is provide a really easy way to use everything so we provide something which is called the but manager which is in charge of actually doing all the configuration so that it will work on practical the anywhere framework and we were not bound the for example to using their unique files which is what we use interval Yale's for configurations you can use the Yamal or whatever you want for storing configuration or you can even write configuration in Python itself because the people to manager is the 1 in charge of keeping their young configuration and is able to lobbied from various users or from dictionaries or for whatever and it keeps what you currently active and continuous so whenever you need something you will to the manager and say hey people limited you me the storage I don't care where it is how it's going to Europe and knowledge what's just give it to me and I was say the file them and if you don't want to get any specific dollar you just ask for a storage and it will provide you the default the so this is an example to from the the condition of people which is the most simple case we are just getting configuring it storage getting this storage itself and starting the file on on the stock so you can see that the regression in this case is made truly a dictionary and we are continuing the fault storage in this case in knowing the fault and the storage uses that during the 1st began and provides some additional options which are related to the back itself so in this case it provides the moon will be your then we get this storage itself in this case we don't specify any specific storage so we are actually getting the default 1 and now we just created the 5 whenever we create the file on the storage we get rectified idea and we can do back for fortifies through the blockade mutable of the storage so you see that face is pretty similar to to be sure it's just creates something you get it back by nothing more and nothing less this is the core foundation of people and all the core foundation learn more about something more complex things we need we focused on providing a solid foundation on which we could actually implement more advanced features and 1 of these features is their support for about a based systems like in this case we we have support for sequel like me a so you want to start a fire which is somehow related to your model like in this case in the case of a user you have the orbital and you want to start the outer inside the user you just the clavicle which is off by popular the defined field and you can specify the uploaded type in this case is an image we Bombay so whenever you upload the image it we also get the time is to and then whenever you save your document or user you just lost default to the and people to read up on the tone with our system or whatever uh storage you wanted to so original specify and under the full 1 and we link it to their to their actual model itself so uh I told you that 1 of the things we learnt is actually the web application change softens maybe the developers change maybe the the knowledge improves whatever so you should be easy to support different technologies so in the but we want we focused on making everything a layer over a for example we support for sequel i can attachments we have support for attachments we have support for storing files on this free lots of flies and you read about this and we have implemented everything as like genes so if you want to support starting 5 so you all see stem or whatever reinvented yourself you just write the plug-in and everything else in the book continuous toward this o'clock in support will continue to work even if it's freedom knowing you were work on because you just need to implement the storage engine and nothing else and area in flies made by whiskey from whiskey middleware so you can use it to mean anywhere framework we use it we 2 about here's what if you our frost user you can just touch the framework to flask and long actually most of our users are actually excuses because of because currently it's what most commonly used for Web API so suppose and then
the plots together you view database is known all these it's actually a very young query it's called a query of this despair the handset really really long SQL and
what it means that the way to you about the basic thing is that the copes review transaction for example you uploaded avatar of the user by saving the user is optimal oplog updating the user fails you transaction gets small back as far as you have a transaction manager probably working and then he puts that that the dual transactional back and we'll recorded the previous states of the 5 so if you know try that and saving the state of the user and the state includes louder and a new name and surname and storing the name and surname fails for whatever reason maybe a dialog on something in your query or whatever the boat with that the air we recover the previous states of the opportunity to so you you would even say things out only the other 1 more in the new models we change in a in a proper way whenever you you delete an item is actually deletes the attachments only if the election of the item properly worked on the bottom if you fail to delete the items you don't end up with an entry which is in your database but you don't have the optimal anymore so the book detects the optimization afraid that we recorded the files that wanted to do and the last thing is that you should be a really easy to extend so we focused on 2 types of extensions to provide additional solver of the book 1 user at actions themselves so whenever you provide an upload fight field you can provide them up what type their punishments are actually in charge of changing defined itself so whenever you want to replace the fired new file you want to go far enough for an apartment and then you can over attachment type you can also provide feels those few dozen them all to replace defined itself on multi able to change the content itself but they cannot additional information to the content which might be additional made about our war Additional files in this case and you can of course outline what about the those for example you might have a few we generate from this they might apply for them because you wanted more small medium and be companies it just like Laplacian filter 3 times with different construction options and you will end up with 3 different parameters let me show you the reality is often not parchment which used to from the uh and documentation of people and the it the interesting part is assured that they not only can change the content itself but they can also add additional year goes to the fires what does it mean it means that whenever corporate from your phi system it will be converted to that top plot died so use your uploaded by provided additional like for example I don't know uh you mean histogram of the image you can call them on your stored 5 so the point will the duration of this type of applaud and we'll be able to recover its state and provide all the additional features and the universe you find some not only to adjust or change the itself or for example if you want to add additional information is the I Q 1 to store not only defined by the also so for example the a number of their pre primary color for example if you want to look for the images which are red you can start that inside the file as a means of that that because people keep stocks of the fires and all the meat about all over the 5 so you can add additional details overview files and this is the example of a customer parchment in this case it's uploaded a major ally let's that's it's bigger than a specific resolution in case the image is bigger than that resolution guest shrinks to that side so the 1st thing we do is getting the content itself and its data and this is done through to wealth of functions because we don't know what the content use we know that the point is going to say 5 smuggling don't know what the user is going to provide us for example in my provided fighting might provide a bytes in memory my provided by you what you might provide is she G field if you want something uploaded the from the web and we have these pretty convenient financial file from content that whatever is the content with convert it to a proper of and he's pretty efficient because users seen memory storage for files which are smaller than the sides and then stores them on the scholarly if the sites is bigger than the maximum size then we opened a image checked for each its sides if there
uh sites is bigger than a specified meaning we create a new foundation for the image of the maximum size of we replace the content we see you we see that in this case we we replace the content variable we temporary file which is that kind of temporary file which stores everything memory on the you make it bigger than the maximal sites specified and then you save the image itself inside the use of temporary file and then gone and brought to the process column the replaced called so you just call your upon intermediate weaving in content and in the middle you can do whatever you want because there logic saving the files is inside your parent implementation
morning to few we already know that we already know that attachments can have more than a few of them and then we already know that they run after about so wildly attachments uh arise during
upload before you define units uploaded so in this case this is by design because if we fail in generating the Tommy we don't want to uh to go on and stored the data in the database for example and and we
end up with a user without them without avatar again so if you have a lot of review the user phrase there are people crashes and you won't have the out of the user created so not only if writing on the but the baby's face is the bold recall those defies but also if it's creating the fighters phase you have a proper exception before saving the data about to that of a bridge so we tried to do the best we can to keep using the few things if any of the 2 phase you don't have done anything you haven't done anything about then so in the case of field there's they're actually you don't know what the word to before uploading the flies but laughter why because the there's usually provide additional of behavior same formations so in case he fails the field there is we just go on and provided that is that define their future fail but you you already have the file so you can recall from the additional information from the existing files so even if you use secondary dominated phase the medium science prometaphase who is not a huge issue because you carry create that medium-sized assignment from the origin of the and as I told you you cannot be sure they thought to unifies but in the case of the does not being used so you cannot have additional money told to a object to a few of them and there is a simple example of a few of them which actually saves the common names for a specific resolution in a specific format and you see that we just received your beyond CiviEvent even inside the answer you and we have the the uploaded file and then at the end of the course which most of you just create this the thumbnail we just added to upload a file any information we want in this case we of time the like the time that part comedy URL to the upload a file so uploaded files work like dictionaries you cannot do anything you want to them and you have this slide itself so the content and or the middle about you add to the 5 but when you look back at the 5 so you really back for your database you just have the time and URL property because we added each here at the end of a 1 code so you just get it back and look for that problem is that number you're rallies non probably parliament failed and you can recreate the from the original and 1 of the
core parts of the point is that the demand for the web is specific to the web so we
wanted to make it easy to use their content delivery networks and we wanted to make easy for people to rely on the books for saving serving they to the web so everything which is needed for sending files themselves is provided by the itself so when you saw 5 epochs already gets the common type the last modified time the Continental Divide itself and the final name so when is something that you can property of the university http goes for the file without having to work on them yourself and we already know that whenever you want to serve them you just rely on the whiskey me the so you just 3 make them the wire and variety to run the replication and the point with the with the proper thing to serve the files and if they're pretend your storing the files on subparts http self for example in the case of s free uh you you can the issue that the middle where we must have the final itself but we redirect the use of the middleware itself so in case of their in that area network you will end up serving the fighters from you constantly doing that so please try if you have questions so if anything let me know if you find lots of anything the more than that to fix them everything is supported from Python 2 to 2 . 6 2 by 2 and 4 4 we haven't tested it on 3 . 5 but additional war everything is fully documented so if you find something missing in the communication let me know we will forward and everything that we want under present so you can be pretty sure that the words that we already using production volumes environment so try to to
learn new things few thank you questions do you have the microphones them but
104 demos that yet
this and related at the the OK
um OK asked how much it costs so much effort will be required to make it work on a single most framework when we use the term production on GMM but German is not a really have a single was framework but it's a single but is far different from a we passing you for example said because thing please see the single most and not explicit single mostly so I can look I'm not sure how much it will take to about their middleware itself to something like to leave which will require to move from from the function to and so on but you should be fairly easy actually because he just gets the files and send it back to their to the content solely to the to the problem so it's a pretty good use case for a framework and the media itself is just 1 of the lines of what source even if you have the right to health care actually would they like to allow us to move forward OK so that the is already divided in utility functions so the coordinates of the cells that fight it's like 10 lines of code which you can probably moved to people or something like that but I haven't tested it on the news media GM and and I know that 1 year and you won't but she was OK so we have a good so you mentioned that in case of a rollback I used defined as the so do you need some sort of storage for did you put itself forward a method that that there no actually what happens is that the book generates a unique ID for each file so if you will create a new version of the file you often end up with a different idea and you old idea gets the heat and all you when the new 1 when the transaction gets committed so you for a for a time when you have the transaction volleying you have bought the fires and they have 2 different identifiers you that transaction goes on and supplement sources fully commits it we say hey these new 1 is the proper 1 delete the old 1 is that on sexual practices a the old 1 was a proper 1 believing you want so it just keeps both the files available at the same time and then decides we one to keep it at the end of the transaction then you mention it is transparent to switch for away from 1 type of storage to be either so when you get a request for a file how how do you know if you need to study descended from the all the storage system and a new 1 that's actually stored in the about by itself so for me every Storage Engines need to provide support from some kind of me about that in the case of greed affects the stores the myth about up together with the file on the being case of rest free use toestimate about power as HTTP those of the of the fight itself in case of their 5 system is saves the genes on file with the myth of and so on are a storage and use in charge of providing a way to augment about onto the 5 and their people to that rely on the MIT about optimal 4 where issues have the file itself but when you get the requests from the user you only know the file names so How do you know let's this stuff that not really because when you store defined at the low level we only know the finding by the few bound their their their fight with Coleman of the people like me on the would be a whatever inside the core we get such a the J so we have various information including for Web 2 . 4 additional devices that 5 so if you use the but the low level yes you have to provide a fallback Europe euro self if you rely on the level at the eyes they already provided for you at so does it out of the box a poet of uploading to preoral something like history uh sorry it all on on like swift and I think history as well we couldn't be provided the temporal to upload directly from the plant would ask what else does support OK I understand but not currently no as the most of the logic happens the self thick client needs to upload a file on your server which processes the bind then upload the promise for you cannot the already provided it almost free as otherwise you will lose all of them at about that the poets calculates for you you we will need to provide some kind of people supporting jealousies itself so you can get them at about before uploading I don't know if we had more time we can ask outside of their thank you because of