Bestand wählen
Merken

Don't Copy Data! Instead, Share it at Web-Scale

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
of my name's mark or time with Amazon Web Services I'm part of the public sector to at Amazon that's why you look up there's acidic and solution architect I work a lot to do with the state local government but and because our customers and medications spaces so active so much going on in our especially hard I spent a lot of time working with our higher-end customers also but I'm also the the mapping guy on the team my specialty undertook my background has to do with what you know about apps on the web and in doing that for some years now on and you know this is interesting because this is the 1 con conference said want to come to you for like 10 years so I'm very happy to be here to be invited to speak here out of this is you know this is the preeminent conference about open source and mapping and I know hope of have real sense of gratitude for whole open source community because that allow me to run a business most my business was actually in Tokyo Japan we did a lot of projects that had at had at the core of source components and so we're do things like that you the 1st of double-byte implementation of map server back in and of 2 thousand or something and so on very very happy to be here and you get to talk about something that's really to achieve more heart and and the so anyway
so much for the Histrionic can try goal is poured into what I hope I hope is a future and as you can see from my title on a lot of this presentation is actually prepared of I to the help of textbook explained best practice around open data and I gave a kind of the the 1st version of this of at a symposium the symposium in Washington DC and I think that was in June on since and I give I think you know given this is for smaller groups if you type of this this this is the 2nd time I'm doing this all but I know I have much the gear crowd here 2 days ago more technical colleges it was more fun but the core ideas and this is the 1 thing that I should have to explain too much here but now it should be about copying data anymore right we live in a linked world we here especially with mapping we refer for many years and we know we know what know REST endpoints are right we know about web services and so you today and I'm I'm representing a company that provides you I t on the fly as a as a result of arrest call right and on top of that portion building systems that for example a RESTful and and do all kinds of interesting things on top of the infrastructure that you you can build and destroy but within minutes idea a code of your choice and so the on so I work with a lot of different use cases so you when they might be big genome analysis metadata might be Alzheimer's research on a large universities and which having to share data at scale so in big data for scientific analysis on and at the core of many of those use case and I would say that the larger the gap of more ah ObjectStore becomes a central feature of that system and so the kind of it's so is best practice around but not not working turned of traditional file systems but work in terms of of object and points and not in this case and specifically talking about our what was called a Simple Storage Service history which I'm sure many of you are aware of and I've I've heard of other people at this conference in talking about it yesterday so a focus on that and what I'm going to show you what is the test set that out I need to do a call out to the folks at not lots of medieval the background and that box asked the USDA for the most recent set of in a data set this is the 1 meter per pixel coast to coast of Fort it's 48 states and it was delivered to not box on a 24 hour surreality disks each 1 to terabytes and and then on they contacted me my boss is a customer of the 1 on AWS and the idea is marking this is this is essentially a public dataset how can you can you help us out here can we get this in your public this program so it's not quite a public dataset program yet it might be how we would rather it be the the dataset and their bucket of people that more to that later but so you're when you're going see is of best practice around building telling system that's the a focus on delivery of an aerial image data in this case it's not it's not a local OpenStreetMap data itself its 1 year class aerial imagery and if you have any data city In the interviews cloud in a region of 1 of our regions so how can you how can you leverage our services with the least amount of and of custom code by leveraging the open source of projects to get the market the most quickly so you see now the what's what we call auto-scaling application very little could that essential logically give of any number of people out there that you want access to 40 terabytes of data are that you don't have the pre cash I on so go
ahead and there's a couple slides on head I think I like 4 5 slides by I I don't intend to slide deck the today it's mostly going to be a real-time
demo and I want to make a couple points there before we start and so on In a sense we're trying to correct for the problem of of what was in the mapping world we call you click and shared but typically you go to some website maybe it's a federal etc. national data with us to go find the data and you go whatever portion of the world that you want out that somehow download and you know if you've done this before you know you essentially go side and I have an apple tree bonding box and you might get an e-mail saying it that's the file be available to you know but not right this whole kind manual process with could should Nazarov from many years 1 of the earliest projects that I worked on it was actually for Japan's space imaging can be built the we build the shopping cart for about for satellite images is the same idea that so that you get you go to this manual process and you know if you're lucky you get e-mail after a few hours and is available via FTP or something then that is less than 1 on this still see that out there but also in the world the mapping especially the interview you and you talk about things like this this so you compressed image data are things like light our data but typically there might be some data out there but this this whole exercise and i'm going engage in essentially making another copy of that and putting it some more on prime part and and you know if it's a large dataset typically have to worry you have you have the same sort of problems start where you put it and you have a close to the performance of services you can actually use it after you download so this copies all over the world right every work so when the USDA comes out with the new the 2015 need data What happens copies of all the free right everybody has a storage problem but in the end the interconnected world of a of the cloud of web based services the annotation just 1 copy right why should we need to have more than 1 copy of the latter have 1 copy that's a definitive source that's well maintained a wall curious all metadata and nobody's moving that thing around when we receive the 24 to by this from USDA of course the errors right there's a hole in another week they'll try figure out where there's war you correcting for the structure so just that the given shipping the whole thing is problematic because of his a lot of files close to half million files include the metadata for this particular set so the storage cost of there's network cost there's a computational costs and then know such a distributed every time there's any kind minor update you have this huge cost on updating those distributed copies where we we bear that every day in our world in that role in were also used to doing this and we think this is some kind normative pattern right it shouldn't be more so it makes clouds stores different
for 1 is but because it's available as an end endpoint that you can either you make completely public for securing of very very near ranula way it's up to you this not siloed in some data center behind some fire or right that's 1 but then and number 2 is the provision of real-time granular access to an right so you can have a so that the best way to think about as many of you have more phones in your pocket right now if you've got for example an application that is no larger take pictures together some kind of sensor data and upload it is a very good chance that you're uploading that idea with what we call a signed link to objects for right not the the not through a server you're reloading it directly to store system that for example Amazon Web Services provides are that that particular application vendor I know but the thing is and this is this is now on the cost side of the which is not simply a technical thing right you can offload the network aggressors cost so this is and this is the most by the most important point so when you saw data in the cloud and and I'm assuming that know whether it's far objects of another of service providers for you're you're basically paying for how much data you have in there right now or you know for the last 2 3 days and and typically you're charge for how much network bandwidth used for that data going out the door that's the network so that's the variable costs so if you've got for example all right now with us I think it's less than 3 cents a gigabyte month I also so 1 big costs something on the order of the 3 pennies a month to store but depending on how often that data goes out the door in technical and user goes out of 1 of our regions there's a variable costs associated with that and so when I say also that you grasp cost what that means is that you can have somebody else pay for the network charge you continue to pay for storage we can set up such that somebody else pays for data going out the door and that's very important because it allows you to actually of release
really large public datasets without media network hammered without having a FTP servers all of go down the obvious it's not a problem anymore more you you've given us the heavy lifting you not given us a job doing the heavy lifting around allowing access to the data so in the cloud you said the paper for storage and your data you control that data but they should just be 1 copy of that data I just by just by storing it i industry you you have 11 minds of durability so it looks like 1 N . 4 or not I'll be showing this to you for that to file right but in the background were of course making multiple copies of it for you but you can't see it is not your problem is our problem we need to make sure that we satisfy the SLA Run data then and because you can also network constant have to worry about network you don't have to worry about provisioning network on your and just because somebody might come today to go get come get the data and we don't have to worry about that we're getting maxed out because somebody decides to download the whole thing in 1 hour right that's far problem but you don't have to worry about compute cost of because an outstanding example traditional FTP servers or putting on some website and so it's not not a problem and then because of you maintain just 1 definitive copy of that data right you don't have to you don't have the cost of updating all those distribute copies goes on out there present need for that 1 so now I should
have to tell this group but you know we need to think in terms of your house right interpreted rather than but in terms of copies so by copying the data
so and so today I will speak about all we know we have to think but if we have over 40 services not so much for my role is to act as a as a guide to your 40 services but I'll be refracted do they grow at such a phenomenal rate than that even the solutions architects on the on a team to tackle people and you can barely keep up so times you have something that is cannot 2 weeks ago that have to be for customers to the actual property that the other that had many years of experience of building telling systems on honestly in history is 1 of the original 3 services on AWS atlas the queueing system in the city which is a virtual machine service and so just 3 in the beginning but those 3 at least that allows a customer of Amazon Web Services allowed us to pretty much feel anything below those of the most core parts cheating storage and compute now we have 40 us also time of explaining some lesser-known features of history of the main Shiite and here if you walk out the door with anything I just want you to remember to words nest requester pace a requester pays a feature of the story that allowed you to offload the network the grass straw mats sheet today's talk and that's key to or best practices around government Open Data Strategy using cloud wisely on history as many clients as has
been around for a long time so there is a command line like those of your pro clients there's no this appeared in the language you can imagine there's there's a there's a client we have all full so that as the case where we support on our side from you know really PHP to to or the node that natively support history and then out today out because I'm running Windows of I'll be showing you a a plant called cloudberries others Max lines in for example in the Java world is a is a project has been around for a long time ago called jet as treaty which is used for it in a lot of a lot of projects so it's very mature from a long time it helps our customers and so you can imagine or larger customers like Netflix or of the shell company of oral oral using 3 somewhere at the core of the core of the architecture it and and
so this is what I really want to remember this idea requester pays bucket of show you how this works a and use picture
here on that and maps to not not just how it technically it set up but really the most salient thing here this 1 thing here is that so that this is about but it is just a top-level name for our industry container so every his account is allowed to create 100 as 3 buckets the reason is limited to 100 is because that's a global namespace if you go to for example the wall and to be careful I think about what customers inhibitors there's for examples of some large university customers out there that have a hold of the of some university named Dr. sites that are sitting in an asterisk others emergencies outside cell that use of a wordpress for example to generate additional data markets pushed history as tree then takes care of the merchants right that scales to massive so the very simple architectures that we share what I want to show was this area here is what 1 particular dataset account on and that only has about you have 100 buckets guassian Warner but if you can have a number of accounts but in that bucket you can have the date of the world the bucket has no limits is just an object that she units is that you guys are just object store a to keep pushing data into the book right so you can have as many times as you want you can have as many world backups as you want a only limb there would be unique hating its larger each object is limited you know 5 terabytes right the keep pumping 5 terrabytes into that but it as fast as you want as long as you want and you will run on state so it's so from the map tiling catch perspective it's it's is perfect but when it comes up I think in many talks into so that so this is 1 account answers about and there is a virtual machine the and Mama call them what EC so he she to server and you know structure that when this virtual machine that's in living in a region of moves data from above from 1 of its buckets this transaction here is free and to back up a little bit right so I'll show this to you the Council on the 2nd but when you have a lot
of it is a conjugate access to know what each of the regions the global footprint on and you can be the far up system so for example if your Government customer typically running them in a Virginia region organ region or an hour go farther region which is actually in the
Portland area but but just as easily you can go to Tokyo or Singapore or to the EU and do the same thing is just of a drop down and so the the point I'm trying to make use of this coder area here is 1 of those regions so if you if you go and get data from a bucket yeah virtue machine 1 of those regions as as being a get operation the the bucket that's free putting to the data is free all the time I'm putting 2 out of history is always now if you take the data out of the book is out of the regions and basically put out in some on Internet could be good and then
there is a chart that is indeed the dress components and when you turn the requester flays paste had on in combination with marking whatever objects you want of public on the authenticated access what it does is it make it such that this other council count 80 over here on account be pays for the data in graph structure OK and that's the key
point so why is that matter that matters because of you know the the Web has
made it possible for you everybody in the room to publish it up so that I can write a paper I can link it to some other paper or I could you make make some dataset available but but was just that and I might not give operate at web-scale of scale and even if I was my have to pay and I have to pay them they pay the cost the network the With was requested page you can actually offload that to to whoever wants to get the it
so up to them to show you that there's mn many views to the same data so going back to my point is just a bunch of Judas the essentially Britain
Pratt and and these are the Judas I'm sure of that you know that this is what the prime contractors that flew the planes that flew the like sensors of ATS ATO where was are they did a bunch QA were confined within a day they get no copy to some hobbyists and USDA probably receives those copies of again there's a bunch more QA work and you know after many weeks so that we can get access to but the way should be is that it is 1 definitive copy the data lives in the cloud of the might be in multiple regions that might be in multiple cloud vendors right but there should be a lot fewer copies of the and so
I'm going to jump into the demo yeah
so we're looking at here is a leaflet from time of uncover OpenLayers guide that I took this chance to learn a little bit a leaflet not not that I'm doing anything complex here but so is just leaflet and than 10 point here is this system if you back off a little bit so you can tell for those of you who from Oakland you build see that this is
actually the see development dataset Mr. back
off a little bit and our looking at the USDA is made so it was important here is I'm using 1 client to look at data that I can look at the so these tiles obviously that's what we get the city map thing going on from the so I'm pretty sure that these tiles that had been built already before so there there already on 3 but in a 2nd you see that if I moved to another part of the USA but the tells take a 2nd to come up because they're being generated on the fly but the important thing here is these these images for just piles 256 by 256 thousand we're all familiar with or based
on content that's living in another accounts bucket so this case is Amazon Web Services public dataset come not my account now my working out if you again account is all a bunch of stuff including the micro you know Biomed data and genomic data all kinds of cool public datasets but appear to the top as it a me and you'll see right so all snappers year-ago OK I get these 4 states and there's California in I I rearrange this a little bit to simplify and of the kind of it's not actually directory assistance these all object keys so it look like a directory system so that we can then next year we can receive the 2015 data just slide in here and maintain that 1 1 copy aspect of it so this is a bit there's a little bit rearrange but basically the same data is a 1 meter resolution data I think I know has half a meter it's only state that has happened years will look at Idaho it'll say 2 . 5 meters in original data was delivered as a group a shape file that define the DeKalb on boundaries index right as you as you'd and then the other issue that is all lit not is it's all for back right so it's
RGB infrared IR and then there is a bunch of metadata here and and if you go look at the original data and these are afips codes and then here is is a bunch of almost 200 megabyte files in here and if if if you were if you have an account with us and if you use a tool for example cloud or any tool that can do a requester pays you all have access to this data In knowledge is AWS hyphenate you have access to 40 terabytes of data not now but I I need to caution you so this is this you know and this is the test dataset so I can't guarantee I think even as late as be the next week right but if you fired up a client right now and it could do requested page requests but you don't see this data you can go and download this data as quickly as you want to yeah and that's so I think the important point so at this point you know people have done this before you've done you've done the exercise of going going to use some of the ship system our probably realizing 0 all I need is a client in a could be on my notebook here or it could be in my you know the workspace VDI no container in the cloud and I can go quickly copy all this data I him before marks of stops talking into my own can't because I want this stuff right and if you've done that you've ordered in data from for example use did you can see that this is the a much faster method for you to get access to the data but so you know that these things you can do all that but I'm suggesting that would be a mistake you don't need to do that project you shouldn't have to copy going back to the same earlier so on the right hand side from is somebody else's account right probably this would be the count of whoever owns the data so from my perspective would be you know a national agency right this case is the state local government that have maybe banded together with other counties or something that didn't group of group by all elevator right and they're they're they're exposing of new high resolution aerial imagery because it is public data anyway right and as long as they don't they don't have a class in disseminating information why not do this right and it's you know it's not just that there's no cost associated with making a public if you take this on the left hand side you can see that this is a idea of working account and got a bunch of stuff in here I apologize of a whole bunch of buckets by the badly named but down here is 1 called Nietzsche mass and so this is the you know you can think of this as a level 1 cache right so rather than the cash being on the server that generated the pilot or the servers that generated power or in or in and can assure you know what everyone years caching there is just an history and it just some using as 3 as a cash on you can choose to use it as a you know how how real-time that captures all along the duration of the caches is up to you right whether it last for 1 day or whether it stays in the sensory but for year that's all people and you don't you know you don't actually have to write code are you just change lexical policy show that you so here it is so this is just a kinase catch this Mercator data and it's is exactly like you'd expect prices layers the drill down here on and eventually right season J. banks right and these these J. text here or by exactly these
guys but that and so technically I can do things like go and just delete all these guys and it would build again right so going back to
here I'll show you a little bit more about how this works so here by non looking at the native data returned firebug on thought distributed on go
it who's y'all don't like to do
with somebody else's mapping systems right we can can explore how this works
I say get the all and if you can see what's going on here says and move
this thing around so if Estonia
303 they can't find it In right here I have this new Denison called says ABC Oakland told me 3 D is my domain in that I have but the these are all using our content distribution networks to did this is a DNS names they're pointing to a kind of content distribution that I've I've created that again leverages a US infrastructure so this is another layer of cash that's closer to us right that's a way to think about and if I go over here
you can see that I got a couple test layers and so on and I'm borrowing the
MapQuest OSN piles here and here I have a direct to the SP bucket lead so right now I'm looking directly at ObjectStore nothing in between is object
some for most most use cases this is just
fine right simple architecture
and over here I'm using and
I'm getting access to the exactly same data but the arc of cloud from a content
distribution network so there's the kind and distribution network kind a distribution then goes to S 3 we don't you notice so that as
an move this thing around so is going to
the same called Tyler right because you know neither the CDN nor the object storehouse of data so really has an interesting feature where if you throw a certain kind error you can essentially provide filter you can you redirect so this case and in a redirect to this system that makes right this call Tyler miss you click on Tyler end of of this kind of thing a new tab
you can see just made 1 for us right and done you can see that all it's really doing is taking this town and so this is just the interval TMS naming scheme and under the hood what it's doing is exercising and autoscaling W enough service that's running on to my son completed separate system that is the definitive source for the pilots in this case so you can see this in practice by just shopping this part of and it
goes the going to it test mode check
modes and you can see the actual documents request on here right the ghost of I copy this guy all of sudden you're looking
at a w NASA so this this year is the is a load balanced so users are elastic load balancer it has a right now I have it set up by using 2 University of Minnesota map server so I'm unfamiliar with maps server-side tend to use natural time for especially for the energy type stuff of some running to use see 2 instances that know how to deliver documents can't and so this locus is a code here all it's also doing is gonna K
was a POW this person wants to just translates that into the appropriate w must request and behind the scenes within a region so typically this wouldn't be exposed to public like this is having an autoscaling system that's you know the easily treatable 2 goats to now but I need to have a 20 tomorrow it just that it's a simple change and it's this copy of 1 of these parts right but it does a couple things it's services from the request so that makes them so
make sure that the client is happy of
course to get some data but as soon as it delivers a data in the fires off another thread in a copy copies the same data right to this to the to the object so so that any subsequent request would be satisfied from a street rather than the type very simple architecture the core feature there is not trying to do the caching on the server itself when you're just leveraging of you know available of services that are available in the cloud such as the street to do the caching another another set system does a catch for so I so if for
example we go to the management
council over here so don't so I can I'm just
curious how many people have seen the management council here could be system hands quite a few OK so for all of you don't have to explain what this says but for those who haven't seen this so this is a good read the hours you to use you know any of our 40 services right now we're looking at our 1 article the C 2 of their logic controller of virtual machine service and it's actually about this the tab that allows you to do or of the PC virtual private cloud which is actually a subset of C 2 but anyway this allows you to spend a virtual machines anytime you want the more importantly our turn them off and not pay for them soon should turn them all and it's very easy you had launched and gonna do this because I don't want this terms embedded cells land here
but but you hit launch make a
couple selection the Windows Linux and this case you know these are all Linux machines on and you go far or whatever you want but within the EC to I want to show you the Monday show is that I
have a map server running on and down here is a single autoscaling
groups and so I an autoscaling group just 2 machines right now but building the styles and which 1 was this 1 here I think and if you come here you see if you got a couple choose a min-max to all I have to do is come in here and if I wanted to know I could just make this for
example max for men for and vice say that the then the EC 2 system will just go ahead and clone couple copies and follow them up and typically when you when you do this you know did something like that we won the scale from 2 to 20 year to wonder with whatever you desire right and you're in if you're if you're working in a world of 48 terrabytes for example which is actually it's not from our perspective not that not that large the test right but you have to worry about all the traditional things around making you know some controversial file system available to map server GeoServer whatever telling server your use of this architecture don't have to worry about it because I'm using yet another open source package that was in my explanation now that show you what that looks like of by actually is logging in our society into 1 of these machines so now I'm going back
to another part of the Council but shows shows you my missions now I just look at the ones that are actually running the some of them are probably getting going to start so here I have 1 and I can get
the Guinness name down here this is the hardest part of my demos copying this and then I've got paid running here's some more and I need to make sure my she is correct the key depends on what part of the world I mean and that it's OK some going open it and then this
is a pretty sure this is a boon to it the man in the door right so right now I just all a data set up the SSA session to 1 of these virtual machines are running of the universe and so it is a matter of what such a map server dual cone combinations of I hit it this you can see that I got a couple of small points down here and you can see that you can see the open-source tool that I'm using right there and it's basically making as 3 of quite a drive so if the measured expect right if I go to the data need do LS it'll take a 2nd look there's all the states now remember this is 48 terabytes right so this is a virtual machine that has a couple of I think there are 160 gave or so as these that that that look local to it OK and work with the system basically does is it has access to 40 terabytes of data can go get any of those duties that we're looking at before I but because it's looking at a shapefile index right they can go get that but only go go and get the ones that it needs right now to to do the talent that needs is essentially acting as as a cash for the 40 terabytes as cashing in on SS these that are local to this particular host that this particular virtual machine is running OK so I don't have also not only no longer have this S 3 in the layout is correct and it is good to go right I don't have to maintain you know my 20 service of
the data store that they can see right is it's all 1 all right so 1 copy now that might be interesting to know from our began in a mop status system up and running perspective but what's more interesting is because those duties they are marked as requester pays and because every object in that bucket is more authenticated access if you have an account you could do the same thing right and you can run your system and not have to copy the data you do have to fire up your version machines in the region that this data resides otherwise so that you know that latency factor of another cost factor DOS US but everyone in the room decision account and you know were and I'm talking about
1 page coach here almost all the code right this is open source false Fusco right so you can have an Nate server telling server that will deliver the United States the other aspect remember I said that history but it has no limits if you keep pumping data and so for example 1 next year you have all the states go from 1 meter to half a meter right resent me you have 4 times
a modern data right you have much more work around of probably processing the data to create for example of you know internally optimizer typically take his uncompressed
files delivered to the USDA you got the internally tile them in part wanted GK compress them went to work all this batch processing were the you have to do in order to get that back into your your work you don't have to do that or even if you did have to do that but you have to know you have is PC resources you can you use for a day to do that batch processing why picture in the class and talking really generically right all about the club and that's the those are the design patterns of kinds of strategies you take because we're going to know a publicly available and points not not an old friend of so here you see the data I and we jump back now and into the Council a couple of other things over to point out that has to do with thestory so
here so they switch over here we're looking at the source data right and that source data is delivered as for band but if you just doing you know base layer for all of for example a public site you don't you know need to do you don't need only for bonds to the the way you do is you create art in odd RGB on the derivative of that right so that's what this is so this was not delivered by the USDA this I built using actually being stock and in in g at all opted to make that happen so the sheer all a sudden you know so these 2 imag files so this is what the 100 per cent but these are just so these are the you know almost wanna make files that have now been compressed intermetallic only 3 bands and compressed using JPEG like I think it was rate 90 and the much smaller right the point I make here if 1 1 person does this right nobody'll should ever have to do this again right so there's another aspect of 1 copy should you right so this is also in the bucket right so you don't have to go you you don't wanna go look at uncompressed of originals between the slow right bargaining the 4th band you know on election days of of your crop erated Alex something might but typically you probably want this in this data now is just in the same market right it doesn't have to be on some different volume because you ran out of space in original just added I just added it in made it obvious in the bucket of just you know just part of the package this is the kind of thing that you know the content owners could do right because everybody on plans but in doing anyway in a in order to reduce them on a heavy lifting around actually using this kind content over time that's what that is then the so over here and back in a Council
and so keep jumping back and for and looking
at this bucket coordinate kinase so this is so this is essentially the catch right if you look at this thing there's a bunch of stuff that I nicely can do or does that allow people learn about wanting more using right now is we're using it in static hosting motor-sensory can act as just a website I single upload the rigorous talking about universities doing things like using WordPress you push model to estimate or you can have your your your personal website monasteries the simple to do but 1 of the things that you can also do is not just enable no index Iisten now but also and the handle of a redirection so for example in this case but if you're 403 right what you do is you saw a little while ago send it to the Tyler Tyler Tyler guessing of request for the power because that generates w mass of request creates a cow serves a tau but more importantly puts it industry for the next request 3 simple another thing here is the School of a 2nd so the static load and just leveraging static workplace website hosts existing feature stream down here I have to a life cycle you'll see that for my uh from 16 to 19 have a lifecycle policy that is the deletes right so this is a test of nodes and edges deleted deleted aptly 24 hours right argument in production probably unique you keep alive for different you know much longer than that right but the point here is are typically you know there's all kinds of heavy lifting around even maintaining this aspect of a
catch the sorts large right but because because this is on amount of honesty in an object store is just a cycle balls it's the same model that for example of folks in for example the you know the world of video used in order process classroom video right so they're taking all kinds a class from video or the taking video awful roadways or something so that's scarier and all competent S 3 although probably of that encode that into another more you know mobile mobile format right having done that still use exactly the same lifecycle feature to pump into Glacier which is our goal of archival store which again drops the price got it just simply a matter of coming down
here in adding adding a rule very little of this video coding it it's a if you wanna write the code of course you can automate all of this idea whatever language you you like to work and this is just the a good implementation of bunch Russell and points that allow you to do things like no change electrical pulses 1 so the last thing we I want speak to you to almost at a time together 1 more
slide the it I can't find it so the last idea here
is on yeah typically we think it think about 3 in terms of love of new static content words so things like uh you know our our our web page it's all based on the front end or something that basic doesn't change but that frequently but if you look at all of you more interesting and high-skilled use cases but in in Amazon of clout but you know what you find is that yes of course it works for in a relatively static stuff like a database backup for example of Oracle Oracle of a backup of the Oracle database using them using on or something and then i'd be done you know once and I saw but we have a lot of customers out there that are increasingly using as as free as more of you know short term used by you can do that because all you're doing is not what you changeable life cycle so how long do and this is I think important especially in for example or government uh use cases where we're talking about things like open data on I'm so that it maps this idea of of know we have a lot of for example government customers are interested in the form of for example providing API and points to to to be more open but the fact of the matter is there might be a lot easier for them to have a system that just Thompson data frequently or more frequently into the ObjectStore and let of the end user figure out how they want to use it right so it's a very different models so rather than it being like you know a doubly announced that the NGS endpoint for the sake of some kind of Open Government Data so my in my made more sense for other the government customer to to be pumping units years he falls into the ObjectStore store the basically because of you know government doesn't know what the customer use cases may want to be in the customer may rather have something that resides in something that I have but that doesn't require a government SLA but it's just an S 3 bucket because then it becomes or a select was the there's a big difference there so rather than focusing on providing an open data via API that a running controlled by the government it might make more sense for example for of government use cases whether it's you geodata or some pdf file or something that just pumped object and let the customer whether that's an individual citizen or whether that's a in private sector entity that's building you know like a traffic application on top of that access to the raw data from which then they can do an API for Russell so I don't I don't want people leading thinking that you know it just good for some long duration cash is actually good for a very short duration content says more about you know but it really is a catch not activist so that that's a form approved presentation of thank you very much for this thing and I'm available I'll be here until at 1 afternoon I think sometimes that the blues over a back corner over there so if you're interested this year more and I'm happy or you my and I apologize on only 1 here is kind of a last minute thing I knew I was coming but I you know I mean like soul of so but 1 of my 1 1 thing I wanna ask you is if you can leave me with your business core I would very much appreciated our because I've been told to come back with data because we're a data-driven company and I want to make sure that we sponsor more of these events so thank you very much appreciate
App <Programm>
Open Source
Atomarität <Informatik>
Stellenring
Implementierung
Quellcode
E-Mail
Raum-Zeit
Mapping <Computergraphik>
Reelle Zahl
Mereologie
Server
Zusammenhängender Graph
Projektive Ebene
Speicherabzug
Aggregatzustand
Resultante
Subtraktion
Punkt
Quader
Gemeinsamer Speicher
Klasse <Mathematik>
Versionsverwaltung
Gruppenkeim
Zahlenbereich
Kartesische Koordinaten
Kombinatorische Gruppentheorie
Term
Code
Computeranimation
Metadaten
Web Services
Maßstab
Mini-Disc
Datentyp
Meter
Dateiverwaltung
Optimierung
Grundraum
Hilfesystem
Bildgebendes Verfahren
Schreib-Lese-Kopf
Analysis
Softwaretest
Zentrische Streckung
Pixel
Open Source
sinc-Funktion
Gebäude <Mathematik>
Stellenring
Systemaufruf
Physikalisches System
Fokalpunkt
Dialekt
Rechenschieber
Mapping <Computergraphik>
Objekt <Kategorie>
Menge
Benutzerschnittstellenverwaltungssystem
Rechter Winkel
Offene Menge
Server
Projektive Ebene
Speicherabzug
Surreale Zahl
Aggregatzustand
Satellitensystem
Subtraktion
Demo <Programm>
Punkt
Prozess <Physik>
Quader
Filetransferprotokoll
Freeware
Zahlenbereich
Abgeschlossene Menge
Kartesische Koordinaten
Service provider
Raum-Zeit
Computeranimation
Rechenzentrum
Netzwerktopologie
Metadaten
Benutzerbeteiligung
Web Services
Datennetz
Mustersprache
Datenstruktur
Speicher <Informatik>
Figurierte Zahl
Bildgebendes Verfahren
Streuungsdiagramm
Datennetz
Speicher <Informatik>
Physikalisches System
Quellcode
Binder <Informatik>
Elektronische Publikation
Dialekt
Quick-Sort
Objekt <Kategorie>
Mapping <Computergraphik>
Echtzeitsystem
Menge
Rechter Winkel
Server
Projektive Ebene
Bandmatrix
Ordnung <Mathematik>
Normalvektor
Streuungsdiagramm
Fehlermeldung
Web Site
Filetransferprotokoll
Datennetz
Gruppenkeim
Speicher <Informatik>
Term
Computeranimation
Prozess <Informatik>
Rechter Winkel
Hypermedia
Speicher <Informatik>
Streuungsdiagramm
URL
Nabel <Mathematik>
Applet
Formale Sprache
Computeranimation
Virtuelle Maschine
Client
Knotenmenge
Web Services
Bildschirmfenster
Elektronischer Programmführer
Gerade
Datennetz
Kategorie <Mathematik>
Gebäude <Mathematik>
Physikalisches System
Bitrate
Offene Menge
Mereologie
Server
Strategisches Spiel
Projektive Ebene
Warteschlangentheorie
Wort <Informatik>
GRASS <Programm>
Speicherabzug
Unternehmensarchitektur
Streuungsdiagramm
Web Site
Bit
Zahlenbereich
Zellularer Automat
Datensicherung
Netzwerktopologie
Spezialrechner
Virtuelle Maschine
Einheit <Mathematik>
Perspektive
Inverser Limes
Datenstruktur
Speicher <Informatik>
Grundraum
Web Services
Zentrische Streckung
Addition
Namensraum
Mapping <Computergraphik>
Objekt <Kategorie>
Generator <Informatik>
Transaktionsverwaltung
Flächeninhalt
Rechter Winkel
Computerarchitektur
Aggregatzustand
Nichtlinearer Operator
Punkt
Selbst organisierendes System
Stichprobe
Physikalisches System
Dialekt
Computeranimation
Internetworking
Unternehmensarchitektur
Virtuelle Maschine
Spezialrechner
Flächeninhalt
Tropfen
Architektur <Informatik>
Punkt
Video Genie
Graph
Verhandlungs-Informationssystem
Schaltnetz
Stichprobe
Gemeinsamer Speicher
Zählen
Computeranimation
Objekt <Kategorie>
Spezialrechner
Benutzerbeteiligung
Zusammenhängender Graph
Verband <Mathematik>
Datenstruktur
Schlüsselverwaltung
Ebene
Zentrische Streckung
Sichtenkonzept
Punkt
Datennetz
Konvexe Hülle
App <Programm>
Sichtenkonzept
Cloud Computing
Dialekt
Computeranimation
Homepage
Spezialrechner
Multiplikation
Magnettrommelspeicher
Streuungsdiagramm
Bit
Demo <Programm>
Punkt
Physikalisches System
Elektronischer Programmführer
Marketinginformationssystem
Softwareentwickler
Computeranimation
Bit
Jensen-Maß
Gruppenkeim
Lie-Gruppe
Computeranimation
Client
Meter
Notepad-Computer
Bildgebendes Verfahren
Bildauflösung
Gammafunktion
Inklusion <Mathematik>
Shape <Informatik>
Verhandlungs-Informationssystem
Applet
Physikalisches System
Elektronische Publikation
Inverser Limes
Mapping <Computergraphik>
Randwert
Lemma <Logik>
Parkettierung
Automatische Indexierung
Einheit <Mathematik>
Mereologie
Kommensurabilität
Verzeichnisdienst
Schlüsselverwaltung
Aggregatzustand
Differenzengleichung
Punkt
Kreisring
Klasse <Mathematik>
Gewichtete Summe
Gruppenkeim
Zählen
Code
Computeranimation
Homepage
Übergang
Unendlichkeit
Metadaten
Client
Grundsätze ordnungsmäßiger Datenverarbeitung
Perspektive
Notebook-Computer
Gammafunktion
Bildauflösung
Gleitkommarechnung
Softwaretest
Datenmissbrauch
Kardinalzahl
Stellenring
Spieltheorie
Farbverwaltungssystem
Applet
Ruhmasse
Trägheitsmoment
Physikalisches System
Elektronische Publikation
Inverser Limes
Kommandosprache
Einheit <Mathematik>
Rechter Winkel
Caching
Disk-Array
Codierung
Server
Projektive Ebene
Information
Persönliche Identifikationsnummer
Streuungsdiagramm
Aggregatzustand
CDN-Netzwerk
Inklusion <Mathematik>
Distributionstheorie
Bit
Dualitätstheorie
Mathematisierung
Physikalisches System
Computeranimation
Domain-Name
Einheit <Mathematik>
Direkte numerische Simulation
MIDI <Musikelektronik>
Hill-Differentialgleichung
Inhalt <Mathematik>
Innerer Punkt
MUD
Distributionstheorie
Quilt <Mathematik>
Smith-Diagramm
Extrempunkt
Computeranimation
Kreisbogen
Wurm <Informatik>
PERM <Computer>
Primzahlzwillinge
Inhalt <Mathematik>
Lambda-Kalkül
Normalvektor
Gammafunktion
Softwaretest
Eindringerkennung
Datennetz
Schreib-Lese-Kopf
Konvexe Hülle
Marketinginformationssystem
Objekt <Kategorie>
Sinusfunktion
Einheit <Mathematik>
Oktaeder
Hill-Differentialgleichung
Unternehmensarchitektur
Simulation
Streuungsdiagramm
Browser
Machsches Prinzip
Systemaufruf
Physikalisches System
Quellcode
Extrempunkt
Bildschirmfenster
Computeranimation
Inverser Limes
Objekt <Kategorie>
Web Services
Rechter Winkel
PERM <Computer>
Ein-Ausgabe
Primzahlzwillinge
MIDI <Musikelektronik>
Fehlermeldung
Softwaretest
ATM
Marketinginformationssystem
Code
Computeranimation
Lastteilung
Mapping <Computergraphik>
Energiedichte
Last
Datentyp
Ein-Ausgabe
Server
Hill-Differentialgleichung
Elastische Deformation
Cliquenweite
Instantiierung
Inklusion <Mathematik>
Kreisring
Mathematisierung
Physikalisches System
E-Mail
Computeranimation
Demoszene <Programmierung>
Objekt <Kategorie>
Client
Lemma <Logik>
Einheit <Mathematik>
Rechter Winkel
Caching
Mereologie
Datentyp
Server
Speicherabzug
Thread
Unternehmensarchitektur
Benutzerführung
Virtualisierung
Zellularer Automat
Physikalisches System
Extrempunkt
Term
Mathematische Logik
Computeranimation
Teilmenge
Virtuelle Maschine
Datenmanagement
Web Services
PERM <Computer>
Server
Gamecontroller
Streuungsdiagramm
Lesen <Datenverarbeitung>
Web Services
Server
Transinformation
Datentyp
Konvexe Hülle
Kanal <Bildverarbeitung>
Gruppenkeim
Virtuelle Maschine
Elektronischer Datenaustausch
Spieltheorie
Instantiierung
Computeranimation
Mapping <Computergraphik>
Virtuelle Maschine
Spezialrechner
Gruppenkeim
Trennschärfe <Statistik>
Bildschirmfenster
Server
Attributierte Grammatik
Ereignishorizont
Normalvektor
Softwaretest
Zentrische Streckung
Extrempunkt
Open Source
Physikalisches System
Computeranimation
Eins
Mapping <Computergraphik>
Metropolitan area network
Virtuelle Maschine
Gruppenkeim
Perspektive
Rechter Winkel
Mereologie
Server
Trigonometrie
Dateiverwaltung
Unternehmensarchitektur
Meta-Tag
Demo <Programm>
Punkt
Open Source
Schaltnetz
Physikalisches System
Computeranimation
Eins
Mapping <Computergraphik>
Virtuelle Maschine
Rechter Winkel
Automatische Indexierung
Mereologie
Server
Dualitätstheorie
Hill-Differentialgleichung
Drucksondierung
Aggregatzustand
Metropolitan area network
Open Source
Versionsverwaltung
Physikalisches System
Code
Teilbarkeit
Computeranimation
Homepage
Zustandsdichte
Entscheidungstheorie
Objekt <Kategorie>
Virtuelle Maschine
COM
Rechter Winkel
Perspektive
Server
Meter
Inverser Limes
Speicher <Informatik>
Aggregatzustand
Klasse <Mathematik>
Entwurfsmuster
Winkel
Elektronische Publikation
Computeranimation
Emulation
Rechter Winkel
Parkettierung
Mereologie
Strategisches Spiel
MIDI <Musikelektronik>
Ordnung <Mathematik>
Stapelverarbeitung
Gammafunktion
Web Site
Punkt
Automatische Handlungsplanung
Mathematisierung
Web Site
Derivation <Algebra>
Quellcode
Extrempunkt
Elektronische Publikation
Bitrate
Pi <Zahl>
Computeranimation
Wechselsprung
Einheit <Mathematik>
Rechter Winkel
Gruppe <Mathematik>
Mereologie
Inhalt <Mathematik>
Spezifisches Volumen
Spider <Programm>
Ordnung <Mathematik>
Quellencodierung
CMM <Software Engineering>
Brennen <Datenverarbeitung>
Bildgebendes Verfahren
Hydrostatik
Web Site
Rundung
Prozess <Physik>
Punkt
Landau-Theorie
Klasse <Mathematik>
Unternehmensmodell
Computeranimation
Videokonferenz
Hydrostatik
Streaming <Kommunikationstechnik>
Knotenmenge
Mehrrechnersystem
Spur <Informatik>
Speicher <Informatik>
Grundraum
Normalvektor
Leistung <Physik>
Softwaretest
Videospiel
Parametersystem
Mobiles Internet
Ruhmasse
Biprodukt
Packprogramm
Quick-Sort
Objekt <Kategorie>
Einheit <Mathematik>
Rechter Winkel
Automatische Indexierung
Last
ATM
Dreiecksfreier Graph
Dateiformat
Hill-Differentialgleichung
Decodierung
Ordnung <Mathematik>
Caching
Rechenschieber
Hydrostatik
Punkt
Puls <Technik>
Güte der Anpassung
Formale Sprache
Mathematisierung
Implementierung
Schlussregel
Code
Videokonferenz
Videospiel
Subtraktion
Punkt
Datenhaltung
Gebäude <Mathematik>
Mathematisierung
Kartesische Koordinaten
Physikalisches System
Web-Seite
Datensicherung
Term
Unternehmensmodell
Ereignishorizont
Computeranimation
Mapping <Computergraphik>
Hydrostatik
Bildschirmmaske
Einheit <Mathematik>
Offene Menge
Dreiecksfreier Graph
Debugging
Wort <Informatik>
Speicherabzug
Inhalt <Mathematik>
Speicher <Informatik>
Figurierte Zahl

Metadaten

Formale Metadaten

Titel Don't Copy Data! Instead, Share it at Web-Scale
Serientitel FOSS4G 2014 Portland
Autor Korver, Mark
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31664
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent Foss4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Since its start in 2006, Amazon Web Services has grown to over 40 different services. S3, our object store, one of our first services, is now home to trillions of objects and regularly peaks at 1.5 million requests/second. S3 is used to store many data types, including map tiles, genome data, video, and database backups. This presentation's primary goal is to illustrate best practice around open data sets on AWS. To do so, it showcases a simple map tiling architecture, built using just a few of those services, CloudFront (CDN), S3 (object Store), and Elastic Beanstalk (Application Management) in combination with FOSS tools, Leaflet, Mapserver/GDAL and Yas3fs. My demo will use USDA's NAIP dataset (48TB), plus other higher resolution data at the city level, and show how you can deliver images derived from over 219,000 GeoTIFFs to both TMS and OGC WMS clients for the 48 States, without pre-caching tiles while keeping your server environment appropriately sized via auto-scaling. Because the NAIP data sits in a requester-pays bucket that allows authenticated read access, anyone with an AWS account has immediate access to the source GeoTIFFs, and can copy the data in bulk to anywhere they desire. However, I will show that the pay-for-use model of the cloud, allows for open-data architectures that are not possible with on-prem environments, and that for certain kinds of data, especially BIG data, rather than move the data, it makes more sense to use it in-situ in an environment that can support demanding SLAs.
Schlagwörter big data

Ähnliche Filme

Loading...
Feedback