Development of Data Archiving and Distribution System For the Philippines' LiDAR Program Using Object Storage Systems

Video in TIB AV-Portal: Development of Data Archiving and Distribution System For the Philippines' LiDAR Program Using Object Storage Systems

Formal Metadata

Development of Data Archiving and Distribution System For the Philippines' LiDAR Program Using Object Storage Systems
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
The Philippines' Department of Science and Technology in collaboration with Higher Education Institutions (HEIs), lead by the University of the Philippines, has embarked on a program for producing hazard maps on most major river systems in the Philippines. Realising the utility of LiDAR and its derived datasets, a concurrent program on resource assessment was also initiated. These endeavors aims to produce essential products such as DEMs, Orthophotos and LAS data that can be used for different purposes such as urban planning, resource planning, and other purposes these geospatial data might be able to provide. The result of both programs are large amounts of data that needs to be distributed and archived at a fast rate. As with other LiDAR operations handling large swaths of spatial data is not an option, hence data sets are organized in contiguous blocks, subdivided by files and grouped by river systems and local government units. Existing spatial content management systems and geoportal solutions were designed and have capabilities for handling rasters and vectors but not for point-cloud data distribution. This study discusses the development of a simple and straightforward system for storing and delivering LiDAR and LiDAR-derived data using Ceph as object storage system coupled with a spatial content management system derived from GeoNode. This approach hinges on our requirements of being scalable yet robust without much deviations from the current file system based storage structure. While most operations like data acquisition, preprocessing and quality checking are done centrally, the system aims to address our programs' needs for data exchange between spatially distributed to autonomous partner HEIs who perform data processing, and validation. The system also targets to semi-automate our data distribution process which caters to government institutions and the general public.
Dataflow Distribution (mathematics) Connectivity (graph theory) Software developer Projective plane Data storage device Computer programming Word Message passing Latent heat Process (computing) Computer animation Ring (mathematics) Logic programming File archiver Distributed computing System programming
Complex (psychology) Multiplication sign Source code Numbering scheme Shape (magazine) Open set Mereology Food energy Computer programming Web 2.0 Sign (mathematics) Ubiquitous computing Forest File system Cuboid Object-oriented programming Endliche Modelltheorie Extension (kinesiology) Meta element Web portal Mapping File format Relational database Sampling (statistics) Cloud computing Instance (computer science) Entire function Connected space Type theory Arithmetic mean Process (computing) Right angle Quicksort Spacetime Point (geometry) Slide rule Server (computing) Overhead (computing) Computer file Image resolution Connectivity (graph theory) Data storage device Drop (liquid) Graph coloring Computer Metadata Supercomputer Wave packet Revision control Structured programming Average Internetworking Computer hardware Operating system Reduction of order System programming Selectivity (electronic) Condition number Computer architecture Authentication User interface Distribution (mathematics) Validity (statistics) Tesselation Database Denial-of-service attack Set (mathematics) Directory service Limit (category theory) Subject indexing Computer animation Software Information retrieval Video game Point cloud Pressure
Gateway (telecommunications) Group action Java applet Multiplication sign Plotter Source code 1 (number) ACID Set (mathematics) Insertion loss Computer programming Formal language Web service Bit rate Object-oriented programming Office suite Logic gate Error message Area Scripting language Mapping Block (periodic table) Software developer Sampling (statistics) Virtualization Flow separation Connected space Data management Arithmetic mean Process (computing) Hash function Internet service provider Hard disk drive MiniDisc Website Spacetime Point (geometry) Web page Dataflow Slide rule Service (economics) Open source Computer file Ultraviolet photoelectron spectroscopy Data storage device Calculus of variations Temporal logic Heat transfer Computer Graph coloring Scalability Metadata Number Natural number Operator (mathematics) Computer hardware Operating system System programming Representation (politics) Selectivity (electronic) Computer architecture Engineering physics Addition High availability Line (geometry) Set (mathematics) Directory service Cartesian coordinate system System call Subject indexing Word Computer animation Software Network topology Communications protocol Spectrum (functional analysis) Library (computing)
Computer animation Decision theory Bit Mereology Event horizon Computer programming
Area Suspension (chemistry) Assembly language Computer file Multiplication sign Range (statistics) Limit (category theory) Revision control Goodness of fit Digital photography Process (computing) Computer animation Square number Vertex (graph theory)
if tha
to comment after and I and cancer and you the and it's OK then distribution component from the the lighter 1 program of the Philippines and I will discuss the development of a data archiving and distribution system for the Philippines lighter program using logic storage systems and so little introduction and then I will receive the is the word for a project and then work in this and we have implemented and I will discuss further in the stuff of storage system as well as archiving process flow and then a short summary so our 1st pass this means that it's secreted both in the Pacific Ring of Fire and the specific
facts about it is visited by an average
of 19 typhoons a year so it's a very and free and prone to things earthquakes and other passages that is also abundant in natural resources therefore there is a need for mapping to assess the disaster risk and for accounting of these natural resources so the Department of Science and Technology in the Philippines where the higher education institutions to different schools among the country the organized programs formatting which is the the lighter 1 and related to these are extensions of the previous program which is the disaster risk and exposure assessment for mitigation lighter program and these 2 programs for the lighter 1 and it is in charge of the recession and the validation processing source training data archiving flood modeling parts all this related to is mostly on natural resource accounting the diverse to agriculture forest postal energy and hydrology and because of the Stroop programs and acquiring the elided data for the entire country so it's good to use lighter because it produces high resolution the spatial data but the problem is this this also leads to humungous data size very big the sole storage indexing retrieval and distribution prove this challenge for us the so I will proceed to what we've reviewed among other sources 1st there's the basic storage system the file storage system it is very commonly used because it comes with the operating systems use on a computer there is little set set up and it was just all your West there's a file system there and it is a pervasive technology meaning everyone is using a file system whether we know it or not and the problem is once the processes of life in our program we complex and so process after process from the raw data produces several other data the complexity of the directory structure we use would get on increases with with that so as an example it's 1 of our directory structures for 1 component only the Archuleta 1 program has 6 5 opponents and as you can see the the more processes that are involved the more folders and they're to gets more complex so that's 1 disadvantage next we can also use geospatial databases orgiast enabled relational databases the there are 2 types of the sign for this 1 is that the indexing does is subway layer from the actual database i'm because database handles data by columns and usually can at index you spatial indexing is a different thing I and 1 of the thing uh approach is to use specialized spatial columns to expose the columns directly 2 and spatial processing your face of prosody that there is an indexing overhead especially if your data is constantly updated when new data especially if indexing is separate and there a scared improve time issues due to this indexing overhead and there is limited support for point cloud data although various like Fiji point cloud for proposed uh over there from what we give a weighted it's still better to just use LEZ it so it because it compresses more efficiently 4 point update so next fall why not combine the approaches so that we can maximize the efficient these advantages right so you slide a flat tiles the storage system produces a relational database to manage the metadata for indexing of reptiles and any using the mission in distributed infrastructure for them the replication and scaling so that if you need additional sample the space can easily increase increase this space so on the ladder tiles will be stored in the dedicated to distributed storage and processing can be can be carried out by a high-performance computing system like for example if you can implement a cloud system like open and um next a perfect example for this is open topographies architecture which is composed of various software and hardware resources at the start of the to the actual data as last format on a dedicated for server and then the meta metadata relational database is an IBM D B 2 database they have a st st cloud platform for storing obviously it's he is for Sandiego supercomputer so 1st feeding centers I think I for new and processing and this will zation are handled by a very large well multiprocessor systems so Robin Peters working on processing and on well highly appealing this raises the following instance 1 is the cost and difficulty of this infrastructure because a fully deployed closest in requires and lots and lots of hardware as well as Internet connection speed and reliability for a country like something so the internet connection is well comparatively slow and and compare to other countries in the world and reliability is also an issue being of encounter timeouts Angop conditions from our our partner partner in higher education institutions when they try to connect to our servers so that even if you have a high performance center if you apply can connect to the server world it's kind and this research so far a working sign we still chose a combined approach but 18 down version so what we use is due which has user on his back and and then we use self to store the larger files larger data like the moon last files and the larger Rasta files we call them and put them into set which is the of storage and more that feature and and you know it is customize it and we made it into a web portal which is what we call our portal for archiving and distribution of logo and and we added some features which we found necessary for our purposes so like had access our the web portal and web user interface User resource that stores the smaller files and pressure that activate the world the large files are tiled and indexed into the object storage metadata newsstands files are then uploaded into the later portal we have represented by a shape file of the Philippines was on the next slide so out of sorts the added features we added from you know include authentication using Active Directory and indexing for those of us restore itself aswell a it's actually town selection of data and the data part and so on this is in sharp screenshot of how about selection works there we display a graded map of the Philippines and have a color coding scheme regarding which data is available via last and the Tamil reduce and then you can set them fertile Oregon brought dragon drop a bounding box or specify a bonding box on the now formally right hand corner so once the 2nd ties and then displayed below and once you click submit an actual as done all style selected for confirmation for them I and then after so on our discussion surf all trips storage system I 1st what is a Knowledge storage here uh as suppose the fastest data is managed
objects which are basically storage and objects are contained on its own and that they can be accessed like a 5 but instead of accessing it like if the axis of file in an operating operating system you need the full path of the file from the home directory the objects are achieved by a unique ID which you can determine on your own and you can use a hash if you're managing so many objects and this offers storage size scalability because it is obstructed by the object starts system adding disks for additional space and it also offers replicated back ups of your whatever you store inside so the features of self specifically is over 1 it is open source but it is also compatible with the OpenStack and Amazon AWS if you choose it as the underlying object storage and this is a support for a broad spectrum program managers to interface with this of storage this Java C + + PHP Python and Ruby and then it it can 1 commodity hardware you just need well several of those so but that if you don't know you do not need specialized hardware for this you can set of men over several desktops as long as you have enough of and it is assigned to be self self-healing and self managing meaning and if there are terror regret errors on sample 1 object that will check for it replicas along the system so it can correct it and it manages so the objects on its own it has its own indexing a plot of what it shows the these are the user and it is also also have has representations save stands for rest API breast this is clearly like at the HTTP protocol you can get what they get what the but instead of web pages any services that web services that can provide over over a connection that's how it works so on the features of also include the block storage which means you can surf can provide a virtual whilst hard disk which can be mounted over a network it's basically like ATM network at attached storage so you can provide additional hard disk space on demand as as there's space and then you can just mounted to your and what any computer connected on the network which set sites and then there's the object storage is exposed to like it said before the rest API you can use of OpenStack or Amazon API or any among any of these languages to interface with the stuff library it gives us the loss library which is means reliable autonomous distributed objects for and and some the architecture of self but there are some this is so example if you have want to set up so the usual architecture is this 1 computer or course that serves as the the tree this is how the other room computers axis by the gate of the between what call that a tree and then this gateway the host or for for node in is connected to several monitoring nodes which ensures a high availability what does that mean so example if 1 wanted Fernald fails to another 1 takes over so you can keep the connection alive the force on special as far as not all that much which they and the always these are basically your hard disks which are managed by an a additional hardware to interface with the ones from and for archiving process flow what we do is simply is after the data has been post-processed and validated and how much for archiving in Thailand into a into 1 or server using 1 over service or so we also made the tidings 1 of the service in our set of and then once the fire fuzzifier uploaded by another another automated that script the word which generates the metadata this is usually just a long log file and FIL this log we uploaded into are lined up later portal so that it will be indexed and then you can the the how selection I showed you before it will update so that it will show that this data is already available using a color coded sky and so I guess how things need to the summary the so as I said for me that there is a need for an item up in the Philippines for acid assessment and natural resource thing but a lotta mapping archiving at the slide a data source indexing and distributing food to be challenge so we use the combined approach using you know do with user in the set of storage and this set of also plays a role for his ever for an hour of ever you want to migrate to a fully distributed computing that performs like that's a OpenStack so and I just made references so of this the in just for knowledge transfer in the office would like to acknowledge the support of the word the partisans anthropology definition Council for industry in and you emerging technology research and development and therefore the number 1 trips and things that so thank you very much do with few things can come we have a questions control I connected the colonized the K I have come from christmas you have during 30 years of channeling coordinates for the findings of hundreds of years what about on if you have a point temporal data sets more if you if you take place corrected at different times over the same area so far we've only can I have resorted to just 1 data points then updated we haven't yet the time for that but uh if ever we did that will probably add an additional indexing to the to everyone of tells you this will which will include the time of which it was about another question is about what is the coverage of your that us is it national coverage all just right now the what we have is spot 18 major rate river basins because that is what the initial mandate of our program indicated that now we're trying to cover all the most if not all of the river basin so it's not entirely the Philippines the ones that are more prone to flooding and along the fault lines so for earthquakes so the enormous kept prone areas or what we have data for now it's the of new I have 2 questions are 1st question is so obsessed calculus yet in the PSG at the club reference system I did so what system I storing the lighter and you know what the the as I remember it was tedious but I think we had a consensus because the other team was it using yes he I forgot which 1 ran outside EPS assessed the of the the system of the actual number of of the action on the reference system is in the Philippines yeah there many there are many yeah so Iong arouses review that I think I that then the other
question was looking at up this so yes there is still a lot of data available to all researchers or is a kind of you know that I have but knowing is the data is not open OK some of these not yet because it's part of the mandate of the program that there's a data policy and that it is and this it can be retrieved by regressed although I am not sure what to have an hour it and the decision was with events in the international community because some some higher-ups would they introduce the yeah a little bit if equally opening i data and what I mean at having worked in the Philippines you know 4 of over 10 years I think of you may wanna tell you higher that they need to not just worry about historic collect the data but to make that available to
the community including municipalities any
researchers easily easily yes because I find it extremely difficult to extract anything from a government good or bad data and sold in just for curiosity do you remember once the fire sizable 1 times so the finances of 1 has around for the 15 to 20 and the vertical megabytes megabytes it's also suspense small that's some this 1 out of 3 because that's I think the of the Rasta already awful in either 1 of those because there's that we have some there's an awful photo there's the the the mn and there's city assemblies for us at the last so all being just 1 ranges from I think it's 10 to 20 so for around and intensity our square kilometer area of deficiencies and that there are different versions of each it's not necessarily the time it was the and that more of what processes with what process that went through so it and several several other files so I would guess
it's somewhat one-sided present 1 square communities the the no questions the limits hadn't but few