Dotloom Project - Next Generation Point Cloud Platform
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 50 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/40827 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Computing platformPoint (geometry)Point cloudBitQuicksortProjective planeMathematicsSoftware developerRippingComputer animation
00:39
Level (video gaming)Geometry
01:09
Point (geometry)TeklaBuildingModul <Datentyp>Discrete element methodData managementGraph (mathematics)Point cloudPlanningData centerProjective planeComputing platformSelf-organizationSupercomputerInformation systems
01:55
Distribution (mathematics)Process (computing)SoftwareProcess (computing)SoftwarePeer-to-peerDistribution (mathematics)Set (mathematics)Lecture/Conference
02:10
Computing platformPoint (geometry)Projective planeShared memoryEmailTwitterMoment (mathematics)TheoryNormal (geometry)Peer-to-peerComputer fileDot productHard disk driveWeb 2.0Computer animation
03:48
Point cloudProcess (computing)Point (geometry)Tape driveFile formatSet (mathematics)Shared memoryPoint cloudGeometryComputer animation
04:20
Point cloudPoint (geometry)Process (computing)SynchronizationCommunications protocolDistribution (mathematics)Data storage deviceSynchronizationBitProcess (computing)Shared memoryRight angleProjective planeMultiplication signPeer-to-peerData storage deviceMultiplicationCommunications protocolComputer animation
05:03
Normal (geometry)BitMusical ensemble
05:46
Client (computing)Interface (computing)Local ringSynchronizationWindows RegistryServer (computing)Subject indexingPeer-to-peerShared memoryPoint cloudProcess (computing)Directed graphSubject indexingCartesian coordinate systemTerm (mathematics)TelecommunicationBitUniform resource locatorFile archiverComputer animationLecture/Conference
06:59
Local ringClient (computing)SynchronizationInterface (computing)Server (computing)Subject indexingLink (knot theory)Cartesian coordinate systemGreatest elementCASE <Informatik>Key (cryptography)Hash functionDesktop publishingUniform resource locator
07:28
Personal identification numberKey (cryptography)Connected spaceFile archiverRevision controlHash functionPeer-to-peerDot productInformationSynchronizationBitSubject indexingInternetworkingComputer fileDirectory servicePoint (geometry)
08:43
IntelComputer file
09:02
Process (computing)RobotSoftwareSubject indexingPeer-to-peerEmailMultiplicationWeb crawlerComputer fileProcess (computing)Link (knot theory)Reading (process)CASE <Informatik>Program flowchart
09:53
CASE <Informatik>Process (computing)Product (business)Point cloudPoint (geometry)Client (computing)Software frameworkVirtual memoryFile formatSubject indexingService (economics)SoftwareComputer networkCASE <Informatik>Process (computing)Goodness of fitConnected spaceInternetworkingSubsetMoment (mathematics)Uniform resource locatorRaw image formatPrototypeMultiplication signSoftwareHard disk driveWeb browserWordFile Transfer ProtocolFile formatFile archiverService (economics)Source codeBitPoint (geometry)Computer fileAlgorithmShared memoryRevision controlComputing platformSoftware frameworkPeer-to-peerSubject indexingSoftware developerPublic-key cryptographySynchronizationComputer animation
14:07
SoftwareBroadcast programmingEmailSoftwareProjective planeLibrary (computing)Link (knot theory)Open set
15:15
Direct numerical simulationMultiplication signTesselationAreaUniform resource locatorBitPatch (Unix)Differenz <Mathematik>Data storage deviceRevision controlRepository (publishing)Subject indexingQuicksortControl flowSynchronizationDifferent (Kate Ryan album)Communications protocolFile formatMathematicsComputer fileDirectory serviceMereologySimilarity (geometry)Latent heatLink (knot theory)Set (mathematics)BuildingDefault (computer science)Local area networkSoftware developerImplementationContent (media)Partial derivativeStability theorySoftwareMusical ensembleKey (cryptography)Reading (process)Projective planeLecture/Conference
Transcript: English(auto-generated)
00:03
Yeah, so I'm it's not remote sensing talk That is tagged And also this whole project is a little bit following some sort of conference driven development, so We always set ourselves so milestones for conferences, and we with like it's with milestones. You usually are not able to
00:23
To match with them so this project has so many Unforeseen changes that Um Maybe what I was writing in the rip and initially a few months ago. It's not exactly as it should is it is today Yeah, so I'm I'm
00:42
Georepublic is another is another country But our our the people working for for this company are quite Distributed and from Rwanda as the latest one and and then recently somebody from Samoa joined And so I have to readjust the map because that didn't really work well So I have to change this completely for Samoa somewhere
01:03
Yeah, yeah we have to Yeah, and actually this is a project that that is funded by a organization called a IST the National Institute of Advanced Industrial Science and Technology in Japan and so they have a big plan they want to
01:24
make a geospatial platform for point clouds for Everyone who generates point clouds in Japan and there's a lot of data generated in Japan And so they want to do many things and they have a data center called ABCI I think it's it's one of the most powerful. It's just released in August in the very very powerful
01:45
supercomputer And so this is yeah, we should we should help them to collect the data, and it's miss mostly point cloud data But later we found out This doesn't apply only for point cloud data, and so what is dot loom? Initially we said it's a peer-to-peer distribution and processing software for very large geospatial data sets
02:08
Yeah, and Why do we need this? Because we found out that big data is really really hard to handle and we have many projects that involve big data and for Normal for for my mother 10 megabyte is already big data
02:22
If she wants to send me that and for other people 100 megabyte and for me also 1 gigabyte already is quite annoying to send somewhere and so We have customers that send us 50 gigabyte or a terabyte and how do you send that? By mail this hard disk and so I think before actually doing some cool stuff
02:42
we we have to struggle with the transport of the data and It's also quite inefficient to do that So how often have I have downloaded An OSM planet file and then forgot about it and downloaded again one week later, and so I always download everything and in my theory
03:03
99% of all data downloaded is kind of Thrown away a few days later or a few weeks later, and maybe my neighbor downloaded it already So it's very inefficient how we handle big data at the moment and Yeah, so we thought um We found out something distributed web. That's that's the new trend
03:23
peer-to-peer and so we Read about this that project um. It's a it's a data sharing synchronizing Project and the biggest problem on this project is the name because when you search for that you find many things but not that project and
03:42
So we gave it a different name, and we called it dot loom um That's how the name was decided and so what we what we actually needed was a way to easily quickly and securely share very large point cloud data, then we wanted to browse and search to that these data sets and
04:00
Just selectively pick what we wanted to download, but everything and then choose later, but selectively like with the This this Geotiff format just just take what we need and also we want to make this Streamable so that we can already do processing when the data comes in
04:21
yeah, so um Here's something we thought it's I almost explained it like quickly share 10 gigabyte Then browse it selectively and choose what to choose the right one and the processing pipelines so
04:40
What is the dot project that project defines the dot protocol and it's? distributed sync it improves the speed of download because it downloads from multiple peers at the same time and It's stores data efficiently and a little bit I would say it's a bit of git and it's a bit of Dropbox and it's a bit of bit torrent
05:03
and So to show you maybe a bit Why it looks like why it's like it so there's a command-line interface and Yeah, just before it worked, but maybe because of
05:29
It's also it's a big deal Yeah, it's okay, it's okay, so it's it's not it's very important, so it was just a screencast how you
05:51
In did sing that sink, and then you can actually try it out yourself It's it's very similar. It feels a bit like it
06:00
and you share and you know how many peers you have and Because a command-line tool is not really useful for for the people for the people we target We are currently working on something that is called that desktop It's an electron based application that should be able to do already many things but we started rewriting it in this react and so
06:23
Yeah, we are quite quite delayed and so you can create dot archives from from these local folders And you can download and synchronize these dot archives and so in terms of point clouds We want to add features like preview the data inside
06:41
publish to specific nodes We also want to introduce some some centralized place where you can register your your your URLs and Where you can trigger? processing and Indexing, let's see if it works. No, okay. No it works
07:11
Okay, so in this case. It's very similar So you have this desktop public application on the top and you have something you want to share here on the bottom
07:22
Maybe So all all you need for sharing something is a URL at that URL, which is a hash key That does not change you add it and then you can You see how your data synchronizes and with these three dots
07:42
You can also see how many peers already have a copy of your data and on the command line interface you can see that you're you're sharing one point one point nine gigabyte of data and 200 files and how many connections you have and upload download so it's a bit the bit torrent bit torrent style and
08:01
When you stop it Like say you you lose the internet connection like in Dropbox then it stops syncing and when you connect again Then it's it starts syncing again You can actually see inside Your archive what kind of data you have this is just metadata
08:21
So you don't need to download everything before you have kind of an index of your of your directory And you also have an information of the about the version because you might have different versions in the internet out there depending on when people were online offline and So it's it's quite exciting to try it out
08:40
Yeah, and you will later. I will later tell you why it's not so fun to actually Work on it. So
09:04
Okay So something we want to add now and we have actually started already We are using this index that is created Or we use this file index and sharing the link to remotely index data. So We we don't want to download something for indexing it
09:22
So we just we let some some bot crawl through the data and just read the header files and write an index so you can you can you don't have to first collect the data at one place and Also in the same way we want to do Processing remotely possible. So you just read from a dotlink which can read data from multiple peers
09:45
And you can also publish this immediately again into something that you publish to appear to peer network So in our case we have some some some users in mind So one one case is for example a surveying company. That's some customers we have
10:04
They require an easy upload Of raw surveillance data and at the moment they often send us hard disks because it's much faster and they have so much they have troubles with their network and and policies and Yeah, they need some processing pipelines to get this data like further processed and then they want to quickly also
10:26
Create this created data. They want to also make it available to their customers and The other use case we had in mind was the data scientist So that that person needs like good access to the data resources
10:42
Also needs to know that this data source is exactly what he what he wants So it's not just always in planet dot PBF or some some file name that doesn't say anything so the data archive guarantees you that This this URL you get a version that only the person who has the private key
11:03
Can have created so it guarantees authorship then you can yeah, he want to apply custom processing algorithms and Also want to publish the data easily And then I thought also about the self-driving vehicle because that's the word you have to you have to
11:22
Say in Japan these days So if you want to get research grants, so cars actually collecting a lot of data But it's so much data that you actually can't get it out of the car and cars are connected and disconnected to the internet But you actually have a very good connection point when you charge it
11:40
And so we thought this is synchronization. You could actually get at this point all your data out to the for other users, so it's a big waste of Sensors right I don't know this data is used elsewhere or just used in the car at the moment and So we are currently development developing here, and there so we have a prototype of the processing framework
12:05
the dot desktop is now rewritten and Now because now we can add some some more interesting features the indexing service. We didn't have a time for we actually wanted to Add Support for the dot put the dot URL into pottery that means that you can read from multiple peers the notes
12:27
data and at this point the EPT format appeared and This is also a bit everything's a bit delayed, so I don't know the status about this format, so there's a lot to do but the reality is that
12:43
The reality is that We are struggling with the following so one more times the traditional idea of the motivation Actually and difficulty compared is that downloading is Quite hard and
13:03
It's yeah, it's also it's also difficult So the motivation for Uploading it's actually very low so people they have a high motivation to download data But they don't have a high motivation to upload data and a lot of our so we had built some secant platforms
13:21
And they were built once and never updated so I find that the sharing data is actually Yeah, the motivation is too low that people actually do it and so this is something But the acceptance of the current schema is quite okay, so people upload downloads using FTP or browsers
13:40
and We want to change it and we want to Make downloading easier and also mostly uploading easier and The problem is that we are using peer-to-peer, and this is a bad word So acceptance is our big issue And I just recently Went to a phospho G conference where I got this t-shirt
14:02
and I had to sign that I'm not using peer-to-peer software to get internet access and Actually It would be much easier to develop a software for labeling and shipping data by mail So we would already have finished, but it's too much fun to stop so
14:20
yeah, so we have lots of lots of obstacles and The biggest one is this peer-to-peer issue opening ports then also the community is a little problem of we are using a lot of node JS and the project is also very distributed and People tend to make tons of NPM
14:40
libraries and it's much more work than we thought and Yeah, this technology is for early adopters, so You never know what happens and my I think actually Many people say bring the software to the data That's the best thing, but I think it's better to say make the data accessible where it's created So with this kind of dot link you could
15:03
Collect the data where it's created and then people could just retrieve what they want Nobody has to upload anything. You just need to download Any questions, so if you're interested that's currently a bit the development area
15:26
With the files being stored on the different nodes. I got the impression that It would be a complete file stored on the node or is it also partial files Content scattered around you mean a peer-to-peer node. So the publisher of course has the whole data
15:44
And then when you when you create so when you when you receive this link this dot link And you sync to your node You can actually choose so the protocol and also the implementations of the protocol allow you to selectively sync So you don't have to synchronize everything but the current tools usually just sync everything
16:05
Yeah So if if the publisher like updates a data set can adjust to double chambers and that kind of thing or If a publisher updates the data set is able to share Can users just pull like a delta change to it or would they have them?
16:24
Oh, yeah, so it's like so there is actually a competition to the start project. It's called IPFS and So one big difference of this one is that the URL does not change when the data changes So you can start with an empty Directory and then index this and then when you add data
16:43
So it increases the burden increases the version and so, you know You know if your version is right or not, right, but you cannot make a diff because we don't so it does not store by default Like if you would store big data in it and you will get a huge problem with
17:02
Increasing its repositories if things change a lot, so this doesn't store For historical data, but you could implement something to take snapshots from time to time that specific versions But you have to store them somewhere else. This is nothing building but It basically versions and it tells you what happened over time, but it doesn't keep the old
17:25
You kind of work back if you didn't have a copy somewhere else The data version Similar story like a great idea
17:43
Yeah, just on the easy part Just sort of break nothing. Take it So
18:00
So actually the take some some kind of tile tile storage or this kind of things The first thing is you could have multiple copies. You have DNS automatically CDN automatically, so If somebody in your local network already created a copy Then you didn't have to patch it from Amazon somewhere it could be used from the from the local network
18:24
and Yeah, so you could have sparse copies so There could be multiple locations where you end it because it's a folder folder format like it's it's stored in folders and files It works actually quite well, and we wanted to use the pottery pottery format initially and contribute that to be done
18:44
But Yeah, we don't want to have the pottery format and so he said we are working on the EPT and so we are waiting That this is kind of yeah stable