Software comes and goes. Mind the Data!

Video in TIB AV-Portal: Software comes and goes. Mind the Data!

Formal Metadata

Software comes and goes. Mind the Data!
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Software developers complain about wrong data. It just does not comply to the specs. It is a complete mess. The owners of the data do not understand the agitation because they have worked with some odd, old software for ages and it sort of always worked out. Why change now? The meta of the data sits in the mind of so many people but has never been organized and fixed (to fit into your software). The Data Monger on the other hand has the same problems but the other way round. The data does not download, the encoding is wrong, the coordinate system is screwed and at the end all the decimals are cut off. Who is right? It is always the data owner! Because software comes and goes, the data stays. This talk is a plea to mind the data. Remember this: Never use software that does not work on open formats. Really open. Like in Open Source but even better. Arnulf Christl (Metaspatial)
Keywords Metaspatial
Meta element Computer program Weight Software developer Extreme programming Measurement Information technology consulting Machine vision Revision control Goodness of fit Process (computing) R-Parit├Ąt Computer animation Different (Kate Ryan album) Software System programming Set theory Physical system Computer architecture Form (programming)
Point (geometry) Software developer Line (geometry) Decimal Coroutine 1 (number) Student's t-test Client (computing) Perspective (visual) Metadata Goodness of fit Different (Kate Ryan album) String (computer science) Software Damping Codierung <Programmierung> Gamma function Set theory Physical system Form (programming) Sine Decimal Software developer Projective plane Metadata Coordinate system Client (computing) Coroutine Peg solitaire Line (geometry) Extreme programming Computer Flow separation Perspective (visual) Word Film editing Computer animation Integrated development environment String (computer science) Video game Set theory Right angle 5 (number) Glass float Window Probability density function
Service (economics) Service (economics) Dataflow Information Observational study Software developer Software developer Multiplication sign System administrator Internet service provider Software as a service Word Goodness of fit Data model Computer animation Software Estimation Personal digital assistant Internet service provider Software Point cloud Right angle Point cloud
Satellite Multiplication sign Tape drive Data storage device Revision control Mathematics Medical imaging Computer animation Software Software Electronic meeting system 5 (number) Table (information)
Performance appraisal Mathematics Meta element Metropolitan area network Mathematics Shift operator Computer animation Multiplication sign Software Letterpress printing Error message Metadata
Satellite Point (geometry) Meta element Pixel Presentation of a group Computer file 3 (number) Control flow Modulare Programmierung Metadata Power (physics) Medical imaging Mathematics Sign (mathematics) Software Electronic meeting system Data compression Set theory Form (programming) Noise (electronics) Algorithm Information Software developer Data storage device Flow separation Mathematics Computer animation Software Universe (mathematics) Row (database)
Information management Dependent and independent variables Electric generator Information management Open source Code Computer-generated imagery Open source Analogy Device driver Control flow Water vapor Line (geometry) Crash (computing) Computer animation Software Analogy Software Phase transition Hill differential equation Theory of everything Set theory
Computer animation Weight Multiplication sign Software System programming Analogy Control flow
State of matter Multiplication sign Disk read-and-write head Perspective (visual) Bookmark (World Wide Web) Medical imaging Mathematics Coefficient of determination Meeting/Interview Cuboid Physical system Area Scripting language Curve Electric generator Mapping File format Software developer Binary code Data storage device Computer Degree (graph theory) output Quicksort Web page Point (geometry) Server (computing) Open source Link (knot theory) Density of states Infinity Rule of inference Hypothesis Revision control Term (mathematics) Energy level Validity (statistics) Information Weight Paradox Evolute Numerical analysis Subject indexing Word Computer animation Software Universe (mathematics) File archiver Window Operating system
it the
hello everyone and welcome to the session it is my pleasure to
introduce to you on of crystal and he will be talking about the cost of the goes back to data stays OK so I just work of through strange philosophical relaxed laid back session what I think is important some hands abuses suffered developer good who was a data longer what are you named after the 6 the manage a lot a data set of measure you're guys you're you're you're gonna love this so far in my name is after so I'm a geospatial systems any other geospatial systems architecture OK great thank you I'm energetic coach agenda anybody but what's from extreme programming edges manifesto there 3 things different things and a manifesto was there by the thinkers philosophers the extreme programmers since the yeah like best ched and then there comes the people wants to organize this whole form vision of a and this Arab when actors and that's what scrum actually is it's trying to get those 2 mind-sets together and get it done in an orderly fashion so Scrum is the boring version of extreme programming with some ideas from job and I'm a consultant and I'm available for hire so imagine you are at best
offer developer the data is wrong right the kind users stupid boring old softer how old are your systems using now the system very old data sets like the really heavy stuff so in processes good of you in modern guys I heard about art storms anybody remember artist on they don't have the systems out there and running with a large storm In a life environment operating now well yeah and there's even to people in the world and all to do with it so did anybody hear this before this word here suffer developers did here this what is that what is metadata if you all of so all our k there is this is the right the right the recruited so your mouth anybody you will know what you about students of U. of Big data are the screenshot it's not even a PDF it's this screenshot in J peg and that's what they give you suffer developer you might want to all men so the client is kind of form ignore and if you're a nice person but you think wow this kind is dead stupid wise during this so encoding anyone the Fed has occurred but it
market so imagine you were a data Monge the encoding is wrong always load some data it's always breaking and 1 point because encoding is wrong because somebody forgot due to the old you that's is this houses windows it's not just be too it's not you know I forgot what was the coordinate system of the ship from which projection did you use ah I forgot to ask the user for 10 years why should I care so of you know what a carriage return is it's you know where it comes from you you remember those old typewriter we and there is that the verdict was to with a letter yeah exactly carriage return is meant thing sturgeon changing the and then you have line feels competent so we have this knowledge in but that that we still have the computer today it's a carriage return linefeed what or it's just topics such and that's a new line so this is stuff we deal with we take the longer but so float values come as text strings why is that because we're in Germany who knows about the problem OK because the Germans there so it all those ones you don't know all the and in Germany when you want to say 0 point something you say 0 come up something is a 1 thousand to easily more no 1 million Tutsis and make it easier to read you say 1 . 0 0 . 0 0 0 that's 1 million all so for chemist extreme and then an import routine cuts off all the decimals in 1 dataset because they think well that's a point that the character or the side on it and don't needed it's just ignore it and it's from a German colleague and interpreted does come was a thousand separator instead of decimal marks so was right was both but their perspectives of different because women are always right so
data is a nuisance that's admitted data is bad news for us suffer engineers status is in use it never works it stubbornly refuses to flow into the software which should happen which is kind of the wording is wrong what's wrong with the software is refuses to flow into the software like the data the data data should be data in software or should those suffer in the data or should does suffer do something to the data or with the data these are things we don't think about it for suffered developers because suffers our thing and that is the nuisance so the suffers obviously right so the data must be wrong and of the data is wrong what you do you fix it so the good thing is that data can be corrected to tweak bent and made to work we're done so is the suffer says the data among and to suffer appears to be wrong but on all this suffered develop of another service provider Cloud SaaS etc. tell otherwise the suffer is not wrong and maybe a little buggy OK yeah but you know usually suffers not wrong suffers right and the step was that expensive fossil many professional and other seem to get by with it because if something export nobody will tell you do you know how many things went wrong and in the year 2000 while so many things went wrong but everybody's in on on on us in the big banks that lost billions but 0 no we didn't have a problem nobody would admit that something broke if it's like really important so a lot of things happened so underneath that you never know all in all the things that go wrong you never know and the use cases you get presented by all the produced here they all worked there is no anybody saying yeah we really after on so we start to wanna we data mongers we really start to wonder we we we we think of everybody else seems to me right we're on so what do we do we start to tweaking make the data fit into this offer done doings and this
is where it all goes wrong we should never ever do that ever suffered develop tells you your data is wrong you should change your data from out don't talk to just from what they were come back Norris so you should yourself you should never ever there is a data model to to change your data because you think this offer is so good I have to have to suffer not to suffer must adjust to your data so way that because suffer comes and goes but data stays and especially geospatial data studies so if you think about all the good basement information in Germany how many times are you going to renew it I was just sitting in the topic talk about land administration and could lemon any knows what he's talking about he told us how much how long it will take in Nigeria to map all the data the contestant of land that information if the continuing the pace that they're doing it now so tell me how long will it take 30 years who gives more 50 1 he calculated it will take 8 100 years the all right so if it takes 800 years to produce all this data up which they will never do obviously but imagine that it takes 800 years what you think what is going to stay in a time the we have the data or do we have the software probably the data
so this is interesting historic data we we're seeing the sucking out did you see you cannot buy from the Karush from user and she had this data from 1976 I worked at the University of Marburg and they had tapes there with satellite images from around that age and 1 day you know what happens the tape recorder broke so the data was there and table on data is 1 of the most longest living in kinds of archiving methods to store big amounts of data the recall broke and the company that produced created to manage so manufactured this recorded went out of business 10 years ago the data is lost so can you imagine this may not happen this is this is a noble but you should never ever Roos data this is so bad so because later on you may need to access that data we want to show how things change over time and you have to document that changed so
historic suffer you know anything about history software it's a bad joke there's no sorry suffer we sometimes laugh about the stuff we we used 2 years ago not talk about 30 years satellite images that are gone lost we're talking about the software that we installed a week ago all always the old version I need to get the new origin so what happens when suffer gets outdated the dies and that's the end of the story so they don't lives for a long time suffered dies on but
suffer does not know history and that's the reason why it dies and just goes away you don't need it anymore suffers unaware of time and change suffer should create a metadata set with all the changes it has cost to the data but big error here I should correct
evaluation here that 1 is very important to get correct a lot of press shift 5 and then should start there
we'll OK so this is actually when somebody can use of power point or LibreOffice presentation as well as I can you know that he doesn't do anything else this horrible I don't know what amounts cold and I don't want to work with data just not to use this so but this is important so my opinion whenever suffer does anything to us to a data set it should document what it has done and none of the software packages do its horror fiction so I have to recorded Buddhists on university teaching so and store them in M P 4 now I have to so I start with the a resume device something recorder device and it stores it and there's some metadata and it like you know when was it recorded and what's the the channels and so and then loaded into from for the city that's a software that you can use to edit sound files and I take on topograph noises and to join the stereo channels into 1 so when you listen to the teachings on your your forms it's not always coming from 1 year and then a storage and exported as M P 4 all the metadata gets lost Can you mention that for a simple thing as an M P 4 files this information gets lost this should not happen and the same thing happened store of spatial data removal from 1 suffered to another suffer so whatever you saw for ever does you should lock it invited and ideally story it with the data so you shouldn't have a separate data set that somebody else has to maintain ideally with some pixel files where you write some metadata it should be in the data itself all the metadata should be as close in the data as possible sit in satellite image the name of the satellite image the file name should already be considered as and in this image you should store some information what has happened to this data because otherwise you're going to lose it I promise you you're going to lose your metadata and then you're in trouble so this would be a real metadata so thoroughly that you can even undo and reverse the change all of them this is never going to happen I know there's some algorithms that do things you never going to undo like MP for compression algorithm you can never do the fire but let's at least try so it's only dreams and all I'm a dream the but data is alive data grows and becomes more complex this not nice but it is a reality deal with it in so data does not go crazy so what does go
crazy is only a delusional mind believing oneself to be perfect is often the sign of a delusional mind you offer developers is perfect come on be honest so this offer is wrong data cannot go crazy suffer cancel anything breaks it's the software that has broken it so the
biggest no is never ever even dare to think that you have to make your data out work with some software should never do that suffer responsibility this offer has to do with your data or die died so when you have a crash and your data is affected sourcing it can happen so you suffer should always be like very careful touching anything in your data so if some sufferers ever corrupted any of your data and get rid of it it's suffer there has gone crazy so it's like trying to wear shoes 3 phases for should please size is too small and well you can cut off your toes it's been done before Cinderella so money and
cost and is not the economic aspect of the whole thing not data is where the money sets and I mean this the last generation did you see the budget of the is a 4 . 4 billion while now we come here with an open source offered like maybe hundred thousand 100 thousand lines of code but it broke the data have something from the so this is the simple reality the money that's where the data is and software should therefore never be more than a thousand in year that's all that's after should be analogies of the data bricks then this offers the mortar Millwall will have more water than breaks right because it will fall over are in your car driver you know about tyres and rims and very expensive and you don't want to spend doldrums 0 when you put that in on so what you need you need a lever and you need lubricant and you put it on and then that becomes gone and you can drive with the car so the same as the suffers
nothing but on the hinge you don't want to break the door because users use all I've always had during dual break because it rests so use suffer in your data will die so never ever use the wrong stuff so in summary
data is important suffer is not important and suffer comes and goes thank you and well in time so if you have to do with this nothing to question here is that 1 so
any questions you this we but it so friends followers entertaining talk and other things also for the subject because of some by professional historians so and I intend to be here and intend to key in keeping the story of it's possible we see that some and something like 1 30 40 years and trying to write history about our agent then this question has a completely different and aspect to it because right now OK it's economic viable but for me as an historian it's more like this is the actual data because if elucidated today and we don't find any ways to archive it and really long term sustainable way than in future generations of historians will really have trouble the number of grasping no Europe and well for somebody not like in the state of mind of storing it might be not the problem because what those interested past exactly but that I do think we will lose a lot we lose a lot as a society and on a long from that perspective it is important to really keep that in mind so thank you again it's not a question actually so we have to yeah it was in this area had enough blockchain can helps with some of the problems that you mentioned or might help in the future and where we have some thoughts on that or someone else has Thompson haven't had looked at betting seems like a promising and tools I what I do personally is M when I'm doing data pipelines I tend to use it I right through custom validation rules for my dataset and as I'm going through my pipeline of data I have a script that at each step runs those validations across the data and to make sure that nothing is changed and but that's just a custom hacking link in the walls of the and is the I and sense of really painful thing keep data straight he knew the questions or comments all of which is what would be volunteer thing is information carried by Morris lost much of a question really problem as it is because of a paradox if that sulfur has no value and the values in the data how come it's a difficult to get rid of old software like legacy systems well I think 1 reason is that we have this mind set off suffer so important in this offer developers heads and more so in the vendors heads so with open source it may actually become a little less worse Liz on then proprietary on development so what what's the current version of to this curve to cut as of 16 or something at what point is is going to be embarrassing like version 227 still not working more features so I wrote my my I wrote my thesis so I had university I never finished the degree by the role the thesis of 300 pages something on word on going down to about 5 . 5 it was before Windows and was the word before Windows was on DOS but it was kind of you that's kind of what you see is what you get kind of thing and ah it worked it was kind of painful but it works by going to titles and footnotes and an index and images and worked OK and then many years later I had a new version of Word for Windows and there was that that actually there was a dark wagging its tail at me asking me whether I want to write a letter I was like what just for the fun of it I installed the old version in a little or a virtual box in a DOS and then my old on the says that I had written it was sold amazingly fast you cannot imagine like scrolling through monitoring the pages in the old computer was like what was in India it work but you know and you take a word document input 3 pages in it was some injuries and you go through and you see how it works sort so I'm asking what has happened to the suffer it still can't do titles paragraphs footnotes indexes images and it can wait dialog while still so I think 1 problem is that suffer has been developed on a new features have been added nobody needs the sufferers bolted and this does not happen so much in open source software so 1 of the my favorite suffering GIS is maps over it was 2 . 5 megabytes binary 10 years ago 12 years ago and now it's the 1 . 8 to me about binary that schools after I like that so you know and some at some point suffer may come to a halt in the evolution because maps are now does everything that a map server has to do there is no more reason to do anything to it what they do now is they make it faster than they can even more stable which is almost impossible and then make it even a little faster and then they make more stable and ah what else so that do there is no purpose to so no weighting dogs so maybe suffering has to at 1 point just the finished we never talk about suffered being finished because we're all developers and what are we going to do better suffers finished this is like horror his offer is done so that's 1 reason the other reason is that this the the continuously change their format so you have your data you're stuck on the sofa art store you want to merge move somewhere else and then as read tells you 0 yeah we have this new stack here you have to migrate your data we cannot import that you have to migrated it we're not going to do it for you because you messed up with all our old soffer which we don't understand anymore but what is your problem to actually get your data into the you stack the question is why do I have to get the new stack there's no reason to move away from Marx storm because it worked because the operating systems are not supported anymore so right in the basement they take away what you need and then you fall flat on her face we have to move and migrate to the next level I'm off 46 years old so I have some 10 years more to live maybe maybe in that time I will I will still see that there is the kind of movement that says suffer can have a history and it can actually staying and just do what it does and doesn't any more changes I think times of great OK they not this was an interesting talk was very much fun to talk to and have a nice rest of the week