Merken

Data Encoding Schemes

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
cell growth of time probably best known for having put 10 years of my life and the ANSI standards committee in writing books about it from some point I was an honest Fortran program among many years ago and then the minute I started doing SQL I was never allowed to do anything else on all the pain 0 the agony but it's been a good living however I also at 1 point in my life the a statistician all and actually worked with data I think this is probably 1 of the things that has the key programmer types forget about our we become more it's in the hardware the indexing the mechanics of our trade and you don't go back to the fundamentals of what we're supposed to be doing which is working with data and we don't get a good theory of or any kind of background knowledge for any the told you just sort of left to figure it out on your own so I have 2 main uses essentially to talks based on 1 of my books which if you buy will be able to pay my mortgage this is important but we network getting the word data science but frankly data science from what I see what's an awful lot like statistics of the firing conditions and a six-digit paycheck so maybe that's a good thing to but we really need to go back to the fundamentals of of of what they did is how we represent that all why there's different forms of In particular
anybody remember Donald can move to the older people out of teach the can use the whole canoes and nothing but the can of his art of programming is still 1 of the classics but it was is that they've got a foot bowling ball it volumes around a bunch Codicils it's about this much space on your shelf and this is probably the greatest of computer science at nite with we've ever had he quotes everybody dextran-coated body that's how you knew your indexing was good because you can use was encyclopaedic however his 1st published work dealing with data was in mad magazine when he was in high school looking up its fourier hypocrisy the system of weights and measures it's a parody of the metric system of the illustrations of by Wally would if anybody else's mad magazine or a comic book fan and you know that name on and it's based on the fitness of Mad magazine number 26 it's so you have to be into particularly New York Jewish Yiddish humor which is curtains of thing the the editor madman using and the metric system to get around the band knows about this fall is is good for a laugh and the humor magazine you run into this kind of crops in the real world when people invent their own systems of of measurement of in data processing
systems let's get a couple of it's good to find you take a couple of
terms for a measurement out the way the range of a measurement is how how much area does it cover essentially in the space you're trying to measure of the if if I'm doing it with a gun and a bull's-eye like then trying to hit on the range would be helping the target is some things are appropriate for 1 size of some things are appropriate for other sizes you know it was the of the the joke about to close enough it is very good for shoes and hand grenades but not so good for surgery on granularities how divisions do I have on my target how many units of measure how do I have that when I is 1 of my main used to I used to work for state highway departments and in the US we use the ETSI there on now only to non-metric countries left on earth and I live in 1 of my are equivalent had was finding went metric but it's still us in Liberia from there but if you want to you can carry out the calculations in decimal feet to several decimal places but nobody has ever poured asphalt for a road or concrete for a bridge using a micrometer but they will publish calculations it would be a 10 thousand of the foot of asphalt we will consider the action measuring head knots ridiculous precision is how repeatable measurement we take it over and over to get pretty by expression errors cumulants but how close to I get in the case of the gun shooting at the bubbles like how close my shock group that would be very precise if sort of spread out all over the place and see using a shot it's not so precise accuracy has to do with how close it is to the truth how how close you get to the polls I notice the precision and accuracy are not the same thing so
of n play target with rain was high granularity theory to the point that it's meaningless on tight cluster shots accurate on but not necessarily precise and a really good gun barrel but my cites a little bit off some always kind of a laughter something going on issue on the other way around the scope is out of the other end the scopes good the barrels loosen and sort of it the general neighborhood there's also the concept of and 0 point on a scale on measurement it's where the scale starts where we start measuring all sometimes there isn't 1 that's a useful kind concept this is not necessarily a numeric 0 that's where your scale starts there's also a metric function which is basically triangular inequality if anyone remember that 1 from high school algebra on as a property so that if I got out of a metric property like a function and therefore properties in my scale I can do calculations with them on that can be meaningful yeah
what sort of funny is scaled in measurements actually didn't come in until the 19 forties as a science you would think that somebody before that and statistics there's something would have come up with that but no it was it was a little late in coming on so what's the simplest scale I can use the measures on the nominal scale take your values aside and name to them that a lot of people would like to count this as a as a kind of scale it on that name by the way can be a tag number and the character string technically can be assembled we don't like to use symbols very much their bitch to put into a computer but they don't transport very well and they're not always obvious the advantage of using a simple other measure something to name something is that it's a language-independent completely language independent but if you want to look at some really beautiful examples of that as a with a really elaborate system of the book on Renaissance art work the stone cutters in Italy who were working for Michelangelo and all those guys were pretty much illiterate and they all had individual marks they put on the on each of the sculptor had individual work the put on pieces of marble he was getting and then little tags and other systems off of that those March of showed the father the Son the the grandson India and given families they look like alchemists symbols I'm be grateful that you do not have to put those in the database you're ever going to work with just a nice little piece of of that's it I can do any calculations from a nominal scale about all I can do with a nominal scale is ask if they are you Fred Jones is the short name on the crudest form of of of the scale also we use an awful lot and we do a bad job of it by the way of naming things inside databases could be
better let's be kind about it if I represented as character strings or numbers on I can order them but I'll be doing an ordering on the symbols on the representation rather than on the meaning these are individuals user grouping the category yeah I told this is the simplest way to do it on the next level up
is a categorical scale where I've got a group of property a category with a name sets them back to just a simple bunch of sets of flight it was my dog dogs are mammals and to set up operations that said union intersection on a good stuff categories are important and how you only get to the encodings how you off the categorical scale is tool triggered most people think of any sound problem is with these categories can I have some overlap or not what happens when I get something that's we of the Robin Williams variance pretty pretty much an American thing of a guy that was the more there would be international but up 1 of his routines when he was doing nightclub standup consisted of on not smoking a joint and deciding on and make a plan placental fuck up 1 of the many classify a platypus is the the only guy in this category known turns out there's actually about 4 other egg-laying mammals the others are the kindness of course the Australia all the weird stuff process trillion what would you doing and that's something that just doesn't fit to a Martian I can make a new category OK I have to have allowed for new categories in my codes I I can make a miscellaneous category that's a good way to screw up things because miscellaneous winds up being so next you you can do any meaningful work with everything you forgot about it's it's it's like a garage and it's it's put in the category of piles up and tell somebody comes in cleans it out or you just pretend it doesn't exist excluded now the other question whether categorical scale is can I actually see individual members or are they simply members of a group it's worth telling people apart it is not worth numbering grains of sand so of the idea of a commodity in a categorical scale is a little hard for people to get their heads around absolute scale is just count on the set I can and find it induces a map I can add and subtract numbers beginning for an absolute scale the work all the elements in my in my groups have to be interchangeable it's a dozen eggs it's not and 1 and 2 and 3 it's a dozen eggs on and can be give names to these units that does is the grows choir rain all 1 sorry we train in the paper industry there's about 3 of them on 500 versus 450 sheets of paper of certain sizes on my favorite across the 6 pack it's not a drinking problem it's a solution what what was funny was in England when they went over to the to the decimalization way back when 1 of the of dairies there could only 10 pack of eggs I'm sorry English-speaking countries traditionally now metric acts will be promotion of English-speaking countries have a lot of dozens for X it didn't sell never mind that the cost of the eggs and those 10 bags was actually cheaper than they had been in the original doesn't backs people just to strange or if you want to look to a beer store of which I like to buy a six-pack or 5 pack even if the cost per beer was cheaper would it just seems somehow wrong the removal of 5 pack for the kids those traditional units really get locked in on OK obviously better as 0 point on the scale of the empty set and Indian cotton is is where things start element
ordinal scale and when put in order on some like our are no operations just comparisons just as a sequence of no no origin no 0 point on anybody else have to take a geology class in college yeah we've got the least 1 g of a one geology victims of the only nice thing about geology as far as I was concerned with a UBM ax you go and hit rocks with the rest of it I have never had to use anything I learned in my freshman geology class anytime it did not help the poor concrete for a driveway or absolutely useless but we went out to the field to plaque interacts with the geologist pick we got a box of the samples pull MOS scale and it was mineral samples and they have 10 compartment box on and what you would do is you take your your sample and scratches on the various elements in the in the MOS scale and you could say well this is harder than tell but softer than gypsum but what could scratch what strictly comparison of it is a quick easy way for someone running around and appear shorts with pick axes and this box rocks in the field leading poison ivy because he has to do this for his freshman science credits to 0 to carry things the real way to do hard this would have been the Rockwell scale but used in manufacturing for steel and other metals but we didn't have that 0 by the way they never gave you a diamond in your most scale usually there was a piece of really hard steel in there so technically it wasn't but in what did you expect when went to the school bookstore together but just comparison just a linear ordering all
he how little thing about ordinal skills they're not required to be transitive place ever places paper rock person the other games but someone 1 of the big bang theory who bought yeah yeah and using the T-shirt with the non-transitive ordering of that of those things of object we really hate non-transitive relationships we want a transitive relationship we want it properly tightly well ordered it's if you can't make the calculation there anything much off of our and non-transitive scale it's also nouns skills are also the tool for fixing elections you have more than 2 candidates look up arrows paradox and that it's impossible to get a fair voting system if I if
you don't have to to people a tight ordering range scales a sort of AI tightening on ordinal scales there's an origin point there well-ordered they're guaranteed to be well-ordered military ranks are costly the obvious 1 for that can do any operations on them I cannot take 3 privates put them together and make a sergeant of the ordering still stands if you shoot your sergeant you still have to take orders from the captive the the transitive ordering and of is tight we like those might not be able to do much that but I can sort of sorting is good
terrible scales are really what when you say scale measurement of people this is what they think of there's a natural order into the unit I don't have any origin point but arithmetic makes sense because of my units on it's uniform in its dimension most common interval still use calendar the common unit is a day regardless of how you cut up year workgroup your days together so got a common unit the day against guys might not get much of this but for some reason among Christian fundamentalists in the United States there is a belief that God made the seven-day week no actually Hebrews did the Romans had a 10 day week parts of Africa had 10 day weeks up how do you want to cut up your your units of completely arbitrary but econometric function I might I can add and subtract I've got a linear ordering I can't divide 2 days by each other Christmas divided by Thanksgiving doesn't mean anything when my favorite T-shirts right now is on the scale from 1 to 10 what color is your favorite letter of the alphabet I show that the people in the starting to do stupid stuff with the data of and what's funny when you ask somebody that FIL stop and think about they will try to answer you it sounds like a lot of real questions and using them in doing you understand why I think it's a real question no the intervals on the scales do not have to be the same size In fact log and exponential scales are are a lot more common than you think because you're human beings your sensory input and a lot of things you do for judges of the judgment sensory input on and on an exponential scale my favorite is the Richter scale for earthquakes each time of return number goes up 1 it's 10 times the magnitude of the previous year we just volume on a stereo it doesn't go up linearly and the amplifier it goes up I believe it's on 3 something to the 4 . 3 power were 1 . 3 power it's editor it's it's not linear in up and a lot of stuff is exponential the and DB for another no by powers of 10 and now have adding lived 3 7 . 8 earthquake years ago in Los Angeles they appreciate the Richter scale more than I did before you look out your window you see a bridge collapse
now ratio scales the sort of the ultimate and that's what we use a scalar measurement of some of this is what they think of I got a natural origin of some kind of 0 point the scales got strong ordering and the uniform of the unit is uniform in its dimensions lead with height all the things to use commercial like Our ratios the coloration scale because everything is expressed off single unit as either a fraction or multiple where the platters ED system of weights and measures for the metric system majority is a little better example our powers of multiples of 10 or fractions of attempts nicely and easier to work with all and we got this number system the Hindu Arabs in it for us I often wondered about the Hindu Arabs but the result was a kid there was and we use Hindu-Arabic numerals and I never met a Hindu not quite all
library classifications of skills important because of the money to convert between scales they have to be of the same type for the convergence to make sense they do in normal to a nominal scale it's a mapping one-to-one mapping
preferably I don't think we've
lost picture in the following the
slave and this back and it's a
little bit ahead OK so nominal
scales one-to-one mapping of Jason's in Canada French English dictionaries this thing it a 1 to 1 of 4 at least some of the some of the terms on Audio monotonic function that preserves the ordering not necessarily the same the same values on each scale but 1 preserve that ordering well that's why we call the ordinal scale 1 of the same value of Western and Chinese chess pieces as the really good ones may be the dates and calendars right to rank
sales monotonic function preserves the order and might not always be a good match the army 1080 ranks in problems there in particular I and I don't know if this is still true the US Army used to consider were officers to be officers accum officers privileges the British army I did consider them to be enlisted and they didn't get officers privileges and varies cut points but a mapping interval scales linear functions and shift the origin point on back and simple conversions of celsius to fahrenheit the 9 over 5 plus 32 if they get that right and I just remember down zeros freezing hundreds boiling of 25 is a little colder than I'd like it and 30 to 35 as comfortable I live in Texas and I keep my house set at 80 fahrenheit yes is a little unusual which use a lot through your scalp the ratio scales constant multiplier of leaders Kuo leaders to courts to point to exact conversions that's why we like ratio scales they're easy to work with interval scale something that do little benefit ratio scales simple multiplication the derived units that the concept of the primary unit this goes over to the metric system actually system International of and the ISO standard 29 55 is but all the official definitions for the derived metric units on kilometers per hour square meters can be all cut all kinds of combinations of differences some of them will not make sense but terms pretty much you can now you put the 2 primary units together multiply make it something that's meaningful if you would have a look at some of the definitional pastel as a unit of pressure in there is a little more multiplying dividing and then you might like but but it grows general statements in the database if I'm going to derive something and the idea I had really rather do the calculation in the database from the simplest most primary units I can store is this is a generalization that way if I need to do something else with them I don't have to work to try and pull out the primary units to get him again and multiplication street on computers are over this computing the so in the old days yes you when people had to do it it was a a little more work than we like but it was worth storing the computer that the computation rather than the the basic units there is a generalization of but machine you computing is actually faster the processes working in nanoseconds you describe is still a lot slower than nanoseconds even if you're doing a solid-state drives so it's actually faster quick summary
on scales from weakest to strongest nominal categorical absolute ordinal rank interval in its various forms of when log and finally ratio scales that of the if that's usually 1 lecture sort of stretched out when I'm doing this for a class and at this point you will all be drawing a 3 by 5 card of that's hinges on an 85 effect of a sheet of paper with something written on it like ring size shoe size and you would have to go to work to Google for the library and look out exactly what kind of scale we're using for this is that that on the total make
it easier to work with we're getting into a database of symbols and I have frankly thanks to a wonderful thing called a unit code the representation the alphabets actually subset of the Latin alphabet model the ASCII characters numbers 0 denied and some symbols and the rules for manipulating the codes and got math numbers that string operations and technically taken these days I can put you a data directly into the databases but we really don't like to do that so much early says as old as will guys it never really got over the idea of having graphics and that sort of stuff in our computers I I don't like it they're hard to search 1 of my favorites was still couple decades back IBM was pushing of picture recognition will probably be face recognition now and they had example the wonderful product for you can sit down some colored pencils draw quick picture of something and then search photographs with your drawing that they were trying to do the general face recognition is is a whole science itself but they were trying to do it very general like thereof example was finding a banana and all these pictures of of fruit it found the bananas is actually quite good about that it also found to camp and he showed up as a banana so we're we're still working on it and in
particular we got unit code let's put all the alphabets in simple systems are known to man in the US in 16 bits it's my sinuses of it's a nice idea so a basis for so for encoding is the ISO people specifically wanted to get this minimal subset in all the languages on earth this is why you can write and then number in Chinese of it's part of the of the unit code set or in the other languages Latin alphabet no accents no sensitivity some of the positions can be numerically might be rules for disallowing but also that we don't get confused or where the digits and numbers can vary in a minimal set of punctuation marks pretty much commas dashes period or . against that in encoding slashing underscore about things like and actually the names for these up to 4 it is not a hashtag typesetters call up to 4 that little thing that you think is an agent is an ampersand all of them have rather fancy names of but at the at sign is technically the little snails it sounds much better in French but that was the year the official name the trouble with the special symbols is that they have meetings in other languages and systems were when not Microsoft was talking about how they were going to really get on this internet the bob and do their partner and Bob Loblaw well they mean something C sharp without any knowledge that the of the octave had meaning on the internet J. that's careful research guys you're really into this then but and then doing
encoding the display is important encodings should be convenient for people FIL those damn uses that wonderful systems 1 of the users always screwing up something in particular but about encoding I can either do fixed or varying length I would prefer fix owing the length of the code is part of its validation if I C 5 digits I know it could be a U. S. zip code if I see a mixture of digits and letters I know it could be a Canadian postal code get it follows the right pattern with the letter and digit a letter for the 1st part no canadians here now and all but the other 1 if you wanna really go nuts get a copy of the British system it's between 4 and 12 letters and numbers that are actually the abbreviations or attempted abbreviations of old post offices that existed in the late or middle 18 hundreds in England it's completely unusable and possible it so bad they're introducing a 5 digits of commercial code on the base of the U. S. zip code for bulk mailers because their own system is proven to be so on usable also the royal Post Office of the Royal males have a monopoly on their guidebooks for the for doing the addresses it's illegal to set up your own postal code service in the in the UK your government work on the fixed link also has another advantage everybody remember printers and and paper we used to get our our data on all of that stuff well that goes back to the old punch card days we had fixed length fields of fixed length columns fixed-length displays 80 columns across a lot of 30 to 70 video screen but more than that it's something of a person can see can can line up varyingly gets confusing that's being nice about it but it's the worst standard will probably run into is not the British postal codes it is a thing called the I bond the International Standard Book Number of bank number bank account number are it is 50 something characters long it includes the put of the account numbers the country codes of whole bunch of stuff crammed this 1 unreadable strain that only a machine and on a swift of system can figure out the people that work with it can't read them the people who work in the automobile trade can reduce the number of edges only 19 characters walls but nobody can read the thing is with human processing you don't read the letter by letter you read in chunks or bone was all you cluster things 3 is the best people get 3 digits of 3 letters correct almost all the time you can go up to 5 very safe are but beyond 5 distributing errors In event therefore common errors missing character extra character 1 bad character and then pairwise transposes that's probably from typing but pairwise principles is the 4th most common for phone numbers are grouped into an exchange a dialing area and the actual phone within it all move and it's very convenient to read it that way of doing OK on
time and i think ok so what about bad encoding schemes well 1 of the characteristics is there's no room for growth in the 19 seventies when still on punched cards in the state of Georgia down the states but we had all of type codes was 1 point and punch cards and it was originally taxis private vehicles farm vehicles of just 7 or 8 of the offer the type of license tag you got I was very nice it worked fine then came a thing called the commemorative tags which state governments love the cause with a commemorative you could charge extra great revenue source California makes a few hundred million dollars of of the commemorative text so every group that had a cause every college veterans group whatever 1 the commemorative tag alone would you like to be kind to animals final question 35 dollars and display it on you and your license tag and better than your neighbors but the problem is that we will not having to put all kinds of different codes abuse when I left it was about 35 about because every college and have its own commemorative 10 so how did get 35 different punches and of the galley people wrote to a pointer to be honest with us thank you thank you thank you know usually when I do that I get what I call my fish market all the kids that's people under 40 said that look like dead fish novel when eyes glazed looking at you and so we found that you could multi punch it holds you the key down and then you'd punch several combinations is 12 columns from gold rose to a column so you had to do that on to the 12 possible combinations in a little translation thing to this side all we demand that we had 0 2 9 0 2 7 and 0 2 8 IBM keypunch machines which are all different and you know that keypunch machines so you not only have to know what the multi punch was you had to know what machine you were punching at all otherwise the tags you get all messed up on no room for growth if they had allowed to digits for license tag type it would have been no problems and would say there's quite a lot of work the other 1 how many people never worked with cobalt yeah I don't tell Mother should be so ashamed to things it was put to work of playing field of orbitals of with them American Hunger for an hour for their dealerships figured we would never have more than 10 thousand dealerships in the United States this is what they were bringing over the Honda scooters you read the Beach Boys song kind as the light my little Honda Google and after that I'll tell you might have had their bell bottoms in many streets next but they that they did allow room for that and they had to review all of the cold war files this is the thing about this with the with SQL if we want to make so that we got have an abstract view of data if we want to make something bigger we just change a check clause to use a different range or we alter table you don't do that in Cobol what you see is what is processed everything as character strings exactly the way it appears on on on the a physical medium but it was really a major leap but no more than 10 thousand dealerships of remember I was a bill gates why the hell would anybody need more and 64 K on a home computer with a 46 48 and that's and have to all watch with more than 1 that's another bad encoding scheme that you spend happens more than you think is ambiguous codes my favorite example was the old International Standard Book Number his B because these don't bookstores of it was 10 digits made up of 4 parts the 1st 1 or 2 digits Variable Length pieces is always the language 0 0 1 1 word English 93 is Esperanto sort of the end of the list I don't know her Klingon and all roughly figure on the on the scale invariant there and not they might think of the publishers code the bigger the publisher the shorter the code 3 digits for up to 7 figures a small 1 time I think the book number within publisher and then our model 11 check digit the catch is without any punctuation in the 10 digit is been you couldn't cut it up in various ways and in the early days they had is that could be passed in 2 ways there were about a 15 or 16 of them and it was enough to mess up libraries for a while that has since been fixed and the has been his own is good in usable elements it's part of the IEA encodes the miscellaneous coding if it gets used a lot something's wrong you skip too much but now the other thing with
a bad code there's no support for exceptions everything just gets in that miscellaneous category but I can have unknown values missing values 0 wait a minute for us that's an old we welcome you have a noble column who you bothered to actually document what the null means in context is it means something's missing of evading all phone number that is that mean I don't have a phone or does it mean we don't know it's funnier on non applicable them just came in Crazy non-applicable you would probably laugh at but I just got through working for a company called in my being no there in the insurance industry they've been in business for well over a hundred years they did not get their name when the movie came out but they get kidded about it when they go to insurance conferences and occasionally wear sunglasses on the thing is dirty data coming in from insurance companies so they need but rather elaborate set of non applicable value already mentioned miscellaneous unclassified but bad design you've missed overflows underflows bad divisions computations that are garbage now I can have an error in 1 field that's what take care of that but how about the 2 fields are related a medical record that shows a pregnant man Bruce gender is going to be so disappointed off have to get that at about it's computable we could find that we don't know but there was a spots committee back in 1975 that issued a list of 14 different kinds of missing data later on a follow-up to the sports committee report he was 22 different kinds of missing data and statisticians have all kinds of ways of trying to correct for the missing data if you're deal with each now they they then this missing data that well if you
think that designing coding schemes is not important try doing math with Roman numerals for a week that Roman numerals were such a bad encoding system but even the Romans didn't do math with another they looked it up and then look up tables revision location on trying living without alphabetical ordering for a week in Hong Kong and this they still do when Hong Kong the telephone operators used to have a contest every year were you would call or where they would identify somebody's phone number from their name and the winners of these up contest would have memorized 10 thousand 15 thousand 20 thousand different names and phone numbers and be able spit them back I had a friend who was talking English in China before taking part in square and that she would get her class roster 150 students translated and in the Roman letter that was never sort of no alphabetical order and it was never the same order from week to week of everybody can memorize 150 names can if you use working the Chinese units so that for an English they go to a library and try and find something without the Dewey Decimal Classification bodily despotism some problems ordered that little bit but term I make it I mention this thing that organizing a lot alike are like a library by color actually before the Dewey Decimal Classification System came on every college library and public Library had its own individual classification system invented by the head librarian at that particular school my wife volunteered for a feminist bookstore in this in the seventies in Atlanta and 1 of the people was definitely not a book person she did sort the books in the store my color because she thought they would look pretty do not work with hippies if you can help but it will not end well for the drugs will be good but it's not worth the way the work you have to do afterward if I had a good encoding scheme my aggregations inquiries and so on earth are easy much easier on then going back to doing decibel if I want to look up science books I know there in the 500 series and I want to look at math books I know it's in the 5 tends I can immediately zoom in on a small subset I can use between predicates so in my SQL those of undue calculations get to be a lot more accurate time you have to worry about things on borders and fuzziness the kinds of encoding enumeration make a list of values assign a name or a number of tag number 2 on it's really a nominal scale with the name attached to it but if you can get some kind of ordering on the symbols of that's a nice chronological procedural on some sort of physical thing the bad news about sorting codes by alphabetical order numerical order is what languages were you using for the alphabet canadians are probably a little more aware of that but how many times special again I have to hit Alt-C sequel server people because they are the worst SQL programmers on earth there all VB programmers who's boss when Paterson into a cost that they learned it at home and we can't of they will use their up auto-increment their identity column just make a list of things and whatever physical water the other values happen to be in the table the physical storage that becomes the encoding for it no logic no-fault nothing this
measurement codes can a unit of measure so I put the unity and I know it's for that column are expressed it here and they have to express may have to tell what you need then using essentially just recording the met of measurements worst design is to put the unit of measure in the same column is the value of the measurement dollar signs the dollar amount that's Koball why because cobalt was concerned physical display there was no difference between storage and physical displaying still people do it you need if you have mixed units you need to have
column tells us for unit of measure is somewhere the abbreviation encoding that a shortened version of the of the value of the minimum value and come up with a usually you can fix length of abbreviation the goal of an abbreviation code is to be people readable I'm going to set everybody out of for the most part flew here did you know she 3 letter on airport codes it doesn't take too much to figure out that of the is Boston Our ATL is Atlanta of Y 0 . if you not so much but I'm going to a wildcat the kid in this in this and the same thing with some of the of smaller really small airports in in the in the Great North and Alaska you can is easing w's on but it's a pretty weird when that just are not intuitively obvious to be added to the casual observer the nice part with abbreviation codes is that business about a human being figuring it out and and don't fight
user of romantic codes I got a procedure to encode the value it might not be immediately human readable in fact in the case of encryption it better not be immediately human-readable several problems but you don't think about it but rounding numbers technically encryption is technically an algorithm hashing functions where there's no way to to look at a time immediately tell what the original work
of hierarchical cos I really like hierarchical codes I fell in love with Dewey Decimal but they're usually numerics but they can be mixed alphanumeric so the Library of Congress system for libraries is actually more accurate the Dewey Decimal on and it's it's mixed alphabetic I'll zip codes in the United States are based on geographic petition it begins with 3 it's in the southeastern United States multiple states and that that by then as you go from left to right it gets more and more accurate so 300 would be the southeast 303 would be parts of Georgia 303 10 would be a subset of Atlanta Georgia and it's nice and easy to look at and you got an
idea when you see the code which a zooming in on this not bad news with that you can put stuff in the 1 part of the hierarchy Dewey Decimal put logic under philosopher why because we know little doing was inventing the system George Boole had not written the walls of thought and there was no mathematical logic when the last thing you saw a philosophical logic book there may have been 1 written but for the most part nobody's written philosophical logic for the past 150 years or longer it became a branch of mathematics on so it's a little little messy and we brandling moved logic more toward method in the newly decimal system if I don't have enough space in my hierarchy for things in particular 1 of mobile Dewey's other prejudices was that there were Catholics Protestants Jews Muslims and miscellaneous down the Library of Congress has more books on Buddhism and it doesn't Christianity they've been writing longer there's more breakdown of subsets of 1 and in the eastern religions but they all got put in 1 miscellaneous religion it's a it's an old British cartoon of Sergeant Major standing in front of us is is Indian troops of the Church of England to the right proximal middle plates to the left in fancy religions to the back and that was pretty much unknown although we sort of 192 that could reasonably fall under multiple codes church architecture and the worship service of could be under under religion and architecture we'll think about it but how Christian churches layout is across that it that affects how we do our ceremonies but did you know the aisle is the short arm of the cross so when the book right walks down the isle traditionally it was down along link to the church was originally from the size of the Muslims have a square for from last and there's other things for the architecture affects how the service itself through a kind of an interesting topic pretty put architecture religion or both well the solution for the librarians is whatever the Library of Congress says is the code is where you put it in
when no 1 thing at a time on vector code you made up of parts but the whole has to be there are the components to be independent dependent on each other but the whole code is a unit my favorite ISO tire sizes a 155 S R 15 is metric with the S R stands for steel radial and then the diameter is still in inches but that's going to change so and obviously I cannot have all we'll without or with a diameter and it's ever be made out of something I cannot take any those components they're pretty much independent of the social security numbers are US thing that needs to be removed we've now done away with some meaningful social security numbers because we have so many illegals they're not being assigned randomly yeah it's that bad
concatenation codes variable number part just keep adding onto the end of it they can be ordered or unordered you don't think about it the keyword list at the front of an article of from a limited vocabulary is incarnation code checklist vesicle they were called vesicles in Europe they're not in favor so of so much anymore for designers and it used to be when they were physically written down and they were literally concatenated on indeed initially each step in a process they were popular in the in the airport on the aircraft industry but not so much anymore OK guidelines do not reinvent the wheel but can all with my face major world was working at of 100 but you can research of encoding schemes do it it's quick and easy can you say google google is our friend so cools the thing we ever had when had to go on to paper copies now here's the bad news on Google so it's too good it's at times so you really need to know your industry my in my book i have a whole chapter on sex codes of the standard 1 that will probably use is idea so there on 30 66 0 4 unknown 1 from a male to for a female 9 for lawful person like corporations organization then there's a whole bunch of other codes used by biologists people were very dull others and Bruce Jenner we don't change around a lot of which is generally come in 2 flavors of it but we needed to the different biological codes for all the medical staff i which when he when he is probably for commercial stuff male female unknown or by the way of the reason for his 0 nines for the unknown the miscellaneous code in the ISO 6 codes is punched cards In the old days they blank column and then punched column would be read as a 0 by for tree and you can also make cobalt read it is 0 so that we could take the card the unit record as it called and we punch when you found out about the code should be nines were easy to do on a keypunch machine useful key down full of all lines and that way but always sought to the bottom of the report yes encoding schemes were designed for the physical use of a keypunch machines a lot of things are like that of inquiry by a railroads a certain with the because at the time he heard this when by the US railroads a certain with the agent on the tracks because we follow the British patterns within the British pattern follow the Roman chariot the with what Roman chariot also we've been following courses asses of tolerance for years it says the exceptional value should be explicit allow for expansion also this is gonna sound really dumb but you'd be surprised that should put a translation of the codes in the database somewhere for somebody get machine version of it I wish I was making that up but when I had my 1st set of eye surgery done in Los Angeles at Cedars-Sinai a major hospital of the good reputation blah-blah-blah but they went down to fill out the the forms the clock and a loose-leaf notebook with all the codes so we could punch medical codes and stuff into a 30 to 70 IBM terminal this is a 19 eighties this is not the 19 fifties in the 19 eighties they had not no drop menus no PCs are and everything was still in laminated pages and a major hospital OK so I'm running
a little bit longer than they should of questions comments feedback if you want us from something it has to be soft that's all I ask anybody giving applause and that
Fundamentalsatz der Algebra
Videospiel
Kraftfahrzeugmechatroniker
Programmiergerät
Statistik
Hardware
Punkt
Datennetz
Quick-Sort
Computeranimation
Maßstab
Automatische Indexierung
Konditionszahl
Datentyp
Wort <Informatik>
Messprozess
Optimierung
Schlüsselverwaltung
Unterring
Benutzerfreundlichkeit
Raum-Zeit
Computeranimation
Physikalisches System
Fächer <Mathematik>
Gruppe <Mathematik>
Nummernsystem
PERM <Computer>
Datenverarbeitung
Spezifisches Volumen
Optimierung
Informatik
Einflussgröße
Division
Klassische Physik
Physikalisches System
Ausgleichsrechnung
Maßeinheit
Spannweite <Stochastik>
Texteditor
Gruppenkeim
Automatische Indexierung
Einheit <Mathematik>
Messprozess
Metrisches System
Term
Fitnessfunktion
Algebraisches Modell
Nachbarschaft <Mathematik>
Unterring
Bit
Punkt
Gruppenoperation
Kumulante
Gruppenkeim
Bridge <Kommunikationstechnik>
Term
Rechenbuch
Physikalische Theorie
Raum-Zeit
Division
Computeranimation
Arithmetischer Ausdruck
Spannweite <Stochastik>
Einheit <Mathematik>
Maßstab
Punkt
Tonnelierter Raum
Einflussgröße
Schreib-Lese-Kopf
Lineares Funktional
Zentrische Streckung
Knoten <Mathematik>
Schießverfahren
Dreiecksungleichung
Kategorie <Mathematik>
Division
Linienelement
Güte der Anpassung
Rechnen
Quick-Sort
Spannweite <Stochastik>
Chirurgie <Mathematik>
Dezimalsystem
Funktion <Mathematik>
Gruppenkeim
Flächeninhalt
Einheit <Mathematik>
Ablöseblase
Messprozess
Term
Aggregatzustand
Fehlermeldung
Selbstrepräsentation
Familie <Mathematik>
Computer
Symboltabelle
Rechenbuch
Computeranimation
Datenhaltung
Softwaretest
Maßstab
Gruppentheorie
Prozess <Informatik>
Skalenniveau
Nummernsystem
Gruppoid
Ordnung <Mathematik>
Einflussgröße
Zentrische Streckung
Nummernsystem
Statistik
Kategorie <Mathematik>
Datenhaltung
Symboltabelle
Physikalisches System
Rechnen
Vierzig
Arithmetisches Mittel
Zeichenkette
Zustandsdichte
Demoszene <Programmierung>
Kategorie <Mathematik>
Zeichenkette
Prozess <Physik>
Punkt
Pauli-Prinzip
Element <Mathematik>
Gruppenkeim
NP-hartes Problem
Element <Mathematik>
Symboltabelle
Zählen
Rechenbuch
Computeranimation
Einheit <Mathematik>
Dämpfung
Maßstab
Nummernsystem
Punkt
Ordnung <Mathematik>
Zentrische Streckung
Nichtlinearer Operator
Nummernsystem
Oval
Kategorie <Mathematik>
Güte der Anpassung
Abelsche Kategorie
Element <Gruppentheorie>
Rhombus <Mathematik>
Dezimalsystem
Datenfeld
Menge
Gruppenkeim
COM
Einheit <Mathematik>
Ordnung <Mathematik>
Fitnessfunktion
Kategorizität
Sichtbarkeitsverfahren
Folge <Mathematik>
Kategorizität
Quader
Klasse <Mathematik>
Automatische Handlungsplanung
Interaktives Fernsehen
Skalenniveau
Rhombus <Mathematik>
Koroutine
Stichprobenumfang
Zählen
Speicher <Informatik>
Operations Research
Varianz
Schreib-Lese-Kopf
Paarvergleich
Skalenniveau
Paarvergleich
Menge
Mapping <Computergraphik>
Zeichenkette
Codierung
Nichtlinearer Operator
Zentrische Streckung
Abstimmung <Frequenz>
Punkt
Prognostik
Skalenniveau
Physikalisches System
Rang <Mathematik>
Rechnen
Ranking
Quick-Sort
Computeranimation
Gruppenoperation
Objekt <Kategorie>
Spannweite <Stochastik>
Maßstab
Paradoxon
Einheit <Mathematik>
Skalenniveau
Zeitrichtung
Operations Research
Ordnung <Mathematik>
Resultante
Punkt
Dichte <Physik>
Natürliche Zahl
Bridge <Kommunikationstechnik>
Kardinalzahl
Dicke
Login
Computeranimation
Einheit <Mathematik>
Hausdorff-Dimension
Maßstab
Nummernsystem
Uniforme Struktur
Bildschirmfenster
Exponent
Punkt
Einflussgröße
Lineares Funktional
Zentrische Streckung
Bruchrechnung
Exponent
Datennetz
Kategorie <Mathematik>
Ein-Ausgabe
Teilbarkeit
Maßeinheit
Arithmetisches Mittel
Texteditor
Funktion <Mathematik>
Einheit <Mathematik>
Grundsätze ordnungsmäßiger Datenverarbeitung
Ordnung <Mathematik>
Hausdorff-Dimension
Zeichenvorrat
Spezifisches Volumen
Multiplikation
Lesezeichen <Internet>
Reelle Zahl
Skalenniveau
Konstante
Spezifisches Volumen
Operations Research
Gleichmäßige Konvergenz
Leistung <Physik>
Logarithmus
Physikalisches System
Quick-Sort
Bildschirmmaske
Mereologie
Ruhmasse
Kantenfärbung
Größenordnung
Metrisches System
Lineare Abbildung
Bit
SCI <Informatik>
Multiplizierer
Textur-Mapping
Rechenbuch
Computeranimation
Datenhaltung
Maßstab
Nominalskaliertes Merkmal
Skalenniveau
Datentyp
Konstante
Programmbibliothek
Data Dictionary
Vorlesung/Konferenz
Punkt
Zentrische Streckung
Datentyp
Mathematisierung
Skalenniveau
Umsetzung <Informatik>
Ranking
Schlussregel
Mapping <Computergraphik>
Funktion <Mathematik>
Einheit <Mathematik>
Injektivität
Quadratzahl
HMS <Fertigung>
Umsetzung <Informatik>
Prozess <Physik>
Punkt
Multiplizierer
Computerunterstütztes Verfahren
Computer
Textur-Mapping
Rechenbuch
Computeranimation
Eins
Einheit <Mathematik>
Maßstab
Meter
Punkt
Schnitt <Graphentheorie>
Zentrische Streckung
Lineares Funktional
Befehl <Informatik>
Datentyp
Datenhaltung
Rechnen
Ranking
Linearisierung
Konstante
Computerschach
Funktion <Mathematik>
Einheit <Mathematik>
Rechter Winkel
Ordnung <Mathematik>
Standardabweichung
Lineare Abbildung
Subtraktion
Schaltnetz
Gefrieren
Term
SI-Einheiten
Datenhaltung
Virtuelle Maschine
Multiplikation
Rangstatistik
Nominalskaliertes Merkmal
Skalenniveau
Konstante
Data Dictionary
Matching <Graphentheorie>
Mathematisierung
Skalenniveau
Gasdruck
Umsetzung <Informatik>
Data Dictionary
Office-Paket
Schlussregel
Mapping <Computergraphik>
Quadratzahl
Injektivität
Quadratzahl
HMS <Fertigung>
Metrisches System
Punkt
Kategorizität
Selbstrepräsentation
Mathematisierung
Zeichenvorrat
Symboltabelle
Computerunterstütztes Verfahren
Nichtlinearer Operator
Computeranimation
Datenhaltung
Bildschirmmaske
Informationsmodellierung
Lesezeichen <Internet>
Unterring
Einheit <Mathematik>
Nominalskaliertes Merkmal
Rangstatistik
Digitale Photographie
Skalenniveau
Nummernsystem
Nominalskaliertes Merkmal
Soundverarbeitung
Nummernsystem
Zentrische Streckung
Nichtlinearer Operator
Gerichtete Menge
Datenhaltung
Mathematisierung
Schlussregel
Symboltabelle
Skalenniveau
Mustererkennung
Biprodukt
Codierung
Ranking
Quick-Sort
Chipkarte
Zeichenkette
Teilmenge
Betrag <Mathematik>
Codierung
Lateinisches Quadrat
Kategorizität
Distributionstheorie
Sensitivitätsanalyse
Bit
Mereologie
Prozess <Physik>
Datensichtgerät
Formale Sprache
Adressraum
Dicke
Computeranimation
Internetworking
Videokonferenz
Einheit <Mathematik>
Gruppentheorie
Standardabweichung
Vorzeichen <Mathematik>
Nummernsystem
Mustersprache
Lateinisches Quadrat
Nummernsystem
Dicke
Zeichenvorrat
Extremwert
Frequenz
Dialekt
Ereignishorizont
Codierung
Arithmetisches Mittel
Teilmenge
Zusammengesetzte Verteilung
Dienst <Informatik>
Datenfeld
Menge
Verbandstheorie
Digitalisierer
TVD-Verfahren
Oktave <Mathematik>
Standardabweichung
Fehlermeldung
Teilmenge
Ortsoperator
Regulärer Ausdruck
Zeichenvorrat
ASCII
Sensitivitätsanalyse
Touchscreen
Ortsoperator
Datenmodell
Unicode
Validität
Schlussregel
Symboltabelle
Physikalisches System
Paarvergleich
Frequenz
Binder <Informatik>
Menge
Lochkarte
Office-Paket
Zeichenkette
Flächeninhalt
Formale Sprache
Basisvektor
Mereologie
Zahlzeichen
Codierung
Lateinisches Quadrat
Chipkarte
Domain <Netzwerk>
Mereologie
Punkt
Datensichtgerät
Formale Sprache
Gruppenkeim
Kartesische Koordinaten
Computerunterstütztes Verfahren
Computer
Element <Mathematik>
Dicke
Computeranimation
Lochstreifen
Minimum
Nummernsystem
Translation <Mathematik>
Figurierte Zahl
Metropolitan area network
Nummernsystem
Zentrische Streckung
Dicke
Datentyp
Sichtenkonzept
Physikalischer Effekt
Kategorie <Mathematik>
Abstraktionsebene
Pufferüberlauf
Ausnahmebehandlung
Quellcode
CAM
Kontextbezogenes System
Codierung
Sinusfunktion
Datenfeld
Verknüpfungsglied
Gruppenkeim
Menge
Geschlecht <Mathematik>
Digitalisierer
Ablöseblase
Charakteristisches Polynom
Schlüsselverwaltung
Fehlermeldung
Tabelle <Informatik>
Zeichenkette
Aggregatzustand
Explosion <Stochastik>
Subtraktion
Lochstreifen
Ausnahmebehandlung
Existenzaussage
Invarianz
Schaltnetz
Prüfziffer
Nummerung
Division
Virtuelle Maschine
Datensatz
Multiplikation
Variable
Spannweite <Stochastik>
Informationsmodellierung
Dateisystem
Datentyp
Programmbibliothek
Zeiger <Informatik>
Physikalischer Effekt
Fehlermeldung
SPARC
Division
Orbit <Mathematik>
Mailing-Liste
Elektronische Publikation
Lochkarte
Quick-Sort
Formale Sprache
Pufferüberlauf
Zahlzeichen
Mereologie
Codierung
Wort <Informatik>
Verkehrsinformation
Retrievalsprache
Programmiergerät
Bit
Datensichtgerät
Formale Sprache
t-Test
Versionsverwaltung
Fortsetzung <Mathematik>
Kardinalzahl
Rechenbuch
Computeranimation
Eins
Einheit <Mathematik>
Maßstab
Vorzeichen <Mathematik>
Abzählen
Nichtunterscheidbarkeit
Nummernsystem
Ordnung <Mathematik>
Einflussgröße
Nummernsystem
Nichtlinearer Operator
Kardinalzahl
Güte der Anpassung
Reihe
Speicher <Informatik>
Mixed Reality
Rechnen
Codierung
Teilmenge
Prädikat <Logik>
Dezimalsystem
Näherungsverfahren
Einheit <Mathematik>
Benutzerschnittstellenverwaltungssystem
Server
Identifizierbarkeit
Messprozess
URL
Programmbibliothek
Ordnung <Mathematik>
Tabelle <Informatik>
Subtraktion
Wasserdampftafel
Physikalismus
Mathematisierung
Klasse <Mathematik>
Zeichenvorrat
Nummerung
Mathematische Logik
Term
Datensichtgerät
Datenhaltung
Mailing-Liste
Task
Nominalskaliertes Merkmal
Skalenniveau
Programmbibliothek
Speicher <Informatik>
Schreib-Lese-Kopf
Mathematisierung
Abzählen
Symboltabelle
Mailing-Liste
Physikalisches System
Umsetzung <Informatik>
Quick-Sort
Quadratzahl
Fuzzy-Logik
Betafunktion
Mereologie
Codierung
Kantenfärbung
Nummernsystem
Algorithmus
Lineares Funktional
Dicke
Kegelschnitt
Versionsverwaltung
Dicke
Textur-Mapping
Algorithmische Programmiersprache
Codierung
Computeranimation
Chiffrierung
Einheit <Mathematik>
Chiffrierung
Algorithmus
Funktion <Mathematik>
Mereologie
Nummernsystem
Lesen <Datenverarbeitung>
Codierung
Luenberger-Beobachter
Versionsverwaltung
Partitionsfunktion
Mathematische Logik
Mereologie
Teilmenge
Boole, George
Zeichenvorrat
Hierarchische Struktur
Dienst <Informatik>
Kardinalzahl
Mathematische Logik
Raum-Zeit
Computeranimation
Multiplikation
Programmbibliothek
Mixed Reality
Gleitkommarechnung
Architektur <Informatik>
Mathematik
Raum-Zeit
Mobiles Internet
Verzweigendes Programm
Mathematisierung
Mixed Reality
Übergang
Physikalisches System
Zoom
Binder <Informatik>
Menge
Codierung
Netzwerktopologie
Teilmenge
Dienst <Informatik>
Dezimalsystem
Einheit <Mathematik>
Rechter Winkel
Mereologie
Dezimalsystem
Codierung
Computerarchitektur
Programmbibliothek
Aggregatzustand
Prozess <Physik>
Mereologie
Versionsverwaltung
Gesetz <Physik>
Computeranimation
Homepage
Netzwerktopologie
Komponente <Software>
Einheit <Mathematik>
Standardabweichung
Datenverarbeitungssystem
Nummernsystem
Minimum
Mustersprache
Translation <Mathematik>
Radikal <Mathematik>
Computersicherheit
Tropfen
Durchmesser
Gerade
Nummernsystem
Datenhaltung
Güte der Anpassung
Checkliste
Codierung
Menge
Translation <Mathematik>
Schlüsselverwaltung
Lochstreifen
Ausnahmebehandlung
Selbst organisierendes System
Wort <Informatik>
Stab
Mathematisierung
Virtuelle Maschine
Vektorraum
Neun
Datenhaltung
Virtuelle Maschine
Mailing-Liste
Variable
Weg <Topologie>
Datensatz
Bildschirmmaske
Wärmeausdehnung
Lesezeichen <Internet>
Polarkoordinaten
Notebook-Computer
Zusammenhängender Graph
Durchmesser
Mailing-Liste
Vektorraum
Chipkarte
Personenkennzeichen
Chirurgie <Mathematik>
Mereologie
Codierung
Wärmeausdehnung
Verkehrsinformation
Bit
Computeranimation

Metadaten

Formale Metadaten

Titel Data Encoding Schemes
Untertitel Scales & Measurements
Serientitel PGCon 2015
Anzahl der Teile 29
Autor Joe Celko,
Mitwirkende Crunchy Data Solutions (Support)
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/19119
Herausgeber PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Ottawa, Canada

Inhaltliche Metadaten

Fachgebiet Informatik

Zugehöriges Material

Ähnliche Filme

Loading...