Data Encoding Schemes
00:00
cell growth of time probably best known for having put 10 years of my life and the ANSI standards committee in writing books about it from some point I was an honest Fortran program among many years ago and then the minute I started doing SQL I was never allowed to do anything else on all the pain 0 the agony but it's been a good living however I also at 1 point in my life the a statistician all and actually worked with data I think this is probably 1 of the things that has the key programmer types forget about our we become more it's in the hardware the indexing the mechanics of our trade and you don't go back to the fundamentals of what we're supposed to be doing which is working with data and we don't get a good theory of or any kind of background knowledge for any the told you just sort of left to figure it out on your own so I have 2 main uses essentially to talks based on 1 of my books which if you buy will be able to pay my mortgage this is important but we network getting the word data science but frankly data science from what I see what's an awful lot like statistics of the firing conditions and a sixdigit paycheck so maybe that's a good thing to but we really need to go back to the fundamentals of of of what they did is how we represent that all why there's different forms of In particular
01:39
anybody remember Donald can move to the older people out of teach the can use the whole canoes and nothing but the can of his art of programming is still 1 of the classics but it was is that they've got a foot bowling ball it volumes around a bunch Codicils it's about this much space on your shelf and this is probably the greatest of computer science at nite with we've ever had he quotes everybody dextrancoated body that's how you knew your indexing was good because you can use was encyclopaedic however his 1st published work dealing with data was in mad magazine when he was in high school looking up its fourier hypocrisy the system of weights and measures it's a parody of the metric system of the illustrations of by Wally would if anybody else's mad magazine or a comic book fan and you know that name on and it's based on the fitness of Mad magazine number 26 it's so you have to be into particularly New York Jewish Yiddish humor which is curtains of thing the the editor madman using and the metric system to get around the band knows about this fall is is good for a laugh and the humor magazine you run into this kind of crops in the real world when people invent their own systems of of measurement of in data processing
03:13
systems let's get a couple of it's good to find you take a couple of
03:21
terms for a measurement out the way the range of a measurement is how how much area does it cover essentially in the space you're trying to measure of the if if I'm doing it with a gun and a bull'seye like then trying to hit on the range would be helping the target is some things are appropriate for 1 size of some things are appropriate for other sizes you know it was the of the the joke about to close enough it is very good for shoes and hand grenades but not so good for surgery on granularities how divisions do I have on my target how many units of measure how do I have that when I is 1 of my main used to I used to work for state highway departments and in the US we use the ETSI there on now only to nonmetric countries left on earth and I live in 1 of my are equivalent had was finding went metric but it's still us in Liberia from there but if you want to you can carry out the calculations in decimal feet to several decimal places but nobody has ever poured asphalt for a road or concrete for a bridge using a micrometer but they will publish calculations it would be a 10 thousand of the foot of asphalt we will consider the action measuring head knots ridiculous precision is how repeatable measurement we take it over and over to get pretty by expression errors cumulants but how close to I get in the case of the gun shooting at the bubbles like how close my shock group that would be very precise if sort of spread out all over the place and see using a shot it's not so precise accuracy has to do with how close it is to the truth how how close you get to the polls I notice the precision and accuracy are not the same thing so
05:33
of n play target with rain was high granularity theory to the point that it's meaningless on tight cluster shots accurate on but not necessarily precise and a really good gun barrel but my cites a little bit off some always kind of a laughter something going on issue on the other way around the scope is out of the other end the scopes good the barrels loosen and sort of it the general neighborhood there's also the concept of and 0 point on a scale on measurement it's where the scale starts where we start measuring all sometimes there isn't 1 that's a useful kind concept this is not necessarily a numeric 0 that's where your scale starts there's also a metric function which is basically triangular inequality if anyone remember that 1 from high school algebra on as a property so that if I got out of a metric property like a function and therefore properties in my scale I can do calculations with them on that can be meaningful yeah
06:49
what sort of funny is scaled in measurements actually didn't come in until the 19 forties as a science you would think that somebody before that and statistics there's something would have come up with that but no it was it was a little late in coming on so what's the simplest scale I can use the measures on the nominal scale take your values aside and name to them that a lot of people would like to count this as a as a kind of scale it on that name by the way can be a tag number and the character string technically can be assembled we don't like to use symbols very much their bitch to put into a computer but they don't transport very well and they're not always obvious the advantage of using a simple other measure something to name something is that it's a languageindependent completely language independent but if you want to look at some really beautiful examples of that as a with a really elaborate system of the book on Renaissance art work the stone cutters in Italy who were working for Michelangelo and all those guys were pretty much illiterate and they all had individual marks they put on the on each of the sculptor had individual work the put on pieces of marble he was getting and then little tags and other systems off of that those March of showed the father the Son the the grandson India and given families they look like alchemists symbols I'm be grateful that you do not have to put those in the database you're ever going to work with just a nice little piece of of that's it I can do any calculations from a nominal scale about all I can do with a nominal scale is ask if they are you Fred Jones is the short name on the crudest form of of of the scale also we use an awful lot and we do a bad job of it by the way of naming things inside databases could be
09:03
better let's be kind about it if I represented as character strings or numbers on I can order them but I'll be doing an ordering on the symbols on the representation rather than on the meaning these are individuals user grouping the category yeah I told this is the simplest way to do it on the next level up
09:27
is a categorical scale where I've got a group of property a category with a name sets them back to just a simple bunch of sets of flight it was my dog dogs are mammals and to set up operations that said union intersection on a good stuff categories are important and how you only get to the encodings how you off the categorical scale is tool triggered most people think of any sound problem is with these categories can I have some overlap or not what happens when I get something that's we of the Robin Williams variance pretty pretty much an American thing of a guy that was the more there would be international but up 1 of his routines when he was doing nightclub standup consisted of on not smoking a joint and deciding on and make a plan placental fuck up 1 of the many classify a platypus is the the only guy in this category known turns out there's actually about 4 other egglaying mammals the others are the kindness of course the Australia all the weird stuff process trillion what would you doing and that's something that just doesn't fit to a Martian I can make a new category OK I have to have allowed for new categories in my codes I I can make a miscellaneous category that's a good way to screw up things because miscellaneous winds up being so next you you can do any meaningful work with everything you forgot about it's it's it's like a garage and it's it's put in the category of piles up and tell somebody comes in cleans it out or you just pretend it doesn't exist excluded now the other question whether categorical scale is can I actually see individual members or are they simply members of a group it's worth telling people apart it is not worth numbering grains of sand so of the idea of a commodity in a categorical scale is a little hard for people to get their heads around absolute scale is just count on the set I can and find it induces a map I can add and subtract numbers beginning for an absolute scale the work all the elements in my in my groups have to be interchangeable it's a dozen eggs it's not and 1 and 2 and 3 it's a dozen eggs on and can be give names to these units that does is the grows choir rain all 1 sorry we train in the paper industry there's about 3 of them on 500 versus 450 sheets of paper of certain sizes on my favorite across the 6 pack it's not a drinking problem it's a solution what what was funny was in England when they went over to the to the decimalization way back when 1 of the of dairies there could only 10 pack of eggs I'm sorry Englishspeaking countries traditionally now metric acts will be promotion of Englishspeaking countries have a lot of dozens for X it didn't sell never mind that the cost of the eggs and those 10 bags was actually cheaper than they had been in the original doesn't backs people just to strange or if you want to look to a beer store of which I like to buy a sixpack or 5 pack even if the cost per beer was cheaper would it just seems somehow wrong the removal of 5 pack for the kids those traditional units really get locked in on OK obviously better as 0 point on the scale of the empty set and Indian cotton is is where things start element
13:50
ordinal scale and when put in order on some like our are no operations just comparisons just as a sequence of no no origin no 0 point on anybody else have to take a geology class in college yeah we've got the least 1 g of a one geology victims of the only nice thing about geology as far as I was concerned with a UBM ax you go and hit rocks with the rest of it I have never had to use anything I learned in my freshman geology class anytime it did not help the poor concrete for a driveway or absolutely useless but we went out to the field to plaque interacts with the geologist pick we got a box of the samples pull MOS scale and it was mineral samples and they have 10 compartment box on and what you would do is you take your your sample and scratches on the various elements in the in the MOS scale and you could say well this is harder than tell but softer than gypsum but what could scratch what strictly comparison of it is a quick easy way for someone running around and appear shorts with pick axes and this box rocks in the field leading poison ivy because he has to do this for his freshman science credits to 0 to carry things the real way to do hard this would have been the Rockwell scale but used in manufacturing for steel and other metals but we didn't have that 0 by the way they never gave you a diamond in your most scale usually there was a piece of really hard steel in there so technically it wasn't but in what did you expect when went to the school bookstore together but just comparison just a linear ordering all
15:55
he how little thing about ordinal skills they're not required to be transitive place ever places paper rock person the other games but someone 1 of the big bang theory who bought yeah yeah and using the Tshirt with the nontransitive ordering of that of those things of object we really hate nontransitive relationships we want a transitive relationship we want it properly tightly well ordered it's if you can't make the calculation there anything much off of our and nontransitive scale it's also nouns skills are also the tool for fixing elections you have more than 2 candidates look up arrows paradox and that it's impossible to get a fair voting system if I if
16:57
you don't have to to people a tight ordering range scales a sort of AI tightening on ordinal scales there's an origin point there wellordered they're guaranteed to be wellordered military ranks are costly the obvious 1 for that can do any operations on them I cannot take 3 privates put them together and make a sergeant of the ordering still stands if you shoot your sergeant you still have to take orders from the captive the the transitive ordering and of is tight we like those might not be able to do much that but I can sort of sorting is good
17:42
terrible scales are really what when you say scale measurement of people this is what they think of there's a natural order into the unit I don't have any origin point but arithmetic makes sense because of my units on it's uniform in its dimension most common interval still use calendar the common unit is a day regardless of how you cut up year workgroup your days together so got a common unit the day against guys might not get much of this but for some reason among Christian fundamentalists in the United States there is a belief that God made the sevenday week no actually Hebrews did the Romans had a 10 day week parts of Africa had 10 day weeks up how do you want to cut up your your units of completely arbitrary but econometric function I might I can add and subtract I've got a linear ordering I can't divide 2 days by each other Christmas divided by Thanksgiving doesn't mean anything when my favorite Tshirts right now is on the scale from 1 to 10 what color is your favorite letter of the alphabet I show that the people in the starting to do stupid stuff with the data of and what's funny when you ask somebody that FIL stop and think about they will try to answer you it sounds like a lot of real questions and using them in doing you understand why I think it's a real question no the intervals on the scales do not have to be the same size In fact log and exponential scales are are a lot more common than you think because you're human beings your sensory input and a lot of things you do for judges of the judgment sensory input on and on an exponential scale my favorite is the Richter scale for earthquakes each time of return number goes up 1 it's 10 times the magnitude of the previous year we just volume on a stereo it doesn't go up linearly and the amplifier it goes up I believe it's on 3 something to the 4 . 3 power were 1 . 3 power it's editor it's it's not linear in up and a lot of stuff is exponential the and DB for another no by powers of 10 and now have adding lived 3 7 . 8 earthquake years ago in Los Angeles they appreciate the Richter scale more than I did before you look out your window you see a bridge collapse
20:42
now ratio scales the sort of the ultimate and that's what we use a scalar measurement of some of this is what they think of I got a natural origin of some kind of 0 point the scales got strong ordering and the uniform of the unit is uniform in its dimensions lead with height all the things to use commercial like Our ratios the coloration scale because everything is expressed off single unit as either a fraction or multiple where the platters ED system of weights and measures for the metric system majority is a little better example our powers of multiples of 10 or fractions of attempts nicely and easier to work with all and we got this number system the Hindu Arabs in it for us I often wondered about the Hindu Arabs but the result was a kid there was and we use HinduArabic numerals and I never met a Hindu not quite all
21:48
library classifications of skills important because of the money to convert between scales they have to be of the same type for the convergence to make sense they do in normal to a nominal scale it's a mapping onetoone mapping
22:04
preferably I don't think we've
22:11
lost picture in the following the
22:23
slave and this back and it's a
22:29
little bit ahead OK so nominal
22:35
scales onetoone mapping of Jason's in Canada French English dictionaries this thing it a 1 to 1 of 4 at least some of the some of the terms on Audio monotonic function that preserves the ordering not necessarily the same the same values on each scale but 1 preserve that ordering well that's why we call the ordinal scale 1 of the same value of Western and Chinese chess pieces as the really good ones may be the dates and calendars right to rank
23:18
sales monotonic function preserves the order and might not always be a good match the army 1080 ranks in problems there in particular I and I don't know if this is still true the US Army used to consider were officers to be officers accum officers privileges the British army I did consider them to be enlisted and they didn't get officers privileges and varies cut points but a mapping interval scales linear functions and shift the origin point on back and simple conversions of celsius to fahrenheit the 9 over 5 plus 32 if they get that right and I just remember down zeros freezing hundreds boiling of 25 is a little colder than I'd like it and 30 to 35 as comfortable I live in Texas and I keep my house set at 80 fahrenheit yes is a little unusual which use a lot through your scalp the ratio scales constant multiplier of leaders Kuo leaders to courts to point to exact conversions that's why we like ratio scales they're easy to work with interval scale something that do little benefit ratio scales simple multiplication the derived units that the concept of the primary unit this goes over to the metric system actually system International of and the ISO standard 29 55 is but all the official definitions for the derived metric units on kilometers per hour square meters can be all cut all kinds of combinations of differences some of them will not make sense but terms pretty much you can now you put the 2 primary units together multiply make it something that's meaningful if you would have a look at some of the definitional pastel as a unit of pressure in there is a little more multiplying dividing and then you might like but but it grows general statements in the database if I'm going to derive something and the idea I had really rather do the calculation in the database from the simplest most primary units I can store is this is a generalization that way if I need to do something else with them I don't have to work to try and pull out the primary units to get him again and multiplication street on computers are over this computing the so in the old days yes you when people had to do it it was a a little more work than we like but it was worth storing the computer that the computation rather than the the basic units there is a generalization of but machine you computing is actually faster the processes working in nanoseconds you describe is still a lot slower than nanoseconds even if you're doing a solidstate drives so it's actually faster quick summary
27:03
on scales from weakest to strongest nominal categorical absolute ordinal rank interval in its various forms of when log and finally ratio scales that of the if that's usually 1 lecture sort of stretched out when I'm doing this for a class and at this point you will all be drawing a 3 by 5 card of that's hinges on an 85 effect of a sheet of paper with something written on it like ring size shoe size and you would have to go to work to Google for the library and look out exactly what kind of scale we're using for this is that that on the total make
27:59
it easier to work with we're getting into a database of symbols and I have frankly thanks to a wonderful thing called a unit code the representation the alphabets actually subset of the Latin alphabet model the ASCII characters numbers 0 denied and some symbols and the rules for manipulating the codes and got math numbers that string operations and technically taken these days I can put you a data directly into the databases but we really don't like to do that so much early says as old as will guys it never really got over the idea of having graphics and that sort of stuff in our computers I I don't like it they're hard to search 1 of my favorites was still couple decades back IBM was pushing of picture recognition will probably be face recognition now and they had example the wonderful product for you can sit down some colored pencils draw quick picture of something and then search photographs with your drawing that they were trying to do the general face recognition is is a whole science itself but they were trying to do it very general like thereof example was finding a banana and all these pictures of of fruit it found the bananas is actually quite good about that it also found to camp and he showed up as a banana so we're we're still working on it and in
29:42
particular we got unit code let's put all the alphabets in simple systems are known to man in the US in 16 bits it's my sinuses of it's a nice idea so a basis for so for encoding is the ISO people specifically wanted to get this minimal subset in all the languages on earth this is why you can write and then number in Chinese of it's part of the of the unit code set or in the other languages Latin alphabet no accents no sensitivity some of the positions can be numerically might be rules for disallowing but also that we don't get confused or where the digits and numbers can vary in a minimal set of punctuation marks pretty much commas dashes period or . against that in encoding slashing underscore about things like and actually the names for these up to 4 it is not a hashtag typesetters call up to 4 that little thing that you think is an agent is an ampersand all of them have rather fancy names of but at the at sign is technically the little snails it sounds much better in French but that was the year the official name the trouble with the special symbols is that they have meetings in other languages and systems were when not Microsoft was talking about how they were going to really get on this internet the bob and do their partner and Bob Loblaw well they mean something C sharp without any knowledge that the of the octave had meaning on the internet J. that's careful research guys you're really into this then but and then doing
31:43
encoding the display is important encodings should be convenient for people FIL those damn uses that wonderful systems 1 of the users always screwing up something in particular but about encoding I can either do fixed or varying length I would prefer fix owing the length of the code is part of its validation if I C 5 digits I know it could be a U. S. zip code if I see a mixture of digits and letters I know it could be a Canadian postal code get it follows the right pattern with the letter and digit a letter for the 1st part no canadians here now and all but the other 1 if you wanna really go nuts get a copy of the British system it's between 4 and 12 letters and numbers that are actually the abbreviations or attempted abbreviations of old post offices that existed in the late or middle 18 hundreds in England it's completely unusable and possible it so bad they're introducing a 5 digits of commercial code on the base of the U. S. zip code for bulk mailers because their own system is proven to be so on usable also the royal Post Office of the Royal males have a monopoly on their guidebooks for the for doing the addresses it's illegal to set up your own postal code service in the in the UK your government work on the fixed link also has another advantage everybody remember printers and and paper we used to get our our data on all of that stuff well that goes back to the old punch card days we had fixed length fields of fixed length columns fixedlength displays 80 columns across a lot of 30 to 70 video screen but more than that it's something of a person can see can can line up varyingly gets confusing that's being nice about it but it's the worst standard will probably run into is not the British postal codes it is a thing called the I bond the International Standard Book Number of bank number bank account number are it is 50 something characters long it includes the put of the account numbers the country codes of whole bunch of stuff crammed this 1 unreadable strain that only a machine and on a swift of system can figure out the people that work with it can't read them the people who work in the automobile trade can reduce the number of edges only 19 characters walls but nobody can read the thing is with human processing you don't read the letter by letter you read in chunks or bone was all you cluster things 3 is the best people get 3 digits of 3 letters correct almost all the time you can go up to 5 very safe are but beyond 5 distributing errors In event therefore common errors missing character extra character 1 bad character and then pairwise transposes that's probably from typing but pairwise principles is the 4th most common for phone numbers are grouped into an exchange a dialing area and the actual phone within it all move and it's very convenient to read it that way of doing OK on
35:44
time and i think ok so what about bad encoding schemes well 1 of the characteristics is there's no room for growth in the 19 seventies when still on punched cards in the state of Georgia down the states but we had all of type codes was 1 point and punch cards and it was originally taxis private vehicles farm vehicles of just 7 or 8 of the offer the type of license tag you got I was very nice it worked fine then came a thing called the commemorative tags which state governments love the cause with a commemorative you could charge extra great revenue source California makes a few hundred million dollars of of the commemorative text so every group that had a cause every college veterans group whatever 1 the commemorative tag alone would you like to be kind to animals final question 35 dollars and display it on you and your license tag and better than your neighbors but the problem is that we will not having to put all kinds of different codes abuse when I left it was about 35 about because every college and have its own commemorative 10 so how did get 35 different punches and of the galley people wrote to a pointer to be honest with us thank you thank you thank you know usually when I do that I get what I call my fish market all the kids that's people under 40 said that look like dead fish novel when eyes glazed looking at you and so we found that you could multi punch it holds you the key down and then you'd punch several combinations is 12 columns from gold rose to a column so you had to do that on to the 12 possible combinations in a little translation thing to this side all we demand that we had 0 2 9 0 2 7 and 0 2 8 IBM keypunch machines which are all different and you know that keypunch machines so you not only have to know what the multi punch was you had to know what machine you were punching at all otherwise the tags you get all messed up on no room for growth if they had allowed to digits for license tag type it would have been no problems and would say there's quite a lot of work the other 1 how many people never worked with cobalt yeah I don't tell Mother should be so ashamed to things it was put to work of playing field of orbitals of with them American Hunger for an hour for their dealerships figured we would never have more than 10 thousand dealerships in the United States this is what they were bringing over the Honda scooters you read the Beach Boys song kind as the light my little Honda Google and after that I'll tell you might have had their bell bottoms in many streets next but they that they did allow room for that and they had to review all of the cold war files this is the thing about this with the with SQL if we want to make so that we got have an abstract view of data if we want to make something bigger we just change a check clause to use a different range or we alter table you don't do that in Cobol what you see is what is processed everything as character strings exactly the way it appears on on on the a physical medium but it was really a major leap but no more than 10 thousand dealerships of remember I was a bill gates why the hell would anybody need more and 64 K on a home computer with a 46 48 and that's and have to all watch with more than 1 that's another bad encoding scheme that you spend happens more than you think is ambiguous codes my favorite example was the old International Standard Book Number his B because these don't bookstores of it was 10 digits made up of 4 parts the 1st 1 or 2 digits Variable Length pieces is always the language 0 0 1 1 word English 93 is Esperanto sort of the end of the list I don't know her Klingon and all roughly figure on the on the scale invariant there and not they might think of the publishers code the bigger the publisher the shorter the code 3 digits for up to 7 figures a small 1 time I think the book number within publisher and then our model 11 check digit the catch is without any punctuation in the 10 digit is been you couldn't cut it up in various ways and in the early days they had is that could be passed in 2 ways there were about a 15 or 16 of them and it was enough to mess up libraries for a while that has since been fixed and the has been his own is good in usable elements it's part of the IEA encodes the miscellaneous coding if it gets used a lot something's wrong you skip too much but now the other thing with
41:39
a bad code there's no support for exceptions everything just gets in that miscellaneous category but I can have unknown values missing values 0 wait a minute for us that's an old we welcome you have a noble column who you bothered to actually document what the null means in context is it means something's missing of evading all phone number that is that mean I don't have a phone or does it mean we don't know it's funnier on non applicable them just came in Crazy nonapplicable you would probably laugh at but I just got through working for a company called in my being no there in the insurance industry they've been in business for well over a hundred years they did not get their name when the movie came out but they get kidded about it when they go to insurance conferences and occasionally wear sunglasses on the thing is dirty data coming in from insurance companies so they need but rather elaborate set of non applicable value already mentioned miscellaneous unclassified but bad design you've missed overflows underflows bad divisions computations that are garbage now I can have an error in 1 field that's what take care of that but how about the 2 fields are related a medical record that shows a pregnant man Bruce gender is going to be so disappointed off have to get that at about it's computable we could find that we don't know but there was a spots committee back in 1975 that issued a list of 14 different kinds of missing data later on a followup to the sports committee report he was 22 different kinds of missing data and statisticians have all kinds of ways of trying to correct for the missing data if you're deal with each now they they then this missing data that well if you
43:51
think that designing coding schemes is not important try doing math with Roman numerals for a week that Roman numerals were such a bad encoding system but even the Romans didn't do math with another they looked it up and then look up tables revision location on trying living without alphabetical ordering for a week in Hong Kong and this they still do when Hong Kong the telephone operators used to have a contest every year were you would call or where they would identify somebody's phone number from their name and the winners of these up contest would have memorized 10 thousand 15 thousand 20 thousand different names and phone numbers and be able spit them back I had a friend who was talking English in China before taking part in square and that she would get her class roster 150 students translated and in the Roman letter that was never sort of no alphabetical order and it was never the same order from week to week of everybody can memorize 150 names can if you use working the Chinese units so that for an English they go to a library and try and find something without the Dewey Decimal Classification bodily despotism some problems ordered that little bit but term I make it I mention this thing that organizing a lot alike are like a library by color actually before the Dewey Decimal Classification System came on every college library and public Library had its own individual classification system invented by the head librarian at that particular school my wife volunteered for a feminist bookstore in this in the seventies in Atlanta and 1 of the people was definitely not a book person she did sort the books in the store my color because she thought they would look pretty do not work with hippies if you can help but it will not end well for the drugs will be good but it's not worth the way the work you have to do afterward if I had a good encoding scheme my aggregations inquiries and so on earth are easy much easier on then going back to doing decibel if I want to look up science books I know there in the 500 series and I want to look at math books I know it's in the 5 tends I can immediately zoom in on a small subset I can use between predicates so in my SQL those of undue calculations get to be a lot more accurate time you have to worry about things on borders and fuzziness the kinds of encoding enumeration make a list of values assign a name or a number of tag number 2 on it's really a nominal scale with the name attached to it but if you can get some kind of ordering on the symbols of that's a nice chronological procedural on some sort of physical thing the bad news about sorting codes by alphabetical order numerical order is what languages were you using for the alphabet canadians are probably a little more aware of that but how many times special again I have to hit AltC sequel server people because they are the worst SQL programmers on earth there all VB programmers who's boss when Paterson into a cost that they learned it at home and we can't of they will use their up autoincrement their identity column just make a list of things and whatever physical water the other values happen to be in the table the physical storage that becomes the encoding for it no logic nofault nothing this
48:02
measurement codes can a unit of measure so I put the unity and I know it's for that column are expressed it here and they have to express may have to tell what you need then using essentially just recording the met of measurements worst design is to put the unit of measure in the same column is the value of the measurement dollar signs the dollar amount that's Koball why because cobalt was concerned physical display there was no difference between storage and physical displaying still people do it you need if you have mixed units you need to have
48:46
column tells us for unit of measure is somewhere the abbreviation encoding that a shortened version of the of the value of the minimum value and come up with a usually you can fix length of abbreviation the goal of an abbreviation code is to be people readable I'm going to set everybody out of for the most part flew here did you know she 3 letter on airport codes it doesn't take too much to figure out that of the is Boston Our ATL is Atlanta of Y 0 . if you not so much but I'm going to a wildcat the kid in this in this and the same thing with some of the of smaller really small airports in in the in the Great North and Alaska you can is easing w's on but it's a pretty weird when that just are not intuitively obvious to be added to the casual observer the nice part with abbreviation codes is that business about a human being figuring it out and and don't fight
50:05
user of romantic codes I got a procedure to encode the value it might not be immediately human readable in fact in the case of encryption it better not be immediately humanreadable several problems but you don't think about it but rounding numbers technically encryption is technically an algorithm hashing functions where there's no way to to look at a time immediately tell what the original work
50:34
of hierarchical cos I really like hierarchical codes I fell in love with Dewey Decimal but they're usually numerics but they can be mixed alphanumeric so the Library of Congress system for libraries is actually more accurate the Dewey Decimal on and it's it's mixed alphabetic I'll zip codes in the United States are based on geographic petition it begins with 3 it's in the southeastern United States multiple states and that that by then as you go from left to right it gets more and more accurate so 300 would be the southeast 303 would be parts of Georgia 303 10 would be a subset of Atlanta Georgia and it's nice and easy to look at and you got an
51:25
idea when you see the code which a zooming in on this not bad news with that you can put stuff in the 1 part of the hierarchy Dewey Decimal put logic under philosopher why because we know little doing was inventing the system George Boole had not written the walls of thought and there was no mathematical logic when the last thing you saw a philosophical logic book there may have been 1 written but for the most part nobody's written philosophical logic for the past 150 years or longer it became a branch of mathematics on so it's a little little messy and we brandling moved logic more toward method in the newly decimal system if I don't have enough space in my hierarchy for things in particular 1 of mobile Dewey's other prejudices was that there were Catholics Protestants Jews Muslims and miscellaneous down the Library of Congress has more books on Buddhism and it doesn't Christianity they've been writing longer there's more breakdown of subsets of 1 and in the eastern religions but they all got put in 1 miscellaneous religion it's a it's an old British cartoon of Sergeant Major standing in front of us is is Indian troops of the Church of England to the right proximal middle plates to the left in fancy religions to the back and that was pretty much unknown although we sort of 192 that could reasonably fall under multiple codes church architecture and the worship service of could be under under religion and architecture we'll think about it but how Christian churches layout is across that it that affects how we do our ceremonies but did you know the aisle is the short arm of the cross so when the book right walks down the isle traditionally it was down along link to the church was originally from the size of the Muslims have a square for from last and there's other things for the architecture affects how the service itself through a kind of an interesting topic pretty put architecture religion or both well the solution for the librarians is whatever the Library of Congress says is the code is where you put it in
53:57
when no 1 thing at a time on vector code you made up of parts but the whole has to be there are the components to be independent dependent on each other but the whole code is a unit my favorite ISO tire sizes a 155 S R 15 is metric with the S R stands for steel radial and then the diameter is still in inches but that's going to change so and obviously I cannot have all we'll without or with a diameter and it's ever be made out of something I cannot take any those components they're pretty much independent of the social security numbers are US thing that needs to be removed we've now done away with some meaningful social security numbers because we have so many illegals they're not being assigned randomly yeah it's that bad
54:52
concatenation codes variable number part just keep adding onto the end of it they can be ordered or unordered you don't think about it the keyword list at the front of an article of from a limited vocabulary is incarnation code checklist vesicle they were called vesicles in Europe they're not in favor so of so much anymore for designers and it used to be when they were physically written down and they were literally concatenated on indeed initially each step in a process they were popular in the in the airport on the aircraft industry but not so much anymore OK guidelines do not reinvent the wheel but can all with my face major world was working at of 100 but you can research of encoding schemes do it it's quick and easy can you say google google is our friend so cools the thing we ever had when had to go on to paper copies now here's the bad news on Google so it's too good it's at times so you really need to know your industry my in my book i have a whole chapter on sex codes of the standard 1 that will probably use is idea so there on 30 66 0 4 unknown 1 from a male to for a female 9 for lawful person like corporations organization then there's a whole bunch of other codes used by biologists people were very dull others and Bruce Jenner we don't change around a lot of which is generally come in 2 flavors of it but we needed to the different biological codes for all the medical staff i which when he when he is probably for commercial stuff male female unknown or by the way of the reason for his 0 nines for the unknown the miscellaneous code in the ISO 6 codes is punched cards In the old days they blank column and then punched column would be read as a 0 by for tree and you can also make cobalt read it is 0 so that we could take the card the unit record as it called and we punch when you found out about the code should be nines were easy to do on a keypunch machine useful key down full of all lines and that way but always sought to the bottom of the report yes encoding schemes were designed for the physical use of a keypunch machines a lot of things are like that of inquiry by a railroads a certain with the because at the time he heard this when by the US railroads a certain with the agent on the tracks because we follow the British patterns within the British pattern follow the Roman chariot the with what Roman chariot also we've been following courses asses of tolerance for years it says the exceptional value should be explicit allow for expansion also this is gonna sound really dumb but you'd be surprised that should put a translation of the codes in the database somewhere for somebody get machine version of it I wish I was making that up but when I had my 1st set of eye surgery done in Los Angeles at CedarsSinai a major hospital of the good reputation blahblahblah but they went down to fill out the the forms the clock and a looseleaf notebook with all the codes so we could punch medical codes and stuff into a 30 to 70 IBM terminal this is a 19 eighties this is not the 19 fifties in the 19 eighties they had not no drop menus no PCs are and everything was still in laminated pages and a major hospital OK so I'm running
59:30
a little bit longer than they should of questions comments feedback if you want us from something it has to be soft that's all I ask anybody giving applause and that