Lies, damned lies and scans
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 34 | |
Number of Parts | 79 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19565 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FrOSCon 201534 / 79
1
2
10
11
12
13
17
21
23
24
26
28
29
30
31
32
33
34
35
38
39
40
42
43
44
47
50
51
53
57
59
60
61
62
63
66
67
70
71
75
76
77
78
79
00:00
FreewareOpen sourceComa BerenicesMachine learningComputer scienceVideo gamePresentation of a groupVirtual machineInformation securityMessage passingBitTheoryCASE <Informatik>Software bugDerivation (linguistics)Computer animationLecture/Conference
01:42
TheoryPublic key certificateForm (programming)Row (database)State of matterRight angleTheorySummierbarkeitDifferent (Kate Ryan album)Line (geometry)InformationFitness functionPublic key certificateCuboidLevel (video gaming)Medical imagingPixelGradientStandard deviationConnected spaceNoise (electronics)Graph coloringContext awarenessMoving averageMereologyPhysical systemTerm (mathematics)Process (computing)EmailGoodness of fit2 (number)Computer cluster1 (number)BitForm (programming)Computer animation
05:30
CNNState of matterService (economics)Letterpress printingAreaMetreCuboidMultiplication signSlide ruleSquare numberProbability density functionVirtual machineEmailNumberGroup actionSoftwareElectronic mailing listFlachbettscannerBlock (periodic table)Different (Kate Ryan album)Computer animation
07:34
Tablet computerVirtual machineProper mapComputer fontError messageOrder (biology)NumberMultiplication signWebsiteCASE <Informatik>Different (Kate Ryan album)NeuroinformatikGoodness of fitPixelMetreRandom number generationRow (database)Power (physics)Medical imagingBefehlsprozessorFamilyAreaRight angleForm (programming)Square numberEmailInformation securityLevel (video gaming)BitService (economics)Computer scienceLecture/ConferenceComputer animation
11:04
NumberProof theoryMultiplication sign8 (number)Error messageInteractive televisionEntire functionMoment (mathematics)Different (Kate Ryan album)CASE <Informatik>Point (geometry)Fitness functionTwitterLecture/ConferenceComputer animation
12:24
QuicksortExecution unitLevel (video gaming)Single-precision floating-point formatPresentation of a groupMultiplication signExistenceWebsiteContext awarenessFlow separationError messageReal numberComputer animation
13:12
Set (mathematics)Sign (mathematics)Online helpArithmetic progressionFirmwareWebsiteLecture/ConferenceComputer animation
13:53
NumberBlogHydraulic jumpDesign of experimentsTerm (mathematics)Acoustic couplerSoftware testingSource codeFamilyGreatest elementSign (mathematics)Structural loadSlide rulePlotterProduct (business)Block (periodic table)Multiplication signNumberMoment (mathematics)Web pageBlogEmailRight angleView (database)
15:25
Kerr-LösungNumberCAN busMaizeLogicAirfoilBoom (sailing)Surjective functionExecution unitSystem on a chipWeb pageE-learningWeb portalEmailMultiplication signTime zoneStatement (computer science)Hacker (term)FlachbettscannerFormal languageTraffic reportingSystem callGreatest elementSuite (music)Electronic mailing listHypermediaMassOrder (biology)ForestWordRight angleMultiplicationMereologyLecture/Conference
17:00
FlachbettscannerMathematicsTerm (mathematics)Reading (process)Shift operatorUniverse (mathematics)Lecture/Conference
17:50
Text editorVideo gameMultilaterationRight anglePresentation of a groupMultiplication signText editorScaling (geometry)Disk read-and-write headLecture/Conference
18:42
Text editorPresentation of a groupDescriptive statisticsOnline helpBlogData compressionTerm (mathematics)Medical imagingSlide ruleSymbol tablePattern languageMultiplication signExecution unitEmailLecture/Conference
19:29
Data managementMachine learningLevel (video gaming)Proof theoryData compressionMedical imagingInternetworkingGame controllerInternet service providerMultiplication signNormal (geometry)Lecture/Conference
20:19
Computer-generated imageryData compressionFile formatData compressionCodecInformationLevel (video gaming)Graph coloringInsertion lossComputer fileMedical imagingReduction of orderDigital photographySoftware testingWaveBlock (periodic table)PixelData storage deviceData transmissionDifferent (Kate Ryan album)MereologyProgram slicingHeat transferTrigonometric functionsDistanceMultiplication signTerm (mathematics)XML
21:49
Symbol tableData compressionProgram slicingSpacetimeFile formatSymbol tableFlow separationData compressionMedical imagingStrategy gameData storage deviceGenderCodecComputer animationLecture/Conference
22:38
Data compressionCASE <Informatik>Operator (mathematics)Pattern languageResultantVirtual machineMedical imagingPixelSymbol tableError messageCodierung <Programmierung>Group actionPattern matchingData storage deviceE (mathematical constant)Representation (politics)BitSpacetimeData compressionNormal (geometry)FlachbettscannerRight angleGoodness of fitAdditionAsynchronous Transfer ModeVelocityProof theoryComputer animation
24:56
Virtual machineReliefOffice suiteFlachbettscannerLecture/Conference
26:05
Data compressionVideo gameComputer fileError messageSubsetComputer-generated imageryComputer networkPoint (geometry)Statement (computer science)EmailPhysicalismSubstitute goodGodProbability density functionDisk read-and-write headData compressionMoment (mathematics)Virtual machineMultiplication signTouch typingQuicksortLevel (video gaming)Menu (computing)Video gameBitAsynchronous Transfer ModeView (database)6 (number)Normal (geometry)Sign (mathematics)Arrow of timeFlachbettscannerXMLComputer animation
27:59
MathematicsWordSet (mathematics)BlogData compressionBlock (periodic table)Lecture/Conference
28:40
Principal idealBoss CorporationSystem callSoftware testingSubstitute goodMultiplication signDistancePattern matchingBitSet (mathematics)Asynchronous Transfer ModeLecture/ConferenceComputer animation
30:07
Principal idealDefault (computer science)Different (Kate Ryan album)Set (mathematics)Virtual machineEuler anglesFactory (trading post)Asynchronous Transfer ModeError messageWeb page1 (number)Scaling (geometry)Inheritance (object-oriented programming)3 (number)Computer animationLecture/Conference
32:06
Data integrityMessage passingDefault (computer science)Factory (trading post)Interface (computing)Computer-generated imageryLemma (mathematics)Venn diagramData compressionAsynchronous Transfer ModeData compressionBlogStatement (computer science)Single-precision floating-point formatWebsiteWeb pageSet (mathematics)Multiplication signNumberSpectrum (functional analysis)StatisticsHookingField (computer science)Goodness of fitXML
33:47
Workstation <Musikinstrument>FlachbettscannerInternetworkingInformation managementHand fanRevision controlPlotterBlogFile archiverVirtual machineDigitizingRight angleTheoryEntire functionQuicksortLecture/ConferenceComputer animation
34:49
Information managementProcess (computing)InternetworkingFile archiverWindowLecture/Conference
35:33
Endliche ModelltheorieDressing (medical)Maß <Mathematik>NumberModemCaustic (optics)InternetworkingClient (computing)Link (knot theory)Electronic meeting systemData compressionAsynchronous Transfer ModeMoment (mathematics)Normal (geometry)BitOffice suiteSoftware testingLine (geometry)DistanceExecution unitContext awarenessSource codeXMLLecture/Conference
36:28
Computer-generated imageryNumberSoftwarePatch (Unix)FlachbettscannerField (computer science)Reading (process)Disk read-and-write headDefault (computer science)Factory (trading post)Data compressionTraffic reportingComputer-assisted translationMedical imagingMereologyData compressionPattern matchingPattern language1 (number)Error messageOnline helpPatch (Unix)Normal (geometry)Asynchronous Transfer ModeStatement (computer science)Message passingPressureXML
37:32
SoftwareInheritance (object-oriented programming)Data miningTelecommunicationAsynchronous Transfer ModeMultiplication signOffice suiteNumberSoftware testingLecture/Conference
38:21
Factory (trading post)NumberLengthAsynchronous Transfer ModeNumberDigital photographyLevel (video gaming)ResultantPixelFood energyDifferent (Kate Ryan album)DigitizingRectanglePattern languageGroup actionSign (mathematics)Pattern matchingEqualiser (mathematics)Visualization (computer graphics)Computer animationLecture/Conference
39:29
Factory (trading post)NumberGamma functionVacuumMenu (computing)Revision controlFrequencyAreaPoint (geometry)Default (computer science)Factory (trading post)VideoconferencingProcess (computing)Medical imagingMultiplication signYouTubeDigitizingLink (knot theory)Insertion lossPressureMultilaterationComputer animationLecture/Conference
40:54
NumberFactory (trading post)System callBitVideoconferencingRight angleInferenceMultiplication signBlogDifferent (Kate Ryan album)Standard deviationSoftware bugData compressionFactory (trading post)Kernel (computing)CodeOffice suiteCASE <Informatik>Loop (music)Default (computer science)DigitizingEngineering drawingLecture/Conference
42:33
Factory (trading post)NumberTerm (mathematics)Ordinary differential equationLengthGame theoryStatement (computer science)Data compressionRight angleMoment (mathematics)TelecommunicationStatement (computer science)Asynchronous Transfer ModeState of matterFlip-flop (electronics)System callComputer animation
43:25
Asynchronous Transfer ModeNumberMotion blurSoftwareControl flowPatch (Unix)FlachbettscannerGamma functionEscape characterAsynchronous Transfer ModeFamilyMultiplication signPatch (Unix)Mobile WebSoftware bugSoftwareComputer animation
44:26
Patch (Unix)NumberFamilyDifferent (Kate Ryan album)SoftwareSoftware testingTraffic reportingNeuroinformatikError messageProduct (business)BootingLogic gateMultiplication signEnterprise architecture10 (number)Process (computing)Level (video gaming)Lecture/ConferenceXML
45:14
WordComputer sciencePatch (Unix)Translation (relic)Group actionMultiplication signMessage passingDependent and independent variablesCASE <Informatik>Electronic mailing listOnline helpLecture/Conference
46:33
Single-precision floating-point formatEnterprise architectureInformationData managementExecution unitTrailQuicksortProof theoryComputer animation
47:29
Data managementFormal languageNormal (geometry)View (database)TwitterMaxima and minimaExecution unitPosition operatorProof theoryPattern matchingPublic key certificateFlachbettscannerWebsiteConnected spacePresentation of a groupMultiplication signCausalityMedical imagingAuthorizationRevision controlNumberTheory of relativityInformation securitySoftware bug1 (number)Process (computing)Enterprise architecturePhysical systemSimilarity (geometry)Instance (computer science)Decision theoryOffice suiteRight angleInformationLink (knot theory)WordSlide ruleHydraulic jumpLecture/ConferenceXML
50:07
Coma BerenicesSlide ruleSign (mathematics)Line (geometry)Connected spacePublic key certificateProbability density functionLink (knot theory)Slide ruleTheoryOrder (biology)CuboidFreezingWeb pageInternetworkingPattern languageFrequencyWordLecture/ConferenceComputer animation
51:20
Medical imaging1 (number)Lecture/Conference
52:06
1 (number)Slide ruleMeeting/Interview
52:49
Computer animation
Transcript: English(auto-generated)
00:08
So, hello everybody. Nice to be here. I hope everybody can hear me. So, if not, just tell me something and we can do the mic alright. I'm David. I'm computer scientist from Bonn.
00:20
So, actually pretty close by this conference I come from. And in real life I work at IVU on topics like release engineering and machine learning and data science. But this presentation is about something completely different. You know when you visit IT security lighted presentations, afterwards there's always some kinds of devices you don't really like to use anymore.
00:47
So, a little disclaimer. If somebody has a special relationship with this copy machine, so this presentation is not really for you. In this presentation we will do three things. First, we will look at one of the most common and most dangerous bugs in the last years.
01:05
Second, we will try to make it plausible for techies and for non-techies. And third, and I believe that such a story might happen to just everyone here. So, we will look at how a single person can deal with going against a big company.
01:20
So, we will be talking about how this developed and which mistakes I made. And we will derive take home messages for you to rely upon in case you face such a situation. Hey, how are you doing? The presentation is a little bit like a novel. So, and as such it starts with a prologue for the conspiracy theorists among you.
01:41
The year is 2008. In 2008, there's no joke but the people are laughing already. The primaries took place in the USA with Barack Obama against Hillary Clinton. And as everywhere else, politics in the US are a comedy of intrigue.
02:01
So, there was anonymous mails that should be useful to Hillary. It was held that Obama was born in Kenya as a Kenyan citizen. And thus not fit for being a president because he needs to be a natural born citizen of the United States. And the term natural born citizen is actually not properly defined.
02:22
The Americans are not really sure for themselves. But there is a consensus. You need to be American and you need to have been American at birth. And you can imagine that Barack Obama's second name Hussein was also not that optimal in that context. So, then to have it done, Obama published his short form birth certificate depicted on the left.
02:46
And if you're a good conspiracy theorist, you're not going to be dispelled by facts. And like immediately there were claims that the certificate was false. And like a stamp was not on the right place and stuff like that was held.
03:04
Now on the right you see some bumper stickers. The lower ones explicitly asking for the birth certificate. So, even though Obama won the primaries and the presidential election 2008 that followed the controversy did not really stop.
03:21
The theory Obama isn't fit to be president is surprisingly unpopular in the United States. This is named the birther movement and they want to prove Obama is not a real American. So, this is actually true. So, two and a half years later when Obama was already president, the discussion still wasn't over.
03:42
So, Obama was really pissed off you can imagine. Obama published the long form birth certificate shown on the right now. So, as you can see there is more information in it. So, you could say now it would be quiet. But again there were claims that the birth certificate would be false.
04:02
So, let's look at it a bit more in detail. The left image now is a strong magnification of the place marked red in the right image. And as you can see that between the different letters, between the one and the four for example.
04:26
Yeah, you can see it. There is a clear differences in gradient and sharpness. The six and the four are perfectly pixel at sharp and uniformly colored. While the one is slightly unsharp and its color has some noise like you would expect from a scan naturally.
04:44
So, how is that there is such a difference in one and the same line? In this additional example you also see similar differences between the two boxes marked. So, again one is like you would expect it from the scan and the other one is sharp on the pixel level.
05:03
Like somebody drew it with Microsoft Paint. Oh, look at this, this is pure gold. This part of the image is taken from the stem and it looks like there was a typo in the stem. Yeah, sure, right. How likely is that? So, of course people believe the birth certificate is forged when they see things like this.
05:24
And additionally one believes the White House in turn is too stupid to use Photoshop. So, of course for Obama that was a major PR fuck up. And just yet you see this thing is huge. In a CNN poll of 2010 at least 11% of the US public believed Obama wasn't born in the USA.
05:49
And an additional 16% that he probably wasn't born in the USA. So, this is a quarter of the US general public and even today the White House gets requests for comments.
06:00
And now we jump to 2013. 24th of July 2013 a company called me. They have two big Xerox work centers. Xerox work centers those are huge business copying machines that like every company has now.
06:21
They have network, they can scan, print and fax and email and everything and they cost as much as a little car. And that's not your grandmother's printer but they may have hundreds of users per machine. And on the slide you can see a blueprint. And the black places are not original. I blacked them out because I couldn't have used this blueprint otherwise.
06:42
And there are now three yellow places. The yellow places are standardized blocks which show the area of the room in square meters. We talk about those now. The company says, hey David if we scan the blueprint to a PDF then there are different numbers on them.
07:05
Can you take a look at that? So, on the left that's me, you know. I have to say that I always had a good atmosphere with them and I've been doing IT service for the company a long time. And obviously I thought they were pulling my leg.
07:23
Sure right, yeah. A copier changing numbers, sure. You hear that every day. Of course, yeah that's what I thought. And they said no really, no really come look at it, look at it. We need this machine it has to work properly. Okay, so I went there and took a look still thinking yeah that's just a prank.
07:45
And they have Xerox work center 7535. And here are the three original image segments before the scan. So, we can read that at the top there's 14.13 square meters and then there's 21.11 square meters.
08:01
And at the lower one there's 17.42, right? So, I put the blueprint into the work center and I scanned it. And here's the same place after the scan. So, apparently that's quite funny right?
08:22
So, all the rooms now have 14.13 square meters. No way, that's impossible, this isn't happening. I still thought that was a prank. So, just to say, I had to say this a lot of times, there's no OCR involved. The replacement of the numbers happens at the pixel level.
08:43
The company at another work center, the 7556. That's larger and it's a bit faster. And there are many, many more like this. It's a huge family of machines. On the small work center, there were always the same numbers that came out like you see. And on the big one, every scan there were different numbers.
09:04
It's bigger so there's more CPU power I guess. And look at those numbers. For example, place two, this is the row in the middle. First we have 14.13 square meters. And next scan we have 21.11 and the letter would have been the correct value.
09:22
So, after all, there is a chance you get correct numbers. So, if anyone needs an ACQ, a random number generator, you can ask the Xerox company and they have one. So, also, this is not very funny but I'm laughing too.
09:41
Also, the numbers look absolutely perfect in the layout. And they only noticed it because the room that's obviously bigger has a smaller area than the smaller one next to it. I know that the font is very small but it is not some strange corner case. We have some more examples but this is the original example where we found it and I wanted to show that to you.
10:07
So, here's the next example. This is the cost register. You see that a six turned into an eight. And the joke is really that later I put this image on my website and I was like, look here, a six turned into an eight.
10:23
And next some reader emailed me, oh look, here's another one. So, again, it looks perfectly clean in the layout. It looks perfect. You can't really see it. This time we noticed the errors because the numbers were sorted in ascending order.
10:40
Consequently, if the numbers make no obvious sense to you, you can't see they are wrong. You always have to have some semantic criteria that make the text implausible. Otherwise, you don't find out. So, as you can see, my neck gets longer and longer. So, I tried reproducing it. Like a good computer scientist, I did so at night.
11:01
I made columns of numbers in different sizes and different fonts and scanned them. Are you done? Cool. And scanned them. And lo and behold, I could reproduce the error. The eights that are yellow should be sixes and should not be the eights that we see there.
11:21
So, we stop for a moment here. I promise that I'm going to show you the entire interaction with Xerox. And I'll tell you what I felt at different points in time. And every time I stress those things that are very important, if you have to find against a big company, I will also show you proof about this. But I will tell you one thing up front.
11:41
In my eyes, what does not help is becoming offensive and hating publicly in particular by Twitter. This is not helpful. I have no problem with Twitter at all. But if you want to do something, if you want to achieve something, you just make yourself a target and an idiot and nobody is going to take you seriously.
12:03
And they will tell you that you don't really want to have a discussion because that won't fit in 140 characters. They can always say that you only want the show. So, if I don't want you to do these things, what do I want you to do? The best thing is to do, to not make it publicly initially.
12:23
So, I write emails, I call them. So, we call Xerox support several times, very often. We called every single support level up to top level in Dublin and nobody knew a single thing.
12:41
We wanted personal contact as well. So, Xerox people who were at the site, they didn't know anything either. So, they were not real Xerox people. They were just working in commission, some distributor, you know. So, they tried to reproduce the errors and they did.
13:03
We're laughing about this here, but they were standing there with their pants down. You know, imagine somebody is coming along and it looks like he's crushing your existence. So, Xerox itself, not the support company, but the very Xerox company, they were a bit surprised, yes.
13:21
But they didn't really try to help us or the support firm. So, so to say, they respected the problem. So, well, that's all. And there were no signs of bigger interest or any advice how we could get rid of it. And somebody came from Xerox and gave us new firmware. It didn't help, but I was like, oh great, now we know that the problem existed three years ago as well.
13:47
After, there hadn't been a sign of progress on the site of Xerox after more than one week. I thought, right, that would be enough. So, I wrote a blog post in German and English about what I just told you now.
14:00
And I offered some test documents the readers could download to print, scan and see if they were affected. That's how the story started to spread. I have to say my blog isn't huge, not at all. Maybe 500 to 1000 readers a day, that's not a lot, but it's not nothing either. Most of them are IT people, that's what I know from the emails that I get.
14:22
On the bottom of my slides, you can now see a line. It's a plot of the page views. It's going to get wider and wider as I continue to speak. It's a sign of how much attention I'm getting at the time. So, let's test that. You see the small bump there? That's a peak of 3000 hits per hour.
14:42
The numbers come from Google Analytics. I have been told you're supposed to multiply them by two or three, but never mind, you get the idea. On the second and third of August, the story spread on some tech blogs. The peak you see is from FIFA's blog. The German people in the audience probably know him.
15:01
So, the story spreads and I get more and more email from people who are affected. The funny thing is I also get loads of confirmation from Xerox work centers I've never heard of. So, I told you it's a whole product family and so I'm starting to think this might be something huge.
15:22
So, lesson learned. It was good to provide the test documents right away. If I hadn't, people wouldn't have been able to reproduce it for themselves and the story wouldn't have spread that much. Fourth of August, the story spreads worldwide in tech portals. In the picture is hacker news. You probably know it.
15:40
And I'm getting hundreds of emails from people with technical knowledge. I start spending a lot of time to channel the email I'm getting and only that allows me to continue the story and to follow the rabbit hole down to the bottom. I'm not allowed to sleep anymore either because I start to get lots of calls from US reporters who apparently don't know that time zones actually exist.
16:08
Lesson learned. Write in multiple languages. Important of course is English for the international readers but also the home language of whatever company you're targeting. You may know that Xerox is so widespread in the US that there's actually a verb for it.
16:22
It's two Xerox. And whenever something goes that big in the tech world, what comes next? Mass media. So, that's where it starts to get huge. This is Der Spiegel. It's the largest German news portal and I'll click through a few now. It's not an exhaustive list.
16:41
Not at all. There were thousands of articles worldwide and I'm just going through it however it suits my talk. There is no statement about published dates of the articles. And as a side note in before, a German journalist told me that he wanted to bring it into Tagesschau, one of the biggest German journals. And they told him, yeah, yeah, this is really cool but we want it to happen when you photocopy, not just when you're scanning.
17:08
So, maybe someone should tell them that if you print a scan, you do have a photocopy. Never mind. Lesson learned, don't make it bigger than it is just because you want the attention.
17:22
The Economist, now it gets really serious and also expensive in terms of PR for the company. And you also can see where I stole my lecture title from. ABC News, even more expensive. BBC, CNBC, suddenly it was everywhere.
17:40
And believe me, this feels completely surreal. And if you do shit a few big bricks in such a situation, I won't blame you. I found myself getting up several nights just to proofread my own articles and make sure I described everything correctly. That's what I did. I wanted to make sure not to get sued on millions of Terre Haute value right at the start of my own professional life.
18:04
At least that's what I thought. We'll get to that later. So, this is Business Week. It's a popular business magazine. And until now, there was no reaction from Xerox. And if you react that slow, it's getting pretty uncomfortable.
18:21
Here comes a citation. On the scale of things, too horrible to contemplate, document altering scanner is right up there with flesh eating bacteria. That's an original citation from Business Week. That's Peter Coy. He's editor there and we will get to him a few more times in this presentation.
18:45
Now my blog post went up to 100,000 readers a day. And still no reaction from Xerox. And in the meantime, I managed with the help of my readers to show what actually happened. And in a minute, I'll tell you about that.
19:01
In a small slide in about image compression. But before that, a small remark. As this thing exploded, I first used some non-standard technical terms in my descriptions. For example, I used pattern instead of symbol. So, and after that, I got some emails which criticized and corrected me, which I am thankful for.
19:22
However, if this happens to you, don't feel discouraged by mistakes you make in the details. At the time, I had to deal with Xerox. I had to write my articles in German and English. I had to storm proof my internet service, you get, you bet. And I had to deal with the press. And additionally, I had to understand what was happening at image compression level.
19:44
And just that you know, at the time, I had no academic interest in image compression because I'm more the machine learning and data science kind of guy. So that's a lot. And normal management means to not let anything catch fire. But crisis handling is completely different.
20:01
There's a lot of fires already burning. And what you're doing is running back and forth between them to just make sure none of them gets out of control. So stay at one of the fires too long and you're done for. So lesson learned. Make sure you understand what's going on. But don't lose yourself in details.
20:22
Now for the image compression. This is a test image I photographed. Both the photo and the text are part of the image. So we have different kinds of test image data. And you know that data transmission is time, money, and storage intensive. So you don't want to transfer images uncompressed. That's why there are lots of compression algorithms for images.
20:44
Now there's two parts of the test image. One of the photo, one of the text. They are very much enlarged. So we can look on pixel level what can go wrong with different compression algorithms. So there are lossless compression methods where no information is lost.
21:01
Or if you want to get the file even smaller then there are lossy codecs. This is the popular GIF format. GIF is lossless but does only support 265 colors. So the visible loss of information comes from the color reduction. And GIF is good for graphics with few colors.
21:23
And sharp edges are preserved well. So as you can see it's less suited for photos. Now we have JPEG which is lossy. And this slices the original picture in 8x8 pixel blocks. And they are approximated with cosine wave how exactly this is mathematically done. I won't bother you with.
21:41
This is really good for photos but bad for text and sharp edges as you can see. So different compression algorithms are good for different kinds of images. And a more advanced compression strategy involves slicing the original image into several sub-images we call symbols. If you know the kind of image in every symbol you can use suited compression algorithms
22:02
for each of these symbols to get very nice images. The spaces that don't belong to a symbol don't even need to be saved. You probably agree this is quite an improvement in compression. But you can use the slice to symbols approach in another even more advanced way. You can see every letter as a symbol.
22:22
This is actually done. I didn't make this up. Usually that's what is done when an image is compressed to JPEG2 format. JPEG2 is an image storage codec that's especially suited for images from scanned text. Next the encoder looks what symbols are really similar to each other.
22:44
Like the symbols I marked here. That's all small e's. That's why they are similar. And they have only very few pixels that are different. This step is called pattern matching. So you get groups of all the similar symbols.
23:00
And for every group we save only one representative symbol. And that is used all over the image instead of the other group members. Only one of those e's is actually saved and consumes storage space. And all other e's are replaced by the safe one. You can save lots of data and don't have to fear a bit image quality.
23:21
And here you get the result. So there's some quite nice quality and uses much less data. Looks good, right? Did you see it? Pattern matching thinks that the small e i is similar to the l.
23:40
That's what's happening when pattern matching is not exactly accurate. Spoken simplified, JPEG2 has two usual modes of operation. It can be encoded lossy. In this case it works exactly like this. Or it can be encoded lossless. In this case, in an additional step, errors like these are corrected before saving the image.
24:02
It seems like Xerox used a pattern matching approach and accidentally left out the correction step. So did you see those as well? So that's dangerous mistakes. Normal compression artifacts are not that problematic. They produce unreadable text and you can see something is wrong.
24:23
But here you have perfectly looking and layouted letters that are actually incorrect. You really have to read them to notice errors. And even then you don't usually notice them, because if they don't make the whole document implausible on first glance, you just don't get them like in the blueprint.
24:44
Also, I don't know what you do, but I don't usually proofread my scans after I produce them. So a politician who has to put this in a positive light would say, I use such a Xerox machine to scan the medicine doses of a retirement home.
25:00
And there is a good chance that you relieve the pensions office from paying grand mass monthly income. So is here somebody from Berlin? No? So otherwise we could have asked which machine they use to multiply the blueprints of the airport.
25:22
So airports, medicine, elections, as big as those are. And that's all small stuff. If such scans were actually used as evidence in court, that's where it gets really interesting. So now if you sue me with such a Xerox scan, I would just say, you know that's false.
25:41
And you can't prove me wrong. So you can't even prove anymore that a place in the scan comes from the corresponding place in the original paper document. The legal value is zero and that's business appliances. There are hundreds of thousands of such machines. Each of them has maybe hundreds of users and even more people getting handed over documents by the machine.
26:04
So for example, I was called by a big business. They scan every incoming mail automatically and from this point on use the scan. So if they use such machines, good luck. So we'll get back to the implications later, but let's go on with the story.
26:20
It's now the 5th of August, three days after the first impact and then God created a life sign of Xerox. The PR of Xerox Germany calls me and it was obvious they can't really do anything without the Americans. They thought it was just a prank and I said, no, it's no joke.
26:42
And we agreed to stay in touch. And the day after that, the 6th of August, it was the first one where really things started going. In the morning I got this screenshot from a reader from his Xerox machine set up.
27:00
It talks about character substitution. So there are three PDF compression modes. There are normal, higher and high. Very PR compatible names. Normal is the mode that compresses the strongest. The reader says in normal the arrow is there and in the two other stages it's not.
27:25
As far as I could tell at that point in time, that was correct. More on that later. And I promised I will tell you how I felt during that. And to be honest, I was a bit afraid. I thought I was being portrayed as the idiot who didn't read the manual.
27:44
So till now there was no official statement from Xerox. And I was tipped off that Xerox was about to write something like that. So lesson learned, that's the internal view. And the outside view is, so what's the deal, David? Such a problem must never happen, not even if you know about it.
28:02
So but from the inside, the world looks different in such a situation. So once such a story happens to you guys, you're prepared now. But if you remained calm from the beginning on and didn't hate, you can just ask colony in the public, why didn't support tell me that two weeks ago?
28:21
So I started some forward defense. I presented the screenshot on my blog as a possible workaround and I recommended to set compression to high. And I was curious why support couldn't tell me that. And I also criticized the setting was called normal. And all the consequences remain, of course.
28:43
And I wanted to give the story a spin because Xerox was about to get on my back. And now it gets crispy. The same afternoon, there's a conference call with Rick Destin, corporate vice president of Xerox, and Francis Tse, which is one of the chief engineers.
29:05
And guys, that's really something. You see at Xerox, the boss is doing support himself. So Rick Destin is the first one to whom I talk who actually does work at Xerox, who confirms that character substitution was indeed known at Xerox.
29:23
So if you have a problem and call support and talk to them for like two weeks, and they can't tell you anything, then you ask, may I please talk to Mr. Destin? And they confirm it was correct that the pattern matching is responsible, and they also confirm it is only done in normal mode.
29:42
So we agree that first, support fucked up. And second, normal was probably not an optimal name for the setting. Yeah, I recommend it experimental. Yeah, right now it's a lot of fun here, but at the same time I was quite a bit scared, believe me.
30:00
And then Xerox gave me a fairly crystal clear RTFM, which is abbreviated for redefined manual. And first, the normal mode isn't the factory default, yeah? They told me. Dear customers, you are all stupid. Why do you set it to a different setting? And second, the manual also says that letter replacement can occur,
30:24
so you are all doubly stupid. So, well, that's only half the truth. For the customer, of course, factory settings is the setup in which they get the machine, and they don't usually get it from Xerox themselves. There are third-party companies from which you get the machines, and they do consulting for you, and they set them up to whatever settings.
30:49
As for the manual, actually there is a mentioning. In one of the manuals, on page 107 of 228, and we're all old enough to know how many people always read manuals 300 pages thick before they use a copier.
31:06
I also was of the opinion that copiers must not be designed in a way that such errors can happen on whatever setting. No one's expecting that, yeah? And the answer was, oh yes, that may happen. And the market demands the small document sizes.
31:22
Additionally, I was right, and they confirmed that you can't prove that a particular document has no errors. So if I claim it's false, you can't prove me wrong. Overall, there was quite a nice atmosphere. And they didn't threaten me legally, and they listened to me, and it was a long talk.
31:43
And then I really let myself be caught in a trap. Remember, I never did anything of that scale. And at Xerox, of course, they had professionals. And after a while, I was wondering why we could talk so long, and in that relaxed atmosphere. Despite Destin is corporate vice president of a huge company, and he probably had other things to do.
32:08
So in a note, it turned out that during I was on the phone with them, they released the press statement. Not stupid, because that's the time at which I cannot react.
32:21
So always listening to our customers was the title. Indeed so, yeah. And they say, who wants data integrity, needs to use a compression setting of high or higher. And furthermore, they say the manual had the info, RTFM.
32:40
Lesson learned, always have someone else watch the website of the adversary when you cannot react. So I also wrote in a tickle on my blog and told about the telephone conference and reported about what I just told you. And I also wrote that I don't think they're off the hook, and now what? That could have been the end.
33:01
If a single guy fights a huge company, either the company shoots back and the guy caves in, or the public can side with the big company, or the public just loses interest. And nothing of that happened. You can see the huge spike in traffic. My blog article was on the title page of Slashdot, and fortunately the press got on my side.
33:22
Here is Heise, the most popular German IT portal. And they stated that I presented the work around even before Xerox did, which is nice. And here Der Spiegel stated, aha, so Xerox knew about the problem for years.
33:41
Now that's something. If you work for the PR of a company and something like that happens, that's really nice. You can screw your holidays for the rest of the year. And it gets funnier and funnier. If you have ever been to the US, when it really starts getting ugly, they say shit hits the fan.
34:02
The next day my blog article was on Reddit, and you can see the next spike in the plot. And what you see at Reddit is the nicest, most eloquent version of shit hits the fan that I have always seen. Fecal matter will indeed hit the rotary air impeller.
34:20
So however, what the guy writes is true. If a company relies on digitization of documents, and nowadays, honestly, who doesn't? They all have a problem. They can close the shop if it's really bad. So to give an example, a state-run archive called me, and they created their entire archive with Xerox machines.
34:40
And what did they do next? Well, they threw away the originals. In theory, they now would have to look at the documents and check them for plausibility. And even if they did that, they couldn't be sure. So if you ever see some keeper of an archive staring out of the window, gazing at nothing, you see why.
35:00
You know why. So also, internet jokes are very nice. This is Nijngag. Oh, you know we are in lecture hall eight? Maybe it was supposed to be hall six, but then they scanned their blueprints.
35:20
And sometimes the really funny jokes come from the protagonists themselves. So if you are corporate vice president at Xerox and have to give interviews on the same topic all day, something just will slip through. You don't need to read. I'll just read it for you. Speaking to the BBC, Destin wanted to relativize the issue and said, hey, well, it's not really that bad.
35:45
Yes, the normal compression mode can create errors, but nobody really uses it. For example, the military and oil rigs and such. So what's the worst that can happen, right? As you probably know, problems on oil rigs are not looked upon very kindly in the US nowadays.
36:05
So perhaps this is the right moment to say again, a bit of a laugh is OK, but imagine yourselves in Destin's shoes. Imagine you really have to talk about this in interviews on the phone for like 14 hours. It's more than human that you let slip something.
36:21
Also, he told me afterwards, he had been quoted out of context and I have no reason not to believe him. But still, I thought it was funny. Here's a tech portal that is happy that cat images seem to be not affected. So as you can see, they're not sure, reportedly. And here's also a new press release from Xerox.
36:44
Under the public pressure, they say, oh, maybe, yes, perhaps we are going to make a patch in which we disable the pattern matching stuff. So they never admitted a mistake or problem though. Understandably, they have to cover theirs. So what if you wait that long, then even announcing a patch doesn't help you.
37:03
Here's a newspaper that put a Xerox-like error in the title on purpose. But now let's go back to the Xerox statement. There is an important and crystal clear message in it. You will not see a character replacement if you set compression to at least higher with at least 200 dpi.
37:26
They also published glossy documents saying that pattern matching is only done in the normal compression mode and not in the higher ones. And now I was quite sure. I had seen the problem in higher modes and users told me that too.
37:41
Unfortunately, I didn't manage to reproduce it on my own devices that I had access to. So I'm just not putting out any rumor because if it did happen in other modes, then everyone would be affected. And Xerox communications had been misleading and they had a much bigger global problem.
38:02
It would be a disaster. But a friend of mine in Bonn, where I was living at the time, he had a work center 7545 in his office. And we went there and scanned my test numbers and used the mode higher and even 300 dpi. So we actually were quite generous.
38:23
And now look at this, the yellow numbers have errors. I just marked some of them. I don't know if I got all of them, but still you can see how frequent it can occur. So I repeat that it was in mode higher with 300 dpi.
38:45
So lesson learned, when you find a bug, it's unlikely you see the full breadth of the issue at the beginning. So we now take the blue rectangle and have a closer look. Here you see some groups of digits that are marked red.
39:01
The digits within the red rectangles look absolutely identical on pixel level. That's very unlikely. It's always going to look a bit different if you scan it naturally. And lots of digits that look exactly the same. Pixel by pixel, that's a clear sign of pattern matching. So despite what Xerox told us, pattern matching was done here.
39:23
One of my readers built a nice visualization that marks equal digits in red. They are marked when I hover them with the mouse. I can see if I can get it to run. Yeah, there it is. And now I can hover. And you see the red ones, those are the very same.
39:42
This is the usual frequency of the two. And you see there are several versions of the two. So that's how it works. And you can see how many there are. So now in this point it is clear that hundreds of thousands of large business devices worldwide are affected in factory defaults.
40:08
And by publishing something like that, you can really damage a big company. So I did not want to publish that without at least trying to talk to them before. I'm sorry? Yeah, yeah.
40:20
I come to earning money with that later. This is really an interesting point. Don't underestimate that. To tell you again how I feel, at this time there was lots of pressure. So I wanted to make sure that I did not make any mistake and could be sued on loss of millions on shareholder value.
40:40
And so I recorded the whole process of producing wrong digits on video and uploaded it on public on YouTube. And I sent a link to Francis, the principal engineer mentioned before. And they were a bit shocked, you know. And Francis told me on the phone that I did everything right in my video.
41:04
And Xerox was cooperative, but they wanted me to wait until they could reproduce the error by themselves. However, I remembered, I felt a little bit fucked over after the last conference call with them. So I said, guys, you know what?
41:21
This time let's do that differently. So please be informed, I have a blog article written on this. Ready, yeah? And as you can see, I already uploaded the video too. So don't take this the wrong way, but this time I want you to keep me in the loop. And that's how we agreed.
41:41
And there is a lot of phoning back and forth. And because of the time zones, I literally spent the night in the office and only had some biscuits. And finally, Francis calls me and tells me, yep, we've reproduced it. Character replacements on factory defaults, everyone was a bit shocked. And you know what they told me later?
42:01
The code for the compression kernel was eight years old at the time. This bug is in the wild, has been in the wild since 2006. And with that being the case, we were all a bit baffled.
42:20
So next I said, hey, have a look at my article and make sure that I've got legal safety so I can push it out. This bug is really dangerous and I don't want to wait longer. And they did. They did it. And they even allowed me to publish it before they published something too. And that's why we shouldn't hate on them.
42:42
Lesson learned, negotiate at the right moment. Right afterwards Xerox bought their own statement. They take back their previous communications and even said thank you and wrote, they now will have to investigate how big this issue is. And that's where they started to be really nice in the press statements and the whole climate was very constructive.
43:04
In the press, because of all the flip-flopping, the whole thing becomes more and more surreal. This is Slashdot again. Look at the title. Apparently to them it doesn't matter what Xerox says, but only what they confirmed to me. Here again is Peter Coy from Businessweek.
43:21
And I've got one more for you, one more for you. I mean one compression mode. Never mind now, but at the 11th of August I can actually prove that it happens on highest mode as well. So even people willing to create massively beautiful PDFs couldn't escape it. And to be honest, as far as I know, it doesn't happen if you scan to TIFFs.
43:44
On the 12th of August, Xerox confirms its software bug and announces the patch another time. And in the middle of the night, my time, Destin and Cie called me on my mobile to be the first to tell me that they found the bug and they'd be rolling out new software to all devices.
44:02
From this we can see how nice the atmosphere had become. That's the patch download side from Xerox. And here we can see for the first time how many devices are affected by this. And look at all the Xs, those are device families.
44:21
And it took until the 22nd of August until the first patches were released. And if you think that's long, I think different. I do the release engineering at my employer, IVU. And I can tell you that rolling out and testing a new software release for such a large and long-existing device family is really, really challenging.
44:40
So they work quite fast. In the next days, the press is reporting again. And for example, the German computer magazine CT has a report and calls the whole thing scanner gate. And here Peter Koi now puts the boot in. This may sound sarcastic, but Peter is completely right.
45:03
Eight yearly productions of scanned documents across tens of thousands of enterprises worldwide can contain these errors and damage things forever. And to understand the full monstrosity of this, reflect on the time we are living in. We are living in a society that right now, as we speak, is doing a transition from paper to digital.
45:27
And the translators between the words are devices like Xerox work centers. We'll have to put up with this a long time. And computer science slowly grows out of its infancy. And as you can see, computer scientists have social responsibility in their own way.
45:44
And in particular, way more than they think they have. And here comes the most important thing of the talk. I told you Xerox sold most of its devices to third parties. They told me they don't even can collect a complete list of their customers to notify them.
46:02
So we can't know if that's right. But in any case, there's no reason to believe the patches have reached too many devices at all. So please spread the word. For example, a few weeks ago, I visited Boston and I know some folks working at MIT. And I was told that the MIT hasn't patched their device.
46:23
So now that I gave this talk in English finally, I hope it gets viral enough that the MIT can get the message too. Perhaps you can all help. And besides all the lessons learned, there's one lesson I haven't told you yet. I always get unbelievable looks when I tell people I hadn't earned a single cent from Xerox.
46:45
A manager even told me I was an idiot. So I now give you the information and you can decide for yourself how much of an idiot I am. And two things. Firstly, it's really hard to earn money with something like this. An enterprise of Xerox size gets threatened by some crackpot every single day.
47:03
And without proof, people are not going to take you seriously. And if you provide proof, it mostly directly leads to the buck. So no money again. And secondly, corporations don't know friends.
47:22
If I had asked for money, this would have come to light and it would have looked sleazy. Regardless, I earned the money or not. I would have been shut down with it. And maybe most important, if I had been paid by Xerox. I would never have had a negotiation position that allowed me to actually demand a solution or asked for getting my articles proofread.
47:43
So last but not least, lots and lots of people all across the world helped me and they didn't ask for compensation either. And I would do it again like this. But at the end of the day, everybody has to decide for themselves. If you want to do it differently, that's perfectly fine. But now you've been here and you know in before, you might weaken your negotiation position.
48:07
And these are all the lessons learned. I'm not going to go through them again. But if you download the slides, here they are. And in our novel, we now jump to the present more precisely to March this year.
48:20
And again, you don't need to read that. I'll explain it. This is the website of the German Federal Office for Security and Information Technology. They are in charge of managing IT security stuff for the German government, the BSE. As a consequence of the Xerox saga, this March they declared all scans using pattern matching and related technology as not legally safe.
48:46
This includes all JBIC-2 scans. They even banned the lossless version of JBIC-2 encoded images. The ban is effective independent from the manufacturer of the scanner. In other words, this affects all manufacturers.
49:03
So this is even bigger than it sounds. As far as I know, all authorities in Germany have to follow the scanning guidelines of this federal office, as well as lots of enterprises that have to do with e-government stuff. And it sounds like they have to rescan. So in a sense, we are creating jobs here.
49:20
There is a similar Swiss institution that made a likewise decision. The Germans are not the only ones. So, okay, what remains? Barack Obama's birth certificate. Here it is again. Shortly after the Xerox thing, journalists from Reality Check in the US, you see the link,
49:41
asked me if the Xerox bug might have been the cause for the phenomena observed in the birth certificate. And they'd done detective work already. For instance, the Obamas had published the text work right before the birth certificate. It had been scanned with the Xerox Work Center 7655.
50:01
And I wonder if I get the numbers right every time. And they asked me if I could ask Xerox because I, you know, now had connections. And understandably, Xerox said, no, we really got different things on our minds right now. So at the end of 2014, I looked back at the birth certificate PDF.
50:25
And have a look at this. The PDF contains exact duplicates of characters that were assigned for uncorrected pattern matching in the Xerox saga. And when you look on the internet pages of the conspiracy theory, there they also say something like duplicate characters.
50:43
And there they thought they were copied on purpose in order to forge that certificate. And here, for example, there are two boxes that look exactly the same. I mean, make up your own mind. But I think this tinfoil hat theory might hereby be shut down.
51:02
And all I have to say now is thank you for spending this hour with me. And here, I find a link to the Xerox saga. Please spread the word. And later, I'll publish my slides on my web page as well. And last but not least, congratulations, Froskon. It's amazing what you did the 10 years. Thank you all.
51:31
Any questions? Yeah. When you first showed the birth certificate, you said there, for example, it is one that looks like a bit blurred.
51:41
Yeah. I can go back and I'm getting to the slide. You mean the image with the one with the four? So it was blurred. Yeah, it's because they tried to get the characters out of the background.
52:06
And the ones they isolated from the background are perfectly sharp in a separate layer which then is compressed by JBIC-II. So the one, they didn't get it out of the background. The one is still a background.
52:21
That's why it's blurred. This is the technique. And that's why it is actually correct. Another question maybe?
52:41
None then. Thanks again.