Image processing with Mono.Simd
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 97 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45711 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201064 / 97
1
2
4
5
6
8
14
15
16
17
23
29
38
41
42
44
46
47
48
50
53
54
62
63
64
65
66
71
74
75
77
78
79
80
82
84
85
94
00:00
Medical imagingLaptopJust-in-Time-CompilerMusical ensembleCartesian coordinate systemSet (mathematics)Endliche ModelltheorieResultantVirtual machineBuffer overflowPixelConnectivity (graph theory)Sampling (statistics)Revision controlDampingMultiplication signInformationCASE <Informatik>Greatest elementBitSession Initiation ProtocolComputing platformWeb syndicationRun time (program lifecycle phase)Computer fontCodeSocial classMetre2 (number)Right angleSource codeLecture/Conference
04:40
C sharpSemiconductor memoryMedical imagingMultiplication signContext awarenessRevision controlLoop (music)Compilation albumBoilerplate (text)Line (geometry)PixelImage processingConnectivity (graph theory)Representation (politics)Function (mathematics)Greatest elementPresentation of a groupCodeEndliche ModelltheorieLecture/Conference
07:11
CodeResultantTransformation (genetics)PixelContext awarenessBitVector spaceMultiplication signConnectivity (graph theory)Orientation (vector space)Virtual machineMathematical optimizationRight angleRevision controlAlpha (investment)Run time (program lifecycle phase)Line (geometry)Annihilator (ring theory)Metropolitan area networkNumberMetreImage processingType theoryOperator (mathematics)INTEGRALCasting (performing arts)Data storage deviceDoubling the cubeDampingIntegerProcess (computing)AdditionElement (mathematics)Reading (process)MathematicsBoilerplate (text)Function (mathematics)Power (physics)Video gameLaptopError messageEndliche ModelltheorieContent (media)Pointer (computer programming)LengthSymbol tableJust-in-Time-CompilerMedical imagingComputer programmingSource codeLecture/Conference
15:57
Run time (program lifecycle phase)CodeBlogArray data structureAdditionExtension (kinesiology)CASE <Informatik>Arithmetic meanVector spaceResultantImplementationDefault (computer science)Multiplication signBit2 (number)NumberSampling (statistics)DampingMetreSocial classOrder (biology)Network topologyLibrary (computing)WeightSteady state (chemistry)Ocean currentMultilaterationLecture/Conference
19:40
DampingOperator (mathematics)ImplementationMedical imagingCasting (performing arts)Vector spaceRun time (program lifecycle phase)PixelData conversionCASE <Informatik>BitPointer (computer programming)Constructor (object-oriented programming)Context awarenessBlock (periodic table)CodeMultiplication signNeuroinformatikGreen's functionSign (mathematics)MereologyIntegerArithmetic meanMultilaterationBoundary value problemNumberConnectivity (graph theory)Bounded variationRevision controlMoving averageWeightEndliche ModelltheorieBit rateMixed realityException handlingCartesian coordinate systemBlogVirtual machineCompilerReading (process)Process (computing)Binary multiplierMathematicsMetreLecture/Conference
28:33
Multiplication signVirtual machineParameter (computer programming)Alpha (investment)ImplementationSpeech synthesisExtension (kinesiology)BitLine (geometry)MultiplicationMedical imagingCodeNumberLoop (music)Lecture/Conference
Transcript: English(auto-generated)
00:01
Okay. So yeah, I'm Stéphane. We talk about image manipulation using SIMD, using Modo-SIMD. I'm doing that talk because the subject is very interesting, because processing images using SIMD is quite obvious,
00:21
but Modo-SIMD was done something like in 2007, and I figured out that a few months ago, there was no application at all, or no big application at all leveraging on it. I had an issue by the Debian Packager asking me,
00:41
what? F-Spot is using Modo-SIMD, that's the only application I know of, and basically I wasn't checking the right version of Modo inside F-Spot, and nobody noticed.
01:09
So no one noticed that, in fact, Modo-SIMD was great, but was not used at all. So this talk is about using Modo-SIMD in your application,
01:24
for example, for doing image manipulation, but you can do a lot more than this. So, Modo-SIMD was developed in 2007, I think Miguel is not there to correct me, and it was done by Rodrigo Campera, it was 2007?
01:44
I think so. Yeah, I think so. It was for PDC in 2007. So, basically, it's a set of classes and meters of those classes that are, or that could be, if the JIT, so the runtime permits it,
02:03
accelerated using the SSE engine of your Intel platform. So basically, for the one who doesn't know about it, SSE allows you to process 128 bits of information at a time,
02:22
instead of just working on the float and then the byte or whatever you want. Is it available? Yeah, as the technology is quite old, it's probably available right now on the machine you're using, if you're using Modo.
02:43
It was packaged with Modo 2.2. I'd say, if you can, you'd better use 2.4, because there's some issues in 2.2 with the SIMD. And is it worth it? We'll see that in a second. I will show you how much we can enhance our application
03:08
and our image manipulation using SIMD. So, we'll take the simplest case possible, we'll compose an image from two images,
03:21
and basically, we'll just take the two pixels and add them. We won't shift them at all, because I had to simplify the sample, because otherwise it would imply a lot of packing and unpacking. So, we will just add the different components of the pixels one by one,
03:47
and if it overflows, we don't care. We don't care about the result, it's just about showing something that works. So, if...
04:01
That's the code for it. Everybody can see it. So, basically, we are loading two PIXBUFF, we're creating a destination, and then we are mixing both source into a destination,
04:21
and saving it to a JPEG. So, the non-SIMD version of it... Is it okay if I make the font a bit smaller, so we can see a bit more? At the back, like this, okay.
04:45
Yeah, the bottom is not important. What's important is the loop. That's the version without any SIMD. So, basically, you take all the components of each pixel, and you add them, and you store them to your destination.
05:02
You have to do that for every line, and for every pixel of every line. All the rest is just boilerplate to make the loop work. So, that's the version using managed C Sharp as you know it,
05:23
and leveraging Mono as you know it. You might think that doing that just this way in C Sharp or Mono would be 10 times slower than doing the same than C. It's not true.
05:41
It's just something like 10 or 15% less effective than doing it in C. So, I'm not even sure if you don't want to use SIMD, but you want to do image processing. It's not even worth it to do it in C for the 10 or 15% again.
06:00
So, we will compile that code, gmcs-simple.cs. It leverages on gdk-pigs-buff, so we don't have to create our own memory representation of the image.
06:24
If you want to compile something with Mono SIMD, you just reference it from the command line, and we're using some unsafe context. We'll come back to that later, and it should compile.
06:42
Execute it. Here it is. It took... Is it the right one? It's really fast. But I tested it on the battery. So, yeah, it looks like a 1. And the original image is that one, that one,
07:05
and the output was that one. So, that's one of the origins. That's the result. That's the other origin. So, the result looks like this. It doesn't do anything important.
07:21
It's not a known transformation or whatever. It just shows that we can add pixels and go ahead. So, what will it take to move that code to SIMD?
07:44
Take that sample. And up. There is not a version of the same method I developed using SIMD. Here it is.
08:00
Yeah. So, what changed? Basically, steel boilerplate. What changed? Instead of doing a G++ here, I'm reading, I'm iterating over four pixels at a time. Each pixel is four components.
08:21
It's four components because I checked on the very first line of the program that I add an alpha channel on everything. So, it's easier for the sake of this tool to have every image with an alpha channel. So, each pixel is four components. Each component is eight bit.
08:42
And so, if I read four of them at a time, I got my register full. My register of 128 bits. So, I'm reading four at a time. How do I read them? Instead of storing my pixel component, I'm loading that in a vector 16b.
09:05
So, that's one of the types in monosimd. Monosimd got types for vector 16b, signed byte. So, for integer, for short, for float, for double, for integer, and for long.
09:26
The length of the type defines the number of elements you can pack in a vector. Basically, a byte is eight bits, so you can pack 16 of it in a vector of 128 bits.
09:47
That's pretty common in image processing, having eight bits per pixel component. We could port that to a 16 bit per pixel component,
10:03
and we would just, at the time, use a vector eight unsigned short instead of a 16 byte. So, here we are. Those are structs. As we are in a non-safe context, we'll come back to that later,
10:24
but we can just cast it and use pointer like you did when you were young and doing C. You can just cast your byte array, because my pixel is a byte pointer,
10:43
to a vector and get the content of it. Those ones, that's the trick I use to load pixels from my pixel array and store them as a vector. I can just add them.
11:01
At runtime, as I'm running them on a recent Mono, as SIMD is enabled, as my laptop has an SSC capability, and that operator plus on vectors is accelerated on my machine,
11:27
that line is automatically, at runtime, that addition is not done on the struct by adding every element one by one,
11:41
but that addition is done directly by the SSC processor. So, that's my out, and I'm using store line to store it back to my array,
12:04
still using some casting magic. So, basically, at the time you add vectors, you can just add them or use any operator that is defined on SIMD,
12:21
or any method that is defined on SIMD, and if, by chance, that method is accelerated on your machine, the JIT will run the accelerated version of it. So, we recompile it.
12:42
GMCS. Okay, run it. Nice, it's slower. Excellent. It was doing fine an hour ago.
13:08
You will have to trust me on this. That sucks.
13:22
That's the old version. How do you write with dash o equals minus SIMD to make sure that it doesn't use it?
13:56
Sorry? I'm running without SIMD optimizations from the command line,
14:00
because maybe it's detecting that you don't have it back. If your machine doesn't have SIMD, it will fall back to the software implementation. Yeah, I know. So, what I'm thinking is, why don't you try to see if that's what's happening? Yeah, I would just add the detection at runtime if the addition is available for that.
14:23
Do we have a non-verbal amount of write for this? Yes, on the command line. Which is? Dash uppercase O. Oh, yeah. Dash uppercase O equals minus SIMD.
14:40
Like this? Yeah, but lowercase. Yeah, so basically what's happening is it's not using hardware. It's not detecting that you have it. It's easy. I'm using 2.4, and here I'm using...
15:03
Thank you again. Yeah. So, gmcs monosimd package gtk sharp dot zero symbol cs.
15:22
Unsafe. Unsafe. Yeah. Which one is it? So, but it's on par with the other one, and it shouldn't be.
15:51
But anyway, the output is the same. That's important.
16:00
But I will show you a real result on a more complex case in a sec. That wasn't to show you how fast it was, but just to show you how you could easily do it. So, as Miguel said, we can check for this probability at runtime.
16:26
So, meaning that it was proven in this case that the fallback that was implemented in SIMD was less efficient than my simple addition using regular arrays.
16:50
So, if you want to avoid this and fallback in your code to one or the other, you can easily check if a method is accelerated at runtime.
17:05
So, basically, in the second sample, I'm just checking for acceleration on every... I would love to make this a bit smaller. Basically, that's another example we'll show you in a second.
17:20
And for every method I will use a bit later, I'm just printing accelerated or not accelerated. That's interpreted at runtime and not at compile time. So, you could decide at the time you write your code.
17:44
If I do have SSA4 instruction, I will do it that way. If I only have SSA3, I will do it that way. Otherwise, I will do it yet another way. Maybe by leveraging some other native library or whatever.
18:05
Or you can even fallback to the default implementation of monosimd. But, as we just showed, it's basically quite slow. It does the thing it should, but not much. So, basically, you ask the runtime, which is a static class, if a method is accelerated.
18:32
And basically, SIMD is mostly, except for those...
18:40
That's the multiply I was adding, but it's on the 4f vector containing 4 floats. That's how I would have checked if the addition was available on that machine. And, as most of the methods of SIMD are implemented as extension methods,
19:03
that's the way you check for that. The code is a bit cryptic. I will post it on my blog on Monday or next week. No, it's not cryptic, but on that small number of columns, it's not easy to see.
19:22
So, basically, you can ask the runtime if a method is accelerating. If the method is accelerated, the runtime will accelerate. Otherwise, it will fall back. But, as you can check at runtime if it's available, you can programmatically decide to pick a fallback or another.
19:46
Yeah. Where are you? So, basically, if you want to do... That doesn't apply to SIMD. But, basically, if you want to make some high manipulation on model or .NET,
20:08
you can't just use the image as a byte or whatever it is. Because the boundary check of the URI will slow the thing down.
20:23
So, most of the time, you just have to put the part of the code that matters in an unsafe context.
20:44
Saying that you're allowed to use that CLI construction using pointers, doing... No, that's in the other case. Doing conversion and cast that are unchecked by the runtime.
21:03
So, you have to do that if you want to do some efficient image manipulation on .NET. Following that, you have to either declare all your metadata as unsafe
21:22
or just a block of it and pass unsafe to the command line or to the compiler. So, it goes. True. Let's move to a more complex... Five minutes. Okay.
21:44
I will just show you how I'm doing... So, some general advice. If you're doing SIMD processing on Mono, try to avoid computing it float.
22:02
For image manipulation. Even if there is some float operation using SIMD, we don't have any accelerated way of converting a short to a float and a float back to a short, except using .NET casting.
22:27
And that's quite expensive. If we were able, but that's not the case, to use the MMX CVT operation, but that's not possible because there is no MMX acceleration on Mono SIMD,
22:44
it would be possible. But it's not impossible. If you just want some more precision, you just unpack your byte to a short or to an integer, you do your computation using an integer computation, and you then repack it to a byte or to a byte vector
23:08
before putting it to your pixel or to your image. So, that's quite important. And if you do that, it will be accelerated,
23:22
but the casting will just kill the performance. Oh, the performance will be just on par with native implementation, not accelerated. So, that's one thing. I heard that SSC5 would have some operation accelerated
23:43
for converting float back to end. I've read something about it, but I didn't see anything. I don't have deadline or whatever. Anyway, for the machine you have right now, it doesn't have SSC5. So, for now, just avoid it.
24:04
So, the other example is the example I showed on my blog some weeks ago. And that's an application. That's just a port of the GDK pix-buff saturate and pixelate operation.
24:31
Saturate and pixelate is a meter of GDK pix-buff that basically you can either pixelate or desaturate. I only implemented the desaturate case.
24:42
And you don't have to desaturate 100% of it. You can just desaturate it a bit. So, basically, the math for this is you have your saturated image. You desaturate it completely.
25:01
So, desaturating is taking something like 80% of the green channel and 10 of the two other ones. The number will be shown a bit later. And you take the mean and the weighted mean depending if you want a fully saturated or fully unsaturated one.
25:24
So, you add both. It's quite hard to do it using SIMD. I will go fast because I'm almost out of time. Which one is it?
25:41
So, that's the one. I'm using a child multiplier. No, it's not 80.
26:05
It's 30% of red, 60% of green, and 10% of blue. And that makes a white and black image that looks great. So, basically what I'm doing here
26:23
is I'm working on a vector containing 8 unsigned short. And there is no horizontal add on... Basically, if you want to desaturate an image, you just multiply all the components by some magic numbers.
26:44
And then you add all the components together and you divide by 3. There is something nice if you are working on float. You can do an horizontal add and it will add all the components together. But there is no horizontal add on vector 8 US. So, you can use shuffle.
27:03
And shuffle is a way to mix the component of your vector. And what I'm doing, I'm shuffling twice. I'm doing two shuffled versions. So, I have all the variation of it.
27:20
And I can add it at the end. And have a vector containing the same value on the 3 main components. And that's it. That's how I'm doing the desaturation. And then, yeah.
27:41
So, I'm shuffling and adding. So, I'm having the intensity. And then, what I'm doing... At the time, I should be...
28:03
Okay, I will post the blog a bit later. I can show you how it works. GM, oh, it's already compiled. Okay, desaturate, take z. SIMD. Okay, and that operation...
28:21
So, that's the check I told you about a bit earlier. I'm not doing any treatment of the check. I'm just printing it accelerated or not. Oh, yeah. So, basically, I'm checking... Yeah, I am.
28:40
I'm checking if the methods are accelerated. I'm not doing any treatment of it. Because I know they are on my machine. So, that's the time it took to make a desaturation of an image. Okay, here it is. It's a partial desaturation. And I can just show you how slow it is using GDK1.
29:08
What's my argument? There's no argument. Yeah, it's no argument. Okay.
29:20
That's GDK. You're using it. You're using it every day. And it's not accelerated at all. So, that code, desaturate, it provides an extension method to GDK pigs buff. And that's really worth it.
29:40
It's seven times the better. And that's very worth it. Yeah, so I'm out of time. I don't know if I will have time for questions. But one more thing. The KDs I showed you are oversimplified. Because I was always assuming I had an alpha channel.
30:03
I'm doing that on some special images that are a multiple of 16 on each side. So, I don't have to check for the remaining bits at the end. So, it's just for the matter of simplifying the loop.
30:21
So, a real implementation of it would take something like 20 lines more. But that wasn't necessarily needed for that speech. So, here it is.