The Future of UX with Kinect for Windows v2
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 170 | |
Autor | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/50790 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
NDC Oslo 2014124 / 170
2
3
5
6
8
11
12
16
21
22
23
27
31
35
37
42
43
45
47
48
49
50
52
54
56
57
58
61
65
66
67
71
74
77
80
81
83
84
85
87
88
89
90
91
94
95
96
97
98
100
102
107
108
112
114
115
116
118
120
121
122
123
126
127
128
130
133
135
137
138
139
140
141
142
143
144
145
147
148
149
150
153
155
156
157
158
159
160
161
162
163
166
169
170
00:00
DatenbankZeichenketteBildschirmfensterVektorraumInjektivitätParametersystemMetropolitan area networkEin-AusgabePasswortWeb SiteSchlüsselverwaltungFront-End <Software>Mapping <Computergraphik>Formale GrammatikE-MailGammafunktionWeb log
02:13
InformationSoftwareProgrammierungProdukt <Mathematik>BildschirmfensterVideokonferenzSoftwaretestLeistung <Physik>MereologieVersionsverwaltungPrototypingQuaderInformationsspeicherungRechter WinkelDemo <Programm>SoftwareentwicklerFormale GrammatikZellularer AutomatE-MailWeb log
04:53
InformationBildschirmfensterDichte <Stochastik>RechenschieberVersionsverwaltungWeb-SeiteSoftwareAmenable GruppeGebäude <Mathematik>BildschirmfensterSynchronisierungVideokonferenzKonfiguration <Informatik>EINKAUF <Programm>Inverser LimesLemma <Logik>Physikalisches SystemRechenwerkStellenringTUNIS <Programm>E-MailReelle ZahlÜberlagerung <Mathematik>HypermediaInformationsspeicherungInteraktives FernsehenRahmenproblemChi-Quadrat-VerteilungQuellcodeWeg <Topologie>DifferenteMapping <Computergraphik>SoftwareentwicklerSoftware Development KitSLAM-VerfahrenComputeranimation
06:07
ATMRahmenproblemGraphfärbungStreaming <Kommunikationstechnik>DifferenteZweiSkeleton <Programmierung>TypentheorieBildschirmfensterAutomatische IndexierungCOMGruppenoperationKonstanteSpannweite <Stochastik>ATMRahmenproblemArithmetischer AusdruckGraphfärbungQuellcodeEreignishorizontBitrateSichtenkonzeptDifferenteAuflösung <Mathematik>IdentitätsverwaltungComputeranimation
07:04
Bildgebendes VerfahrenCodeImplementierungInformationMustererkennungDigitalsignalMAPVideokonferenzEchtzeitsystemDimensionsanalyseAutomatische IndexierungAggregatzustandCOMStichprobenumfangQuick-SortVersionsverwaltungReelle ZahlFontPunktspektrumPunktPixelBitmap-GraphikRahmenproblemGraphfärbungSchnelltasteQuellcodePuffer <Netzplantechnik>Elektronische PublikationEreignishorizontWeg <Topologie>Streaming <Kommunikationstechnik>DifferenteAuflösung <Mathematik>IdentifizierbarkeitZweiMapping <Computergraphik>Skeleton <Programmierung>RohdatenFigurierte ZahlMethode der kleinsten QuadrateDeskriptive StatistikMAPKategorie <Mathematik>VektorraumDimensionsanalyseAutomatische IndexierungDeterminanteAggregatzustandBildschirmmaskeCOMMultiplikationOvalGewicht <Ausgleichsrechnung>Prozess <Informatik>Offene MengePixelUmsetzung <Informatik>Bitmap-GraphikDateiformatRahmenproblemGraphfärbungQuellcodePuffer <Netzplantechnik>MeterEreignishorizontWeg <Topologie>Objekt <Kategorie>Auflösung <Mathematik>CliquenweiteSkeleton <Programmierung>IdentitätsverwaltungDefaultThumbnailBewegungsunschärfeComputeranimation
14:23
Bildgebendes VerfahrenCodeMethode der kleinsten QuadrateMustererkennungSoftwareSynchronisierungVideokonferenzAggregatzustandBefehlsprozessorBitDelisches ProblemKoordinatenOrientierung <Mathematik>Virtuelle MaschineZahlenbereichZellularer AutomatVersionsverwaltungGüte der AnpassungProzess <Informatik>AbstraktionsebeneGemeinsamer SpeicherPunktOffene MengeRahmenproblemArithmetischer AusdruckGraphfärbungSchnelltasteQuellcodePuffer <Netzplantechnik>Elektronische PublikationSchreib-Lese-KopfWrapper <Programmierung>MultiplikationsoperatorKreisbewegungOverlay-NetzMinkowski-MetrikSkeleton <Programmierung>Demo <Programm>OrtsoperatorDefaultMehrrechnersystemMethode der kleinsten QuadrateDeskriptive StatistikStörungstheorieBildschirmfensterKategorie <Mathematik>VektorraumSynchronisierungMIDI <Musikelektronik>ProgrammierumgebungLogischer SchlussKlon <Mathematik>AggregatzustandHill-DifferentialgleichungInverser LimesMultiplikationOrientierung <Mathematik>QuaternionResultanteIntelligentes NetzData DictionaryAusnahmebehandlungGleitendes MittelOffene MengeThreadInklusion <Mathematik>FokalpunktBetriebsmittelverwaltungRahmenproblemArithmetischer AusdruckStrebeSchnelltasteQuellcodePuffer <Netzplantechnik>EreignishorizontWeg <Topologie>Abgeschlossene MengeSchlüsselverwaltungKreisbewegungInstant MessagingRechter WinkelOrtsoperatorComputeranimation
21:42
AlgorithmusBildgebendes VerfahrenCodeDeskriptive StatistikTypentheorieBildschirmfensterVideokonferenzBitBoolesche AlgebraEinfach zusammenhängender RaumGlättungStichprobenumfangDatensichtgerätSystemaufrufVersionsverwaltungProzess <Informatik>AbstraktionsebeneFehlermeldungPunktPixelBitmap-GraphikRahmenproblemGraphfärbungQuellcodeElektronische PublikationEreignishorizontStreaming <Kommunikationstechnik>Schreib-Lese-KopfDifferenteObjekt <Kategorie>Kontextbezogenes SystemMultiplikationsoperatorCliquenweiteDienst <Informatik>VolumenvisualisierungOrtsoperatorApp <Programm>Bildgebendes VerfahrenMathematikValiditätZeichenketteDeskriptive StatistikTypentheorieBildschirmfensterKonfiguration <Informatik>BitfehlerhäufigkeitGruppenoperationKoordinatenKreisringLokales MinimumPrimzahlzwillingeRechenwerkSinusfunktionOvalIntelligentes NetzNeuronales NetzMetropolitan area networkPunktLESDisk-ArrayBildschirmsymbolLesen <Datenverarbeitung>Bitmap-GraphikInklusion <Mathematik>RahmenproblemGraphfärbungQuellcodePuffer <Netzplantechnik>SichtenkonzeptDiskrete-Elemente-MethodeIRIS-TMeta-TagCulling <Computergraphik>DickeCliquenweiteRechter WinkelOrtsoperatorComputeranimationProgramm/Quellcode
29:01
DatensatzProgrammierungBildschirmfensterVideokonferenzEinfach zusammenhängender RaumVersionsverwaltungWellenlehreElektronische PublikationEreignishorizontLaufzeitfehlerDemo <Programm>App <Programm>DatensatzMustererkennungSimulationEntscheidungsmodellNormalvektorZustandsmaschineComputeranimation
29:53
ForcingSchnelltasteOrtsoperatorMustererkennungSprachsyntheseBildschirmfensterAbgeschlossene MengeComputeranimationVorlesung/Konferenz
31:00
SoftwaretestSprachsyntheseSoftwaretestTrennschärfe <Statistik>MultiplikationsoperatorComputeranimationVorlesung/Konferenz
32:02
SoftwaretestSprachsyntheseSoftwaretestWeg <Topologie>Trennschärfe <Statistik>Schreib-Lese-KopfComputeranimationVorlesung/Konferenz
32:53
DatensatzOrdnung <Mathematik>EchtzeitsystemSoftwaretestBitLokales MinimumMultiplikationZahlenbereichSystemaufrufVersionsverwaltungMatchingATMWellenlehreRahmenproblemQuellcodeElektronische PublikationAbgeschlossene MengeDifferenteDreiecksfreier GraphMultiplikationsoperatorRechter WinkelInterface <Schaltung>TaskAggregatzustandFormale GrammatikTeilbarkeitMatchingGewicht <Ausgleichsrechnung>Metropolitan area networkATMSummierbarkeitKonvexe HülleRahmenproblemWeg <Topologie>Virtual Home EnvironmentRechter WinkelComputeranimationProgramm/Quellcode
36:04
SprachsyntheseBitEinfach zusammenhängender RaumZahlenbereichRahmenproblemElektronische PublikationWeg <Topologie>ZweiRechter WinkelMatchingDateiformatRahmenproblemComputeranimation
37:51
CodeDatensatzKonfiguration <Informatik>GeradeTermAbstraktionsebeneKartesische KoordinatenService providerMailing-ListeRahmenproblemElektronische PublikationWeg <Topologie>ZustandsmaschineObjekt <Kategorie>MultiplikationsoperatorRechter WinkelInterface <Schaltung>AssemblerBildgebendes VerfahrenCodeDatensatzZeichenketteBildschirmfensterKategorie <Mathematik>BildschirmmaskeFormale GrammatikGeradeHöhere ProgrammierspracheIdeal <Mathematik>Projektive EbeneStellenringStichprobenumfangTermTUNIS <Programm>E-MailOvalVersionsverwaltungBacktrackingAutomatische HandlungsplanungIntelligentes NetzAbstraktionsebeneWindows SDKDatenfeldGemeinsamer SpeicherDisk-ArrayBildschirmsymbolEin-AusgabeUmsetzung <Informatik>AppletChi-Quadrat-VerteilungSchnelltasteQuellcodeMeterSystemplattformObjekt <Kategorie>sinc-FunktionKontextbezogenes SystemURLSkeleton <Programmierung>OrtsoperatorHumanoider RoboterGamecontrollerMobiles EndgerätWindows PhoneSoftware Development KitSLAM-VerfahrenComputeranimationProgramm/Quellcode
40:11
CodeRauschenSpieltheorieEinfach zusammenhängender RaumAbstandRahmenproblemMeterWeg <Topologie>DifferenteDifferenz <Mathematik>Rechter WinkelBildschirmfensterPhysikalisches SystemRechenwerkSpeicherabzugZählenMenütechnikIntelligentes NetzProgrammfehlerMetropolitan area networkBildschirmsymbolRahmenproblemVerdeckungsrechnungFahne <Mathematik>Lucas-ZahlenreiheComputeranimationProgramm/Quellcode
43:07
CodeBildschirmfensterKategorie <Mathematik>ForcingRandverteilungQuick-SortTeilbarkeitMatchingDeltafunktionFehlermeldungDatenfeldSchnittmengeRahmenproblemOpen SourceWeg <Topologie>MultiplikationsoperatorRechter WinkelOrtsoperatorRauschenBildschirmfensterFormale GrammatikGeradePhysikalisches SystemTeilbarkeitMatchingIntelligentes NetzMetropolitan area networkATMBildschirmsymbolRahmenproblemFahne <Mathematik>Tablet PCComputeranimationProgramm/Quellcode
45:53
Ordnung <Mathematik>VideokonferenzZählenBitmap-GraphikDateiformatRahmenproblemPuffer <Netzplantechnik>DifferenteRohdatenDeskriptive StatistikDrucksondierungHill-DifferentialgleichungMultiplikationFokalpunktDateiformatRahmenproblemGraphfärbungQuellcodePuffer <Netzplantechnik>DefaultComputeranimation
47:55
AnalysisBildgebendes VerfahrenCodeMathematikMustererkennungMAPGanze ZahlOrdnungsreduktionPaarvergleichObjekterkennungDezimalzahlMeterWeg <Topologie>NeuroinformatikWald <Graphentheorie>GamecontrollerFormale GrammatikTablet PCComputeranimation
50:34
Formale SpracheMustererkennungSoftwareProgrammbibliothekStörungstheorieBildschirmfensterProgrammierumgebungGefrierenEinfach zusammenhängender RaumLeistung <Physik>PortabilitätTermVirtuelle MaschineVisualisierungVersionsverwaltungGüte der AnpassungReelle ZahlInternetworkingATMNotebook-ComputerObjekterkennungQuaderKälteerzeugungVorzeichen <Mathematik>Microsoft dot netWeb logMultiplikationsoperatorHumanoider RoboterApp <Programm>AlgorithmusCodeDatensatzFormale SpracheMathematikMustererkennungTransformation <Mathematik>Singularität <Mathematik>TrägheitsmomentBildschirmfensterKategorie <Mathematik>MIDI <Musikelektronik>Gesetz <Physik>COMFormale GrammatikLokales MinimumRechenwerkStichprobenumfangTermVerschlingungVersionsverwaltungSpannweite <Stochastik>AbstraktionsebeneWindows SDKSampler <Musikinstrument>GammafunktionSummierbarkeitQuaderGleitendes MittelKartesische KoordinatenSummengleichungChi-Quadrat-VerteilungGoogolVorzeichen <Mathematik>Elektronisches ForumOpen SourceNeuroinformatikWeb logICC-GruppeMapping <Computergraphik>Skeleton <Programmierung>GamecontrollerSoftware Development KitSLAM-VerfahrenComputeranimation
58:50
Zeiger <Informatik>Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:02
Many tools. It has some weird name that you've never heard before and we'll quickly forget. So it's a tool that you can use to run against a website that takes a parameter that maps to an ID in a database.
00:21
And dotnetrocks.com, we used to publish where you could take a show ID and pass it in the primary key and all that stuff. And he said, that's a prime vector for a SQL injection attack. I do my homework, right? I mean, I wrote the back end and I do not just take
00:44
parameters just willy-nilly from input and stuff it in to a SQL string. There's no way I do that. Everything's parameterized. I check to make sure that it's a number, all that stuff. So he says, basically with this tool, you just run it against that URL.
01:01
And it comes back and it does its thing. And it comes back in about five minutes and says, here are all your tables, here's all your data, here's your passwords and stuff. And I'm like, Troy Hunt, right? Scary man. So Richard and he have this conversation, I'm like, downloaded.
01:20
So that's why I got that malicious, potentially malicious software, cuz it doesn't like that. Maybe cuz it came in a jar file, I don't know. But I ran it. And in 15 minutes, I was relieved to find that it, nope, it couldn't get through. Yep, actually did it right.
01:45
But still, Windows thinks it's a potentially harmful file, so I'll probably uninstall it. This is the, let's find it, it's probably right here if I refresh it.
02:03
Yeah, Havage, Havage? That's it. Don't need it anymore, so now I won't get that message, but that's what it was. All right, it's 4.20, you ready?
02:22
So I'm gonna start with a story. Story started about an hour ago, I was in the booth. I brought this Kinect prototype for the Kinect version 2 that I got from Microsoft. I'm an MVP for Kinect for Windows. So I was part of the early developer preview program.
02:42
Before they opened it up to non-MVPs, I got this last year. And so I'm coming to show you all how it works. So I'm testing it out about an hour ago, plug it in. Poof, my God, what was that?
03:02
120 volts, so there won't be a demo today, sorry. Okay, true story, and it's a Japanese switching power supply. They just went to the extra effort to limit it 120 to 120, 220 to 120.
03:22
They decided, nope, they're gonna have two versions, one for Europeans and one for Americans. So I can't, maybe because of my license, right, I don't know. Thank you, Microsoft, for ruining my demo. So, but it's not ruined. I have a lot of information to share.
03:41
I have some good videos to show that I've made myself of software that I've written that I'm also gonna share with you, so it's gonna be a good session. So first of all, the Kinect for Windows 2, this guy right here, is based on the Xbox sensor.
04:02
But this prototype here has this nasty breakout box that will not be the finished product. It has USB 3 on one side, power on the other side, and another connector on the other side to the breakout box from the power supply.
04:21
And the final version, of course, will look a lot nicer. They just announced pricing and availability. Did you notice this? 199 bucks, well, about 1,700 kroner. So it's currently cheaper in the Microsoft store than the first version.
04:42
First version's what, 299, $299 US dollars, and this is $199. And you can pre-order it now. There it is, MSDN blogs, pre-order your Kinect for Windows v2 sensor starting today.
05:02
And here's the actual pre-order page. And it's in the Microsoft store, you can order it now. So what they're gonna do is they're gonna ship these at the end of July, looks like, or sometime in July. And they will also ship with a beta version of the SDK.
05:21
And the SDK will continue to be developed. This is a slide show that was NDA from the early. This is a PDF from a slide show that they showed us a long time ago. It's all since been released as public information, but
05:41
the PDF itself doesn't exist for the public. So I wanted to just go through it a little bit, and I'm gonna scroll rather quickly just so that if you see anything that you're really interested in, we'll go page by page, give you a chance to say whoa and ask a question. I don't wanna gloss over anything.
06:00
Then again, we got a lot of info to talk about. So this sensor has a camera that's 1080p. It does 30 frames a second. It has a depth NIR stream as well.
06:22
And that feeds the skeleton or the body different from the color. It has a very wide range, a lot wider view, and it can recognize up to six bodies at once, unlike the previous one.
06:43
This is not very interesting, also not very interesting. Yeah, here you go. So skeleton is what we used to call the body. But I think we really need to talk about these different modes
07:02
here before we get into any of these features. Yeah, here we go. These are the data sources. Some of these are currently implemented, some are not. Audio is not, infrared is, color is, depth, body and sex, index is, audio is not, but it will be.
07:27
So infrared is the first sort of level. Like an infrared camera, you get that infrared stream. That's kind of cool. You get a color camera, 30 frames a second as I said. You get depth. Now depth is, if you've ever seen those posterized, weird looking,
07:43
you know, either they're monochrome, essentially, images that basically show an outline of you, you know, or a depth perception of you. That's what is used by the SDK to build up
08:05
an image internally of where the different joints are in the body. And then that body is presented as a collection of joints, 20 joints around, you know, your head, your neck, your shoulders, and your arms, and your legs, and all that. And it maps those in real time.
08:22
And so as you move, you've seen the videos where you've overseen a skeletal stick figure drawn over the body. And so it can map you in three dimensions in real time at 30 frames a second. The audio thing I won't talk about too much, but the whole promise of the audio is because it has a microphone array.
08:42
It can tell you who's talking, which body is actually speaking when it's listening. Because it can see, right, and it knows that based on where in the audio spectrum that, in the stereo spectrum,
09:02
that signal is the strongest, therefore it came from that person. So that's kind of cool. So it can differentiate between this person's issuing a command and that person issuing a command, which is kind of neat. Not implemented yet, but it will be. Infrared, I don't have any experience with this,
09:24
but the way that you access the data in all of these streams is pretty much the same. But again, I don't have a lot of experience with this. The way that you do it is you grab the sensor and you start a, you open a reader.
09:43
And then there's a frame arrived event. And this pretty much happens for all of the sources. And at that frame arrived event, you get a frame and in it, the data. So here's just a little C sharp code that shows you what it might look
10:04
like to handle a frame of infrared data. You acquire the frame and if it's not null, you copy it to an array and you can do what you want with the data. There you go.
10:22
All right, so there's com implementation as well. Color, interesting. I've had a lot of experience with this. Being able to save the frame to a bitmap or a JPEG file and stay
10:45
out of the way asynchronously was a challenge, but I finally figured out how to do it. But it is easy enough to do. You get basically a raw buffer and you can turn that into a writable bitmap. Of course, you can bind that to an image.
11:01
So, you know, viewing that in real time in WPF anyway is very easy and trivial. Here's how to access that raw format data if you want to.
11:21
You know, there's lots of great samples that come with the SDK that show you how to do this. But again, it's the same idea. You get a reader, you acquire the frame in the event that handles the frame, and then you copy it to an array. You can then write it to a writable bitmap, which is then bound to an image source, which is shown.
11:45
And that's exactly what they're doing right there, writing it to a writable bitmap. Yeah, so depth is, you know, two bytes per pixel. It's basically what is used to figure out where your person is
12:05
and where all the different joints are in your body. And this is the real thing here, is the body tracking. Very, very cool. You get six bodies, which in the previous version, you can only get data for two of them at once.
12:20
So that's really neat. You also get the state of the hands, whether they're open, closed, or pointing. So open, you know, closed, or pointing. It can tell the difference between those, but it can't tell, for example, you know, what fingers are moving.
12:43
It doesn't have finger resolution yet. You know, if you are one of those people who likes to do signal processing, digital signal processing, and image processing, you could look at the depth data, which has all of the information about the fingers, and you could do your own and see,
13:02
you know, see if you can do that. And there are a lot of people doing that. But the SDK itself does not have finger recognition. Well, this is interesting, too. They have things that will tell you about the appearance and the level of user engagement if you're looking at the Kinect or if you're looking away.
13:21
That's neat. And also a couple of facial expressions, whether you're smiling or whether you're neutral, you know, whether you're grumpy. They can tell that. In the previous version of the SDK, and will definitely be ported to this new one, they have a whole face recognition API.
13:41
And so they map something like 83 or 85 or 80-something points on your face. And with that, you know, just like the skeleton can identify the joints of your body, they can map these points. With that, you can get this data map of a person's face and see, you know, as they're making sort
14:03
of weird expressions, you know, you can use that to animate avatars, or you can use it to identify people or different faces or whatever. There's a lot easier ways to do facial recognition, though, than that. But it's kind of neat. And I can't wait to see what they do with it
14:21
in this version. You also get the direction, the orientation of joints, which way they're leaning. And that's kind of cool, which way they're pointing. So here's how that works. And I love this little bug there, infrared data equals body frame source.
14:42
But it's just like the infrared initialization. You create a buffer, an array of bodies, which is going to be six. And then you create a body reader. And you have a frame arrived event. And you acquire the frame.
15:02
And this is basically one method to copy the frame's data to a buffer, to the buffer. It just does it in one shot, which is nice. And there's a bunch of stuff here. Here are all the joints.
15:22
So nothing spectacularly strange there. You do get the hand tip and the thumb, the hand and the wrist. So there's quite a bit of fidelity around the hand, because that's where most people do gesture recognition.
15:44
So this is a great question. It's handled in the GPU. Yeah, it's handled in the GPU of the machine. So previously, in the old version of the sensor, it was handled by the machine in the CPU. But since it's the SDK running on the machine,
16:02
not in the Kinect itself, it has to run on the machine somewhere. But this version runs in the GPU. Yeah, so that's a good question. There is code available that lets you run C Sharp
16:26
in the GPU. So that's worth looking into. I don't have an official answer on that, though. But seeing as how you can run C Sharp code in the GPU, yeah, I don't see why not. I mean, it's just data.
16:41
If you can get that data into a generic format, which it is, it's essentially just double precision numbers, right? I mean, it's essentially just an array of doubles. You copy that into an array, send it off to the GPU, do some processing with it. Yeah, sure. I can see that happening.
17:03
You have the state of being tracked and not tracked and inferred for any joint. And so that there are different pens that you can use, for example, to draw bones or whatever. There's some great software that just comes with it in WPF that allows you to draw a skeleton.
17:25
I've done some simplification of all of that stuff and I'll share that code with you. But here's just a nice little abstraction. So if the body is tracked, go through all the joints
17:40
and get the rotation of joint orientations for each position and map that to a camera space. So camera space is video. And this is just a way to map what the body's coordinates are to the video coordinates.
18:01
So if you want to overlay one on the other. The hand states also have a confidence, which is nice. But there's only two, high and low. Been nice to have a number. But we don't have that. These expressions are not implemented yet.
18:20
But it is kind of funny how this will be in the final SDK. It'll know if your eyes are open or closed, your mouth is open, if you're looking away or wearing glasses, just these kinds of neat little things.
18:44
Again, not implemented yet, but will be. Here's the engaged in leaning. And audio, again, not yet implemented. I told you about that. All right, so let's, yeah, frame synchronization.
19:03
Let's talk about this. So there's a color reader. There's a body reader. There's an infrared reader and all that. There's also a multi-source frame reader. And this is the way that you can synchronize more than one of these readers together. If you want to display, for example, a skeleton over a video,
19:21
you need to use a multi-source frame reader. And that will allow you to get the frames at the same time in a synchronized way. So let me just show you some code, show you what that looks like. This is a simplification that I've done.
19:45
Can you see that OK? It looks pretty big, doesn't it? You can see that fine? All right, good. So this is, again, a simplification that I've done. A nice little wrapper for all of this stuff that handles both bodies and images.
20:06
And it has stuff for joint thickness and pens for the hand, brushes, hand open brush, a hand lasso brush, a brush for the joints when they're tracked,
20:21
when they're not tracked, when they're inferred, when a bone is tracked and when it's not. This head image is neat. If you basically take a PNG file or even a JPEG file, but a PNG file has transparency of somebody's head and you put it in there, it'll put the picture
20:41
where the head is. So I do a great demo where I have John Skeet's head over my body when I'm walking around. It's kind of funny, and it tips when you move your head. But I wish I didn't blow up that thing. I couldn't demo it anyway. But anyway, video image source is
21:04
what you can bind an image to in WPF. Right here, we've got a video image bound to video image source in a grid. And then over that, we've got in the same grid cell
21:22
with a border transparent. We've got an image source, a body image bound to the image source property, which is for the body. So here we go. When we initialize this, looking for the default
21:44
Kinect sensor. And this is the great thing about this API. And I asked the question, have any of you used the Kinect for Windows API before? The answer was no. But in previous versions, if it wasn't plugged in, there were errors and stuff.
22:01
And if the service wasn't running and all that stuff. But none of that. It just doesn't care now. If there's no sensor plugged in, it doesn't go nuts. It doesn't freak out. It just returns null. And you can just code around it. There's no exceptions thrown.
22:21
No problem. It's not a null object. It's just that the Kinect sensor is going to be null. So if we have a sensor plugged in, we're getting a coordinate mapper. We're opening the sensor. We get this frame description. And that gives us the width and height of the frame.
22:41
And from that, we can get a color frame description. The depth frame is for the body. The color frame is for the color. And then we're opening a multi-source frame reader right here, where we're passing in that we want both the body and a color stream in one reader.
23:04
This pixels is an array of bytes that's going to store the color data for each frame. And then I'm creating a writable bitmap here. This is a property, a writable bitmap with the width and the height.
23:22
And I have my array of bodies. And I start initializing. I call an event that my status has changed. And I initialize each body. Now, this joint smoothing here is something that I'm privy to because I'm an MVP.
23:40
I don't know if it's going to be in the SDK. I imagine it will be. But it's the joint smoothing algorithm that the team has. Because sometimes, right out of the box, you'll be moving. And it'll be tracking you. And your knee will be here. And if you move it down here, all of a sudden, boing, boing, boing, boing, boing, this kind of stuff.
24:02
So smoothing just makes it go a little bit slower. But it doesn't freak out. It knows that in frame 500, you were here. In frame 501, it was over here. In frame 502, it was back here. You know what? 501, it was probably right there. So it does that smoothing for you.
24:25
So then, we have our frame arrived. Let's go down here. Multi-source frame arrived. And I get my color frame. And I get my body frame.
24:41
And I just have a different processing thing for each one of them. So the color frame, I just have a show live video. If I want to do that, I have a display color frame, which
25:01
basically copies the data to an array and writes it into the bitmap. And this is my code to write into a JPEG using await. Save it to a JPEG file.
25:25
And let's go to process body frame.
25:45
Here it is. So this one takes a little bit more work. So I'm drawing a transparent background to set the render size, which is something that I stole right out of the sample code.
26:00
In fact, a lot of this was stolen right out of the sample code. And I have a Boolean if I want to draw bodies or not. Get and refresh the data for the bodies, all of them. And then I go through each one of these. Now we have joints and the joint type. And we're smoothing those.
26:20
And that's all this is. Then mapping them into camera points and calling draw body. And this is what that code looks like. Calls to draw bone, which draws from the head to the neck, from the neck to the shoulder,
26:41
from the shoulder to the spine. Essentially, that's what this is. And you have a drawing context that you're drawing for each of these things. All right, so long story short, it's kind of complex if you're doing all this stuff
27:02
by yourself. So this particular engine that I wrote simplifies all of that to this. That's it. So look at this code here. Here's my grid with an image bound to video image source,
27:22
an image bound to image source over it. I've got a simple multi-engine here. This is my code. In my main window loaded, I've created a simple multi-engine. I got a body tracked event setting my data context. And that in and of itself is enough
27:41
to display the color, camera, and the body over it. Done. And now if I want to look at any of those joints, I can look at them right here. Body, joints, and joint type, head or hand left, position,
28:05
x, y, z. That's it. I mean, this is the same data that you would get in frame arrived, except all of that other cruft is taken out of your way. So even if you're not using this,
28:21
which my code is freely available on my blog, you can write your own abstraction layer just to handle all that stuff and get it out of the way of the app. Definitely a good idea. So when it comes down to time to writing your app and you want to know if somebody did this,
28:40
now you can just track the x position of the hand, I guess it's this hand, over time. And if it was here and then it's here in a certain amount of time, now they've done that. But it gets even better. So for the first version of Connect for Windows,
29:01
I wrote this program called Gesture Pack. And Gesture Pack is a recording app where you stand and perform a gesture and it records the movements that you make and saves that data to an XML file. And then you load up those XML files at runtime
29:23
and say, watch me. And then when you perform any of those gestures, it fires an event and tells you you did it. Because I don't want to be tracking joints. And who knows, where do I start? How do I know what a hand wave is? Where do I start? Where is it?
29:43
So here's the first version, which is for the old version of Connect for Windows. This is the demo video that I have for that. And this is exactly how you use it.
30:06
Create new. Snapshot. So you're taking snapshots of positions of where your hands are or whatever. Animate.
30:24
Stop animation. Next. So now I'm using the force to name it. Turns out there's a lot easier ways to do this than I had originally thought. But I thought this was kind of fun at first.
30:43
I always wanted to use a keyboard with the force. Next. So the axis, the track, X and Y, not Z. Left hand. And now you're picking the joints that you want to track. Left hand, right hand.
31:00
Test gestures. Now I'm going to test out that flap. Begin test. Stop test. Salute. So I have another gesture that I created, Salute.
31:21
Just to show you can have more than one. Flap. Salute. And now you move around. Flap. Salute. Stop test. To show you that it's. Previous. Create new. Create another gesture here.
31:41
Snapshot. Turn mouse off. Snapshot. Snapshot. Snapshot. Next. Next.
32:00
Wax on. Next. Next. Just the right hand. Test gestures. So I'm testing them all. Wax on. Flap. Salute. You can see I'm moving around, so it's relative to the body,
32:22
not to the. Wax on. Wax on. Wax on. Turn mouse off.
32:47
Yeah, pretty cool. So that was the old version, and I thought it was a little bit clunky. So in this next version, and believe me, it's not going to look so bad.
33:01
But this was just the beta version that I had put out. I'm even going to turn off. I'll tell you how I'm going to improve it, but let me just show you what a difference this interface makes. To record a gesture, click the live mode button, then the record gesture button.
33:22
Step back, say start recording. The button turns red. Perform the gesture, then say OK, stop. Then save the gesture to an XML file. Just name it here. We're naming it wax on. And now you're in edit mode.
33:41
So press the animate button. So it recorded all those frames. Cycle through them. You can see there's 104 frames here. It's quite a lot, especially at the end. So let's trim them up a little bit. I'm using the mouse wheel to scrub through all of the frames here. So let's get to the first-ish frame in the animation,
34:02
around 24, and click the trim start button. And now the animation starts at 24. So now let's get about to the end of the animation. Click trim end. Now we only have 29 frames in the animation, a little easier to work with.
34:21
All right. So now let's pick the joints we want to track. As you can see, when I hover over the joints, you see the names of the joints up the top there. I only want the right hand in this gesture. So click on it and it turns white. You can also see that I can pick x, y, and z axis
34:41
to track individually, and the left and right hand state, open, closed. For this gesture, I only want x and y. OK. Now we pick the frames that we want to match against. So scrub to the frames, and one by one, click the match
35:01
button. So you want to pick the lowest number of frames necessary, the smallest number of frames necessary in order to make this gesture work. And therein lies the art of creating gestures, is picking those frames. So it's going to match those frames in real time in series.
35:22
That max duration right there, 500 milliseconds, that's how much time you get from frame to frame before it gives up and says, you're no longer in the running. OK. Let's test it. Click the live mode button again, stand back, do the gesture, and match.
35:41
OK, now we're playing a wave file, but you can do anything. And in fact, it's very easy to just make a call to check to see if a gesture was made. Multiple bodies, multiple gestures, and you get the source code with this next version of gesture pack. So you can just rock on all day long.
36:01
Pretty simple, huh? All right, so there you go. So that's a bit different, isn't it, from using speech and all that. But even in this one, I'm not going to require speech, because speech, people have problems with it. It doesn't recognize your accent or whatever. Your microphone isn't turned up, and it's not necessary.
36:22
I found out. You can basically set a number of seconds that you want it to record for, and then just say, start recording, and then it'll count down and give you five seconds to get in place, and then it'll just say go. And you do the gesture, whatever it is you're going to do. Maybe you set 10 seconds, so it gives you
36:40
10 seconds worth of recording, and then it says you're done. And then you can go and trim it up and do whatever you want. So the data itself looks like, I'll show you what the data looks like. Here's a, yeah, here's wax on right here.
37:04
It's just a gesture with, it's an XML file with all the little data things that I'm, all the joints that I'm tracking in here, and each frame with the duration, max and min, the left and the right hand state,
37:20
and then each joint, the X, Y, and Z value, very easy. Each frame has a name, frame one, frame two, whatever, whether it's being matched or not. So it's very easy to understand, very easy to edit. It just seems to me like a very natural thing.
37:41
So I was selling this for 99 bucks. The next version, I've decided to open source it. Yeah. Along with the Connect tools that are right here on my carlfranklin.net, you can download right now.
38:03
This is Connect tools. It's an abstraction that I was showing you that simplifies all of this stuff here, takes all of that code and turns it into just a couple of lines, you know, so that you can easily track positions and do stuff.
38:23
And it also does the drawing of the body, you know, and it gives you full options in terms of the pens and the brushes and what you want, you know, how thick you want the lines to be and all that stuff. So questions?
38:49
Yeah, yeah. Yeah, you use them later in your applications. Basically, you have a couple of things. You've got a, you actually have a recorder object in gesture pack as well.
39:01
So if you want to load up a recorder and say record start, you can record your own gestures. And once you've stopped that, you can save that as an XML file. It just has, you know, you can record your own, write your own code to record it. You don't have to use the interface that I'm providing.
39:21
And then you can load those up in a list. Basically, you create a list of gestures and create a matcher object. And you tell it to start recognizing. And then in your body tract, you know, or whatever, you take this body right here,
39:41
which could be any one of six. This will fire each, you know, six times if you have six bodies for each frame. And you pass that to the gesture matcher and say, you know, this is the latest data I have. Was the gesture matched? And because it keeps track of,
40:00
the matcher keeps track of all that stuff. I will show you the code if you'd like to see it. The gesture matcher is a really interesting code.
40:35
So the first thing that I do is I make all of the X, Y, and Z coordinates relative to the spine,
40:43
the spine mid, because this is essentially you. It does, you know, when you move around, the coordinates that you get are all in meters. And they're all relative to the connect. So if I've got my right hand here, and I'm moving like this, it's changing. The right hand is changing, okay?
41:01
So, and it's for obvious reasons, it's relative to the connect. So but if you want to make it relative to the body, then you just have to subtract hand from body, body to hand, whatever it is, subtract it. And so that's the first thing I do. And then I'm looking through each joint, in the frame, making sure we're tracking
41:23
because we are only interested in the joints that we're tracking. Remember in the gesture, I can say, I only want to track the right hand or the left hand or both. And the whole idea is to not pay attention to the noise and only focus in on the things that matter.
41:41
So if it's just the right hand, great. And only the axes that we're tracking, X, Y, or Z. If it's just, you might have a gesture that's just like this, stop, to stop something. You might have a game that the children are crossing the road and the cars are zooming by
42:01
and you stop. But you might go like that, or you might go like this, or you might go like that, doesn't matter. If my Z is out, is it this distance from my chest, that's all I really care about. I don't care about how high it is or how left or right it is, you know? So I would only track Z, yeah.
42:27
Yeah, so that's a good question. Remember, it's relative to the spine. So the spine has a Z as well, yeah. Your spine mid, wherever that is, right here.
42:43
So if we're tracking X, Y, or Z, then we look at the delta, which is the difference between where the joint is now and where it is in the frame, the next frame that I have to match in the gesture. And if it's within that window,
43:01
and that was, if you remember, in this guy, let's back him up a little. I don't know if we said this, but in the fudge factor field right there, this is the sort of margin of error that it's sort of, think of it as a bubble
43:23
around which you have some give. And if that fudge factor is too big, you're gonna have more false positives. If it's too small, you won't trigger it. You know, you won't trigger it enough. You'll have to be more accurate, right?
43:41
So it's kind of important to get that right. And I found that that's a good fudge factor. And how I came to that value, how I come to that value is by taking the deltas between the X, Y, and Z, or all of the axes that I'm tracking, and taking the deltas and adding them together
44:02
and comparing to the fudge factor, yeah. So then I'm checking to see if we're, if the hand states match as well as the fudge factor. And of course this is gonna return true if we're not tracking the hands. And the frame matches, great.
44:21
Set a matched property on the frame, cool. Now I have whether these frames are matched or not. Then I go through each gesture and now that the frames are matched, I wanna see if the frames have been matched in the right sequence and within the time window.
44:41
So it's really a brute force method of just going through the data and determining whether or not you've gone through the positions in the right amount of time in an accurate way, right.
45:00
It's actually fairly simple. What's missing from this is scale. So if I make a gesture and I'm like this, and my gesture is put your hands out like this and I'm tracking Y, right.
45:23
I'm tracking Y and then somebody like this comes in and does like this, it's not gonna work. So that code is missing and that's what I'm hoping. And I actually have some that kinda works, but this is one of the reasons why I'm open sourcing it.
45:41
That's sort of like beyond my mathematical ability to figure out. So that's one of the reasons I want to open source to let somebody else do that. What else can I tell you? Questions, any other questions?
46:03
What's that? The raw what data? Video data? Yeah. The raw video data can be in a bunch of different formats. Let's see. Color, color, color, color, color, color, color, color, color, color, color, color, here it is.
46:29
Yeah, this has been my experience. It's been in this YUY2 format, but you can never tell, you can't count on it. You basically have to query the frame
46:44
to determine what format it's in and treat it appropriately. I'm not a format person. I don't know what these things are. RGBA is obviously red, green, blue, alpha, so that's the order of bytes,
47:03
but YUY2, I don't even know what that stands for. I just know that's the format, and then there's a method that you can call to copy from that format into a buffer that can go into, be digested by a writable bitmap.
47:27
Cool? What do you like? What do you not like? Thoughts?
47:59
You mean a very close range?
48:03
Very close range, I'm not sure. You know, a lot depends on the lighting, and I wouldn't say typing, but I'd never rule it out. I don't know. I couldn't tell you. You saw me before where I was standing about 10 meters away, 10 feet away,
48:22
and doing this, and moving very little, and actually moving the mouse, and that all had to do with, I mean, it's very sensitive, actually, if you're moving. It's very sensitive in how much it changes. I mean, it has meter.
48:43
It's in meters, but you're getting a double precision number, so it's tracking really, really small movements. So I imagine that you could, you know, have that kind of control if you really wanted it.
49:03
I haven't tried it, though. Here's what I know. Not with this, because this is brand new, but I was involved in a gesture recognition competition in Boston. I believe it was in Boston, yeah,
49:23
and I don't know why I was there, because my method was like so pedestrian compared to everyone else who was just like, you know, these Asian kids math geniuses that were talking about forests and map reduction and stuff, and I was like here with my C-sharp code
49:42
doing, you know, brute force array, integer decimal comparisons and stuff. I was seriously outclassed, but they were doing, there was companies there that were taking the depth data and doing, I guess, forest analysis.
50:02
Do you know what that means? A forest algorithm? If you know what that means, great, but that's, it's a way to do image object recognition. In fact, we just, Martin Yule is here.
50:24
Have you seen his stuff on computer vision? Yeah, OpenCV? Right, yeah, OpenCV, and there's a .NET version of this,
50:42
so this has object recognition stuff, and believe it or not, this will work with low-res JPEG, like you don't need a lot of stuff to recognize, so I imagine that, you know, and this is all done by PhDs and stuff that spend a lot of time doing it,
51:02
and there is a .NET library for it, so jeez, I imagine you could probably do sign language recognition with this. In fact, I wonder if it hasn't been done already. OpenCV and sign language recognition.
51:30
Look at that, sign language recognition software using C-sharp. Probably don't even need Kinect.
51:45
Probably do it with a video camera.
52:29
That's not even using a Kinect. That's just your video camera with OpenCV. Yeah, knock yourselves out.
52:46
Yeah, well, we were interviewing Martin Yule today, and I was downloading this, thinking yeah, I'm gonna check it out. He has an app that he wrote, put it on a Raspberry Pi with a camera, and it's over the door,
53:02
so you can tell when the pizza guy is delivering a pizza, and it recognizes the box and the logo, and it plays a song when the pizza guy is here. Like, yay, the pizza's here, whatever. It's a solution in search of a problem, isn't it?
53:24
I have another one of those. I just got a new refrigerator from Samsung, and it's got an app, so naturally, I downloaded the app, right, because it's a refrigerator, and everybody's been talking about, oh, you're a smart refrigerator. It's gonna be so awesome. You know, you're living in the high-tech world.
53:41
Your refrigerator's gonna be connected to the internet and all this stuff, and I'm like, great. So I download the app from my Google phone here, my Android phone, my Samsung Galaxy S5, just happens to be the same brand as my refrigerator, and I got it all working, and I swear I spent about 45 minutes before I even realized what the app did, right?
54:02
So I get it all working, and I get the fridge connected because the fridge has got a Wi-Fi thing in it, right? Great, cool. And I finally get it loaded up. It tells me the current temperature in the freezer, current temperature in the fridge, awesome. Allows me to put it into deep freeze mode
54:21
and deep fridge mode, which I guess, power freeze and power fridge, which I guess if you take like a thing of hot soup or something and put it in the fridge, you can go into deep whatever, and it'll drop the temperature really fast so nothing warms up, you know, and it'll bring back the temperature, blah, blah. Yeah, it's great. Okay, cool. I can do that for my phone, but I gotta go to the fridge to put the hot stuff in there,
54:43
and there's a front panel that allows me to do that. Okay, but I can do that for my phone. Gets better, it gets better. You can only use it on the local Wi-Fi, not connected to the internet. So the only reason to have this app
55:02
is if you're too lazy to get up and go to the fridge, you gotta have the data right there in your hand in the living room. That's the American way. Yeah, solution in search of a problem. Any other questions?
55:22
What are you guys thinking of doing with this thing? Seriously, I wanna know what's on your mind in terms of real world apps. Are you making games or do you really wanna make a, anybody gonna make a business app with this?
55:41
Oh yeah, beautiful. Wow, great. I'm actually working with a company in Nashville right now that's doing a physical therapy
56:02
and they're doing dynamic movement assessment. Basically the person stands in front of the Kinect and they do squats and it counts as they go up and down one, two, three, and then checks out what their, how messed up their knees are or whatever when they go up and down, yeah.
56:21
So yeah, the medical world, it definitely has huge implications. They're freaking out over how amazing it is because they've done this stuff before in medicine but they've had to put markers on the body or they've had to be in special environments where very controlled but not anymore. $200 device.
56:41
The only thing they're complaining about is the cost of the machine that's required to run it because not every laptop is powerful enough to run it. And you gotta have an i7 basically and probably a good idea to have an SSD drive, eight gigs of RAM.
57:01
So yeah, they're looking at a desktop machine probably 1200 bucks and that's like, yeah, yeah. Linux support.
57:21
I don't, no, I don't know. I know that there was a port of the last, not the last SDK but somebody did an SDK for Connect for Windows for Linux. Yeah. Yeah. So I wouldn't be surprised if there's something.
57:42
I don't know if it'll be done by Microsoft but you know, you never know. I mean Microsoft is living in this great world of cross-platform now and especially in Visual Studio. I'm not so sure they're crazy about Linux but they certainly like Apple and Android stuff.
58:07
Who knows? Other questions? Yeah, time for a beer? All right, good. Thanks a lot guys. Oh, wait a minute.
58:23
CarlFranklin.net is my blog. This is where you can download, you'll be able to download, you can download the Connect tools. Right now they're only available for the old SDK but when the new one ships and I have permission,
58:40
I will post everything up there. So knock yourselves out. CarlFranklin.net. That's all you need to know. Okay, thanks. Sorry about this. I couldn't show it to you anyway but at least I should have known before I put it in my suitcase.