Possibilities for using speech and gesture recognition for the future of mobility - TIB AV-Portal

Possibilities for using speech and gesture recognition for the future of mobility

00:00

5

Wex, Philipp Edwards, Vanessa van Schäfer, Mathias

Formal Metadata

Title

Possibilities for using speech and gesture recognition for the future of mobility

Title of Series

re:publica 2015

Part Number

131

Number of Parts

177

Author

Edwards, Vanessa van

Schäfer, Mathias

License

CC Attribution - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/31914 (DOI)

Publisher

Release Date

Language

Production Place

Berlin

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Speech and Gestures are guiding our way into the future of mobility and play an important role in our daily lives and the way we get along. This session will provide insights on how speech and gestures impact the way we communicate and interact with each other as well as with our surroundings and also proves how powerful gestures can be for expressing ourselves.

Speech

Text

Image

00:00

Speech synthesisBookmark (World Wide Web)Presentation of a groupPattern recognitionVideoconferencingVideo gameBlogXMLUMLComputer animationLecture/Conference

01:22

Speech synthesisPattern recognitionRepresentation (politics)Physical systemMereologyGroup actionVirtual machineFormal languageInteractive televisionComputer animation

02:15

Speech synthesisPattern recognitionInteractive televisionFocus (optics)TheoryDevice driverWave packetMusical ensembleHTTP cookieProcess (computing)1 (number)Boss CorporationConfidence intervalSpacetimeMobile WebComplex (psychology)Equaliser (mathematics)Functional (mathematics)Three-dimensional spaceInfotainmentWaveDifferent (Kate Ryan album)

04:06

Complex (psychology)Volume (thermodynamics)BitIntegrated development environmentAudio file formatRight anglePattern recognitionSpeech synthesisNoise (electronics)Virtual machineMobile WebFrequencyPhysical systemPopulation densityWaveTask (computing)Representation (politics)Formal languageContext awarenessField (computer science)NeuroinformatikLecture/ConferenceComputer animation

06:28

Semantics (computer science)Formal languageDomain nameWordAreaConfidence intervalImage resolutionTask (computing)Multiplication signMessage passingPhysical systemSet (mathematics)Tablet computerStreaming mediaComputer configurationLattice (order)Web 2.0Functional (mathematics)Series (mathematics)BenchmarkNeuroinformatikSoftware developeroutputFocus (optics)Interactive televisionPoint (geometry)Representation (politics)Texture mappingStrategy gameChainProcess (computing)Goodness of fitHTTP cookieMonster groupNoise (electronics)Type theoryElectronic mailing listNumberExpected valueSpeech synthesisPattern recognitionHome pageSystem callDialectPattern matchingLevel (video gaming)Context awarenessCartesian coordinate systemNatural languageAddress spaceNavigationBitWebsiteGroup actionTouchscreenArithmetic meanLine (geometry)2 (number)Computer animation

12:57

Computer configurationSet (mathematics)Series (mathematics)Electric generatorContext awarenessDifferent (Kate Ryan album)CASE <Informatik>Pattern recognitionTheoryIntegrated development environmentNoise (electronics)PredictabilityOffenes KommunikationssystemSpeech synthesisComputer animation

13:57

Context awarenessVirtual machineMechanism designPhysical systemSpeech synthesisDevice driverRight angleGroup actionBimodal distributionPoint (geometry)Formal languageGraph coloringPattern recognitionSocial classModal logicGoodness of fitCAN busInteractive televisionComputer animation

15:33

Information privacyPoint cloudInformationPoint (geometry)Interactive televisionProjective planePattern recognition2 (number)Speech synthesisMoore's lawMachine visionBiostatisticsComputer animation

16:44

Virtual machinePhysical systemAutonomic computingSoftwareConnected spaceExpected valueDevice driverComputer animation

17:41

Presentation of a groupWordProfil (magazine)Formal languageFacebookTheoryGraph coloringWebsiteInternetworkingDigitizingProcess (computing)LaptopComputer virusPhase transitionLecture/Conference

20:10

BitGame controllerDecision tree learningPerspective (visual)TouchscreenDigitizingFormal languageMoment (mathematics)Shared memoryLecture/ConferenceMeeting/Interview

20:59

TouchscreenQuicksortPurchasingDecision theoryInterface (computing)BitDigitizingFormal languageCausalityPrice indexGraph coloringComputer fontDifferent (Kate Ryan album)Physical lawUser interfaceWeb pageMobile WebUsabilityVisualization (computer graphics)INTEGRALMobile appAutomatic differentiationComputer animation

22:53

Electronic mailing listMobile appObservational studyAvatar (2009 film)Bit rateProfil (magazine)Multiplication signGreatest elementSlide ruleWordGame theorySoftware testingMilitary base

24:36

Decision theoryMultiplication signPattern languageRegulärer Ausdruck <Textverarbeitung>RandomizationGrand Unified TheoryTwitterDigital photographyQuicksortPattern matchingSound effectHypermediaMetropolitan area networkMedical imagingComputer animation

26:56

Game controllerRegulärer Ausdruck <Textverarbeitung>2 (number)VideoconferencingGroup actionFocus (optics)Sampling (statistics)Software testingBit ratePower (physics)FamilyTape drivePattern languageConstraint (mathematics)PlanningUniverse (mathematics)PurchasingDecision theory1 (number)Expected valuePartial derivativeContent (media)HypermediaWebsiteAerodynamicsBitBookmark (World Wide Web)TwitterProfil (magazine)FacebookComputer animation

30:59

Perspective (visual)Real numberVideoconferencingForm (programming)Confidence intervalDigital photographyMedical imagingProfil (magazine)Mathematical analysisAvatar (2009 film)Regulärer Ausdruck <Textverarbeitung>FeedbackHypothesisBitAuthenticationBasis <Mathematik>Different (Kate Ryan album)Price indexBit rate1 (number)3 (number)Shape (magazine)Content (media)NumberSurgeryWebsiteUniverse (mathematics)Selectivity (electronic)Event horizonProcess (computing)OrbitDisk read-and-write headArtificial neural networkPatch (Unix)Group actionMusical ensembleMeeting/Interview

35:42

Regulärer Ausdruck <Textverarbeitung>VideoconferencingMultiplication signDisk read-and-write headFreeware

37:09

MereologyGreatest elementRight angleAreaContent (media)Group actionWebsiteQuicksortAdditionMedical imagingVideoconferencingNatural numberPattern languageProfil (magazine)DepictionOnline helpProcess (computing)Video gameSign (mathematics)Graphic designDifferent (Kate Ryan album)Presentation of a groupMobile WebPointer (computer programming)Incidence algebraTwitterNewsletterPower (physics)Web pageTheoryDigitizingBranch (computer science)AuthenticationFacebookBookmark (World Wide Web)Formal languageSampling (statistics)SpacetimeDirection (geometry)Gaussian eliminationMetropolitan area networkHoaxPhysical lawObservational studyLie groupAvatar (2009 film)Multiplication signMobile app

44:06

TouchscreenSpeech synthesisFormal languageSign (mathematics)Lecture/Conference

45:01

Graph (mathematics)Formal languageDevice driverInformationElectronic signatureGodLecture/Conference

45:47

Arithmetic meanFormal languageFormal grammarDisk read-and-write headSign (mathematics)Regulärer Ausdruck <Textverarbeitung>Interpreter (computing)Content (media)MereologyGroup actionSimilarity (geometry)Lecture/ConferenceMeeting/Interview

47:13

Point cloudHorizonSpecial unitary groupPopulation densityComputer animation

48:37

Point cloudOpen setView (database)WebsiteLecture/Conference

49:44

Sheaf (mathematics)Control flowQuantum stateWindowLecture/Conference

50:38

Factory (trading post)Link (knot theory)Supersonic speedArmSatelliteWindowCellular automatonFood energyNeuroinformatikShape (magazine)OvalLecture/Conference

52:15

Direction (geometry)Lecture/Conference

53:07

DistancePerspective (visual)View (database)Group action

54:04

Roundness (object)Table (information)TouchscreenCircleSurfaceDataflowOvalNeuroinformatikVideo gameAddress spaceOcean currentRight angleCrash (computing)Lecture/Conference

55:45

WindowShape (magazine)Disk read-and-write headMultiplication signInternet forumStreaming mediaLecture/Conference

58:06

Multiplication signLecture/ConferenceComputer animation

Transcript: English(auto-generated)

00:18

I was told to speak English, so I'll pick up in English.

00:22

So thanks very much for having me here, I'm quite excited to talk about my everyday work life and give you some insights on what we're doing on speech recognition. I explicitly will only talk about speech as the other two speakers will be more on the gesture topic. So, I always love when I go in presentations to start with a short video.

00:45

So I hope if I go here, some of you might know this video, it's actually one of my favourites and it's actually the best motivation for talking about speech recognition. So some of you might recognise it.

01:13

This is the German Coast Guard, what are you thinking about?

01:30

So, I'd like to talk to you about speech and recognition. Something like this can happen in human interactions, it definitely can happen when you interact

01:41

between a human and a machine. So as I'm a technical guy, I had to bring the phonetic representation of speech and recognition, so this is really the phonetics that you would see in some of our documents that we have when we define speech recognition systems. So I'll talk first about speech, so what is speech and how do we have to use speech and the second part which is the more important one for us is really the recognition

02:03

part and the understanding part. So how do you get from speech, from understanding a language to really recognising something with it? First of all, the question I always have to answer, why do we care about speech interaction in the car? I'm with Mercedes, you know, we have as every big corporation probably have this brand cookie

02:25

where we have different market values or brand values. Two of the main ones for us that we get always associated with is safety and comfort. Safety is for us very important, comfort as well, we make everything that our cars are safe and comfortable.

02:42

When we talk about speech recognition, those are the two values that we are working towards. Safe, we always want to have the eyes on the road and the hands on the wheel. We want to make sure that the driver can really focus on what he really needs to do with drive. As long as we are not autonomous and right now we are still not autonomous, we really need to get the focus of the driver really on the road and that's why we really want

03:01

to have the eyes on the road and the hands on the wheel. But of course, we want to as like infotainment, navigation, music, phone is really an important feature also in the car. We need to provide some simple access to infotainment features, to vehicle functions and that's why really the speech interaction is really a piece for us that brings us quite far.

03:25

So let's start with the basics. That sentence on the top, as long as I have a job I always tell my boss that's why I'm here because speech recognition is really highly complex and it's multifaceted. Basically, in the very beginning what you get is a wave in a 3D space that is somewhat

03:43

represented like this. That of course requires you to have some microphones placed somewhere, normally it's either in the ceiling somewhere or in the mirror, in the inside mirror. What I've put down here is that mobility increases complexity because probably later

04:00

on you would ask yourself, well Siri does it, Google does it, what is special about us? What is special about us is really that we think we increase complexity. Now we probably need to turn up the volume a little bit. Just to make it a little bit more, to understand why it's more complex, I have three or four audio files.

04:21

The first one that I'm going to play now is basically a navigation command in a studio like environment. So, this would be a guy sitting in a studio having a microphone right in his hand. Again, you've probably never seen someone sitting in his car having a microphone right stuck in his face.

04:41

So basically the same thing, this is how it would basically come to us when you are sitting in a car without any noise. You might have heard, the volume is a lot less than we had before. The next one I add a little bit speed of the vehicle to that.

05:04

100 kilometers, you say it's okay, let's make it a little bit faster. So you see, going to a machine and giving that kind of quality and that kind of noise, it makes it harder for us and that's why we're saying mobility increases complexity.

05:22

So what we basically need to do before we do any speech recognition, we need to optimize that signal because that signal really is a pain for machines to interpret. So what we basically do, we have this wave up there, we try to emphasize on the characteristics and try to get like a more smooth audio signal.

05:40

The complex task we have to accomplish, on the one hand road noise, everybody has road bumps, you have a road surface that is not really even so we need to think about this. But of course also noise that we contribute to, windshield wipers, turning signals, those have all special frequencies that really disturb speech recognition systems.

06:01

So we need to think about how we intelligently can filter those out. And last but not least in dense traffic situations you need to also cope with that. So now let's assume we have dealt with that task and we have a clean signal that we can start to interpret. Next step for us is get something that the textual representation of the audio signal.

06:25

So don't care about the language, it's German now, but the computer right now doesn't know either. So it's just some textual representation of an audio signal. So what I get here, is a textual representation of the audio line up there.

06:41

If we in the car only wanted to do dictation of messages, I think we probably could stop here. Of course we need to be aware what is the language as an input to the system. Right now we do that by there's one system setting and whatever that system setting is, that will be the expected language, that will be the input. In the future what we are working on of course we would like to think about a dynamic and automatic language detection also.

07:08

That's quite a difficult task, there are a lot of institutes right now and corporations working on that. But this would be like the next step that the system understands what language is spoken and even cross languages.

07:21

Normally just dictating messages in the car is not enough when we talk speech recognition because we want to solve tasks. I mean that's why we're talking about comfort and safety, we want to solve tasks and save time for the user. So now we need to start putting semantics and meaning to that language. And now we are in the area of natural language understanding.

07:41

When we talk about meaning and semantics, we would know from how the German language is set up. We would know that FARE is a domain for navigation, so the user probably wants to navigate somewhere. With the other filling words we would have like something, a POI, which is restaurant sonnet,

08:02

and we would have a location, so that's basically where we want to navigate to. So now we are at a point where we can actually instruct the system within the car what to do for us. And this is for the different languages right now, the challenge is getting that in our car. We want to have really the natural language understanding.

08:22

Having said this, recognition technologies, the very early processing have improved a lot in the last couple of years. That's why also we have the strategy to partner with technology companies. Even us, although we are a big company, we can't do that all by ourselves and we want to participate on the latest developments.

08:41

That's why here we definitely say partner with technology companies. Our focus lies really on the interaction and the dialogue design, so really the user experience within the car. And I want to just give you some examples on what the pain points are, also what we need to think about. First example, intelligent dialogue design, solving tasks cooperatively.

09:04

You might have used Siri before, and don't get me wrong, Siri set the benchmark and I'm really happy with Siri because it put a lot of focus on my work. But what Siri does in many times, either when it doesn't understand you or it has some function, it searches the web.

09:21

When you're driving at 140 on the Autobahn, you can't just start with a web search, it's not meaningful for the user. So this way, for a tablet or for something, a screen that you have in your hands, it's a good way to do it. In a car, it's not a good way because you need to be cooperatively, you need to give some options to the user. So we would need to think about and translate this behaviour, how would this be within a car.

09:44

Second one, meeting users' expectations. I stumbled across this design, I'm not sure if you are aware of that. It's like when the first Edison electric light got introduced, and I love that they were trying to light it with matches instead of using the knobs on the side of the door.

10:04

This translated to us means, in the very early stages of speech recognition in the vehicles, let's say domain phone calling, we had the options of dial a number, and then you were able to dial a number, of course, 07524242, whatever. Today, this is not the user expectation anymore.

10:22

Today, they want to talk about names, nobody knows numbers anymore. So today, of course, we need to be able to say dial David Menzel, and this is basically the number. So of course, we need to be very close to the market and to the user expectations to see what they want and how they really want to intuitively use the system.

10:41

Next one, and that is probably one of the main pieces, we need to resolve ambiguities, and there are many, many ambiguities. Just for the example of Neustadt, a city in Germany, if you type in Neustadt on Wikipedia, you get a very, very long list of Neustads in Germany. So now the big question is, if someone wants to navigate to Neustadt, what do you do in the car?

11:04

You really need to intelligently design your dialogue that cooperatively you get to where you want to be. I definitely say that today we are probably not there where we want to be, there is still room for improvement, but I think those are exactly the things where systems are good

11:20

or where they fail and users don't accept them. And this is why you really need a good dialogue design. Next one, also important, picture yourself being the Cookie Monster of Sesame Street and you are sitting in a Mercedes, and you basically just say, cookies, cookies, cookies.

11:41

What do you expect the car to do? If you ask Google, this is what Google gives you back. First it gives you back recipes within the car, not really interesting for me, then it gives you back something about Wikipedia, some more details about cookies, probably also not the best thing to do. If you go down a little bit you get something to hear Leibniz KXE.

12:04

You probably within that homepage or within that website you would probably see an address of Leibniz or Balsen, they are actually in Hanover. So if you right now were in Hanover, it would be quite interesting for the car to, if you say cookie cookie, that the car tells you, okay here is an address, do you want to navigate to Leibniz in Hanover?

12:22

Of course if you are somewhere today in Berlin, it would be a little bit hard just for a couple of cookies to go to Hanover. So there I would expect that the car understands I want a bakery where I can get some coffee and maybe some cookies. So this is why we are saying we need the context of the car. And this is one of our strengths, we have a lot of context, we have parallel applications,

12:41

we know where the user goes in normal ways and we also know what other things are currently going on in the system. And this will be a big area of the future where we are looking into how can we get this context. Even better. Which challenges are we addressing right now,

13:00

which are probably coming within some of the next vehicle generations? One thing is speaker recognition. Especially in our cars we normally have four seats. It would be quite interesting to think about recognition for different seats because use cases might be very, very different. Multi-party recognition,

13:20

same thing, you have an open system, an open microphone, why don't you give the option to speak up to any of the passengers and to also detect which seat is currently talking and maybe you have some settings that are special to that seat. Keyword activation for speech activation, I mean Google and Siri do it already, so you can say,

13:41

hey Google, hey Siri, that is working, again in our environment with all the noise we have around us, that's a challenge, so we are working on this one. Predictive user experience is something we are closely looking into. Just as I said before, the context is important. The context is the situation right now,

14:02

but there were contexts before that. Can you learn about the user and adapt your speech interaction and your whole user experience based on the context you have learned before? So we are trying to see whether we can work with machine learning mechanisms to really improve the speech recognition systems. Then what I have also already said before,

14:21

language mix, so if you have different languages in the same dialogue, how can you cope with that? We have one pretty good example on this, let's say S-Class in China, which is one of our biggest S-Class markets, there will be normally a local person speaking Chinese in the driver's seat and there might be someone not speaking Chinese

14:42

in the VIP seat in the right hand side in the back. So would there be a possibility that one of them speaks English, the other one speaks Mandarin? Those are also the questions can a system cope with that and how do we get there? And last but not least and this I think perfectly fits

15:00

into the session, can you talk about multimodality? Right now I have only talked about speech. Can the speech and the overall user experience be increased or improved if you add gestures to it? So if I just point somewhere and then talk about it or if I add other modalities with it. So this is also a topic where we get the different modalities together and see how

15:22

we can optimise the overall system. So those are the things we are currently working on and looking into. Of course there are also a couple of challenges we need to be very very aware of and we need to address special attention. Privacy is one big thing for us. You know if you talk about Siri, Google, you do

15:42

actually voice recognition in the cloud. Everybody at some point has agreed that this voice recognition in the cloud is okay and that this data can be uploaded. Of course speech in itself is a biometric information. So privacy is a big issue here. So you need to be very very careful what you really want to do and why do you want to do it.

16:02

So in the very best scenario you wouldn't have to go into the cloud. Of course we know why they do it today in the cloud, just because the computing power and the possibilities you have in the cloud today are much much bigger. But for us this means we need to think about it. What is privacy, how can we cope with privacy and how do we really inform the customer what we do with that data if we really need

16:22

to do it. Second point here, personality of speech interaction. We've had a research project on this, whether it would be interesting to give our speech engine a character. Wouldn't it be interesting really to speak with it just like a natural interaction with a person.

16:40

We have very ambiguous comments from our customers on that, but some of them say yes, interesting. Others say no, it's a machine, I don't want to talk to him like a human. So this is also something where do we even go there. You know Siri, Siri is very very personal. Where do we want to go there? What's the character of our system here? Availability, another topic.

17:01

Coming back to let's say Google again. Google currently works great if you have network connectivity. Google does not work if you are in a parking structure underground and you don't have a cell phone signal. So for us and our customer expectations here again our system has to work even if we don't have a cell phone signal or anything or no connectivity.

17:22

And last but not least this is always important and will be as long as our cars are driven by hand and not autonomous. And this is driver distractions. We follow very strict guidelines and this will be something we need to follow in the future as well. So having said this, this brings me I guess to the end of

17:41

hopefully a worthwhile listening to presentation. Thank you very much. Thank you Philipp for your insights and I think we are going to make, we are going to try and get the technology working for the next speaker

18:00

who is going to support what you just said. Thank you very much. Thank you. Now Vanessa van Edwards is waiting for us. She has already sent word that she is ready. Let's hope the internet helps us with that.

18:24

Ah, okay. So I heard her in a workshop actually in Texas when she was giving advice about your online digital body language and I had not been familiar with this topic before and she talks about stuff you are not aware of.

18:42

Like, when you choose your profile picture, what does it say about you? She actually has some tools where you can evaluate that and it's all about gestures and face mimics that you do that express little things that you are not consciously doing and you are not

19:01

all awake again if you weren't. It works, okay. So she gave some advice on how to pick your Facebook profile picture for example and to think about what you want people to think about you when it comes to your profile. What do you want to express? What do you want them to know about you?

19:22

Or who do you want to be? And she has some advice on how you can do that and I'm just telling you more about that workshop. It's also about figuring out what kind of personality you are and what you are compatible with I guess also when it comes to jobs or your website for example.

19:41

She has a really interesting way of putting her website. Her website is Science of People because she's running a lab where she's actually looking into digital body language and researching it to figure out how it all works and in this lab she looks at lots of things for example, of course you know all the theory about the colours

20:01

and what they mean but she thinks that also your target audience reacts to different colours and there she is I'm going to go to the laptop to say hi. Hello That's loud We can see you and hear you

20:21

You're on a very big screen We're going to put you full full screen Great, I'm sorry So I told the people a little bit about your background and about the workshop. I was lucky to attend in Texas about digital body language and I think they're all warmed up for you now and you can just go ahead. We just heard a little bit about gesture control

20:42

in the car of the future and today and now we're interested to hear about your perspective on the story Great, I'm just trying to share my screen. Hold on one moment We're going to share it for you It's a nice morning with you It's already even here

21:01

You got my screen up on your screen? Yes It's not full screen yet Hopefully it's doing full screen for you Not yet? No, we can see your Skype wrapping Can you make it full screen?

21:23

Ah, ok Then we are having it like this Alright, so I'm very happy to be here today and talk a little bit about digital body language What digital body language is is our non-verbal cues within our mobile devices and our apps

21:42

and what we like to do is integrate human behavior into technology and so what we do I run a human behavior research lab in Portland, Oregon and we take the latest human behavior research on how we use our technology and then integrate that into the user experience and usability

22:00

So very briefly since I only have a few minutes today I wanted to talk about what makes up digital body language There are actually ten different laws of human behavior within body language but we're going to talk about the most important three today and when we're talking about non-verbal cues we're talking about logos, pictures landing pages, ads

22:22

videos, user interface and colors and fonts, those are sort of the assets that make up digital body language On the flip side those assets draw us they cause us to change our perception of a brand's first impression the trust indicators of that brand

22:41

the brand values the brand's credibility and memorability and then of course the purchase decision if someone decides to click or buy or stay with an app or a mobile interface and then loyalty to that brand and that's what those digital cues bring out Very briefly I thought one thing that I would start with

23:00

just to show the impact of non-verbal is this slide of all these different faces One study was done at Tufts University by Nalini Ambadi and what she did was she wanted to know what kind of cues are sent by a face and as we know in our mobile experiences we see pictures all the time

23:20

whether we're in dating apps or we have avatars or profile pictures and we don't realize how many cues are being sent off in those small pictures So what she did was she devised a very clever experiment where she had people look at a grid of faces She had them try to guess who was the most influential

23:41

Now all the pictures you see in front of you are taken from this Fortune 500 list Some of the CEOs in front of you are the top members of the Fortune 500 list in other words they are the most successful influential and some of them are from the bottom of the Fortune 500 list and what they did was

24:00

they wanted to know if participants could tell just by looking at the picture which were the CEOs that were the most and least influential and people were accurately able to do it They were able to glance at a picture and guess who made the most money and who made the least money Now here's the trick with this study is the longer they let someone look at a face

24:21

the worse their ratings got So for the last few minutes I've had this picture up and you've been hopefully second guessing your guesses on who you thought was the most influential So let's actually play that game I'm going to have us play it right now So here are three different pictures of CEOs I want you to guess who you think

24:40

is the most influential Do you think it's A, B or C? You can raise your hand if you think it's A Raise your hand if you think it's B Raise your hand if you think it's C So the answer is B Now most people the majority of people can guess B but the trick is we have to show that to them very very quickly

25:02

And so what we found was in our lab is this works on Twitter as well So what we did is we replicated this experiment, we pulled random pictures of people on Twitter and we showed our users in our lab these three different pictures and we asked them based on their photos who do you think has the most followers

25:21

on Twitter? And we found, again, the longer we left this up, like hopefully right now you're looking at these images and you're second guessing yourself and you're trying to figure out who has the most followers. So how many people think it's A? How many people think it's B? How many people think it's C? So the answer is A

25:40

Now I've left this up for quite a long time so maybe, and I'm not sure I can't see the audience, the answers were sort of evenly split. That's very common the longer I leave it up. The next one I'm going to do I'm going to put it up very quickly so I want you to guess your answer and stick to it, your immediate gut reaction This is the men So of these next three men

26:01

how many do you think Who has the most followers on Twitter? How many people think it's A? How many people think it's B? How many people think it's C? So the answer is B. And we found that the faster we show these pictures, the more people can guess. Now when you're on a mobile device and you're scrolling through things or you see someone's picture

26:21

or avatar, usually you're looking at those pictures for less than a second And you make so many of your unconscious decisions based on the non-verbal cue in those pictures. And that's what I want to talk about today. I want to talk about the patterns that we found that make people think

26:41

someone is more influential or more popular or more likely to be trusted and those that are less likely to be trusted so that we can learn the cues that are from the pictures that are more likely to be trusted. And a lot of this focuses on micro expressions. So today because I only have 20 minutes, I decided to focus on the face. There's a lot of cues that we

27:01

talk about but the face is the most powerful one And when we're talking about the face, we have to talk about the micro expression So a micro expression is a very quick facial expression that we all make when we feel an intense emotion It's involuntary, so we make it no matter what we're feeling, we cannot control

27:20

that very quick micro expression. It happens in about one twentieth of a second. Very very fast And we found is that whether you're watching a video on your phone or you're looking at a picture, our brain is constantly scanning for these facial expressions. And I want to point out there are actually seven universal facial expressions. Today I want to

27:40

talk about my favorites that come up especially in the mobile world the most because they're the ones we really have to watch out for So the first micro expression that we should all be able to recognize and hopefully eradicate from any of our mobile experiences or be on the lookout for is contempt So contempt is a one-sided mouth raise

28:02

and it's kind of like a smirk It's when someone pulls their mouth away or the eye is kind of narrow at someone and the importance of this is that it is an incredibly powerful facial expression even though it's really simple most of us think that

28:20

the contempt or the smirk is a partial smile but it couldn't be more opposite than a partial smile Contempt is hatred disdain pessimism and so what they found was this is researcher John Gottman Dr. John Gottman is a marriage and family counselor up in Seattle and he

28:42

without realizing it discovered the power of the smirk. What he did is he wanted to look at one of our I think biggest questions in the marriage world which is why do couples get divorced So what he did was he studied a few thousand couples He brought them into his lab He tested them on everything

29:02

he could think of He tested them on hair samples, urine samples He gave them personality tests He gave them cue tests He videotaped them interacting He interviewed their friends and families and kids and then he followed them for 30 years

29:23

What he was looking for was patterns to see why did some couples get divorced and why did some couples stay together What he found was at the very end of the experiment is that the couples that got divorced made a small smirk or contempt micro expression in the very first

29:41

intake interview. In fact it was so powerful this smirk that he can predict with 93.6% accuracy which couples will divorce in 30 years just by looking for contempt So when we're designing things, when we're looking at pictures, when we have videos, when we have intro videos

30:02

we have to realize that one small smirk is like planting a seed of disrespect. It is very difficult to build brand loyalty to get purchase decisions and to give the user a pleasant experience whenever you have contempt or a smirk showing any of your pictures. But just go on Twitter and I challenge

30:21

you to go to your LinkedIn profile and your Twitter picture and your Facebook picture and check and make sure you don't have the smirk. I would say at least 40% of the pictures that I see on social media include a smirk People don't even realize what they're sending. You'll see this in celebrities that bother us A lot of celebrities

30:41

that use the contempt micro expression, they kind of grate on our nerves, they irritate us a little bit Here in the US when politicians show contempt their approval ratings drop. So it's also a very powerful cue to your brand personality Alright, let's move on to the next micro expression And this one is happiness

31:01

So happiness most people think is very simple. They think happiness is just a smile The trick with happiness though is it's actually not what you think. Happiness the only true indicator of happiness is when the smile reaches all the way up into the sides of our eyes the little muscles on the sides of our eyes

31:22

Women fondly call this the crows feet wrinkles So you'll notice in these videos the difference between a fake smile and a real smile So in the fake smile you have someone who smiles all the way up to their eyes. You can tell that she goes all the way up to her eyes and in the real smile I'm sorry, in the fake smile she just leaves it in the bottom

31:41

half of her face So we know that when you're taking profile pictures or when you're in a video inauthenticity or lack of passion, lack of engagement comes when people see a fake smile instead of a real smile It's kind of a trigger for people to go ugh this is not a brand that I could relate to

32:01

this is not a real brand, this is not an authentic brand this is not a brand I could trust And this is with both corporate brands, big corporate pictures and photos, even stock images and also individual personalized avatars and photos So this fake versus real is one of the easiest ways to show authenticity. What we found

32:21

is that celebrities who have not botoxed their smile wrinkles a lot of celebrities will botox or get plastic surgery so that their smile their authentic smile wrinkles disappear we see them as fake and pose and authentic. Whereas celebrities that use their entire face we really like them. Because we

32:41

see them as authentic and like that next door neighbor we really want to be friends with them they have this really wholesome loyal brand. Whereas all these celebrities who have these botox wrinkles, they don't smile all the way up into their face, we see them as sort of inauthentic especially politicians here in the US that don't smile with their full face get very bad ratings

33:03

So let's talk about number three. This is fear and this is I think one of the hardest ones to spot because we don't realize it's happening So fear from an evolutionary perspective is when people widen their eyes drop their mouth open and raise their eyebrows up their forehead. It's when you see the whites

33:21

of someone's eyes. And most people will tell me Vanessa, I'm in a business world no one is showing fear during business or no one is showing fear in their profile pictures or their videos The problem is is that we actually show fear if we're anxious about someone who's taking the photo sometimes we can accidentally show fear

33:41

when the light is too bright in our pictures or our videos and so what happens is that not only does fear, a fear picture show anxiety so it makes you look like you are low confidence or low confidence but it also produces anxiety For example, I have these two videos playing and as you watch these

34:00

you should actually begin to feel a little bit anxious. The reason for that is called the facial feedback hypothesis and it's that our brain mirrors what we see so when you see a fearful face you actually begin to feel that fear with them it's the basis of empathy so if you have fear in any way, shape or form

34:20

and this can also be in cartoon characters avatars, stock images, videos it actually begins to produce fear in your user so you see this sometimes in pictures where people have widened their eyes just a little bit above their pupils so you see the whites of their eyes so you see this over and over again

34:40

with different pictures and people will choose these photos without realizing that what they are showing is a deep kind of anxiety this also can happen in videos and people are using their mobile devices more and more to watch and consume all kinds of video content. People can accidentally show fear when they forget lines, when they stage flight, fright, sometimes

35:01

teleprompters will accidentally produce fear based on where they're placed camera nervousness, even uncomfortable clothing when women are wearing high heels, sometimes when they're standing on those heels and their feet hurt that will actually show up in their facial expressions through pain or fear and of course difficult questions. Whenever I'm watching news interviews you'll often know

35:22

when someone is very nervous about an answer because they flash the whites of their eyes right before they answer. One of the things that you can see in action, I'm not able to cover all of the different facial expressions here. There are seven universal facial expressions. You are welcome to go to SciencePeople.com slash Ted to see some of our facial expression analysis of videos.

35:42

I wanted to show one really quick video. Hopefully it will come through. Most people don't believe me that facial expressions happen very involuntarily and more importantly we can jump from one extreme facial expression to the next extremely quickly. So this is a video and in this video the baby is watching its mom blow her nose

36:02

and this delights and terrifies the baby at the same time and I want you to watch for the two facial expressions we just learned happiness and fear and see how quickly we can go from one to the next. Alright, I'll play it.

37:09

Okay, I will let that video go on for too too long although we love watching it. So as you could see in that video that baby didn't even have eyebrows yet yet we were able to recognize the apples of that smile

37:22

that smile reached all the way from the baby's eyes and then fear when we see the whites of the baby's eyes. So this is a very silly example. This comes across in every kind of image that we're showing in our mobile devices or even on our websites or our profiles. Whether that's a picture, a cartoon, an avatar or even

37:40

a depiction. So what I challenge you to do is think about how can you eliminate fake smiles, fear eyes and smirks and how can you add genuine smiles, calm eyes and even smiles and in addition if you don't have images or videos in your mobile experience, adding genuine smiles, calm eyes

38:00

and even smiles can actually greatly increase the amount of digital cues and authenticity that you're sending out. Now I want to talk for a second about directional actions and this is a different part of digital cues of how we take action. So if in a mobile experience you're hoping to direct action, clicks views, reading

38:20

purchasing this is where a lot of that comes from. So the first thing is that we tend to take cues from other people's eyes more than anything and we forget this in a mobile experience because often times we don't have as many pictures or we don't have as much space. But the most powerful way that you can direct

38:40

cues is using other people's eyes and I'm going to explain this in a couple different ways. First, most of us have probably seen that we make a general eye pattern on a website. This is by the Poynter Institute. When we're looking at a website where we tend to follow the F pattern. We start up at the upper left hand corner

39:00

we move across the top of the page then we scroll down, move halfway across and then down again. I think it is very important and one thing that we found is that it's very natural for people to have this come into their natural gaze direction. For example, here's a website by Ramit Sethi. He uses the F pattern as

39:22

well as his own eye gaze to prompt action. So what he's done, he's put his own face up in the upper left hand corner and immediately he's looking over to the side where you naturally go to his headline. Are you ready to live a rich life? So he falls

39:40

into that natural F pattern as well as making you want to follow his gaze. If you've ever been standing on the street and you look up towards the sky, you'll notice that as people walk by you, they can't help but also look up towards the sky. That is because we are programmed as humans to follow other people's direction. So he used that natural tendency to be able to

40:02

follow into where he wants you to go. So not only is he looking over at his headline, are you ready to live a rich life, but he's also looking over at his buttons where he wants you to click. That happens again. Find your dream job where he wants you to get started. This happens over and over again with different websites where they have a face and they direct the eye gaze towards their headline.

40:22

Even this one, you have the headline up top and you have the female, the bride looking back at the headline so you're more likely to read it. And she has you go back towards that headline so it directs your gaze back to exactly what you want someone to do. So we use this all across our website. We've played with it in different things.

40:41

So here's a post that I did on the science of lying. One thing we teach in our lab and we study is human lie detection. So I'll use gaze and try to see how I can get people to change their gaze and change their actions based on where we are looking at our pictures. You can also do this with gestures. So let's say that you don't have

41:01

pictures or faces in your experience. You can even do this with different kinds of gestures. So for example, in this one, this is a website by Marie Forleo. She actually uses her fingers to point where she wants you to look as well as her gaze. Oops, sorry. And then you also see in this other picture

41:21

they've used a gesture of a man but these pencils pointing towards the graphic design geniuses to sort of gesture over where they want you to read and click. I didn't see it but now I know.

41:40

So this last thing is gestures are used, even this little monkey is not even a real person down at the bottom, but they've used this weird flashlight gesture to be able to show that they want you to read, you have the right to know. So they've used gestures in a very interesting way. So here I challenge you to eliminate confusing gaze cues and any gazing away from content and adding

42:01

in the natural F pattern, the gaze and gesture to headlines, and gazing and gesturing for action. So I want to end here with a couple of things. So again, these were my sort of favorite laws of digital human behavior. I will actually be having a free webinar on all of this so you're welcome to join if you want, as well as I have all these

42:21

tips online. I challenge you to think about what your body language cues are in your mobile experience and when you're using other people's apps. So it's not just for your own experience, it's also for when you're using apps and you see pictures or you see gestures or videos, how is that affecting

42:40

or how is that changing your perception of that brand? Because just tuning into how that affects you can really change your experience with the brand and it's the best way to learn to see how pictures change things. And I also challenge you to please look at your LinkedIn profile picture, your Facebook profile picture and your Twitter profile picture

43:00

and make sure that you're not showing a contempt or an inauthentic smile or any fear eyes because that is changing your personal brand to signify things that are more negative and less positive. So if you have any questions, I'm also going to hop on Twitter right now since I'm not there to take any questions on Twitter if you have any for me.

43:20

Thank you so much. Thank you for teaching us all these things and it's really terrifying when you know somebody like you can totally know what you're thinking subconsciously so I'm just going to hang up now and people can ask you stuff on Twitter. You actually have a super cool newsletter as well that people can subscribe to where you teach

43:41

people more things like that. Thank you so much. Thank you. And now for the last part of our session, we're going to have the presentation of art for the future of mobility and you have your microphones

44:02

they're going to be turned on, don't worry about it. We actually have help that's going to translate for us what Matthias Schäfer who is the artist is going to perform for us but we're also going to have the text on the screen so you can fully give yourself to this experience of gesture

44:22

and the journey that Matthias has created for us. I'm really looking forward to this. Have fun. Thank you. Before I'm going to start I would like to introduce myself. My name is

44:43

Matthias and I'm going to speak German Sign Language. These are my two interpreters. They're going to speak English for you. Thank you. I was born deaf and

45:00

so when I grew up I didn't hear anything so I didn't know that so I grew up visually and I got all the information through my eyes through signals, through body language this is all I am

45:25

with everything I learned and so I'm going to show you to a story where gestures and mimics are very important. They're going to show you

45:40

how you could maybe drive a car later with mimics and gestures. Okay, so I'm going to use my facial expressions. They have some meanings and a grammar meaning also in German Sign Language.

46:00

Then I have to move my body and my head also has meaning and also I have my hands and there are 38 different hand shapes in the German Sign Language. They have a meaning and there's also

46:21

something like classifiers and these are hand shapes that directly take action like pantomime or something similar. And I would like to

46:41

if you don't understand everything just let the gestures and the mimics that I make give you a feeling of what might be the content of my performance. There's only parts that will be written. There's more in it and the interpreters just read short parts of it. But there's more

47:03

content in what I'm signing. Just a dream

47:29

I go to sleep and then I have this dream.

47:45

I am floating up from my bed going faster and faster until I'm all up in the air.

48:00

The sky is covered by dense clouds. I'm flying through the clouds.

48:22

I'm flying through the clouds where the sun arises on the horizon.

48:50

Suddenly the clouds open up. There's a gap. I slow down to fly through this gap. My sight

49:11

opens up on a top view of a big city. There's many high rise buildings

49:20

and roads leading almost everywhere.

49:47

I fly deeper down towards the city center. I land in the middle of an empty and large road where I find myself being surrounded by skyscrapers.

50:24

Between the skyscrapers a day's first rays of sunshine break through and they are reflected by numerous windows.

50:47

All these windows are coated with transparent solar cells. They feed energy to underground batteries.

51:12

As I look around me all I see is urban canyons.

51:25

There is no one to be found around me. Far and wide blank emptiness. I raise my right arm to hit my watch.

51:51

A radio signal given out by my clock is sent to a satellite in supersonic speed.

52:01

Having received it the satellite sends the signal to a main computer which is situated in a garage full of egg-shaped cars.

52:38

One of them is activated.

52:40

It is taken upwards by an elevator. Then it starts moving independently and travels

53:00

on the road in one direction. In a top view perspective you can see it taking its way through the city. It is heading towards me.

53:36

From a distance I see it coming and I joyfully await its arrival.

53:56

The egg-shaped car gently pulls over and stops in front of me.

54:01

The doors open up widely. Inside the car there are seats built in a round circle. They are made of shiny white leather. In their center there is a table with a round

54:21

touchscreen surface. As I get on the car the door is closed behind me.

54:40

On the touchscreen table I enter an address. I put the lights on and off. The car starts moving. Suddenly life comes back to town. Everywhere I see people

55:00

and egg-shaped cars. Everything is in motion. There are no traffic lights. No traffic jam.

55:29

There is a central computer controlling the traffic's flow. It is a good and safe feeling.

55:45

Through a circular panoramic window I observe all this with amazement. Without realizing that the egg-shaped car brings me closer and closer to my desired destination.

56:07

Having reached my destination I get out of the car. There is a beautiful woman waiting for me.

56:40

I ask her if she would like to go

56:42

to a restaurant. She just shakes her head and smilingly she points her finger at the egg-shaped car. I smile back and nod my head alike. Then we both get in.

57:05

The doors gently close and the car drives away to an unknown destination. I wake up.

57:36

I smile about this dream and I think

57:44

why not? Let this future come. Our future for us all.

58:10

Thank you very much. This is the end of our session. Thank you so much for attending. You can always visit our stall at the beginning of the hall, the Daimler stall, and ask us more questions there. Thank you

58:22

so much for your attention. Have a great time here at the Republica.