A robotic platform for natural and effective human-robot interaction - TIB AV-Portal

A robotic platform for natural and effective human-robot interaction

00:00

13

Carbognani, Enrico

Formal Metadata

Title

A robotic platform for natural and effective human-robot interaction

Title of Series

EuroPython 2017

Number of Parts

160

Author

Carbognani, Enrico

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/33706 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

A robotic platform for natural and effective human-robot interaction [EuroPython 2017 - Talk - 2017-07-14 - Anfiteatro 1] [Rimini, Italy] This talk deals with the usage of artificial intelligent techniques in humanoid robotics. The focus is on human–robot interaction with the goal of building a robotic platform which embodiments are able to interact in natural and effective way with humans through speech, gestures, and facial expressions. The system is fully implemented in Python and based on the Robotic Operating System (ROS). The talk will describe the hardware and software configuration of our currently NAO based humanoid platform. The strategy has been to use available high level Python libraries for spoken language processing, sentiment analysis, vision, interfacing with Artificial Intelligence applications in order to provide current edge technologies performances. The overall system architecture is based on finite state machines nodes interacting via the ROS communication layer. The main fields of applications that the platform is targeting are: - Entertainment - Education - Field robotics - Home and companion robotics - Hospitality - Robot Assisted Therapy (RAT) We will present the latest status of the platform together with a NAO based demo

Speech

Text

Image

00:00

RobotWebsiteIntelSoftwareArchitectureImplementationField (agriculture)VideoconferencingSharewareComputer fontModul <Datentyp>Regular expressionComputing platformSoftware testingMachine visionMathematical analysisProcess (computing)Formal languageElectric currentVirtual machineLibrary (computing)InfinityData structureCodeComponent-based software engineeringMehrprozessorsystemField (computer science)RoboticsInteractive televisionComputer architectureComputing platformAreaMalwareRoboticsField (agriculture)Level (video gaming)Endliche ModelltheorieLibrary (computing)Finite-state machineFlow separationMathematical analysisCartesian coordinate systemComponent-based software engineeringSpeech synthesisArtificial neural networkComputerComputer programData structureCodeRegular expressionSoftwareFormal languageSoftware testingMachine visionScripting languageDifferent (Kate Ryan album)INTEGRALCore dumpSinc functionSharewareModul <Datentyp>Lecture/ConferenceComputer animation

03:32

ArchitectureRobotDecision theoryComputing platformTelecommunicationInfinityPoint cloudPattern recognitionComputer networkInterpreter (computing)Library (computing)Formal languageTranslation (relic)SoftwareImplementationEvolutionarily stable strategyVirtual machineProcess (computing)Finite-state machineMassControl flowVideoconferencingConfiguration spaceFormal languageInteractive televisionLevel (video gaming)RoboticsLibrary (computing)Scripting languageInterpreter (computing)Natural numberFinite-state machineProgramming languageArtificial neural networkObjekterkennung1 (number)State of matterTelecommunicationRight angleComputing platformComputer programComputer architecturePattern recognitionDifferent (Kate Ryan album)Game controllerPhase transitionWebsiteSoftwareMassVirtual machineMoment (mathematics)Maschinelle ÜbersetzungTranslation (relic)ChatterbotComputer fileCombinational logicComputer configurationProjective planeEvoluteDrop (liquid)MathematicsMultiplication signComputer animation

08:49

Reduction of orderBefehlsprozessorPoint cloudInternetworkingRobotTask (computing)Latent heatMathematicsRoboticsComputing platformAdditionOpen sourceRepository (publishing)ArchitectureSkeleton (computer programming)Computer hardwareDigital signal processingCodeDirected setConvex hullPlotterSharewareCodeRoboticsLatent heatVideoconferencingMultiplication signResultantSlide ruleSkeleton (computer programming)Scripting languageConfiguration spaceInternetworkingLocal ringComputer fileGodConnected spaceProjective planeComputer architectureBefehlsprozessorPoint cloudComputing platformComputer hardwareDigital mediaComputerPublic domainCASE <Informatik>PRINCE2State of matterWebcamTask (computing)Object (grammar)Open sourceArmLaptopModal logic

15:15

Computing platformRobotShared memorySpacetimeState of matterFinite-state machineComputer animation

16:23

File viewerTerm (mathematics)Crash (computing)AreaComputerWikiMoment (mathematics)Source codeComputer animation

17:30

Pattern recognitionSpeech synthesisMultiplication signWordPoint cloudComputer animation

18:53

Term (mathematics)Web pageSpacetimeRegular expressionProcess (computing)1 (number)FrequencyMereologySound effectStandard deviationSpeciesMultiplication signLaptopVirtual machineProjective planeOrder (biology)WebsiteCharge carrierSoftwareConnected spaceWater vaporState of matterCategory of beingVideo gameComponent-based software engineeringObjekterkennungPattern recognitionLevel (video gaming)Disk read-and-write headObject (grammar)NP-hardInheritance (object-oriented programming)RobotComputerComputer fileInternetworkingComputer animation

26:02

Multiplication signProjective planeNormal (geometry)Computer animation

26:53

Maxima and minimaNormal (geometry)JoystickArmComputer animation

27:31

Term (mathematics)

28:09

MathematicsMultiplication signTheoryRight angleNoise (electronics)Bus (computing)Configuration spaceComputer fileSoftwareProof theoryMoment (mathematics)Web pageEndliche ModelltheorieSpeech synthesisPattern recognitionReading (process)Formal languageImage resolutionNatural numberVirtual machineDisk read-and-write headFocus (optics)Volume (thermodynamics)Sheaf (mathematics)Game theoryWordDigitizingGodState of matterReflection (mathematics)Sinc functionBitComputer animation

36:22

CAN busVirtual machineData managementFinite-state machineComputer animation

38:08

Term (mathematics)Object (grammar)RoboticsGraph coloringPlastikkarteQuicksortSpeech synthesisDemosceneMultiplication signCAN busWeightType theoryWorkstation <Musikinstrument>Set (mathematics)Right angleKey (cryptography)Inheritance (object-oriented programming)Instance (computer science)Computer animation

Transcript: English(auto-generated)

00:05

So hi everyone, my name is Ricco Carboneani and today I will show you how to create an architecture to create to a platform to create a natural and effective human-robot interaction. So I will start saying what are our main goals? What is the main architecture that we use?

00:25

Then I will say, okay, how do we implement this architecture for the now, the field of application for the robot? A video, well, audio doesn't work, so I'll show you, show it at the end without any audio, so

00:41

forgive me. I'll show you all the problems that there are. Next steps, conclusions, and then I'll show you a demo of the robot actually doing something. So, okay, what are our main goals? Our main goal is to create a robot able to effortlessly communicate and interact with human through speech, gestures, and

01:07

facial expressions, because obviously that's the most effective way to do it. So what, and the other thing is to actually build really a modular software infrastructure so that we have the possibility of integrating different robotic platforms and integrate new software really easy and also

01:28

remove old software. So this is really, really useful, especially for for testing purposes and performance assessment, also so we can test software and change it whenever we want.

01:46

So how was it achieved? So we use high-level Python libraries for spoken language processing, sentiment analysis, and vision, and also we use artificial intelligence applications and yeah, so

02:04

the main architecture actually is pretty simple. Everything is based on Ross Indigo. Why Ross Indigo? Because this Mac library that we use for Finis state machines runs only on Indigo, so we are stuck with that.

02:20

Also, the state machines actually run a separate processes, so a separate Python scripts, and they communicate with Ross. Now, what are the main benefits of this architecture? So thanks to this, thanks to the state machines that actually run as different programs,

02:43

we are actually able to use asynchronous computing and also achieve multiprocessing. It's not threading, it's different. They're just different programs using different cores. So since it's structure is modular, because that was one of our main goals, then soft code can be added and removed without problems.

03:08

Also, since it's already Ross-based, basically every component that uses Ross in any way can be added without problems. So

03:20

now, why, where do we actually use the now? Mainly in entertainment, education, field robotics, home and companion robotics, hospitality, and robot assisted therapy. Now, there would have been the video, but too bad. Now, robotic platforms. These are like the way that we design this architecture, then

03:47

this can be virtually be used on every robot, but based, about Ross-based or not, because in fact, now we are not actually using Ross to communicate with the robot. We're just using Ross to communicate between the state machines.

04:02

Also, if you want, you could also use the architecture to work on non- robotic related projects, but it's up to you. Now, why did we decide to use the now robot? Well, it's actually one of the best commercially available humanoid robots.

04:22

There are actually not many other options. The other one is the Pepper, which is its evolution. Also, since we actually need to interact with humans, having a humanoid robot makes the,

04:41

with a humanoid robot, a human can actually connect at an emotional level, so the interaction is a lot more effective. Now, how did we actually implement our architecture for the now? So, as I already said, we use Ross just for communication between the state machines, and

05:03

then we use the Naochi library, which is a Python-based library to send all of the commands to the robot. Now, we use the PyIML interpreter for natural human-robot dialogue. So, what is PyIML? PyIML basically is just

05:24

a programming language that permits to actually create chatbots. This is, at the moment, kind of the only way to create an effective way for the robot to communicate with a human, that actually

05:42

with answers that kind of are human, since, you know, you program them. Also, we actually heavily use a cost-based solution, especially for voice recognition and also language translation, because

06:01

the local-based one, the equivalent, are not as effective, and also, the cost-based ones are a lot faster and a lot more accurate. Also, why language translation? Because, basically, if we add some kind of automatic translation, then we can just go to the config file, say,

06:24

okay, I don't want Italian anymore, I want Polish, and voilà, the robot now can speak another language. I also use neural networks. At the moment, we use it for object recognition and face recognition.

06:43

Now, this is the actual map of the state machines. So, basically, the main Python script is called humanoid. When you call humanoid, it will launch these five state machines, which are sensing, sight, thinking, acting speak and acting move.

07:03

So, now you might ask, why is acting speak and active move connected to sensing? That doesn't make any sense. Basically, when the robot talks, if we didn't do anything, it will still continue hearing and go crazy, more or less.

07:22

It wouldn't do anything. So, we need to make it mute when it talks and when it moves. Now, this is the inside of the sight state machine. Basically, what happens is there are different states, and basically, you can just have one state at a time.

07:49

You cannot have two working at the same time, just one at the same time. The standard one is idle, and basically, when I talk to the robot, it will change to whatever I ask him to.

08:04

For example, I ask him, who am I? It will go to face recognition. And then, when it's done, it will go back to idle. Now, this is the actual brain. Now, don't be fooled. This is not everything that it can do. It actually can do a lot more things, but for the sake of readability, we cut them down.

08:25

Otherwise, you couldn't see anything. It would be just a mess. It's already a mess right now. Imagine with five or six extra machines, you can read anything. And it's pretty self-explanatory.

08:41

Also, we have movement control, which is pretty simple, I guess. Okay, now one thing. Why did we actually decide for a web-based solution, rather than, you know, a local one?

09:01

So, basically, we can reduce a lot the CPU load. So, basically, if we use something like a laptop, then everything can run faster, since it's on the cloud, and who cares? Oh, yeah, also, you actually need a fast internet connection. That might be a problem in some cases, because if you don't have it, then you're screwed, basically.

09:26

And also, obviously, the tools provided by big companies are usually the best available and the most reliable. Okay, so, wait, what?

09:43

For some reason, these slides are inverted. They should not be. So, okay. What are the main problems with this? So, basically, Ras Indigo uses Python 2.

10:07

And, obviously, if you have a piece of code that runs on Python 3, you will have a problem with compatibility. Basically, to solve this problem, what we do is just, okay, I need to run a Python 3 script. Okay, very easy.

10:21

I just say, okay, when you go in this state, just execute a Python script somewhere else, and then get the result and make the robot speak, or whatever you need it to do. Also, right now, when we change the robotic platform, we have a really, really big problem.

10:45

Because every time you change it, basically, you need to go to search around on all the code that you wrote and said, okay, so here, the robot speaks, and it uses this kind of command.

11:02

So, I need to change it. And you need to do this, basically, on everything that interacts with the robot. Also, the now robot, for our needs, is not really powerful. Especially in the sensor department.

11:24

Like, for example, the microphone that it has is really, really not powerful. So, basically, we actually, like, need to use an external one to make it work. And also, the actuators department. So, basically, its limbs are not powerful enough.

11:42

So, if you ask him to, I don't know, pick up something, it will fail most of the time. Also, the grip is really basic. It's just like, well, basically, it's just like if there were two fingers, if there are three.

12:07

So, what are the future steps that we will do to actually correct every problem with this? So, as you, I think, could understand, this is actually a project that we are working

12:24

on with a company. So, basically, a lot of this stuff, unfortunately, is closed source. So, we can just show you how it works. But we want to make the skeleton of the main architecture available to everyone.

12:41

So, I don't know, if you want to make it open source, so, I don't know, if you want to start a new robotic project, say, okay, just download the architecture, put in my robot commands in the config file that we will create. And basically, you can just add code really easily, remove it,

13:03

and you could have a robot working pretty easily. Also, obviously, since now is not enough for our needs, we need to change it to something newer and hopefully more powerful. And also, yeah, adding a config file.

13:22

So, as I said, when you need to change something on the robot specific task, then you need to change all of the software. So, we just create a config file to say, okay, here is the webcam,

13:40

it's here, or here is the, I don't know, the activator for the left arm, whatever, you can do everything there. You don't need to actually modify all of the software. Also, we might add a companion robot to the now. So, this is just a temporary solution,

14:02

so it will actually house all of the necessary hardware, so basically just mainly a desktop computer, and also all of the sensors that will replace the now ones, because they just don't work, they're really bad. Also, I know it's kind of early for conclusion,

14:23

but don't worry, we also have a lot of more stuff, especially with this guy, and also we have a video. So, basically, with this robotic platform, we have been able to achieve the main goals that we initially achieved, which is great, thank God.

14:42

And also, we are actually seeing promising results with our now embodiment. And also, obviously, Python is the art of everything, and we are exploiting the batteries-included philosophy,

15:03

so we try to integrate everything that is already available. And now, let's enjoy the demo, and hopefully the video. Okay, now, let's make it start, actually.

15:40

Can you see everything? Yes. Okay, Rosco started. So, basically, as you can see, I have little space to actually make you see everything, but for every state machine,

16:00

there is like a log on an x term, so you can actually see what it's doing. Yeah, also forgive the spelling mistake. And here, you can actually see what it's actually doing on real time.

16:22

The problem is, it might crash, so... Okay, no, it doesn't also. Forgive it, everything is really slow, because this is just not a really powerful computer, so speaker is not so important.

16:42

Maybe, whatever. Wait a second, there's something missing. Ah, here it is. Okay, basically, you can see what actually it's hearing.

17:00

So, one small problem is that the now, for the moment, can only speak in Italian. If you're an English speaker, I'm really sorry. Okay, so, for example, I could say...

17:21

First, let me see if it can actually recognize me. Okay, whatever. It should work, but it doesn't, so...

17:43

Eh, too bad. So, for example, I could ask him for... I don't know, I could ask him something. So, I heard a lot of stuff. So, as you can see, basically, it's trying to understand

18:03

everything that I said, and it's now trying to recognize it. I said a lot of stuff, and it's taking quite a bit of time. That's one of the, actually, one of the main problems with the cloud-based architecture. The speech recognition is actually based on the cloud,

18:22

so to actually send all of the audio, and then to get it analyzed and get it back takes a lot of time, especially if you have really big words.

19:03

Or, for example, I could ask him if he likes broccoli. Obviously, it doesn't work every time,

19:23

so we still actually need to improve everything. It's still a pretty young project.

19:51

As you can see, basically, since the actual aim of the bot is in English, it tries to translate it into English,

20:02

make him say something, then translate it back to Italian, so sometimes, as you can see, it doesn't work. Basically, it's just understood that orphan black does broccoli chemicals, and we like broccoli. So, as you can see, it's not always how it works.

20:25

So, let's try to ask him something else, doing too much stuff.

20:42

The speakers are really not the best, so he can basically hear anything. Yeah, George Bash.

21:02

Doesn't really work. Come on, do it.

21:23

Oh, no, I used this. That's the problem. This is like a pretty... George Bush, President of the United States. President of the United States. Okay, so as you can see, the performance is really slow, because I'm using an i3 computer.

21:41

Really slow, so it's a laptop. Usually, we use a much more powerful desktop, so we have a lot of processes working, and it's really, really slow. Also, the internet connection, it's not really the best. Also, it's hearing everything.

22:01

This is a pretty sensible microphone. When I talk to the speaker, he can hear everything. I could ask him for some object recognition, maybe. Doesn't really work, because as you can see, the camera is not really the best. Let's try anyways.

22:27

Let's try with a water bottle, because why not? Come on, stop it. It recognizes everything.

22:57

As you can see, it actually works.

23:15

Okay, as you can see...

23:21

So basically, as you can see here, it tries to understand what it is. And also, there is that light, which is really messing with his camera. And obviously, object recognition, especially at this level, if you know neural networks, is actually kind of hard. We could try something else if you want,

23:41

if you ask me. Okay, make him recognize, I don't know, a glass, a napkin, coin. If you want, you can ask me. Also, I could show him, I don't know, a dance, maybe. It doesn't really end well, usually, but let's try it.

24:04

Okay, no. Oh yeah, actually, let me do this. Legem in Astoria. Legem in Astoria.

24:29

Pinocchio. So basically, if you give him books, you can read them. That kind of actually, it's pretty easy to do.

24:41

It's nothing super hard, but why is it not working? No, it's heard Lucho, not Pinocchio. Why? There are a lot of problems, as you can see. Pinocchio, as you can see,

25:08

it got to the red machine, and it's actually reading the book from a just simple TXT file.

25:24

You can stop it by touching his head. Let's try Tai Chi, why not? Do you understand it or not?

25:40

When this PC is melting. Tai Chi, Tai Chi, yes. And basically, you can make behavior work.

26:03

Probably, it will fall, so don't worry. It's normal.

26:45

Oh wow, it actually worked. That's impressive. And also, basically, I don't know. There's someone, Dami Lamano. Basically, if he's understood it,

27:03

in theory, he should follow me. Dami Lamano. You can use the arm just for the joystick,

27:24

and it can follow me around. As you can see, as I told you, the actuators are not the best, and we kind of need to replace them. I can stop him.

27:42

Okay, fantastic. Go back to elbow, and my face,

28:05

but for some reason, it doesn't work.

28:22

Also, I don't know how well it's actually seeing me there. Probably, the gigantic reflector is not the best for its camera. It's really not that powerful. Come on.

28:43

Is this thing like face recognized there or something?

29:03

I think there is. Let me try this. Let's try. I don't know, shit.

29:28

I don't know, shit. Well, at least it's the guy in my face that says, oh, I don't know you.

29:41

So, theory, if it knows you, it says, oh, hi, Enrico. Obviously, since it's a demo, it must not work. Oh, yeah, actually, it can actually also read, but maybe this is big enough. I don't know. Can I try with this?

30:01

Okay.

31:11

Usually, it works.

31:20

You just need to have like really big letters. That's kind of the only problem. Basically, you need to have big letters because the resolution of the camera and also the focal point is kind of crap. So if you have actually a decent camera, you can actually use this with even like really small text.

31:42

You can just take a book, put it there, and say, okay, read, and it will read whatever it's on the page if I have some kind of big text. Too bad. Also, we could do a lot more stuff at the moment. I could ask him, oh, yeah, for example,

32:07

Domande, Domande, Ulivo, for example,

32:32

Ulivo, it understood the www.

32:49

I don't know.

33:10

For some reason at Google, I have no clue of why, but yay. Oh, yeah, basically, as you can see, there are some really big problems.

33:22

Think that this is kind of one of the best speech recognition software that you can actually use. Well, now it's not really the best because there is all of the echo from the microphone, and it gets really confused. Pretty easily. So basically, now you could ask everything.

33:46

Microphone, for example, I'd ask a microphone, and it should tell me, oh, what is a microphone.

34:03

It works, actually. I don't know. There is a lot more stuff. Stop.

34:26

Stop, please. It's too intelligent.

34:48

Oh, God. Stop.

35:11

I don't understand what it wants to understand.

35:34

Oh, yeah, now it's idle. Thank God. OK, fantastic. Yes, it went out.

35:40

So basically, to just make it go out of a state machine, you say basta, therma, or whatever you want. There is like a config file for that, and it goes back to idle. I don't know. We still have some time, so do you want to see it dance a little bit more or not?

36:01

OK. So, okay, this is pretty hard.

36:37

Now you should see the full manager.

37:36

Kick in. So basically, there is a state machine that says, OK, that actually has like the priority on everything else.

37:43

It says, OK, it's full, then go back up. And it works, as you can see.

38:02

And there you have it. Good as new. I don't know. Should I try to make it recognize something? Or should we go to the next one? It depends on what you prefer. We have time for a few questions. OK. If you want demonstrations. I don't know.

38:21

Tell me what you want to do. If you want, I can do a Q&A. If you want, I can continue showing stuff that you can do. Tell me. Maybe one question, somebody. Yes. What?

38:40

Behind, on the back. Question is, what happens if it falls on the other side? You can do it softly.

39:07

I mean, you know.

39:32

Any other question or challenges for the robots?

39:42

Hi. Actually, this camera, I was wondering about. Is it really, I mean, the color camera, what you see, is the same thing that the robot sees? Why don't you, for instance, use black and white and some sort of infrared light, so you are not dependent on, so the camera, the scene.

40:05

Is it the same thing that we see right now? Or you do some sort of black and white and infrared, maybe, to be not dependent on the environmental light?

41:00

Any other challenge or questions?

41:34

Raise your hand. I have one question. Can you pick up some objects?

41:42

Can you pick up an object on the ground? Please speak in the microphone. I think it will be easier for everybody. So, as it is right now, it kind of can take objects on the ground.

42:01

So, basically, you can just create an animation for the robot and say, okay, when you are here in this spot, then you can just run the animation to pick up the object. But it's not actually a smart object pickup. It doesn't say, okay, I recognize that is the object, and then I will pick it up.

42:23

It just says, I go here, and then I execute just a stupid behavior. Let's go down, close the finger, and get back up. We actually do it. Not really that kind. If you want, I can kind of show you. Is there some things?

42:46

Yes, he does it.

43:07

Great. Dami, Dami, Dami, Dami, Dami.

43:58

Give me back the napkin, but it's not.

44:02

Oh, man, this is so beautiful. That's dumb, you know? Yes. Let's try again. Dami.

44:22

Why is it not executing it? Well, yay. You're taking it out of his hands.

44:43

It's stealing from a robot that you just did. I will. Yeah, basically you could just, it's kind of a cheap way to do it, but hey, it works. Great, anyway. Yeah, well, thank you. Thank you very much for showing us that robots don't like broccoli.

45:02

And please give him a warm applause for his work with his robots. Thank you.