Logo TIB AV-Portal Logo TIB AV-Portal

Teaching machines new tricks

Video in TIB AV-Portal: Teaching machines new tricks

Formal Metadata

Teaching machines new tricks
Machine learning: Silver bullet or route to evil?
Title of Series
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
According to the Gartner Hype Cycle Machine Learning is currently at the peak of being hyped. Scanning current press publications we can find anything from Elon Musk warning about AI being the biggest existential threat to humanity, scientists fooling machine learning models with seemingly tiny modifications to street signs, machine learning enhancing smart phone pictures, as well as introductory material trying to explain what machine learning is about. According to Wikipedia "Machine learning is the subfield of computer science that, according to Arthur Samuel, gives "computers the ability to learn without being explicitly programmed." This keynote will detail what it takes to build a successful machine learning pipeline. We will explore some examples of how machine learning has evolved over the last twenty years and close with highlighting some of the implications that new machine learning based systems have.
Keywords Keynote

Related Material

Video is cited by the following resource
Computer animation Integrated development environment Meeting/Interview Patch (Unix) Physical law Abstract machine Routing Computing platform
Type theory Machine learning Multiplication sign Energy level Bit Abstract machine Cycle (graph theory) Cartesian coordinate system Field (computer science)
Word Wave Machine learning Scaling (geometry) Computer animation Open source Video game Solid geometry System call Scalability Library (computing)
Building Arm Open source Multiplication sign Abstract machine Data analysis Freeware Scalability Wave packet Physical system
Open source Decision theory Patch (Unix) Constructor (object-oriented programming) Demoscene Word Process (computing) Computer animation Software Order (biology) Duality (mathematics) Video game Pressure Physical system
Computer animation Well-formed formula Multiplication sign Software developer Bit Software framework
Server (computing) Word Computer animation Term (mathematics) Personal digital assistant Abstract machine
Service (economics) Computer animation Observational study Artificial neural network Video game Quicksort Cartesian coordinate system Form (programming)
Arithmetic mean Computer animation Auditory masking Artificial neural network Projective plane Stress (mechanics) Figurate number Abstract machine Spectrum (functional analysis)
Predictability Algorithm Observational study Workstation <Musikinstrument> Model theory Range (statistics) Set (mathematics) Mass Instance (computer science) Flow separation Computer programming Formal language Goodness of fit Machine learning Computer animation Personal digital assistant Abstract machine Task (computing)
Algorithm Support vector machine Observational study Transformation (genetics) Weight Insertion loss Line (geometry) Flow separation Supercomputer Category of being Word Computer animation Computer science Social class
File format Cellular automaton Database Line (geometry) Flow separation Dimensional analysis Theory Formal language Word Computer animation Order (biology) Linearization Video game Hyperplane Spacetime
Information File format Transformation (genetics) Multiplication sign Model theory Wave packet Power (physics) Information extraction Medical imaging Process (computing) Order (biology) Selectivity (electronic) Abstract machine
Uniform resource locator Process (computing) Computer animation Different (Kate Ryan album) Transformation (genetics) Military base Model theory Order (biology) Normal (geometry) Line (geometry) Data conversion
Spreadsheet Decision theory Multiplication sign Model theory Order (biology) Cycle (graph theory) Abstract machine Product (business)
Addition Hoax Information Multiplication sign Information privacy Timestamp Programmschleife Computer animation Order (biology) Quicksort Data conversion Error message Identity management Probability density function Row (database)
Performance appraisal Word Loop (music) Software engineering Computer animation Model theory Right angle Online help Abstract machine Quicksort Metric system
Support vector machine Email Divisor Pseudonymization Decision theory Model theory Bit Black box Number Product (business) Data management Query language Search engine (computing) Personal digital assistant Office suite Extension (kinesiology) Error message Resultant Exception handling
Point (geometry) Building Model theory Product (business) Performance appraisal Programmschleife Loop (music) Computer animation Integrated development environment Average Internet service provider Form (programming) Social class
INTEGRAL Multiplication sign Orientation (vector space) Model theory Rule of inference Medical imaging Message passing Mathematics Process (computing) Computer animation Order (biology) MiniDisc Cycle (graph theory)
Building Service (economics) Multiplication sign Iteration Game theory Student's t-test Cartesian coordinate system Local ring
Optical character recognition Medical imaging Data mining Standard deviation Computer animation Order (biology) Workstation <Musikinstrument> Abstract machine Task (computing)
Type theory Support vector machine Pattern recognition Computer animation Robot Multiplication sign Interactive television Speech synthesis
Boolean algebra Computer animation Bit rate Videoconferencing Logic gate YouTube Alpha (investment)
Collaborationism Group action Computer animation Autonomic computing Basis <Mathematik> Game theory Machine code
Predictability Addition Group action Beta function Service (economics) Decision theory Model theory Set (mathematics) Machine code Information privacy Product (business) Integrated development environment Order (biology)
Group action Sparse matrix Computer animation Agreeableness Range (statistics) Heat transfer Freeware Metric system Task (computing)
Home page Decision theory Range (statistics) Shared memory Set (mathematics) Price index Cartesian coordinate system Wave packet Message passing Mathematics Process (computing) Computer animation Hypermedia Term (mathematics) Telecommunication Order (biology) Statement (computer science) Abstract machine Cycle (graph theory) Social class
Category of being Algorithm Computer animation Real number Decision theory Model theory Order (biology) Design by contract Line (geometry) Twitter Wave packet
Word Computer animation Decision theory Model theory Order (biology) output Office suite Abstract machine
Area Process (computing) Machine learning Computer animation Real number Software developer Order (biology) Cycle (graph theory)
Frequency Group action Process (computing) Computer animation Regulator gene Position operator Computer programming Neuroinformatik
Category of being Type theory Pattern recognition Computer animation Regulator gene Real number Model theory Order (biology) Video game Abstract machine Theory of everything Position operator
Type theory Computer animation Regulator gene Robot Model theory Bit Abstract machine Wave packet
Computer animation Decision theory Open set Extension (kinesiology)
Group action Decision theory Mass Disk read-and-write head Theory Wave packet Mathematics Causality Computer configuration Hypermedia Videoconferencing Energy level Office suite Position operator Area Regulator gene Mapping Model theory Graph (mathematics) Physical law Line (geometry) Cartesian coordinate system Computer animation Integrated development environment Software Personal digital assistant Order (biology) Pattern language Abstract machine Freeware Metric system Resultant Spacetime
Game controller Service (economics) State of matter Length Decision theory Real number Mereology Distance Information privacy Perspective (visual) Theory Field (computer science) Computer programming Number Mathematics Machine learning Physical system Predictability Dependent and independent variables Email Validity (statistics) Software developer Weight Forcing (mathematics) Model theory Physical law Projective plane Shared memory Electronic mailing list Measurement Category of being Word Personal digital assistant Order (biology)
Computer animation
a higher law and it's a common welcome everybody to the key node
of the 2nd day of the frost on the + frost common it is really a pleasure for me to introduce you trust from she's a member of a patch of foundation the um cofounder of the parchment would they are projects serial into beta x so to see some uh my what is a is a platform for um environment for a quickly creating scalable machine learning at attention and this is also what the talk is about some teaching machine you tricks of a political route to even just work on these other things b and
B thank you for this very warm welcome so today we're going to talk a little bit about machine learning and its implications places even well and if you take a look at the Gartner Hype Cycle and the where AI and machine learning stuff noting that pretty much at the top so there's a lot of type 1 wall but it's also like a technologies that more and more pervasive we what we're going to look at is what it looks like to build in machine learning application so that we're going from magical carried as factional understanding of what it looks like on a very very high level they're going to see a
few of C successful applications as well as their historic background and very are coming from and I will conclude with a few implications as a warning and probably not use up the whole time slots so if you have any questions at home with you will probably have a lot of time for a year and a session at the so we had a little bit of a modern introduction of why giving us the stored about machine learning on I and it's this field as a researcher back in 2003 before I realized that writing will suffer more interesting than publishing
papers which is well went into industry industry I realize that was like 10 years ago that C libraries that we had back then the dealt with machine learning were on their own life and solid under something strange like give me a call if you wanna use it so no real life and actually was a word I'm active and visible for as long as they had research funding and after that
what's that was they couldn't scale to large amounts of data and that's how you get him is if you have people at the to suffer foundation namely grounding of Saul and others we found in and I'm all around a roof more scalable open-source license was the goal of building a community that's the waves of injured individual
contributors not having time or interest anymore and that's how we came to be apart from that come from Berlin if you know someone from Berlin they don't like traveling outside they will more likely in invite you invite so if you needing excuse to make the employer pay for you to and
train or flight ticket to a comfortable in on go to Poland buzzwords if you're interested in anything scalable like and big data analysis search machine learning by now also ritualization systems like the 1 at that early June trust neighborliness lovely and are if you're not into big data and you still want convince young player to travel to Berlin I'm currently trying to make a conference on everything for free and open-source backstage flight put x named for backstage of the arms gonna be the day 2 days after from investments next year 1 November twenties november 20 is this going kick-off workshop in this year all things open source governance licensing of community building patchy way except why do I get time to do says I'm currently working as opens which stretches that Europe's how many genes you may not know them
but I raise your hand if you build a house on needed financial might financing for some kind of construction of any of because then you probably have spoken in Germany to someone who gave you a mortgage offerings and if they were comparing different offerings they were probably using our system in order to figure out what is best for you C C on
processes that we use internally to serve developed software are pretty much converting towards independent teams to work making decisions are close to the team where they are needed I'll 2 words of making decisions without having new territories and escalation path and this is fairly aligned with how the at least the Apache Software Foundation works on the pressures of her Foundation also is a dual proceeds so people who want to do something like that so the people that takes a decision wanted do something and you actually submitted patch so this quite some alignment and that's why I'm Europe's supporting and making this Conference on Open source behind the scenes life
OK you know a little bit about me I'm famous for doing the absence and for taking microphone and handing it's true the audience I will save you from that to date when I wanna do is use a quick show of hands I how many of you know what this formula is all about yeah OK pretty much more than half
dead anyone heard about Annex I would love to see at
least 1 Amazonian end up in different and so that people in framework currently under incubation at the apertures after foundation which is heavily sponsored by Amazon what love with development time someone knows logo
tens of low a few more of from the server maybe what anyone seen these
logos before spot community
before I mention it OK good climate back and 1 taken in machine learning course before the
hopefully you won't be you won't feel word which is the basic terms that we thought so earlier that's pretty much the only on the creation that I have in my slides sorry in any case
in the room OK anyone else was scared of
AI in buoyantly to knows the
movie and they why did I make this
quick show of hands and reason there came across the study and 1 of my colleagues and me actually Tina what customers really think about AI so 70 to 2 per cent believed that they understand what I Artificial Intelligence is all about however when asked about whether they actually use any in artificial intelligence technology only 34 per cent answered with yes the if you then take a look at the devices that these people who have been asked were using an active applications and that's services that are using it would truly more be something like 84 so chances are high that at least a few years in some form of mobile phones yeah you did interact with some sort of our machine-learning based system before in your life what does a press
say according to the along mask artificial intelligence is our biggest existential stress which according to the MIT maybe it
was false over figures problems a solid
it's a spectrum to means looks like people thinking we are talking about magical you take it is over your project and suddenly everything
works like a charm whatever let's see what Wikipedia has the same size the the it's not
particularly helpful if you look at machine
learning were pre readers something like we won't have a program that makes the data-driven predictions or stations through building a model from sampling Machine learning is employed in a range of computing tasks which assigning and programming explicit algorithms with good performance is difficult or infeasible and more usable but take a look
at some xi 1 book by Tom Mitchell back from 1997 machine learning is the study of computer algorithms to improve
automatically through experience and it some phlegmatic actually it's a look more like mass this is how you machine learning algorithm the is by now a fairly old fashioned it's an SPN model along it's a very simple SVM model and that it only finds a linear linear separation so what do we have to do to our data as a set for instance in this case the classifier can do something with imagine that all language
consisted only of 2 words 1 of them were being high-performance computing the other were were being sunny weather the all
texts that contain all only study whether is that the green start over here but not high-performance computing probably weather forecasts maybe on what about to record over here talked about high-performance computing and sunny weather the probably some research publications on how to create reservoir costs may be I think last in the in the right see you want only high-performance computing probably something computer science related so what our algorithm will do as trying to see a lot of examples we tell them which categories example belongs to and then it will find just the lines that separates those losses that belong to the class from those that don't belong to it and create like a weight vectors that tells it like what's aligned should look like plus maybe a a transformation from the origin where to the line and then we got our separation all easy in reality of course we don't have only
2 words in a language other life would be boring so imagine doing so small into the but in a high at like in a more high-dimensional space like couple cells and couple million dimensions but a
wall of all at all the only thing that we're doing this just drawing lines in space
just reading hyperplane that's everything if you bring deeper learning into the makes you no longer drawing our hyperplanes what you're trying to some kind of like and non linear separation but essentially is the concept remains the same OK in order to come up
with this hyperplane what do we need to do we wanna learn from data so where do we get to state from right the doesn't live in a database theory have somewhere in the C A 3 format the as it may be in some kind of
proprietary formats of the 1st step to reading from the converted is it may be just a along fume sheets of paper on the on the desk what is it's not even recorded at all so you will see that in order to train a machine learning model you will spend a lot of time figuring out where you know in your company is the data that you need to build a small letters and it gets this to a format that makes sense so no shiny model training model tuning army at you just figuring out the data on this and talking to a lot of teams and
people great we've got the data and right not quite yet what we have now see huge power of documents could be images could be boys with the Audi like audio and could be they could be something like text but we don't have a mathematical like so what we need beforehand as like a 1st transformations that we need you to it come up with some kind of feature generation feature selection let me tell you a story of the teams that try to build a information extraction for identifying from information well and in job
advertisements what you need for that is like a location at least a job title Beckham the
bases and advertisements were published not only in HTML models and P. so same ran sees PDF documents through text converter and then created features in order to identify what was a job title not best what's the most important feature of what is this line of text located at the very bottom of the document why owners because the PDF to text generator would take all of this is the text that was market title and put it at the very bottom so the very bottom lines often will seek out job titles that people were looking for there was another team bowling a spam classifier the data scientists and weeks and weeks optimizing the model coming up with different feature transformations was different normalizations but he couldn't quite get it above the special they create and publish it and deployed so books is person ended up doing was going on to the
OPS people giving some spreadsheet where he had all of the features are sorted by how much influence they had on the decision sets in machine learning what model would take steps quite ops people would then go ahead and say hey there's a futuristic over here we note that the model looks like in production it took a couple we have a couple cycles maybe we maybe suddenly there was a huge
performance boost things could I got to production and everything was great so what this means is that for feature
generation you will need to spend a lot of time talking to your teams and talking 3 Austine's talking to the people who know the business in order to figure out what the best features are the when converting data that you will run into
all sorts of love issues how many of you have dealt with timestamp conversion the same here you will have to deal with typos in your data as i is there because that's just the human error because someone couldn't spell C whatever the feature is practically walk on purpose at least for me personally I tried to such a couple dozen times in the past was just fake identity information in order to retrieve like a PDF document that was published so there's going to be a lot of an noisy data out there or to speak
from and the the patient-provider Sears hotel owners who purposefully give you use the wrong you'd for their own child why would they do that while you'll probably were among the more likely to but that if it's closer to the ocean suddenly improved oftentimes the dataset you will need as a top isn't it captured you will have to go straight additional loops in order to deal with privacy protected protection but there's also the issue of small nobody actually thought that this piece of data would be interesting maybe that's not enough to that they had to trade off maybe the records that you would like to put into it and 1 pocket on think on on not linkable together so all of you will spend a lot of time just munging data just getting pipelines together all
sorts of boring of words that you will have to do them before you come to some kind of fancy machine if you wanna know more about to papers outside of 1 by loop and 1 by Amazon the 1 on the lower right was was like a couple weeks ago at VLDB OK now
we've got he found data before preprocessing which train our model where done right so if you're a software engineer you
know that you need to test and identify how how good the quality of the solution is so you need to help evaluate quality what's a quality metrics that you wanna use if you look at literature you will find dozens of metrics that you can use only for classification so you will have to deal with you will have to identify whether it's a
curious was of precision recall as such F-measure owners that something like a AUC or what have you on top of that think about the spam classification example that's the most that city best known 1 yeah wasn't putting someone at something in your inbox pseudonym honesty is probably not quite that that's annoying worse but your spam classifier putting something in your spam those that you would have wanted to read not quite a bit so you wanna put on a waiting on yeah errors depending on your business use case the the all fancy and shiny if you use the learning models if you use a support support vector machines like if you use these black boxes but as soon as you talk to your monitor will probably want to be able to explain why a certain decision was made if you've ever build a search engine you probably remember that your manager comes into your office and tells you hate biases search result not at the very top when search was query all
query clearly there's a few ranking factors etc. and involved the road optimized so often what happens is that in many cases something that is easy to explain some things that wins also you wanna makes of what happens uh visually available that just having a number that tells you yes or no is helpful to some extent but you probably also will initialize somebody items and thus and finally you need to deploy what you did to production sounds easy you just take take what you train and put it on your server except if you talk to your
average data scientists able use and stable want an environment which lets so experiment very easily and what's easy to experiment was is often something that cannot be deployed easily so you will have to figure out how to make this transition metal to makes things worse you will probably have the loops where you on quality evaluation forms which additional data and you should have some
explored to begin with last you will probably have an additional loop once you've put something to production user behavior will probably change so we you will have to
be retrained model just to tell you 1 story about the e-mail providers that was building and class and classifier if they build the model to date it was pretty usually effective floor roughly 3 months and after that it was useless the already 10 years ago what happened was that at some point the
disks were full filling up why was happening because spammers have figured out said it sees them classifier didn't look at images so they were hiding the spam message was an image so suddenly you have a different classification of that the so to summarize you always have this 80 20 rule that we already know from engineering
you will spend 20 per cent of your time non that goes into selecting and tuning a model rusty will spend 80 per cent of the time integration if feature preparation and deployment of the pipeline you probably wants us to be a
continuous and process where you observe you usage you measure how good you are you decide which feature to implement next and then you actually act and make changes in order to reduce of Orientals cycle the tighter you makes a cycle the
faster you makes iterations the faster you can be than your competitors what you wanna do is to automate as much as possible what you wanna do as well as to the able to fail fast and cheaply won't be able to learn from us ers that means that you wanna build API services that make it possible to set that people are being tricked
into supplying in 1 data probably will make something like annotations game 2nd research time what we did was was to pay students to annotate documents can do still do something today if you get to something like Amazon Mechanical Turk ago to allocate workers what happens there is you can train these people but still these annotations will be noisy so even there is a cost will up we can put out rewards for better data thing about being local scouts and Google Maps the OK let's look at the history of a few applications
of anyone know where the image might be coming from yeah so that's actually handwritten character recognition tasks standard data mining tasks then the machine learning Fast published back in 1995 fast forward like 10 years and scientists were trying to learn to complete sentences imagine nite in a help-desk scenario you've got e-mails coming in and it's usually that's often the same questions that are being asked so you can try to train a model of of that data in order to help the stations to that provide better
and faster on fast forward another 10 years and you are at and swearing at their mobile phones because auto-correct doesn't do what they want more often
than not not like titles are selected type actually does emerge plus you've got a
tiny little devices that you can talk to the people on the you questions hopefully correctly the so what we're looking at
20 years going from rock character vector recognition to speech recognition and interaction
yeah again like roughly mid-nineties what we have something like little robots trying to to play some soccer so make a random
guess when was the 1st time that cars fast see it desert autonomously anyone anyone have a
guess yes is 5 years ago will because of his rate as also of great now was 2005 and in there
was a car driving through Boolean if you come from Borland there's there's no street and which goes
from CC Casali which has a large roundabout to prime the gate when was it actual real traffic
capital of Germany guess the 2 years ago 20 alpha 2011 and to the 11 so we can still watch the video on YouTube and it's very impressive so going
from there we now got a home advertisement for action autonomous cars and actually reached public discourse so
we have 20 years going from in around what's to autonomous cars let loose on real world treats at least in the US how
did we get there what we needed was collaboration if you think back at that
that's the on soccer games at RoboCup what was the basis there was that they had multidisciplinarity insulates they were sharing the code and datasets after a competition like
after yearly competition was the was the winning team was supposed to share the code so that others could be kick-started and could work off that it could be over and there could be faster essentially what this means also if you think back to here and corporate environment as that loan data scientist coming up with a magical idea that the light thing about C bigger scientist trying to say to build a spam classifier was only listed after talking to opt people you will need to have a broad action understanding in order to understand your data you will need to have a production understanding in order to figure out if like a trap and beta is something meaningful or if it's just an outage it being an outage is very interesting to your business but probably not a 2 year prediction models you will
need to have an understanding of data privacy because as soon as he linked data sets together and as soon as you start to require more data you will need to understand if you allowed to do that and you will need to understand what kind of a warning you have to give to users you will need to make a decision and what is better for you if you wanna ship pre-made models or if you wanna build services and with the services gain more insight into additional data what also helped was
competition but in the old days when we had the UCI datasets they were fairly limited and what you learn on is probably don't transfer won't want to fight actions that it's it helps to have something like tackle they've got a wider and wider range of data and of tasks that you can deal with it's still not ideal but you want to have some tasks where can compete on free agreeable pretreatment metrics and the 1 I have teams work on the on the sparse together what also helps us to focus on your customers focused on the
problems that have business value what is the most valuable problems that it can solve right now and for that you need more than just a mathematical understanding and the theoretical understanding you need a this understanding and looking at the
implications data-intensive applications of machine learning based applications of being deployed in a wide range of settings 1 of them being home pages and side where ordinary people share their thoughts and become what before was only possible for 8 Press namely become someone who shares the message was the wider and with the wider public class share a message that not only were
more but which is written and which is permanent so we are going beyond traditional media we increasing this and publication process publications beat em cycling speed the we're increasing it in terms of reach the false statements suddenly make it very much faster to the end user we're dealing with allopatric amplification defects at as engineers what we typically see is that the such as ours but someone training them since someone applying them there's someone tuning them must've someone selecting the data that being fat indices that in order to make decisions we've seen changes in communications that have impact that have had impact on politics before oftentimes they were 1st exploited before they were regulated or before they raised for good at the where do we see influence it's all fun and
games and untill it has real world impact looking at the ecosystem today if you look at
the use of algorithms and if you look at these models they can impact real human beings line they can have an impact on the lectures like what your I was selected to be shown top what to be shown as a trend ought to be shown to many users can impact how these people think and act what about training algorithms in order to
predict if someone should receive an insurance contract or someone should receive a mortgage would Sears implicit racism or implicit bias from prior decisions courses will manifest
in the models themselves as will of garbage in garbage out whatever you see in the dataset to train as you will see in the model afterward then you've got suddenly it and a question whether the machine driven by us as any words than you modify was human bias words boy says something that we can do about the bias inherent in these models can we say about deal was our input data and get the bias of of it what about models deployed among judges or among police officers in order to try to get their decision
what about bias in the data over there at
the can be of enormous we also being tested on real world traffic situations the but I mentioned earlier when we had the cycle of development what I told you was to get to goes through the cycle very fast in order to improve fast but I told you was that it's good to have fail fast scenario it here in Germany
imagine it fails lost area in cars that goes along the way the on the other hand machine learning can lead to more automation and a different landscape and it comes to jobs and employment the fact that in fact in the past as well
so this is just 1 poems that famous atleast in Germany we that and usually 1st it becomes worse before regulation helps and make the situation better so what we have to think about is that we want machines that Hollywood entirely replace certain jobs or how will these machines change jobs and what loosely transition period look like yeah with implications will we have on society as an engineer and I'm an engineer myself
the we believe that we are to taking pure technical positions of all it's just a computer it's just a programs that we write but
these technical positions are starting to have real life consequences on fell even being inverse this
real life consequences on human beings is if you imagine the same advances being rolled out in military impacting health warfare is being made what impact will it have
the face recognition and toes military if other machine learning models are being trained in order to be used in the military what type of regulation
the do we want any regulation there and if so which type this 1 open letter there and signed by several AI
and robotics researchers as well as practitioners urging the world to regulate AI being deployed in the warfare that in you so how
many of you still believe that AI on machine learning really as magical pixie Faria hopefully help dispel magic
at the end of the day machining is just a tool of the little butterfly was movements a little bit of data harm that a user knowledge train your models it can be used for good and evil
as so many things at the end of the day what it's being used for that our decision and with that I'm open for questions
been and
but any questions comments thoughts yeah the so I you know the only thing that our decision I think to some extent it is our decision after we can go and vote and we can talk to people
who can vote yeah the the level of the result and all the now I know of
many of the things you you go to your politician you talk to them about what regulation should look like and they often have to enforce of the this is because of the fact that that people in the people of the nation and the country in the environment so 1 of the thank you it the other 1 the other option would be to get organized like CFUs defeated in order to further the cause of free software and free software licenses and become a lobbyist in that topic what and tropical together with things like you the of the shirt and yeah it was right and by and large of of that so techniques for getting rid of bias in your data the 1st of all you have to analyze the data that you have to figure out whereas the as coming from and then maybe start to stratify like the the when I was working in a maps provided what we did was to like very simple example so we had a lot of traffic say in germany so our quality metrics were always and dominated by the big German performance what we started was to look at individual metrics like how well are we within France what how well are we was in China he logics fun to China and said look at these metrics individually and to start to optimize for these you have to go in it that would be 1 option and is more likely to use and a lot of and Holy was sincere hot place she had in the what sorry I don't have an answer for that I usually start so that that's what the easier it is if you have access to the data we can start looking at the decisions they make and that's why to figure out if there's like a pattern since they're wary famous video right now right my role or in the social space social media space you someone was a black and trying to get so in a soap dispensers and it doesn't work as soon as they put something like over the head had only works that apparently some kind of bias in yeah it is the the end of the and the way it is applied the so the question was whether agencies employing machine learning on graphs this bias and pretty sure that those who trained the models are aware of the spine and you can't go through a machine learning training without being taught that there is bias in the other 2 have to deal with I would doubt that the police officer in the street still understand the model so deeply that he is aware other than people telling him and training what you're getting in in and all you want to know what the this all of the area of the of the this and this is and you we and it people tend to be afraid of change and people tend to be afraid of what they don't understand and I believe that so far is the way this works is in so well known amongst the general public I give I show model up there which is just the line going through space so space it's fairly simple actually but what we tend to show people it is more what it can do in the end user application and the sounds magical they don't see it like the butt of the tip of the iceberg problems they see what works but they don't see how much work they had that had to go into this model in order to make this 1 use case work and this dip of surprise perfect so magical that it almost looks fairly simple in order to build something similar they don't see the 20 years of research going into it beforehand that's my personal take so you know you and all the the have a public this discussion it more better education and yet yet the goal of the the year groups to this model of the I know I was very Paine was used on the route from from what I I can't predict the future economic predictive said future but given that what you've seen in this talk about how it works at a what you take when will it come theory of planned and what is the 1 that is just so almost all of them there the ability to that and in the will exponentially on the the the mass the we have to so I don't know what this is true of this it will this theory the so the issue is that as engineers who loves love talking about 2 radical issues right now was AID have out there if got cars driving on machine learning models that on where there is no law in place how to deal with them there are machine learning models deployed out there that are capable of influencing our political positions right now as we talk I
believe we should try also deal with these problems 1st before thinking about something theoretical of all of the the the the sir yet not in my personal opinion where there yet we've got
way more problems out there to DOS before we go into special reader the of the net and you know who is called the so it's this is a lovely problem and the light that think about it and we believe right now we've got other issues to talk about how we can use some of the of the of the of the course so you have and me they few were part of the all of the and if you have this is more of a of what it is we then you use the length of the cost and all of that so that you know all of the world is where we were also the of the of the of the of the of the of the he this those were to our model so what I would I would like to to do in this discussion is to focus on real use cases we already have there and to figure out what the but we want to do something there and if so what we wanna do there instead of arguing if there's going to be in the distance the distant future something happening look at what out there right now and see what we have to do if we do have to do something the and to the world of the the change in the program is on how how to use these words in this in this part of the theory of the mind or something like that to their where back to where you need to get involved in politics in order to talk about speculation because schools and the only 1 who has a has and probably you know there's a amongst making it accessible there's other companies that the same data and we have the same models theories state institutions that have the same data both the same model so not only about but it's a validate topics that you have to discuss at a wider circle In and in the field of the of the of the of and I think of the and so what I would like you to use and to and and we still if you think of the way in which the we the end of is the use of the and what so about data protection being forced on European companies such as this this an advantage when disadvantage let me up and that means turn the question around maybe it's even an advantage to the European companies being subject to this law because people can be certain that their data is being used for purpose that they didn't agree on from a company perspective who wants to roll out machine learning of course they also always want your data in order to build better models and in order to build better services the but from the customer's perspective it's like a it's like 2 sides of a coin you want others better services that are being based on your data so you will have to meet share somehow but on the other hand you won't have control over your data so maybe it's even an advantage to to be able to tell your customer the and subject to the law being forced right here your data safe is us not just because we say so but because the state and forces us to do that and I would like to you you will you you know the question was that the roadmap of Apache worked out a roadmap go to the mailing list it's so an all-volunteer project right now for the deep learning most people have the active and measurement are also active in Annex net so there is a the idea of going back and forth but other than that I would ask you to go to the definitions stairs so that others can benefit as well I probably won't make decisions we make predictions on behalf of other people there but I I don't see any further questions and if I meant to take the opportunity again I think this shows that this business states from that this development properties remembrance Day I takes the responsibilities of the engineers were doing this and the ethical implications that the word pairs a bit further and I think this is what happens with the through the last 50 years and he was just like number-crunching and the amendment number printed it was just like like like a long hot so it was just that something like a corporate data that were not very and not now I to develop more and more sophisticated systems and now they're it's going into the the public realm where it goes about personal preference decisions about personality so when the the societies get more involved from the outcome what happened so this is very important work raises the important questions and I would like to think that the ergodic thank you very much