Visualization in Machine Learning
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 3 | |
Author | 0000-0001-5320-5729 (ORCID) | |
License | CC Attribution - NonCommercial 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56614 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Computer animation
00:38
Computer animation
01:14
Computer animation
01:47
Computer animation
02:23
Computer animation
03:59
Computer animation
04:56
Diagram
07:16
Computer animation
08:06
Computer animationDiagram
12:20
Computer animation
12:33
Computer animation
13:14
Computer animation
Transcript: English(auto-generated)
00:00
and thank you very much for inviting me. I'm going to quickly move on to the topic I would like to talk about today. In recent years, visualization has been used in machine learning workflows quite commonly, and I would like to talk about that from a theoretical point of view
00:21
why we need to do that. Hopefully, I will provide a few key reference points for people to find out how the major publications are on this particular topic. I'll start with a simple example. About 10 years ago, we worked on detecting facial expression
00:48
using machine learning models. This is a particular challenge, although humans can do it extremely well. We worked with video data and went through a quite complicated process to extract the
01:05
information from videos, then convert it into time series, then conduct machine learning in the time series space. We had two different approaches. We had a fully automated approach,
01:20
which you see on the left-hand side, taking a decision tree and a feature-based decision tree approach, and we can also do that with random forest as well. We also took another approach and using visualization as a part of the process to
01:42
adding the construction of the decision tree. What we noticed is that humans constructed decision trees using visualization techniques, which you probably noticed. There are parallel coordinates and plots there, and it seems to be better than an automated approach using C45
02:06
and CAST and so on. Later, we conducted a similar experiment with a different machine learning problem. We also came out with the same results. We wondered why.
02:22
We got to the machine learning developers in the process to compare the two different approaches. We asked the question, what did the human developers do, which the machine may not be able to do? They summarized these six particular points. They said the humans are able to
02:48
have an overview and observe all the possible features being used in the machine learning process, so they have some sort of overall picture in selecting the axes and so on.
03:03
They can check the multiple axes to see whether they are consistent with each other or contradictory to each other. They will be able to look ahead when they're selecting a one particular variable to choose as a node for decision tree, and they can look ahead
03:21
and see what they may do next time. They can determine outliers among the feature points. When the decision tree is constructed, there's also a cut position, which the algorithm usually has difficulty doing well. They also use their theoretical knowledge to say that particular
03:44
feature in theory is not good, although in practice it seems to be okay. So we're using the human knowledge based on their own research to apply to the selection of the axes.
04:01
For example, in the outline detection, you can see here that people may have a smile and look very similar to a surprise, or the sad and angry may look similar. So those are outliers that humans are able to judge quite quickly. We're using information theory to
04:24
quantify how much human knowledge was available to the machine learning process, and we work out in this particular case there were 68 bits available in terms of entropy. Throughout the entire process, we noticed that when the data is relatively skewed or the data set is small,
04:45
the total amount of human knowledge available in terms of entropy is more than what is available in the data. In theory, there's a question about machine learning involving more human
05:02
knowledge in the process. At the bottom here, we show a typical data mining process, and at the top is a machine learning process. Machine learning actually generates a model to be included in the data mining process. In that process, if you notice the yellow parts
05:24
are preparing for machine learning, it usually takes a lot of time. Although typically machine learning may take hours or days, the entire machine learning process takes months, six months for producing a relatively optimum model. So there's a huge amount of human knowledge
05:42
entered into this particular process. On top of that, the process before and after the machine learning process, we tend to sometimes not take that into account. This is partially due to the search space for the machine learning model is gigantic. If you consider all the
06:06
any mathematical functions can be realized on the computer, it's called a Turing machine. And with machine learning, an ordinary program sparsely locates a program in that Turing machine
06:24
space. But what machine learning does is, when it comes to machine learning, we only identify a tiny small space in the template, and we search that space to find the best program.
06:41
Most of the machine learning frameworks we know now are not Turing machines. So in other words, there are always programs you can write in an ordinary program language such as Java, but cannot be written in machine learning if we treat them as a language.
07:02
So the critical point is for human in the process to find this sweet spot in that entire space, to allow the search space, the white space, to find the machine learning, to find a optimal model. And there's a huge amount of knowledge. Typically, we always say the labels are the
07:22
human knowledge and in semi-supervised learning and the human also involving the process, learning. And the reinforcement learning requires humans to come out with some clever fitting functions. Self-supervised learning requires humans to define how the fitting functions,
07:41
different fitting functions, interact with each other. Unsupervised learning quite often requires humans to interpret the results of the model. If anything involving features, features are actually programs or mathematical equations are human inputs, and in terms of selecting different techniques also are human knowledge. So there's a huge
08:05
amount of human knowledge in the process. Sometimes people give the credit to the data instead of giving the model, but in fact it's not always the data and quite often we don't have enough data. Many years back American Statistical Association released a piece of data
08:26
at that time considered to be a big data set. That's all the commercial flights in the USA for 20 years between 1987 and 2008. There's 120 million records and 29 variables and they
08:50
if we have 29 variables, if every variable we only consider as a binary variable, so for example instead of thinking about 12 months is we think of first six months and
09:02
the second six months and so on. The total number of combination with 29 binary variables is 536 million combinations and 120 million records is nothing compared to that.
09:21
If we choose all the variables are considered to be four possible values then the number expanded very very quickly. Quite often we don't have enough data and so that using human knowledge is important. Go back to the information theory and the
09:40
statistics algorithm including machine learning algorithms and training machine learning models, data visualization and human communication, they all share the same characteristics as they all lose the information. Traditionally we consider theoretically losing information or philosophically losing information is a bad thing. Is it always bad if statistics algorithms,
10:06
visualization and the human computer interaction all lose the information? So mathematically we can actually separate the good part of losing information and the bad part of losing information. We can take the cost into account and we start to realize
10:25
there's a cost benefit for all those processes and they act very very differently. And in terms of visualization, the visualization loses information very different from statistics
10:40
for example. And so the top part can be measured using information entropy measurement bits and the bottom part ideally is energy but can be approximated using time cost or financial costs. And once we consider that in an ideal situation and we can have the human knowledge
11:04
into account, the human knowledge can reduce the potential bad part of losing information because human knowledge itself contains information. And human knowledge can also take the task into account to consider in some cases losing information is not necessarily always bad.
11:22
So in general statistics itself actually losing information too quickly and usually visualization complemented by losing information slightly slower but maybe offer a lower resolution. Algorithm cannot take all the human knowledge into account
11:43
and interaction actually is the way for human to adding their knowledge into the process. The visualization can't deal with too much data and so require data analysis and interaction to help select what data to visualize. The interaction is very expensive so we need analysis
12:04
and visualization to reduce the cost of interaction. So we conducted an ontology which is a knowledge representation and to work out in many ways and the visualization can help machine learning. We kind of use the ontology to just define the work process, the workflows
12:28
of the machine learning identify all the processes. We're using the previous research to identify where the difference and the places in this map and the machine learning happened to
12:41
stop and ask a human for look at the data using visualization which locally called the bus stop. And we identify many many different bus stops in the machine learning process which existing technology already started using it including many led by industry.
13:04
So in general and we think there are many many places in the machine learning workflow has not been explored yet. There's a lot of work to be done in the future. So in summary, we would like to have a more human intelligence in the machine learning process.
13:23
Ideally they are open, transparent, efficient and effective using visualization. And hopefully in the future we'll have a more integrated approach to bring the traditional software with machine learning models together including statistics, human
13:43
design algorithm, machine learning models, visualization and the human computer interaction. That's my talk. Thank you.
Recommendations
Series of 2 media