We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Visualization in Machine Learning

00:00

Formal Metadata

Title
Visualization in Machine Learning
Title of Series
Number of Parts
3
Author
License
CC Attribution - NonCommercial 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Diagram
Computer animation
Computer animationDiagram
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
and thank you very much for inviting me. I'm going to quickly move on to the topic I would like to talk about today. In recent years, visualization has been used in machine learning workflows quite commonly, and I would like to talk about that from a theoretical point of view
why we need to do that. Hopefully, I will provide a few key reference points for people to find out how the major publications are on this particular topic. I'll start with a simple example. About 10 years ago, we worked on detecting facial expression
using machine learning models. This is a particular challenge, although humans can do it extremely well. We worked with video data and went through a quite complicated process to extract the
information from videos, then convert it into time series, then conduct machine learning in the time series space. We had two different approaches. We had a fully automated approach,
which you see on the left-hand side, taking a decision tree and a feature-based decision tree approach, and we can also do that with random forest as well. We also took another approach and using visualization as a part of the process to
adding the construction of the decision tree. What we noticed is that humans constructed decision trees using visualization techniques, which you probably noticed. There are parallel coordinates and plots there, and it seems to be better than an automated approach using C45
and CAST and so on. Later, we conducted a similar experiment with a different machine learning problem. We also came out with the same results. We wondered why.
We got to the machine learning developers in the process to compare the two different approaches. We asked the question, what did the human developers do, which the machine may not be able to do? They summarized these six particular points. They said the humans are able to
have an overview and observe all the possible features being used in the machine learning process, so they have some sort of overall picture in selecting the axes and so on.
They can check the multiple axes to see whether they are consistent with each other or contradictory to each other. They will be able to look ahead when they're selecting a one particular variable to choose as a node for decision tree, and they can look ahead
and see what they may do next time. They can determine outliers among the feature points. When the decision tree is constructed, there's also a cut position, which the algorithm usually has difficulty doing well. They also use their theoretical knowledge to say that particular
feature in theory is not good, although in practice it seems to be okay. So we're using the human knowledge based on their own research to apply to the selection of the axes.
For example, in the outline detection, you can see here that people may have a smile and look very similar to a surprise, or the sad and angry may look similar. So those are outliers that humans are able to judge quite quickly. We're using information theory to
quantify how much human knowledge was available to the machine learning process, and we work out in this particular case there were 68 bits available in terms of entropy. Throughout the entire process, we noticed that when the data is relatively skewed or the data set is small,
the total amount of human knowledge available in terms of entropy is more than what is available in the data. In theory, there's a question about machine learning involving more human
knowledge in the process. At the bottom here, we show a typical data mining process, and at the top is a machine learning process. Machine learning actually generates a model to be included in the data mining process. In that process, if you notice the yellow parts
are preparing for machine learning, it usually takes a lot of time. Although typically machine learning may take hours or days, the entire machine learning process takes months, six months for producing a relatively optimum model. So there's a huge amount of human knowledge
entered into this particular process. On top of that, the process before and after the machine learning process, we tend to sometimes not take that into account. This is partially due to the search space for the machine learning model is gigantic. If you consider all the
any mathematical functions can be realized on the computer, it's called a Turing machine. And with machine learning, an ordinary program sparsely locates a program in that Turing machine
space. But what machine learning does is, when it comes to machine learning, we only identify a tiny small space in the template, and we search that space to find the best program.
Most of the machine learning frameworks we know now are not Turing machines. So in other words, there are always programs you can write in an ordinary program language such as Java, but cannot be written in machine learning if we treat them as a language.
So the critical point is for human in the process to find this sweet spot in that entire space, to allow the search space, the white space, to find the machine learning, to find a optimal model. And there's a huge amount of knowledge. Typically, we always say the labels are the
human knowledge and in semi-supervised learning and the human also involving the process, learning. And the reinforcement learning requires humans to come out with some clever fitting functions. Self-supervised learning requires humans to define how the fitting functions,
different fitting functions, interact with each other. Unsupervised learning quite often requires humans to interpret the results of the model. If anything involving features, features are actually programs or mathematical equations are human inputs, and in terms of selecting different techniques also are human knowledge. So there's a huge
amount of human knowledge in the process. Sometimes people give the credit to the data instead of giving the model, but in fact it's not always the data and quite often we don't have enough data. Many years back American Statistical Association released a piece of data
at that time considered to be a big data set. That's all the commercial flights in the USA for 20 years between 1987 and 2008. There's 120 million records and 29 variables and they
if we have 29 variables, if every variable we only consider as a binary variable, so for example instead of thinking about 12 months is we think of first six months and
the second six months and so on. The total number of combination with 29 binary variables is 536 million combinations and 120 million records is nothing compared to that.
If we choose all the variables are considered to be four possible values then the number expanded very very quickly. Quite often we don't have enough data and so that using human knowledge is important. Go back to the information theory and the
statistics algorithm including machine learning algorithms and training machine learning models, data visualization and human communication, they all share the same characteristics as they all lose the information. Traditionally we consider theoretically losing information or philosophically losing information is a bad thing. Is it always bad if statistics algorithms,
visualization and the human computer interaction all lose the information? So mathematically we can actually separate the good part of losing information and the bad part of losing information. We can take the cost into account and we start to realize
there's a cost benefit for all those processes and they act very very differently. And in terms of visualization, the visualization loses information very different from statistics
for example. And so the top part can be measured using information entropy measurement bits and the bottom part ideally is energy but can be approximated using time cost or financial costs. And once we consider that in an ideal situation and we can have the human knowledge
into account, the human knowledge can reduce the potential bad part of losing information because human knowledge itself contains information. And human knowledge can also take the task into account to consider in some cases losing information is not necessarily always bad.
So in general statistics itself actually losing information too quickly and usually visualization complemented by losing information slightly slower but maybe offer a lower resolution. Algorithm cannot take all the human knowledge into account
and interaction actually is the way for human to adding their knowledge into the process. The visualization can't deal with too much data and so require data analysis and interaction to help select what data to visualize. The interaction is very expensive so we need analysis
and visualization to reduce the cost of interaction. So we conducted an ontology which is a knowledge representation and to work out in many ways and the visualization can help machine learning. We kind of use the ontology to just define the work process, the workflows
of the machine learning identify all the processes. We're using the previous research to identify where the difference and the places in this map and the machine learning happened to
stop and ask a human for look at the data using visualization which locally called the bus stop. And we identify many many different bus stops in the machine learning process which existing technology already started using it including many led by industry.
So in general and we think there are many many places in the machine learning workflow has not been explored yet. There's a lot of work to be done in the future. So in summary, we would like to have a more human intelligence in the machine learning process.
Ideally they are open, transparent, efficient and effective using visualization. And hopefully in the future we'll have a more integrated approach to bring the traditional software with machine learning models together including statistics, human
design algorithm, machine learning models, visualization and the human computer interaction. That's my talk. Thank you.