Incremental and Approximate Inference for Faster Occlusionbased Deep CNN Explanations
Video in TIB AVPortal:
Incremental and Approximate Inference for Faster Occlusionbased Deep CNN Explanations
Formal Metadata
Title 
Incremental and Approximate Inference for Faster Occlusionbased Deep CNN Explanations

Title of Series  
Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2019

Language 
English

Content Metadata
Subject Area  
Abstract 
Deep Convolutional Neural Networks (CNNs) now match human accuracy in many image prediction tasks, resulting in a growing adoption in ecommerce, radiology, and other domains. Naturally, explaining CNN predictions is a key concern for many users. Since the internal workings of CNNs are unintuitive for most users, occlusionbased explanations (OBE) are popular for understanding which parts of an image matter most for a prediction. One occludes a region of the image using a patch and moves it around to produce a heat map of changes to the prediction probability. Alas, this approach is computationally expensive due to the large number of reinference requests produced, which wastes time and raises resource costs. We tackle this issue by casting the OBE task as a new instance of the classical incremental view maintenance problem. We create a novel and comprehensive algebraic framework for incremental CNN inference combining materialized views with multiquery optimization to reduce computational costs. We then present two novel approximate inference optimizations that exploit the semantics of CNNs and the OBE task to further reduce runtimes. We prototype our ideas in Python to create a tool we call Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5X (resp. 35X) to produce exact (resp. highquality approximate) results without raising resource requirements.

00:00
Data management
Context awareness
Process (computing)
CNN
Magnetooptical drive
Approximation
Order (biology)
Computer
Deconvolution
Database
Metric system
Inference
00:29
Medical imaging
Linear subspace
Artificial neural network
Logic
Computergenerated imagery
Artificial neural network
Analytic set
Cartesian coordinate system
Task (computing)
Convolution
Task (computing)
Product (business)
00:50
Convolution
Sine
Transformation (genetics)
Decision theory
Computergenerated imagery
Expandierender Graph
Medical imaging
CNN
Series (mathematics)
Contrast (vision)
output
Task (computing)
Form (programming)
Physical system
Area
Artificial neural network
Prediction
Transformation (genetics)
Cartesian coordinate system
Convolution
Type theory
Prediction
Series (mathematics)
Function (mathematics)
CNN
Order (biology)
output
Social class
Representation (politics)
01:55
Predictability
Area
Source code
Patch (Unix)
Computergenerated imagery
Graph coloring
Front and back ends
Demoscene
Area
Medical imaging
Process (computing)
CNN
Prediction
Blog
output
02:38
Point (geometry)
Source code
Operations research
Sensitivity analysis
Mapping
Patch (Unix)
Computergenerated imagery
Point (geometry)
Prediction
Front and back ends
FLOPS
Inference
Medical imaging
Inference
Process (computing)
Blog
CNN
Operator (mathematics)
Order (biology)
03:16
Inference
View (database)
Set (mathematics)
Database
FLOPS
Software maintenance
2 (number)
Machine vision
Workload
Inference
CNN
Different (Kate Ryan album)
Computer hardware
Query language
Multiplication
Position operator
Mathematical optimization
Task (computing)
Form (programming)
Operations research
Computer font
Point (geometry)
Database
Division (mathematics)
Instance (computer science)
Prediction
Cartesian coordinate system
Software maintenance
Front and back ends
Demoscene
Approximation
Inference
Numerical analysis
Query language
Personal digital assistant
Approximation
Order (biology)
Abfrageverarbeitung
Mathematical optimization
Task (computing)
Resultant
04:45
Convolution
Slide rule
Digital filter
Cliquewidth
Transformation (genetics)
Code
Computergenerated imagery
Function (mathematics)
Product (business)
Element (mathematics)
Medical imaging
CNN
Kernel (computing)
Computer multitasking
Series (mathematics)
output
Summierbarkeit
Position operator
Condition number
Weight
Transformation (genetics)
Convolution
Product (business)
Kernel (computing)
Process (computing)
Hadamard matrix
Prediction
Function (mathematics)
Series (mathematics)
output
Social class
Computer multitasking
Representation (politics)
05:52
Inference
Convolution
Digital filter
Transformation (genetics)
Virtual machine
Local Group
Data model
Inference
CNN
Kernel (computing)
Computer hardware
Series (mathematics)
Endliche Modelltheorie
Summierbarkeit
Linear map
Area
Theory of relativity
Cliquewidth
Linear algebra
Sequence
Convolution
Inference
Data model
Algebra
Series (mathematics)
Computer hardware
Order (biology)
output
Absolute value
06:47
Pixel
Context awareness
Theory of relativity
Transformation (genetics)
Computergenerated imagery
Database
Instance (computer science)
Instance (computer science)
Software maintenance
Mereology
Semantics (computer science)
Convolution
Software maintenance
Medical imaging
Geometry
Mathematics
Data management
Propagator
CNN
Sheaf (mathematics)
Electronic meeting system
Hausdorff dimension
Task (computing)
Physical system
07:23
Transformation (genetics)
Set (mathematics)
Function (mathematics)
Sequence
Front and back ends
Convolution
Inference
Sequence
Inference
Medical imaging
Word
Query language
Query language
output
Position operator
08:03
Reading (process)
Context awareness
Patch (Unix)
Correspondence (mathematics)
View (database)
Computergenerated imagery
Sheaf (mathematics)
Function (mathematics)
Parallel port
Sequence
Medical imaging
Inference
Mathematics
Readonly memory
Kernel (computing)
Operator (mathematics)
Query language
Mathematical optimization
Form (programming)
Graphics processing unit
Context awareness
Stapeldatei
Algebraic number
Instance (computer science)
Front and back ends
Inference
Kernel (computing)
Process (computing)
Query language
Computer hardware
Order (biology)
Software framework
Reading (process)
09:45
Inference
Patch (Unix)
Transformation (genetics)
Mobile Web
Theory
Sound effect
Semantics (computer science)
Convolution
Computer architecture
Sound effect
10:20
Inference
Run time (program lifecycle phase)
Projective plane
Correspondence (mathematics)
Multiplication sign
Visual system
Similarity (geometry)
Price index
Drop (liquid)
Mereology
Sound effect
Medical imaging
Mathematics
CNN
Query language
Data structure
Data compression
Identity management
Identical particles
Mapping
Run time (program lifecycle phase)
Adaptive behavior
Sound effect
Field (computer science)
Total S.A.
Instance (computer science)
Thresholding (image processing)
System call
Front and back ends
Approximation
Inference
Compiler
Similarity (geometry)
Subject indexing
Numerical analysis
Query language
Approximation
Order (biology)
Metric system
12:14
Convolution
Digital filter
Projective plane
State of matter
Execution unit
Source code
Function (mathematics)
Mereology
Protein
Thresholding (image processing)
Dimensional analysis
Neuroinformatik
Sound effect
Inference
Mathematics
Propagator
Different (Kate Ryan album)
Kernel (computing)
Distribution (mathematics)
Projective plane
Fitness function
Sound effect
Field (computer science)
Total S.A.
Prediction
Thresholding (image processing)
System call
Convolution
Number
Kernel (computing)
Numerical analysis
Chain
output
14:02
Point (geometry)
Run time (program lifecycle phase)
Mapping
Run time (program lifecycle phase)
Computergenerated imagery
Characteristic polynomial
Visual system
Sampling (statistics)
Limit (category theory)
Thresholding (image processing)
Demoscene
Approximation
Category of being
Medical imaging
Workload
Mathematics
Sample (statistics)
CNN
Different (Kate Ryan album)
Configuration space
Configuration space
Physical system
Physical system
14:53
Point (geometry)
Inference
Computer font
Projective plane
Patch (Unix)
Adaptive behavior
Computergenerated imagery
Sheaf (mathematics)
Set (mathematics)
Field (computer science)
Thresholding (image processing)
Approximation
Inference
Medical imaging
Workload
CNN
Prediction
Approximation
Task (computing)
Resultant
Computer architecture
Task (computing)
15:23
Intel
Matching (graph theory)
Sheaf (mathematics)
Planning
Disk readandwrite head
Approximation
2 (number)
Inference
Type theory
Medical imaging
Befehlsprozessor
CNN
Approximation
Revision control
Resultant
Position operator
Physical system
Computer architecture
16:29
Inference
Predictability
Amenable group
Sine
Interactive television
Interactive television
Set (mathematics)
Expandierender Graph
Database
Login
Front and back ends
Approximation
Twitter
CNN
Prediction
Operator (mathematics)
Approximation
Inference
Mathematical optimization
Task (computing)
00:03
would often all of mine immiscible knock on the a so this is a joint work with my advisor
00:08
process command and prophesy harness Constantino from UC of California's and
00:13
you so this work is about hallway to classical database inspired techniques and applied in the context of deconvolution Newell metrics in order access accelerate a particular book or namely explanation booklets so it is
00:32
not a news that deep convolutional networks are widely used today for image this analytics task for
00:39
example there being used in largescale so as products enormous rankles and even in critical applications such as in health care especially in radio logic for those
00:51
who are not familiar with CNN's so what is this sin a CNN is a special type of a neural network specialized for image data and this can perform classification task for example if you have a chest Xray image of the CNN contrast form into a series of layers and finally pretty whether the person has new morning our and inside the CNN we get a series of convolution layer transformations and each of these convolution let's take a threedimensional area as the input and outputs the threeday regional
01:31
however in many of the applications which are primarily powered by deep CNN's expandability of the prediction is also very important for example if you are a doctor and if they're using a deep CNN this system to diagnosed pneumonia but just predicting the person has pneumonia is not enough you want to know why the system believes so in order to rely on that decision
01:56
so how do we explain CNN predictions CNN explainability is still an active area of research however in the practical literature we found that population base explanations all all be for short is a widely used approach by practitioners all
02:14
the is a very simple and intuitive process let's say you haven't just extreme image and you have a scene in which predicts the person has the morning you are interested in finding why the scene in things so how you go about solving this is like you get a patch probably of gray or black color and place it on the input image and see what happens to the predicted probabilities and by systematically in this
02:42
pollution patch across the image you can you can create a sensitivity heat map and from the sensitivity heatmap basically you can localize the region of interest and thereby get an intrusion on the prediction process however there
02:58
is a problem with the which is it is highly timeconsuming CNN inference it's itself a timeconsuming process for example popular CNN's like inception 3 requires 35 K gov floating point operations in order to perform inference on a single image and due to
03:17
the nature of the all workload it ends up creating large number of different instances of the same made with each corresponding to a different occlusion position NASA result this entire workload can take up to several seconds also plans some cases up to serve a couple of minutes depending on what hardware you are working on so this is what is is really problematic in interactive setting so you want to quickly it's see the prediction for a particular result especially in applications like more by that so it in this work because all be as a query optimization task and perform database inspired optimizations to accelerate the vocal so the rest of my talk
04:05
is organized as follows 1st I will give a brief background on the scene an internal so in order for you to understand and appreciate optimizations next I'll explain how incremental seen inference techniques followed by all approximates inference to finally I will give a brief summary of of experimental results although incremental CNN inference techniques I inspired by the might occur optimization and incremental view maintenance form in the dead of this literature and approximate inference techniques I inspired by the approximate query processing techniques division science
04:46
so in the previous slide off I mentioned that a CNN is essentially performing a series of convolution layer transformations now let's briefly look into what goes inside each of these conditions dates inside collusion later we
05:02
get a bank of 3 filter kernels which in threedimensional and the height and the weight of each of these the kernel is smaller than the height and width of thing however that that will be the same you take 1 of these filter kernels superimpose it on the input image and take the element wise product also called the highdemand production and performer reduced some to produce this scalar output then you can slide this of filter kernel across the input image is you'll end up creating a scalar value for every individual position and finally to create the code slice of the output by repeating this process for all the filter kernels that you have and finally stacking them together you can create the final to the opera in fact
05:54
the convolution transformation can be reimagined using our relational data model for example here how represented the input area and the filter kernels using tool relations and I can write that and this was a skill could to basically perform the same transformations I don't know I don't expect you to read this or cookery but that take away from this is that a CNN is essentially performing a sequence of these steps is as which are essentially a series of joints and aggregates however in the machine learning community we usually don't use the relational data model but rather use the linear algebra based a model in order to better utilize hardware to
06:43
next I will explain how incremental seen an inference to incremental inference technique
06:50
is motivated by the observation that in all the we are performing a lot of freedom in competitions for example if you take the orginally image and an instance of an occluded image was most of the pixels are not and also using the semantics of the
07:07
convolution layer transformations began exactly dataminded change propagation part across the layers this kind of problems have been previously studied in the context of relational data management systems which called incrementally maintenance problems all solution for this
07:26
problem is what we call incremental CNN inference essentially we cast whole B has a set of sequence of queries given an orginal image every
07:38
is convolutional layer transformation can be treated as a query which takes the input and produces an output and since all be has multiple images corresponding to each uploading position it is it is essentially a set of sequence of words so to stop the given the orginal image we 1st materialized all the outputs of convolution layers and treat them as Ms. materialized it's next we have
08:05
developed an adequate framework for incremental propagating only the changes across the slaves this followed soon and in competitions however in all being there are multiple such uploaded images and sequentially executing them will travel the performance especially in high throughput and selected few
08:29
tool come this we propose the Batch incremental inference optimization so the idea is instead of making incremental changes in place we share and reuse the materialized view across all uploaded images let's say I want to perform incremental inference on this particular instance of the occluded image in order to perform the incremental inference I how to read a slightly larger reading context due to do all interoffice in infant the rather than making the changes in place I take the operation patch and cream do reading can context when different boffo and performed incremental for to make the output corresponding to the next layer and similarly I can repeat this process by reading the corresponding breeding context from the corresponding materialized view and do this for all the latest that I have no we cannot invoke multiple IV and curious all corresponding to each of occluded image in 1 go which is a form of what a query optimization and we have created a custom due kernel to invoke this Parliament copies which improves the harder it did to hide it less section so
09:46
what speedups can we expect its incremental seeing an inference using the
09:52
semantics of the convolution layer transformations we quantify the titular quantified how what speedups can we expect for popular CNN architectures even though for some architectures we could expect speedups all the via Posix accelerate text he so that most architectures can only give up up you up to 2 to and this is due to a problem which we call the avalanche effect which reduces the attainable speed and
10:21
this requires us to look into other approaches in actually reading this book and this is the approximate CNN inference techniques comes in part so the basic idea in our approximate inference techniques is that the trade off the we show quality of the generated heat map in order to reduce the runtime but how do we quantify the generated heat map quality for this we used already established metrics innovation science namely the structural similarity index so given the exact heat map and an instance of the approximate heat map using we can generate value which is a number between 1 and minus 1 and number and value and the same value of 1 basically corresponds to 2 identical images in practice the same values close to point 9 9 a widely used in practical applications such as an image compression so going back to Africa
11:21
like approximate techniques we have basically tuitions of appartment of approximate techniques the first one is what the culprit if you feel thresholding which essentially combats the effect of the avalanche effect by pruning the growth of the compiler changes which made which basically makes every query nobody runs faster the next technique is what the call adapted to endow which basically produces less number of queries for long basically creates low granularity queries for less sensitive regions which ultimately reduces the total number of queries that he had run for me for the interest of time I'll be focusing on predictive filter surely ever come you to read our paper or drop by during the poster session to learn more about adopt drilldown techniques so before explaining the
12:17
predictive thresholding technique let me explain what is avalanche effect explain that I have an example of a 1D convolution with the takeovers from these also applicable for the 2 leasing let's say I have full different layers and I health of the kernel of size 3 and I go and make an input change in the 1st after the 1st convolution it will update 3 more units in the next layer and find more nodes in the subsequently and so on and so forth the shaded region which essentially captures the chains propagation part is also called the project to feed as you can see as you go or crude little layers the ends from the incremental inference this start dimension to all come this work the saddest
13:05
easter thresholding the growth of the predictive fit do explain why this would work let's take the opportunity in this final layer and calculate the number of different parts that goes from the source change to the destination 1 such states and the part would it and if the can't the total number of parts for although of protein and we can see this closely resembles that of buIlshit go where most of the parts are centered around the center now follow the actual changing the output values also dependent on the physical values but it can be theoretically shown under certain assumptions this converges to a Lugosi in co in distribution so what we scientists used to define a threshold call Tao predict too few threshold and once the project to feel reaches that level the simplest of the propagating propagation of changes and thereby reduce computation but how do you think tone
14:05
it's again a tradeoff between runtime and officials heatmap quality and also it also depends on the characteristics of the images and the scene and properties
14:16
for example here I have shown that generated approximate heat maps corresponding to different threshold values of tau equals 1 comma decimal 0 and so on the TOEIC along points you is essentially the exact heat map not making any threshold you can see as the toll will decreases I can see of Belene all changes to significant changes in the generated approximately in our system the AutoTune tau using a target seen value is and that sample limits that from the workload once a front during the system configuration face now I will briefly
14:56
explain some of of experimental results for
14:59
the experiment here I have chosen chest Xray image datasets and the task is to predict whether the person has the morning or not and we have chosen the populous Henan architectures namely regis 16 present 18 in section 3 and for approximate inference techniques we use a target assume value of point 9 more data sets and results can be found in our paper the experiments were run on a single
15:26
node matching which has a tight immediate titan expedient XPG you and the dual of a system using piped OvaScience 0 comma decimal 4 when did the results on the g
15:41
and the found that the night way of performing this booklet which is performing full CNN inference for all the more polluting make of images corresponding to different uploading positions can take all the way from tool seconds to 12 seconds depending on the type of the scene and that you have but incremental inference this can Our enable up to some around for its speed up for some CNN architectures but for architectures in section 3 we saw that into mental inference is actually slower than on night inference due to deal all heads of the our incremental inference on plane the benefits but if we combine the approximate inference on top of incremental inference we saw that we can obtain get significant speedup of somewhere Nynex on some architectures on the CPU we
16:31
so similar trends but the speedups were greater than on the GQ in and because on the CPU and in the ng bottleneck by the computations and in the paper we show also to other data sets we have we have achieved close to 35 speedups on on some approximate inference techniques signs in
16:55
summary we have shown that explains seen in predictions is an important task and operation this expandability is widely used in this work we used the database inspired incremental an approximate inference techniques black select all been all optimizations make all be the more amenable to interactive diagnosis of sin and predictions thank you