Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations

Video thumbnail (Frame 0) Video thumbnail (Frame 720) Video thumbnail (Frame 1248) Video thumbnail (Frame 2247) Video thumbnail (Frame 2872) Video thumbnail (Frame 3305) Video thumbnail (Frame 3953) Video thumbnail (Frame 4425) Video thumbnail (Frame 4901) Video thumbnail (Frame 6042) Video thumbnail (Frame 7113) Video thumbnail (Frame 7478) Video thumbnail (Frame 8801) Video thumbnail (Frame 10049) Video thumbnail (Frame 10615) Video thumbnail (Frame 11074) Video thumbnail (Frame 12076) Video thumbnail (Frame 12685) Video thumbnail (Frame 14626) Video thumbnail (Frame 15504) Video thumbnail (Frame 16969) Video thumbnail (Frame 18343) Video thumbnail (Frame 19560) Video thumbnail (Frame 21042) Video thumbnail (Frame 21392) Video thumbnail (Frame 22332) Video thumbnail (Frame 23074) Video thumbnail (Frame 24733) Video thumbnail (Frame 25329) Video thumbnail (Frame 25965)
Video in TIB AV-Portal: Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations

Formal Metadata

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Deep Convolutional Neural Networks (CNNs) now match human accuracy in many image prediction tasks, resulting in a growing adoption in e-commerce, radiology, and other domains. Naturally, explaining CNN predictions is a key concern for many users. Since the internal workings of CNNs are unintuitive for most users, occlusion-based explanations (OBE) are popular for understanding which parts of an image matter most for a prediction. One occludes a region of the image using a patch and moves it around to produce a heat map of changes to the prediction probability. Alas, this approach is computationally expensive due to the large number of re-inference requests produced, which wastes time and raises resource costs. We tackle this issue by casting the OBE task as a new instance of the classical incremental view maintenance problem. We create a novel and comprehensive algebraic framework for incremental CNN inference combining materialized views with multi-query optimization to reduce computational costs. We then present two novel approximate inference optimizations that exploit the semantics of CNNs and the OBE task to further reduce runtimes. We prototype our ideas in Python to create a tool we call Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5X (resp. 35X) to produce exact (resp. high-quality approximate) results without raising resource requirements.
Data management Context awareness Process (computing) CNN Magneto-optical drive Approximation Order (biology) Computer Deconvolution Database Metric system Inference
Medical imaging Linear subspace Artificial neural network Logic Computer-generated imagery Artificial neural network Analytic set Cartesian coordinate system Task (computing) Convolution Task (computing) Product (business)
Convolution Sine Transformation (genetics) Decision theory Computer-generated imagery Expandierender Graph Medical imaging CNN Series (mathematics) Contrast (vision) output Task (computing) Form (programming) Physical system Area Artificial neural network Prediction Transformation (genetics) Cartesian coordinate system Convolution Type theory Prediction Series (mathematics) Function (mathematics) CNN Order (biology) output Social class Representation (politics)
Predictability Area Source code Patch (Unix) Computer-generated imagery Graph coloring Front and back ends Demoscene Area Medical imaging Process (computing) CNN Prediction Blog output
Point (geometry) Source code Operations research Sensitivity analysis Mapping Patch (Unix) Computer-generated imagery Point (geometry) Prediction Front and back ends FLOPS Inference Medical imaging Inference Process (computing) Blog CNN Operator (mathematics) Order (biology)
Inference View (database) Set (mathematics) Database FLOPS Software maintenance 2 (number) Machine vision Workload Inference CNN Different (Kate Ryan album) Computer hardware Query language Multiplication Position operator Mathematical optimization Task (computing) Form (programming) Operations research Computer font Point (geometry) Database Division (mathematics) Instance (computer science) Prediction Cartesian coordinate system Software maintenance Front and back ends Demoscene Approximation Inference Numerical analysis Query language Personal digital assistant Approximation Order (biology) Abfrageverarbeitung Mathematical optimization Task (computing) Resultant
Convolution Slide rule Digital filter Clique-width Transformation (genetics) Code Computer-generated imagery Function (mathematics) Product (business) Element (mathematics) Medical imaging CNN Kernel (computing) Computer multitasking Series (mathematics) output Summierbarkeit Position operator Condition number Weight Transformation (genetics) Convolution Product (business) Kernel (computing) Process (computing) Hadamard matrix Prediction Function (mathematics) Series (mathematics) output Social class Computer multitasking Representation (politics)
Inference Convolution Digital filter Transformation (genetics) Virtual machine Local Group Data model Inference CNN Kernel (computing) Computer hardware Series (mathematics) Endliche Modelltheorie Summierbarkeit Linear map Area Theory of relativity Clique-width Linear algebra Sequence Convolution Inference Data model Algebra Series (mathematics) Computer hardware Order (biology) output Absolute value
Pixel Context awareness Theory of relativity Transformation (genetics) Computer-generated imagery Database Instance (computer science) Instance (computer science) Software maintenance Mereology Semantics (computer science) Convolution Software maintenance Medical imaging Geometry Mathematics Data management Propagator CNN Sheaf (mathematics) Electronic meeting system Hausdorff dimension Task (computing) Physical system
Transformation (genetics) Set (mathematics) Function (mathematics) Sequence Front and back ends Convolution Inference Sequence Inference Medical imaging Word Query language Query language output Position operator
Reading (process) Context awareness Patch (Unix) Correspondence (mathematics) View (database) Computer-generated imagery Sheaf (mathematics) Function (mathematics) Parallel port Sequence Medical imaging Inference Mathematics Read-only memory Kernel (computing) Operator (mathematics) Query language Mathematical optimization Form (programming) Graphics processing unit Context awareness Stapeldatei Algebraic number Instance (computer science) Front and back ends Inference Kernel (computing) Process (computing) Query language Computer hardware Order (biology) Software framework Reading (process)
Inference Patch (Unix) Transformation (genetics) Mobile Web Theory Sound effect Semantics (computer science) Convolution Computer architecture Sound effect
Inference Run time (program lifecycle phase) Projective plane Correspondence (mathematics) Multiplication sign Visual system Similarity (geometry) Price index Drop (liquid) Mereology Sound effect Medical imaging Mathematics CNN Query language Data structure Data compression Identity management Identical particles Mapping Run time (program lifecycle phase) Adaptive behavior Sound effect Field (computer science) Total S.A. Instance (computer science) Thresholding (image processing) System call Front and back ends Approximation Inference Compiler Similarity (geometry) Subject indexing Numerical analysis Query language Approximation Order (biology) Metric system
Convolution Digital filter Projective plane State of matter Execution unit Source code Function (mathematics) Mereology Protein Thresholding (image processing) Dimensional analysis Neuroinformatik Sound effect Inference Mathematics Propagator Different (Kate Ryan album) Kernel (computing) Distribution (mathematics) Projective plane Fitness function Sound effect Field (computer science) Total S.A. Prediction Thresholding (image processing) System call Convolution Number Kernel (computing) Numerical analysis Chain output
Point (geometry) Run time (program lifecycle phase) Mapping Run time (program lifecycle phase) Computer-generated imagery Characteristic polynomial Visual system Sampling (statistics) Limit (category theory) Thresholding (image processing) Demoscene Approximation Category of being Medical imaging Workload Mathematics Sample (statistics) CNN Different (Kate Ryan album) Configuration space Configuration space Physical system Physical system
Point (geometry) Inference Computer font Projective plane Patch (Unix) Adaptive behavior Computer-generated imagery Sheaf (mathematics) Set (mathematics) Field (computer science) Thresholding (image processing) Approximation Inference Medical imaging Workload CNN Prediction Approximation Task (computing) Resultant Computer architecture Task (computing)
Intel Matching (graph theory) Sheaf (mathematics) Planning Disk read-and-write head Approximation 2 (number) Inference Type theory Medical imaging Befehlsprozessor CNN Approximation Revision control Resultant Position operator Physical system Computer architecture
Inference Predictability Amenable group Sine Interactive television Interactive television Set (mathematics) Expandierender Graph Database Login Front and back ends Approximation Twitter CNN Prediction Operator (mathematics) Approximation Inference Mathematical optimization Task (computing)
would often all of mine immiscible knock on the a so this is a joint work with my advisor
process command and prophesy harness Constantino from UC of California's and
you so this work is about hallway to classical database inspired techniques and applied in the context of deconvolution Newell metrics in order access accelerate a particular book or namely explanation booklets so it is
not a news that deep convolutional networks are widely used today for image this analytics task for
example there being used in large-scale so as products enormous rankles and even in critical applications such as in health care especially in radio logic for those
who are not familiar with CNN's so what is this sin a CNN is a special type of a neural network specialized for image data and this can perform classification task for example if you have a chest X-ray image of the CNN contrast form into a series of layers and finally pretty whether the person has new morning our and inside the CNN we get a series of convolution layer transformations and each of these convolution let's take a three-dimensional area as the input and outputs the three-day regional
however in many of the applications which are primarily powered by deep CNN's expandability of the prediction is also very important for example if you are a doctor and if they're using a deep CNN this system to diagnosed pneumonia but just predicting the person has pneumonia is not enough you want to know why the system believes so in order to rely on that decision
so how do we explain CNN predictions CNN explainability is still an active area of research however in the practical literature we found that population base explanations all all be for short is a widely used approach by practitioners all
the is a very simple and intuitive process let's say you haven't just extreme image and you have a scene in which predicts the person has the morning you are interested in finding why the scene in things so how you go about solving this is like you get a patch probably of gray or black color and place it on the input image and see what happens to the predicted probabilities and by systematically in this
pollution patch across the image you can you can create a sensitivity heat map and from the sensitivity heatmap basically you can localize the region of interest and thereby get an intrusion on the prediction process however there
is a problem with the which is it is highly time-consuming CNN inference it's itself a time-consuming process for example popular CNN's like inception 3 requires 35 K gov floating point operations in order to perform inference on a single image and due to
the nature of the all workload it ends up creating large number of different instances of the same made with each corresponding to a different occlusion position NASA result this entire workload can take up to several seconds also plans some cases up to serve a couple of minutes depending on what hardware you are working on so this is what is is really problematic in interactive setting so you want to quickly it's see the prediction for a particular result especially in applications like more by that so it in this work because all be as a query optimization task and perform database inspired optimizations to accelerate the vocal so the rest of my talk
is organized as follows 1st I will give a brief background on the scene an internal so in order for you to understand and appreciate optimizations next I'll explain how incremental seen inference techniques followed by all approximates inference to finally I will give a brief summary of of experimental results although incremental CNN inference techniques I inspired by the might occur optimization and incremental view maintenance form in the dead of this literature and approximate inference techniques I inspired by the approximate query processing techniques division science
so in the previous slide off I mentioned that a CNN is essentially performing a series of convolution layer transformations now let's briefly look into what goes inside each of these conditions dates inside collusion later we
get a bank of 3 filter kernels which in three-dimensional and the height and the weight of each of these the kernel is smaller than the height and width of thing however that that will be the same you take 1 of these filter kernels superimpose it on the input image and take the element wise product also called the high-demand production and performer reduced some to produce this scalar output then you can slide this of filter kernel across the input image is you'll end up creating a scalar value for every individual position and finally to create the code slice of the output by repeating this process for all the filter kernels that you have and finally stacking them together you can create the final to the opera in fact
the convolution transformation can be reimagined using our relational data model for example here how represented the input area and the filter kernels using tool relations and I can write that and this was a skill could to basically perform the same transformations I don't know I don't expect you to read this or cookery but that take away from this is that a CNN is essentially performing a sequence of these steps is as which are essentially a series of joints and aggregates however in the machine learning community we usually don't use the relational data model but rather use the linear algebra based a model in order to better utilize hardware to
next I will explain how incremental seen an inference to incremental inference technique
is motivated by the observation that in all the we are performing a lot of freedom in competitions for example if you take the orginally image and an instance of an occluded image was most of the pixels are not and also using the semantics of the
convolution layer transformations began exactly data-minded change propagation part across the layers this kind of problems have been previously studied in the context of relational data management systems which called incrementally maintenance problems all solution for this
problem is what we call incremental CNN inference essentially we cast whole B has a set of sequence of queries given an orginal image every
is convolutional layer transformation can be treated as a query which takes the input and produces an output and since all be has multiple images corresponding to each uploading position it is it is essentially a set of sequence of words so to stop the given the orginal image we 1st materialized all the outputs of convolution layers and treat them as Ms. materialized it's next we have
developed an adequate framework for incremental propagating only the changes across the slaves this followed soon and in competitions however in all being there are multiple such uploaded images and sequentially executing them will travel the performance especially in high throughput and selected few
tool come this we propose the Batch incremental inference optimization so the idea is instead of making incremental changes in place we share and reuse the materialized view across all uploaded images let's say I want to perform incremental inference on this particular instance of the occluded image in order to perform the incremental inference I how to read a slightly larger reading context due to do all interoffice in infant the rather than making the changes in place I take the operation patch and cream do reading can context when different boffo and performed incremental for to make the output corresponding to the next layer and similarly I can repeat this process by reading the corresponding breeding context from the corresponding materialized view and do this for all the latest that I have no we cannot invoke multiple IV and curious all corresponding to each of occluded image in 1 go which is a form of what a query optimization and we have created a custom due kernel to invoke this Parliament copies which improves the harder it did to hide it less section so
what speedups can we expect its incremental seeing an inference using the
semantics of the convolution layer transformations we quantify the titular quantified how what speedups can we expect for popular CNN architectures even though for some architectures we could expect speedups all the via Posix accelerate text he so that most architectures can only give up up you up to 2 to and this is due to a problem which we call the avalanche effect which reduces the attainable speed and
this requires us to look into other approaches in actually reading this book and this is the approximate CNN inference techniques comes in part so the basic idea in our approximate inference techniques is that the trade off the we show quality of the generated heat map in order to reduce the runtime but how do we quantify the generated heat map quality for this we used already established metrics innovation science namely the structural similarity index so given the exact heat map and an instance of the approximate heat map using we can generate value which is a number between 1 and minus 1 and number and value and the same value of 1 basically corresponds to 2 identical images in practice the same values close to point 9 9 a widely used in practical applications such as an image compression so going back to Africa
like approximate techniques we have basically tuitions of appartment of approximate techniques the first one is what the culprit if you feel thresholding which essentially combats the effect of the avalanche effect by pruning the growth of the compiler changes which made which basically makes every query nobody runs faster the next technique is what the call adapted to endow which basically produces less number of queries for long basically creates low granularity queries for less sensitive regions which ultimately reduces the total number of queries that he had run for me for the interest of time I'll be focusing on predictive filter surely ever come you to read our paper or drop by during the poster session to learn more about adopt drill-down techniques so before explaining the
predictive thresholding technique let me explain what is avalanche effect explain that I have an example of a 1-D convolution with the takeovers from these also applicable for the 2 leasing let's say I have full different layers and I health of the kernel of size 3 and I go and make an input change in the 1st after the 1st convolution it will update 3 more units in the next layer and find more nodes in the subsequently and so on and so forth the shaded region which essentially captures the chains propagation part is also called the project to feed as you can see as you go or crude little layers the ends from the incremental inference this start dimension to all come this work the saddest
easter thresholding the growth of the predictive fit do explain why this would work let's take the opportunity in this final layer and calculate the number of different parts that goes from the source change to the destination 1 such states and the part would it and if the can't the total number of parts for although of protein and we can see this closely resembles that of buIlshit go where most of the parts are centered around the center now follow the actual changing the output values also dependent on the physical values but it can be theoretically shown under certain assumptions this converges to a Lugosi in co in distribution so what we scientists used to define a threshold call Tao predict too few threshold and once the project to feel reaches that level the simplest of the propagating propagation of changes and thereby reduce computation but how do you think tone
it's again a trade-off between runtime and officials heatmap quality and also it also depends on the characteristics of the images and the scene and properties
for example here I have shown that generated approximate heat maps corresponding to different threshold values of tau equals 1 comma decimal 0 and so on the TOEIC along points you is essentially the exact heat map not making any threshold you can see as the toll will decreases I can see of Belene all changes to significant changes in the generated approximately in our system the Auto-Tune tau using a target seen value is and that sample limits that from the workload once a front during the system configuration face now I will briefly
explain some of of experimental results for
the experiment here I have chosen chest X-ray image datasets and the task is to predict whether the person has the morning or not and we have chosen the populous Henan architectures namely regis 16 present 18 in section 3 and for approximate inference techniques we use a target assume value of point 9 more data sets and results can be found in our paper the experiments were run on a single
node matching which has a tight immediate titan expedient XPG you and the dual of a system using piped OvaScience 0 comma decimal 4 when did the results on the g
and the found that the night way of performing this booklet which is performing full CNN inference for all the more polluting make of images corresponding to different uploading positions can take all the way from tool seconds to 12 seconds depending on the type of the scene and that you have but incremental inference this can Our enable up to some around for its speed up for some CNN architectures but for architectures in section 3 we saw that into mental inference is actually slower than on night inference due to deal all heads of the our incremental inference on plane the benefits but if we combine the approximate inference on top of incremental inference we saw that we can obtain get significant speedup of somewhere Nynex on some architectures on the CPU we
so similar trends but the speedups were greater than on the GQ in and because on the CPU and in the ng bottleneck by the computations and in the paper we show also to other data sets we have we have achieved close to 35 speedups on on some approximate inference techniques signs in
summary we have shown that explains seen in predictions is an important task and operation this expandability is widely used in this work we used the database inspired incremental an approximate inference techniques black select all been all optimizations make all be the more amenable to interactive diagnosis of sin and predictions thank you