As Deep Learning Engineer and Researcher we are always trying to optimize some bottleneck computation in our programs. Sometimes we are faced with situations when scientific libraries like NumPy, SciPy aren't just cutting it or worse there are no libraries that implement the esoteric function on our expensive GPU hardware. *Writing Custom C and Cuda Extension becomes an important skill and necessity for applications that require really fast computation. * In this talk, we go through a detailed example of image search on billions of items, we write custom C and Cuda kernel for distance computation and learn how to connect them seamlessly with our python codebase. We compare methods for writing these extensions and bindings for python in terms of both speed and ease of use. |