What do Netflix recommendations, Google’s cat video detector, and Stanford’s image-to-text system all have in common? A lot of training data, and deep neural networks. We’re in the same boat here at BitLit.
This won’t be a tutorial about how deep neural networks work. There are already excellent resources for that (this one by Andrej Karpathy, for example). But, even if you fully understand how deep neural nets work, and even if you can implement one, bridging the gap between prototype implementation and a production-ready system can seem daunting. The code needs to be robust, flexible, and optimized for the latest GPUs. Fortunately, this work has already been done for you. This post describes how to take advantage of that pre-existing work.
There is a plethora of deep neural network libraries available. Caffe, CUDA-Convnet, Theano,and others. At BitLit, we have selected Caffe. Its codebase is actively developed and maintained. It has an active community of developers and users. It has a large library of layer types and allows easy customization of your network’s architecture. It has already been adapted to take advantage of NVIDIA’s cuDNN, if you happen to have it installed.
cuDNN is “a GPU-accelerated library of primitives for deep neural networks”. This library provides optimized versions of core neural network operations (convolution, rectified linear units, pooling), tuned to the latest NVIDIA architectures. NVIDIA’s benchmarking shows that Caffe accelerated by cuDNN is 1.2-1.3x faster than the baseline version of Caffe.
In summary, the tight integration of NVIDIA GPUs, CUDA, cuDNN, and Caffe, combined with the active community of Caffe users and developers is why we have selected this stack for our deep neural network systems.
As noted by Krizhevsky et al. in 2012, “All of our experiments suggest that our results can be improved simply by waiting for faster GPUs… ” This is still true today. We use both Amazon’s GPU instances and our own local GPU server.
When we need to run many experiments in parallel, we turn to Amazon. This need arises when performing model selection. To determine how many neural net layers to use, how wide each layer should be, etc., we run many experiments in parallel to determine which network architecture produces the best results. Then, to fully train (or later, retrain) the selected model to convergence, we use our local, faster GPU server.
Amazon’s cheapest GPU offering is their g2.2xlarge instance. It contains an NVIDIA Kepler GK104 (1534 CUDA cores). Our local server, with an NVIDIA Tesla K40 (2880 CUDA cores), trains about 2x as quickly as the g2.2xlarge instance. NVIDIA’s latest offering, the K80, is again almost as twice as fast, benchmarked on Caffe. If you’re just getting started, it certainly makes sense to learn and experiment on an Amazon AWS instance before committing to purchasing a GPU that costs several thousand dollars. The spot price for Amazon’s g2.2xlarge instance generally hovers around 8 cents per hour.
If you are an academic research institution, you may be eligible for NVIDIA’s Academic Hardware Donation program. They provide free top-end GPUs to labs that are just getting started in this field.
To conclude, it is not difficult to integrate a robust and optimized deep neural network in a production environment. Caffe is well supported by a large community of developers and users. NVIDIA realizes this is an important market and is making a concerted effort to be a good fit for these problems. Amazon’s GPU instances are not expensive and allow quick experimentation.
Sancho McCann (@sanchom) is the Head of Research and Development at BitLit Media Inc. He has a Ph.D. in Computer Vision from the University of British Columbia.
This post was originally posted on Packt Publishing’s blog. It has been republished here with permission. Packt Publishing offers print and ebook bundling through BitLit. Download the app (Android and iPhone) to bundle your Packt books.