Installing and Building MXNet with Intel MKL

By Promotion | Updated on 17-Mar-2017

17-Mar-2017

MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices. It is highly scalable, allowing for fast model training, and supports a flexible programming model and multiple languages. MXNet allows you to mix symbolic and imperative programming flavors to maximize both efficiency and productivity. MXNet is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The latest version of MXNet includes built-in support for the Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.

Prerequisites:

Follow the instructions given here.

Building/Installing with MKL:

MXNet can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install MXNet with Intel MKL 2017 on CentOS*- and Ubuntu*-based systems.

1. Clone the mxnet tree and pull down it’s submodule dependencies:

git submodule update –init –recursive

git clone https://github.com/dmlc/mxnet.git

2. Edit the following lines in make/config.mk to “1” to enable MKL support.

With these enabled when you attempt your build it will pull the latest MKL package for you and install it on your system.

USE_MKL2017 = 1

USE_MKL2017_EXPERIMENTAL = 1

3. Build the mxnet library

NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))

make -j $NUM_THREADS

4. Install the python modules

cd python

python setup.py install

Benchmarks:

A range of standard image classification benchmarks can be found under examples/image-classification. We’ll focus on running a benchmark meant to test inference across a range of topologies.

Running Inference Benchmark:

The provided benchmark_score.py will run a variety of standard topologies (AlexNet, Inception, ResNet, etc) at a range of batch sizes and report the img/sec results. Prior to running set the following environmental variables for optimal performance:

export OMP_NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))

export KMP_AFFINITY=granularity=fine,compact,1,0

Then run the benchmark by doing:

python benchmark_score.py

If everything is installed correctly you should expect to see img/sec #’s output for a variety of topologies and batch sizes. Ex:

INFO:root:network: alexnet

INFO:root:device: cpu(0)

INFO:root:batch size 1, image/sec: XXX

INFO:root:batch size 2, image/sec: XXX

…

INFO:root:batch size 32, image/sec: XXX

INFO:root:network: vgg

INFO:root:device: cpu(0)

INFO:root:batch size 1, image/sec: XXX