Installing and Building MXNet with Intel MKL

Published Date
05 - Mar - 2017
| Last Updated
17 - Mar - 2017
Installing and Building MXNet with Intel MKL

MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices. It is highly scalable, allowing for fast model training, and supports a flexible programming model and multiple languages. MXNet allows you to mix symbolic and imperative programming flavors to maximize both efficiency and productivity. MXNet is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient.  The latest version of MXNet includes built-in support for the Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.


Follow the instructions given here.

Building/Installing with MKL:

MXNet can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install MXNet with Intel MKL 2017 on CentOS*- and Ubuntu*-based systems.

1.     Clone the mxnet tree and pull down it’s submodule dependencies:

git submodule update --init --recursive

git clone

2.     Edit the following lines in make/ to “1” to enable MKL support. 

With these enabled when you attempt your build it will pull the latest MKL package for you and install it on your system.

USE_MKL2017 = 1


3.     Build the mxnet library

NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))

make -j $NUM_THREADS

4.     Install the python modules

cd python

python install


A range of standard image classification benchmarks can be found under examples/image-classification.  We’ll focus on running a benchmark meant to test inference across a range of topologies.

Running Inference Benchmark:

The provided will run a variety of standard topologies (AlexNet, Inception, ResNet, etc) at a range of batch sizes and report the img/sec results.  Prior to running set the following environmental variables for optimal performance:

export OMP_NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))

export KMP_AFFINITY=granularity=fine,compact,1,0

Then run the benchmark by doing:


If everything is installed correctly you should expect to see img/sec #’s output for a variety of topologies and batch sizes.  Ex:

INFO:root:network: alexnet

INFO:root:device: cpu(0)

INFO:root:batch size  1, image/sec: XXX

INFO:root:batch size  2, image/sec: XXX

INFO:root:batch size 32, image/sec: XXX

INFO:root:network: vgg

INFO:root:device: cpu(0)

INFO:root:batch size  1, image/sec: XXX