Setup TensorFlow to use the GPU with Docker Containers

Having built a machine suitable for deep learning, I was ready to put my EVGA GeForce 1080 Ti GPU to the test. Unfortunately I found that configuring TensorFlow + GPU to run on my local machine was not as straightforward as any other Python package I have installed. This story has been repeated on many posts online with all the pitfalls that can occur. This post chronicles the simplest approach I have found to start using TensorFlow with the GPU in the simplest and easiest manner as possible.

I am motivated to make this post since I found no sites that chronicled the complete journey to start from a fresh GPU installation and have Tensorflow running on a GPU. Many sites show individual steps, and some advertise how easy this can be while only showing the last Conda install steps required, none of the prior CUDA configuration steps. Having tried multiple approaches to install TensorFlow on my local machine directly to work with the GPU, I found that using a Docker container was a reliable method and also makes work more portable to other machines.

In this post, I will describe all steps that were required to stand up a Docker container that can run TensorFlow on Ubuntu 18.04 OS with an EVGA GeForce GTX 1080 Ti GPU.

1. Install Nvidia Drivers

Prior to installing Nvidia drivers, I recommend removing all existing Nvidia drivers. I have seen errors with the GPU not being recognized due to prior Nvidia GPU and CUDA drivers. If you find that you later want to install Tensorflow with GPU support on the local machine, this is the key first step.

$ sudo apt remove nvidia-*
$ sudo apt install
$ sudo apt autoremove

The next step is to find the appropriate driver for the GPU. Here I performed a Manual Driver Search for GeForce 10 Series: https://www.nvidia.com/en-us/geforce/drivers/. Select the OS with bits (e.g., Linus 64-bit) and downloaded the latest driver for this GPU: Linux x64 (AMD64/EMT64T) Display Driver Version: 465.31 and the run file NVIDIA-Linux-x86_64-465.31.run.

The file permissions may need to be changed prior to executing the run file:

$ sudo chmod +x ./NVIDIA-Linux-x86_64-465.31.run
./NVIDIA-Linux-x86_64-465.31.run

If you do not know the meaning of installation options, I recommend selecting the defaults since other options can produce errors. You may receive a warning about the GCC version being different. I had no errors as long as the system GCC version is more recent than the GCC used to compile the run file.

2. Install Docker

Docker provides the latest instructions to install the Docker engine on Ubuntu here: https://docs.docker.com/engine/install/ubuntu/. Note that Docker may change the steps below, and I recommend following the latest steps from the Docker site. It is recommend to start with the Uninstall Old Versions step to prevent incompatibility issues. Next use the Install Using the Repository and Set Up the Repository steps:

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

Add Docker’s official GPG key:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Setup a stable repository:

$ echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the latest version of the Docker engine:

 $ sudo apt-get update
 $ sudo apt-get install docker-ce docker-ce-cli containerd.io

Finally verify that Docker is working:

$ sudo docker run hello-world

3. Install Nvidia Docker Support

Nvidia provides working instructions to setup Docker and the Nvidia Container Toolkit here with Install on Ubuntu and Debian: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker. I recommend using this link maintained by Nvidia. However, I will also document the steps I used recently to setup Nvidia with Docker support. Note that you can skip the Setting up Docker step since we setup Docker in the prior step. Use the $ docker -v command to confirm that the Docker version is 19.03 or later which is required for nvidia-docker2.

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The output should show the GPU status similar to below (extra points if you catch the pop-culture reference):

In prior installations I received an error installing nvidia-docker like this one: https://github.com/NVIDIA/nvidia-docker/issues/234. If this error occurs, the solution is to install this deb file: https://github.com/NVIDIA/nvidia-docker/files/818401/nvidia-docker_1.0.1-yakkety_amd64.deb.zip. Then the nvidia-docker2 package should be able to be installed, or else you may also try to install with sudo apt-get install -y nvidia-container-toolkit.

4. Pull a Pre-Built Docker Image

The easiest way to get started with Docker is to pull a pre-built image that has Jupyter notebook and TensorFlow GPU support. I recommend selecting an image with a terminal window to make updating the Python virtual environment easier, and I recommend to choose an image that connects to the local filesystem.

The GPU-Jupyter image provides these features: https://github.com/iot-salzburg/gpu-jupyter/commits?author=ChristophSchranz. I started with Quickstart Step 4 to pull the Docker image. If only Python is needed, the site provides names of additional images that exclude Julia and R which should save time in downloading the image. Also select the proper image for the Ubuntu OS. I used the following command to pull the image:

$ cd your-working-directory 
$ docker run --gpus all -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root cschranz/gpu-jupyter:v1.4_cuda-11.0_ubuntu-18.04_python-only

The command and tags used for pulling a docker image are explained here: https://docs.docker.com/engine/reference/run/. The specific commands used for GPU-Jupyter are explained as follows:

  • -d: the container exits when the root process running the container exits.
  • -it: creates a tty (teletypewriter) as a terminal window for interactive processes.
  • -p: specifies the ports to be accessible on the local host.
  • -v: specifies the volumes or shared filesystem. In the command above, a data folder will be created with admin access only in the working-directory.

Once the image has been pulled, it will begin running automatically at http://localhost:8848. The password at the time of this article is gpu-jupyter.

5. Check that Tensorflow Runs on the GPU

One way to confirm that TensorFlow runs with the local machine GPU is to open a Jupyter notebook in the GPU-Jupyter image and use the is_gpu_available() function which returns a Boolean:

import tensorflow as tf
print(tf.test.is_gpu_available(cuda_only=True))

TensorFlow also provides a function to check the GPU device:

print(tf.test.gpu_device_name())
GPU-Jupyter image provides a JupyterLab web interface

As seen above, both commands confirm that TensorFlow recognizes the GPU. If the image is configured correctly, TensorFlow will use the GPU by default.

In my next post I will show initial results of using TensorFlow + GPU for a common deep learning problem.

Building a Deep Learning Machine — Part 4: Installing the Ubuntu 18.4 OS

The machine booted successfully using the chassis power button. The initial screen displays that the CPU and RAM are recognized. The image was taken when only the initial 16 GB RAM was recognized before moving the RAM card to another slot.

I decided to install the latest LTS (Long Term Support) desktop version of Ubuntu 18.04.1 LTS. I chose Ubuntu over Windows 10 since some machine learning packages like OpenCV can only run on Ubuntu, and some applications like Docker are built on Linux and are easier to install on Ubuntu.

I downloaded the 2 GB Ubuntu 18.04.1 ISO file and burned a bootable ISO file onto a USB flash drive using Rufus. The machine booted off the USB flash drive without changing BIOS boot settings beforehand. I chose default Ubuntu installation options with login credentials required. Installation steps are described here in more detail.

The only installation issue I had was that I could not get past the Ubuntu login screen. After every attempt to login by entering the password, Ubuntu would return to the login screen. No error appeared for an incorrect password since the password was correct. I selected the Use LVM with the new Ubuntu installation option on the first installation. LVM stands for Logical Volume Management and allows the user to add, modify, resize, and take snapshot partitions.

The infinite login loop issue was resolved by reinstalling Ubuntu without selecting the LVM option.

Ubuntu 18.04.1 desktop

Building a Deep Learning Machine – Part 3: Installing the SSD, RAM, GPU, PSU, and Motherboard/Power Connections.

I completed building the deep learning machine this past weekend. I will describe the final steps for assembling the hardware in this Part 3 and discuss the OS installation in Part 4.

Solid State Drive
The build has a 500 GB Samsung 960 EVO M.2 solid state storage drive. The drive uses the NVMEe protocol which can be utilized by the M.2 socket with ‘M’ keying (read more about keying here). The Strix X99 M.2 socket runs on a PCIe 3.0 x4 lane which it shares with a U.2 connector. The socket is compatible with the following SSD sizes: 2242/2260/2280/22110. The first two numbers ’22’ are the width (22 mm), and the remaining numbers are the length (42 mm, etc.). The M.2 socket was designed to provide faster link speeds than the mini-SATA connector. The SATE 3.0 has a link speed of up to 6 GB/s versus the PCIe 3.0 x4 lane which runs up to 20 GB/s. The 960 EVO has sequential read/write speeds up to 3.2 GB/s and 1.8 GB/s. Read more about the performance difference between SATA and M.2 at PCWorld here.

Samsung 960 EVO M.2 SSD with 500 GB

When inserted into the M.2 socket, the SSD will be angled upward. I first install a hex head jack screw to raise the screw mount even with the socket and press the SSD onto the jack screw while screwing the mounting screw into the jack screw.

RAM

I have started the build with two 16 GB DDR4 RAM cards.

I initially installed the two RAM cards in the D1 and B1 motherboard locations as recommended by the motherboard manual and as shown below.

After completing the installation and booting the machine, the BIOS utility only recognized the RAM in the B1 slot though the D1 slot is recommended as the first slot to use with one card.

Bios utility recognizes 16GB RAM in the B1 slot.
BIOS utility recognizes a card in the D1 slot but does not recognize the size.

When I researched this issue, the first solutions I found recommended overclocking the motherboard with increased RAM slot voltage to permit using additional RAM cards. In this case, I moved the card in D1 to A1, and the BIOS utility recognized two cards and 32 GB RAM. I recommend moving RAM cards to another slot as the first troubleshooting step when a card is not recognized.

BIOS utility recognizes both cards in the A1 and B1 slots.

GPU

The build will begin with one EVGA GTX 1080 Ti 11GB graphical processing unit. Tests have shown the 1080 Ti performance is comparable to the more expensive Titan X for machine learning applications. The motherboard has 3 PCIe x16 slots and is suited to run two GPU utilizing x16 PCIe lanes each or three GPU in a x16/x8/x8 configuration.

EVGA GTX 1080 Ti GPU
The GTX 1080 Ti fits into a 16 lane PCIe slot.

The Strix X99 motherboard has two PCIe x16 slots and one PCIe x8 slot, and the first GPU is installed in the first slot.

Strix X99 16x PCIe lane

PSU

I chose a EVGA 1000 GQ power supply unit for the build. Since the absolute peak load for a GTX 1080 Ti that is overclocked is 350 W, the 1000 W PSU will be sufficient for an upgrade to two GPUs. The EVGA 1000 GQ comes with one ATX 20+4-pin cable for the main motherboard power supply, two 8(4+4)-pin CPU cables, two standalone 8(6+2)-pin and four 8(6+2)-pin x2 cables, three SATA 5-pin x4 cables, one Molex 4-pin x3 cable, and one Molex to FDD adapter. The ‘4+4’ notation indicates that two 4 pin male cables are wired adjacently to connect with either an 8 or 4 pin female connector.

I recommend connecting all required cables to the PSU prior to installing in the case since access to the rear of the PSU is restricted inside the case. The Phanteks Eclipse P400 case has an exhaust port along the case bottom for the PSU fan.

Motherboard and Power Connections

Once the PSU is installed, I complete the build by making all remaining motherboard and power connections. The GTX 1080 Ti requires one 8 pin and one 6 pin VGA cable.

The motherboard has a 4-pin header for the water pump and a 4-pin header for the CPU fan. The water pump power connector has holes for 3 pins and should be connected to the 3 header pins aligned with the bar as shown below. The fourth pin left of the water pump connector has no bar behind it. The fourth header pin would allow pump/fan speed control via pulse width modulation. The CPU fan has holes for four pins.

The chassis is connected to the motherboard with the following connections. The power, reset, and HD audio connections are shown in the upper left corner of the image below. The chassis USB 3.0 ports are connected in the center connector. The front panel audio connector is shown in the upper right.

The CPU is powered with both the 8-pin and 4-pin ATX connectors. The CPU water cooler is powered with a 15-pin SATA connector from the PSU as shown below in the upper left corner.

After binding the wires in the chassis behind the motherboard, the machine hardware build is now complete! In Part 4, I will describe the OS and software installation.

Building a Deep Learning Machine – Part 2: Installing Motherboard, CPU, and CPU Water Cooler

Installing the CPU

The machine build began with installing the CPU. The CPU is an Intel Xeon E5-1620 v4. Although the processor is a v4, it is designed for a LGA 2011-v3 socket consistent with the ASUS Strix X99 motherboard. I described why I chose this processor in Part 1 of the series.

The LGA 2011-v3 socket on the motherboard has a protective cover to prevent exposing the pins any longer than necessary. The cover warns the user to keep the cover on the socket until after installing the CPU. Removing the cover just before installing the CPU is also fine since the pins are covered once the CPU is pressed into the motherboard socket.

LGA 2011-v3 socket on the Strix X99 motherboard

I opened the socket cover by releasing both spring levers to an open position.

The CPU should be aligned with the arrow on the CPU corner aligned with the arrow on the motherboard socket before being placed into the motherboard socket.

The CPU and motherboard have arrows shown on the lower right corner.

The socket is closed and spring levers are returned to their locked position. Some force is required to press the CPU contacts to the motherboard contacts in order to lock the spring levers. Finally the protective cover is removed.

Installing the motherboard

The PC case is a Phanteks Eclipse P400 Tempered Glass Edition midtower.

Installing the motherboard into the case was straightforward since the board is aligned with the rear I/O connection on the case. The case is built to conceal wires behind the motherboard and has two wire ports located on the opposite side from the rear I/O connection. The motherboard has 9 screws to attach to the case.

View of motherboard and chassis from above with the rear I/O connection along the bottom.

As my first PC build, I learned the hard way that the thermal paste is already layered on the water cooler interface out of the box when I compared the water cooler size to the CPU by placing the cooler on the CPU. I was able to salvage the situation and make the final alignment the same as when I first transferred the thermal paste to the CPU to ensure the thermal paste coverage is consistent.

Installing the Water Cooler

I bought the Corsair H60 water cooler which has a single 120 mm radiator and fan. This water cooler gets good marks on PC part picker for being economical (currently $70) and effective in its price range.

A main consideration before installing the water cooler is whether to apply aftermarket thermal paste. Thermal paste is necessary to ensure suitable thermal contact between the cooler and CPU. Water coolers will come with thermal paste already applied by default. Tests are inconclusive about whether aftermarket thermal paste improves heat transfer; I have seen tests demonstrate worse heat transfer with aftermarket paste. Factors can include the quality of default thermal paste and how well the after market paste is applied. I decided to use the default thermal paste and will trend the temperatures in operation.

The radiator should be positioned against the case wall with the fan oriented as an inlet fan as recommended by Corsair. This setup ensures cooler air is drawn over the radiator to produce a larger temperature delta rather than warm air from inside the box. The Phantek Eclipse P400 provides space for the water cooler radiator and fan on the top of the case. I positioned it towards the front to be closer to the exhaust fans.

The Corsair H60 screws directly into the top of the Intel processor socket on the motherboard. AMD processors are attached with adapters from behind the motherboard.

The Corsair H60 has a 15-pin SATA power connection which will connect directly to the power supply.

Building a Deep Learning Machine – Part 1: Components

I have started building a desktop machine designed for fitting machine learning models including deep learning applications. The Reinforcement Learning and Decision Making class in the OMS CS program at Georgia Tech motivated me to build a machine appropriate for machine learning applications as I start the second half of the masters program. I was able to complete RLDM with my laptop which has a 2.16 GHz Celeron processor and 8 GB RAM, but I plan to use the new machine for upcoming machine learning classes.

I have prioritized designing the machine learning desktop around the GPU(s). I will take a short detour to explain why. GPUs have become the main engine for solving computations in data science models versus CPUs. GPUs have many more arithmetic logic units (ALUs) than CPUs which provides an improved ability to perform simple operations in parallel. Machine learning, artificial intelligence, and deep learning problems generally require matrix math operations that can be accelerated by solving in parallel. My design goals were:

  • A powerful GPU that has sufficient RAM to be well suited for computer vision applications. Some users have reported using 8 GB RAM at a minimum for training computer vision models, but an upgrade to 11 GB is beneficial. The GPU should also have broad support for machine learning libraries. The cuDNN library built on top of Nvidia CUDA programming framework is used by major deep learning frameworks including TensorFlow and PyTorch. I decided on the GeForce GTX 1080 Ti made by EVGA which has a Nvidia processor with 11 GB.
  • Sufficient RAM to handle a future upgrade to two GPUs. The machine should have at least as much RAM as the GPUs. Since I would like the machine to be ready for a possible upgrade to 2 GPUs in the future, I have purchased 32 GB RAM.
  • A 40-lane CPU that can accommodate an upgrade to two 16 PCIe lane GPUs while maximizing the PCIe lanes for data transfer between the CPU and GPU. The data transfer between the CPU and GPU across PCIe lanes can be a bottleneck which slows the GPU performance depending on the application. I chose an Intel Xeon E5 1620 V4 3.5 GHz processor over an i-7 series processor since the Xeon has 40 PCIe lanes which will allow two GPUs to use 16 lanes apiece. *07/02/21 UPDATE: My research in 2018 indicated that PCI lanes may restrict GPU performance. Given the cost of the GPU, I preferred to ensure my system did not restrict performance due to data transfer limitations. However, more recent posts have shown that deep learning may be restricted by memory but should be little restricted by PCI lanes and data transfer with the CPU. Tim Dettmers has a nice article discussing GPU selection for deep learning: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/.
  • A motherboard that suits the GPU and CPU and handles an upgrade to two GPUs while maximizing the PCIe 3.0 lanes for data transfer between the CPU and GPU. PCIe 3.0 is recommended for multiple GPU machines. To have space for two 1080 Ti GPUs, the motherboard needs to support two dual-width x16 graphics slots. The motherboard should also have a LGA 2011 processor slot for the Xeon processor. I chose the ASUS STRIX X99 motherboard which provides 40 PCIe 3.0 lanes which supports a 16/16/8 configuration.

I have provided a full list of the components I chose on PC Part Picker: https://pcpartpicker.com/list/RtKCq4 .

References

Tim Dettmer: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Slav Ivanov: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Yan-David Erlich: https://medium.com/yanda/building-your-own-deep-learning-dream-machine-4f02ccdb0460