Setup TensorFlow to use the GPU with Docker Containers

Having built a machine suitable for deep learning, I was ready to put my EVGA GeForce 1080 Ti GPU to the test. Unfortunately I found that configuring TensorFlow + GPU to run on my local machine was not as straightforward as any other Python package I have installed. This story has been repeated on many posts online with all the pitfalls that can occur. This post chronicles the simplest approach I have found to start using TensorFlow with the GPU in the simplest and easiest manner as possible.

I am motivated to make this post since I found no sites that chronicled the complete journey to start from a fresh GPU installation and have Tensorflow running on a GPU. Many sites show individual steps, and some advertise how easy this can be while only showing the last Conda install steps required, none of the prior CUDA configuration steps. Having tried multiple approaches to install TensorFlow on my local machine directly to work with the GPU, I found that using a Docker container was a reliable method and also makes work more portable to other machines.

In this post, I will describe all steps that were required to stand up a Docker container that can run TensorFlow on Ubuntu 18.04 OS with an EVGA GeForce GTX 1080 Ti GPU.

1. Install Nvidia Drivers

Prior to installing Nvidia drivers, I recommend removing all existing Nvidia drivers. I have seen errors with the GPU not being recognized due to prior Nvidia GPU and CUDA drivers. If you find that you later want to install Tensorflow with GPU support on the local machine, this is the key first step.

$ sudo apt remove nvidia-*
$ sudo apt install
$ sudo apt autoremove

The next step is to find the appropriate driver for the GPU. Here I performed a Manual Driver Search for GeForce 10 Series: https://www.nvidia.com/en-us/geforce/drivers/. Select the OS with bits (e.g., Linus 64-bit) and downloaded the latest driver for this GPU: Linux x64 (AMD64/EMT64T) Display Driver Version: 465.31 and the run file NVIDIA-Linux-x86_64-465.31.run.

The file permissions may need to be changed prior to executing the run file:

$ sudo chmod +x ./NVIDIA-Linux-x86_64-465.31.run
./NVIDIA-Linux-x86_64-465.31.run

If you do not know the meaning of installation options, I recommend selecting the defaults since other options can produce errors. You may receive a warning about the GCC version being different. I had no errors as long as the system GCC version is more recent than the GCC used to compile the run file.

2. Install Docker

Docker provides the latest instructions to install the Docker engine on Ubuntu here: https://docs.docker.com/engine/install/ubuntu/. Note that Docker may change the steps below, and I recommend following the latest steps from the Docker site. It is recommend to start with the Uninstall Old Versions step to prevent incompatibility issues. Next use the Install Using the Repository and Set Up the Repository steps:

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

Add Docker’s official GPG key:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Setup a stable repository:

$ echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the latest version of the Docker engine:

 $ sudo apt-get update
 $ sudo apt-get install docker-ce docker-ce-cli containerd.io

Finally verify that Docker is working:

$ sudo docker run hello-world

3. Install Nvidia Docker Support

Nvidia provides working instructions to setup Docker and the Nvidia Container Toolkit here with Install on Ubuntu and Debian: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker. I recommend using this link maintained by Nvidia. However, I will also document the steps I used recently to setup Nvidia with Docker support. Note that you can skip the Setting up Docker step since we setup Docker in the prior step. Use the $ docker -v command to confirm that the Docker version is 19.03 or later which is required for nvidia-docker2.

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The output should show the GPU status similar to below (extra points if you catch the pop-culture reference):

In prior installations I received an error installing nvidia-docker like this one: https://github.com/NVIDIA/nvidia-docker/issues/234. If this error occurs, the solution is to install this deb file: https://github.com/NVIDIA/nvidia-docker/files/818401/nvidia-docker_1.0.1-yakkety_amd64.deb.zip. Then the nvidia-docker2 package should be able to be installed, or else you may also try to install with sudo apt-get install -y nvidia-container-toolkit.

4. Pull a Pre-Built Docker Image

The easiest way to get started with Docker is to pull a pre-built image that has Jupyter notebook and TensorFlow GPU support. I recommend selecting an image with a terminal window to make updating the Python virtual environment easier, and I recommend to choose an image that connects to the local filesystem.

The GPU-Jupyter image provides these features: https://github.com/iot-salzburg/gpu-jupyter/commits?author=ChristophSchranz. I started with Quickstart Step 4 to pull the Docker image. If only Python is needed, the site provides names of additional images that exclude Julia and R which should save time in downloading the image. Also select the proper image for the Ubuntu OS. I used the following command to pull the image:

$ cd your-working-directory 
$ docker run --gpus all -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root cschranz/gpu-jupyter:v1.4_cuda-11.0_ubuntu-18.04_python-only

The command and tags used for pulling a docker image are explained here: https://docs.docker.com/engine/reference/run/. The specific commands used for GPU-Jupyter are explained as follows:

  • -d: the container exits when the root process running the container exits.
  • -it: creates a tty (teletypewriter) as a terminal window for interactive processes.
  • -p: specifies the ports to be accessible on the local host.
  • -v: specifies the volumes or shared filesystem. In the command above, a data folder will be created with admin access only in the working-directory.

Once the image has been pulled, it will begin running automatically at http://localhost:8848. The password at the time of this article is gpu-jupyter.

5. Check that Tensorflow Runs on the GPU

One way to confirm that TensorFlow runs with the local machine GPU is to open a Jupyter notebook in the GPU-Jupyter image and use the is_gpu_available() function which returns a Boolean:

import tensorflow as tf
print(tf.test.is_gpu_available(cuda_only=True))

TensorFlow also provides a function to check the GPU device:

print(tf.test.gpu_device_name())
GPU-Jupyter image provides a JupyterLab web interface

As seen above, both commands confirm that TensorFlow recognizes the GPU. If the image is configured correctly, TensorFlow will use the GPU by default.

In my next post I will show initial results of using TensorFlow + GPU for a common deep learning problem.