Walmart Daily Sales Prediction Using Time Series Analysis: Seasonality

Time series prediction can be beneficial in many fields including logistics, weather, sales forecasting, and predictive maintenance. Walmart provided a complete data set on Kaggle that can be used to evaluate time series prediction techniques. I will be making several posts using the Walmart data set from the M5 Forecasting – Accuracy competition to develop and evaluate time series methods in Python.

This first post demonstrates preliminary exploratory data analysis (EDA) and prediction using seasonal features. The post also provides a brief summary of polymorphism in Python using an abstract parent class to minimize code duplication and avoid conditionals like switch or if statements.

Walmart Data Set

The Walmart data set includes data for items in three categories of products from 2011 through 2016: hobbies, foods, and household. Each item is associated with a store in CA, FL, or TX. Three tables contain data to identify daily unit sales, selling prices, and event data for any given day.

The calendar table
  • The calendar table has daily rows with weekday and event labels providing the date of notable events. The event labels contained in the table include religious holidays, such as Chanukah End and Easter, sporting events, such as SuperBowl and NBAFinalsStart, and US national holidays, such as Thanksgiving and IndependenceDay.
  • The sales_train_validation table includes daily unit sales data for products in the three categories among stores in the three states. This table is in wide format with each row containing all daily sales data for one product and columns for each day in the full time range.
The sales_train_validation table
  • The sell_prices table provides weekly prices for each item.
The sell_prices table
Exploratory Data Analysis

Prior to predicting unit sales, we would like to identify good candidate items that have strong seasonal variation. Since the Walmart data set only provides item_id labels instead of true item names or descriptions, EDA is needed to identify these good candidate items. Items that are the best candidates for seasonal prediction have a high correlation with events and holidays. Since some foods are often associated with events (e.g., chocolate on Valentine’s Day), this analysis focuses on items in the FOOD category. This initial round of exploratory data analysis identifies foods that demonstrate higher sales on events. Same items from multiple stores are grouped since preliminary EDA showed these same food items behaved similarly among multiple stores and states.

To identify good candidates for seasonal prediction, the data tables needs to be merged into a form with one column per event and one row per item with the same items from multiple stores groups into one row.

  1. Unpivoting (pandas melt) the sales_train_validation table converts the table from a wide format with a column per day to long format with a primary key including day.
  2. Grouping and averaging (pandas groupby) combines each item sold on one day among all stores into one row for that item with an average unit_sales for this day. The grouped sales_train_validation table now only has three columns: d (day), item_id, and average unit_sales across all stores.
  3. Merging (pandas merge) joins this grouped table with the calendar table on the day column. This step adds the event labels per day to the average unit_sales per day.
  4. Grouping again combines items per event to produce a table with a primary key of item_id and event_name. This table identifies the average unit_sales per event.
  5. Pivoting (pandas pivot) converts this grouped table into wide format with one column per event including a ‘None’ column to group sales on days without events.
  6. Dividing the unit_sales values in the event columns by the the unit_sales values in the ‘None’ column produces a unit_sales ratio to highlight foods with higher sales on event days. This step produces the final table values and structure.
Final unsorted wide table with average unit sales per item and per event

Sorting this wide table to be descending in the ‘Thanksgiving’ column identifies FOODS_3_069 as the food with highest increase in average unit_sales on Thanksgiving Day compared to days without events.

Sorted table identifies foods that sell more on Thanksgiving than normal days

The unit_sales for FOODS_3_069 at the TX_1 store demonstrates the unit_sales seasonality for this food. Distinct peaks occur near New Years Eve, Christmas, Thanksgiving, and Valentine’s Day although not all holidays have a peak each of the five years.

Unit sales of the FOODS_3_069 item shows peaks near three holidays
Unit Sales Prediction with Seasonal Features

This analysis uses a combination of deterministic time series features to predict unit sales. These features are a linear trend, weekly seasonal indicators, and annual seasonal indicators. The linear trend enables the model to detrend a long-term linear trend in time. The seasonal features are Fourier series in which each series has an integer number of cycles within a one year time frame. This analysis uses annual seasonal indicators with 1 to 32 cycles per annum. The statsmodels.tsa.deterministic.DeterministicProcess container class is a convenient class that provides the Fourier Series in addition to constants, time trends, and seasonal indicators for each week. The following method demonstrates the DeterministicProcess syntax. The DeterministicProcess requires a pandas index format for the index column.

    def create_seasonal_features(self, df_merged_store):
        """Creates seasonal features for one item and one store"""
        df_copy = df_merged_store.copy(deep=True)
        y = df_copy['unit_sales']

        df_copy['date'] = pd.DatetimeIndex(df_copy['date'])
        df_copy.set_index('date', inplace=True)
        fourier = CalendarFourier(freq='A', order=16)
        dp = DeterministicProcess(index=df_copy.index,
                                    constant=True,
                                    order=1,
                                    seasonal=True,
                                    additional_terms=[fourier],
                                    drop=True)
        X = dp.in_sample()

        return X, y

The model fitting and prediction problem presents an opportunity to apply polymorphism in Python using the abc package. A parent class contains a generic plotting method and abstract methods for fitting and prediction. A child class defines fitting and prediction methods that are tailored to a specific combination of input features. This first analysis only uses the seasonal features described previously, and the UnitSalesPredictionSeasonal child class fits a linear regression model from sklearn.linear_model.LinearRegression. The full code used in this example is available on GitHub: https://github.com/bspivey/M5ForecastingAccuracy.

from abc import ABC, abstractmethod

class UnitSalesPrediction(ABC):
    def plot_predictions(self, X, y, y_pred):
        list_of_tuples = list(zip(X.index, y, y_pred))
        columns = ['date', 'y', 'y_pred']
        df_wide = pd.DataFrame(list_of_tuples, columns=columns)
        value_vars = ['y', 'y_pred']
        df_tall = pd.melt(df_wide,
                            id_vars='date',
                            value_vars=value_vars,
                            var_name='y_label',
                            value_name='y_value')

        fig = px.line(df_tall,
                        x='date',
                        y='y_value',
                        color='y_label',
                        width=900,
                        height=300)
        fig.update_layout(
            yaxis_title='unit_sales')

        fig.show()

    @abstractmethod
    def fit_unit_sales_model(self):
        pass

    @abstractmethod
    def predict_unit_sales(self):
        pass

class UnitSalesPredictionSeasonal(UnitSalesPrediction):
    def fit_unit_sales_model(self, X_seasonal, y):
        """Trains a model to predict unit sales for one item and one store"""
        X = X_seasonal
        model = LinearRegression().fit(X, y)

        return model

The model trains on FOODS_3_069 time series data excluding the final two years. The final year contains the test data not used for model tuning, and the prior year contains the validation data used for model tuning.

The results for unit_sales predictions on validation and test data demonstrate that the seasonal features model successfully identifies peaks around Thanksgiving and Christmas and a possible peak near Valentine’s Day. The y_pred signal is the predicted unit_sales shown versus the y signal which is the validation or test data unit_sales.

FOODS_3_069 unit sales predictions on validation data
FOODS_3_069 unit sales predictions on test data

While the results demonstrate a correlation with several holidays as expected, the results smooth the predictions and show potential for improvement. Ideas for next steps are (1) including categorical features using actual event and holiday labels combined with lag/lead features, (2) using a hybrid linear regression and nonlinear regression model, and (3) using deep learning packages such as Facebook Prophet.

Compare GPU and CPU Training Times for Image Recognition with Tensorflow 2

This article compares the training times for fitting a Tensorflow 2 convolutional neural network (CNN or convnet) using a GPU or CPU on the Kaggle Dogs vs. Cats dataset. The Dogs vs. Cats competition was an early Kaggle competition to demonstrate the power of convnets to solve computer vision recognition problems as winning entries reached 95% accuracy.

The training time comparison follows my prior post explaining how to setup an nvidia-docker container to run TensorFlow 2 on a GPU. I will begin this article by reviewing the main steps to train the convnets using an example in Deep Learning with Python 1st edition by Chollet. These steps are provided in more detail on the book GitHub site: https://github.com/fchollet/deep-learning-with-python-notebooks.

Starting the Container

The GPU can be enabled or disabled when starting the nvidia-docker container by keeping or removing the --gpus all option in the following line:

sudo docker run --gpus all -d -it -p 8848:8888 -v "$(pwd)/data:/home/jovyan/work" -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root cschranz/gpu-jupyter:v1.4_cuda-11.0_ubuntu-18.04_python-only

If the GPU is not selected as an option, the following command should show no GPUs in the list of local devices:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
from tensorflow.python.client import device_lib
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2823115825857772105
]

Training the Model

The convnet is constructed with a series of paired convolution and max pooling layers. The first Conv2D layer slides 3×3 windows over the 150 x 150 x 3 pixel tensor representing the scaled RGB input image to produce a 148 x 148 x 32 pixel output feature map with 32 layers for each of the 32 convolution filters. The output height and width can maintain the input height and width by setting padding="same". The MaxPooling2D layer downsamples the feature maps. Downsampling is important to reduce the number of model parameters and to achieve output feature maps that represent general image features such cat eyes or ears. The convnet is completed by flattening the output feature map and adding Dense neural network layers. The convolution and max pooling layers transform input images to generalized image features which serve as inputs to the Dense neural network classifier. The reader may find many more detailed explanations of convnets online.

from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
          input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               3211776   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0

The model is compiled with a binary_crossentropy loss function and the acc metric as a generic accuracy metric. These may be used together for a two target class problem, but the metric should be changed for a multiclass problem.

from keras import optimizers

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

A data generator is used to generate batches of image tensor data that can be augmented at runtime. The first example shows the training time comparison with only image rescaling, and the second example shows the results with rotations, x-y shifts, shear, zoom, and horizontal flip augmentations.

# Image data generator with only scaling
from keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        # All images will be resized to 150x150
        target_size=(150, 150),
        batch_size=20,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')
# Image data generator with additional data augmentations
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        # All images will be resized to 150x150
        target_size=(150, 150),
        batch_size=20,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')

The image transformations used for data augmentation are beneficial to reduce overfitting since the model becomes less sensitive to placement and orientation of the objects within an image. The convnet model is fit using 30 epochs without data augmentation and 100 epochs with data augmentation. The model is fit with more epochs in the latter run since model validation performance continues to improve without overfitting.

history = model.fit(
      train_generator,
      steps_per_epoch=100,
      epochs=30, # 100 epochs with data augmentation
      validation_data=validation_generator,
      validation_steps=50)

Model Validation Results

The convnet without data augmentation demonstrates overfitting that begins by the second epoch as the training accuracy exceeds the validation accuracy. The validation accuracy saturates at ~70%.

Accuracy and loss learning curves demonstrate overfitting early in learning

The convnet with data augmentations demonstrates increasing validation accuracy above 80% by the final epoch.

Accuracy and loss curves demonstrate continued improvement through 90 epochs.

GPU vs. CPU Training Time Results

Without data augmentation, the training time for all GPU epochs after the first one was 8 seconds versus the CPU epoch time of 27 seconds.

GPU training time without data augmentation
CPU training time without data augmentation

With the data augmentations used above, the training time for the GPU epochs were 15 seconds versus the CPU epoch time of 28 seconds.

GPU training time with data augmentation
CPU training time with data augmentation

The reason the training time for GPU epochs increased compared to the CPU epochs may be because the ImageDataGenerator augmented the images asynchronously using the CPU. The following post describes more details about how the data augmentation may be done synchronously with the GPU: https://keras.io/examples/vision/image_classification_from_scratch/ and https://github.com/keras-team/keras/issues/12120.

Setup TensorFlow to use the GPU with Docker Containers

Having built a machine suitable for deep learning, I was ready to put my EVGA GeForce 1080 Ti GPU to the test. Unfortunately I found that configuring TensorFlow + GPU to run on my local machine was not as straightforward as any other Python package I have installed. This story has been repeated on many posts online with all the pitfalls that can occur. This post chronicles the simplest approach I have found to start using TensorFlow with the GPU in the simplest and easiest manner as possible.

I am motivated to make this post since I found no sites that chronicled the complete journey to start from a fresh GPU installation and have Tensorflow running on a GPU. Many sites show individual steps, and some advertise how easy this can be while only showing the last Conda install steps required, none of the prior CUDA configuration steps. Having tried multiple approaches to install TensorFlow on my local machine directly to work with the GPU, I found that using a Docker container was a reliable method and also makes work more portable to other machines.

In this post, I will describe all steps that were required to stand up a Docker container that can run TensorFlow on Ubuntu 18.04 OS with an EVGA GeForce GTX 1080 Ti GPU.

1. Install Nvidia Drivers

Prior to installing Nvidia drivers, I recommend removing all existing Nvidia drivers. I have seen errors with the GPU not being recognized due to prior Nvidia GPU and CUDA drivers. If you find that you later want to install Tensorflow with GPU support on the local machine, this is the key first step.

$ sudo apt remove nvidia-*
$ sudo apt install
$ sudo apt autoremove

The next step is to find the appropriate driver for the GPU. Here I performed a Manual Driver Search for GeForce 10 Series: https://www.nvidia.com/en-us/geforce/drivers/. Select the OS with bits (e.g., Linus 64-bit) and downloaded the latest driver for this GPU: Linux x64 (AMD64/EMT64T) Display Driver Version: 465.31 and the run file NVIDIA-Linux-x86_64-465.31.run.

The file permissions may need to be changed prior to executing the run file:

$ sudo chmod +x ./NVIDIA-Linux-x86_64-465.31.run
./NVIDIA-Linux-x86_64-465.31.run

If you do not know the meaning of installation options, I recommend selecting the defaults since other options can produce errors. You may receive a warning about the GCC version being different. I had no errors as long as the system GCC version is more recent than the GCC used to compile the run file.

2. Install Docker

Docker provides the latest instructions to install the Docker engine on Ubuntu here: https://docs.docker.com/engine/install/ubuntu/. Note that Docker may change the steps below, and I recommend following the latest steps from the Docker site. It is recommend to start with the Uninstall Old Versions step to prevent incompatibility issues. Next use the Install Using the Repository and Set Up the Repository steps:

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

Add Docker’s official GPG key:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Setup a stable repository:

$ echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the latest version of the Docker engine:

 $ sudo apt-get update
 $ sudo apt-get install docker-ce docker-ce-cli containerd.io

Finally verify that Docker is working:

$ sudo docker run hello-world

3. Install Nvidia Docker Support

Nvidia provides working instructions to setup Docker and the Nvidia Container Toolkit here with Install on Ubuntu and Debian: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker. I recommend using this link maintained by Nvidia. However, I will also document the steps I used recently to setup Nvidia with Docker support. Note that you can skip the Setting up Docker step since we setup Docker in the prior step. Use the $ docker -v command to confirm that the Docker version is 19.03 or later which is required for nvidia-docker2.

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The output should show the GPU status similar to below (extra points if you catch the pop-culture reference):

In prior installations I received an error installing nvidia-docker like this one: https://github.com/NVIDIA/nvidia-docker/issues/234. If this error occurs, the solution is to install this deb file: https://github.com/NVIDIA/nvidia-docker/files/818401/nvidia-docker_1.0.1-yakkety_amd64.deb.zip. Then the nvidia-docker2 package should be able to be installed, or else you may also try to install with sudo apt-get install -y nvidia-container-toolkit.

4. Pull a Pre-Built Docker Image

The easiest way to get started with Docker is to pull a pre-built image that has Jupyter notebook and TensorFlow GPU support. I recommend selecting an image with a terminal window to make updating the Python virtual environment easier, and I recommend to choose an image that connects to the local filesystem.

The GPU-Jupyter image provides these features: https://github.com/iot-salzburg/gpu-jupyter/commits?author=ChristophSchranz. I started with Quickstart Step 4 to pull the Docker image. If only Python is needed, the site provides names of additional images that exclude Julia and R which should save time in downloading the image. Also select the proper image for the Ubuntu OS. I used the following command to pull the image:

$ cd your-working-directory 
$ docker run --gpus all -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root cschranz/gpu-jupyter:v1.4_cuda-11.0_ubuntu-18.04_python-only

The command and tags used for pulling a docker image are explained here: https://docs.docker.com/engine/reference/run/. The specific commands used for GPU-Jupyter are explained as follows:

  • -d: the container exits when the root process running the container exits.
  • -it: creates a tty (teletypewriter) as a terminal window for interactive processes.
  • -p: specifies the ports to be accessible on the local host.
  • -v: specifies the volumes or shared filesystem. In the command above, a data folder will be created with admin access only in the working-directory.

Once the image has been pulled, it will begin running automatically at http://localhost:8848. The password at the time of this article is gpu-jupyter.

5. Check that Tensorflow Runs on the GPU

One way to confirm that TensorFlow runs with the local machine GPU is to open a Jupyter notebook in the GPU-Jupyter image and use the is_gpu_available() function which returns a Boolean:

import tensorflow as tf
print(tf.test.is_gpu_available(cuda_only=True))

TensorFlow also provides a function to check the GPU device:

print(tf.test.gpu_device_name())
GPU-Jupyter image provides a JupyterLab web interface

As seen above, both commands confirm that TensorFlow recognizes the GPU. If the image is configured correctly, TensorFlow will use the GPU by default.

In my next post I will show initial results of using TensorFlow + GPU for a common deep learning problem.

Building a Deep Learning Machine — Part 4: Installing the Ubuntu 18.4 OS

The machine booted successfully using the chassis power button. The initial screen displays that the CPU and RAM are recognized. The image was taken when only the initial 16 GB RAM was recognized before moving the RAM card to another slot.

I decided to install the latest LTS (Long Term Support) desktop version of Ubuntu 18.04.1 LTS. I chose Ubuntu over Windows 10 since some machine learning packages like OpenCV can only run on Ubuntu, and some applications like Docker are built on Linux and are easier to install on Ubuntu.

I downloaded the 2 GB Ubuntu 18.04.1 ISO file and burned a bootable ISO file onto a USB flash drive using Rufus. The machine booted off the USB flash drive without changing BIOS boot settings beforehand. I chose default Ubuntu installation options with login credentials required. Installation steps are described here in more detail.

The only installation issue I had was that I could not get past the Ubuntu login screen. After every attempt to login by entering the password, Ubuntu would return to the login screen. No error appeared for an incorrect password since the password was correct. I selected the Use LVM with the new Ubuntu installation option on the first installation. LVM stands for Logical Volume Management and allows the user to add, modify, resize, and take snapshot partitions.

The infinite login loop issue was resolved by reinstalling Ubuntu without selecting the LVM option.

Ubuntu 18.04.1 desktop

Building a Deep Learning Machine – Part 3: Installing the SSD, RAM, GPU, PSU, and Motherboard/Power Connections.

I completed building the deep learning machine this past weekend. I will describe the final steps for assembling the hardware in this Part 3 and discuss the OS installation in Part 4.

Solid State Drive
The build has a 500 GB Samsung 960 EVO M.2 solid state storage drive. The drive uses the NVMEe protocol which can be utilized by the M.2 socket with ‘M’ keying (read more about keying here). The Strix X99 M.2 socket runs on a PCIe 3.0 x4 lane which it shares with a U.2 connector. The socket is compatible with the following SSD sizes: 2242/2260/2280/22110. The first two numbers ’22’ are the width (22 mm), and the remaining numbers are the length (42 mm, etc.). The M.2 socket was designed to provide faster link speeds than the mini-SATA connector. The SATE 3.0 has a link speed of up to 6 GB/s versus the PCIe 3.0 x4 lane which runs up to 20 GB/s. The 960 EVO has sequential read/write speeds up to 3.2 GB/s and 1.8 GB/s. Read more about the performance difference between SATA and M.2 at PCWorld here.

Samsung 960 EVO M.2 SSD with 500 GB

When inserted into the M.2 socket, the SSD will be angled upward. I first install a hex head jack screw to raise the screw mount even with the socket and press the SSD onto the jack screw while screwing the mounting screw into the jack screw.

RAM

I have started the build with two 16 GB DDR4 RAM cards.

I initially installed the two RAM cards in the D1 and B1 motherboard locations as recommended by the motherboard manual and as shown below.

After completing the installation and booting the machine, the BIOS utility only recognized the RAM in the B1 slot though the D1 slot is recommended as the first slot to use with one card.

Bios utility recognizes 16GB RAM in the B1 slot.
BIOS utility recognizes a card in the D1 slot but does not recognize the size.

When I researched this issue, the first solutions I found recommended overclocking the motherboard with increased RAM slot voltage to permit using additional RAM cards. In this case, I moved the card in D1 to A1, and the BIOS utility recognized two cards and 32 GB RAM. I recommend moving RAM cards to another slot as the first troubleshooting step when a card is not recognized.

BIOS utility recognizes both cards in the A1 and B1 slots.

GPU

The build will begin with one EVGA GTX 1080 Ti 11GB graphical processing unit. Tests have shown the 1080 Ti performance is comparable to the more expensive Titan X for machine learning applications. The motherboard has 3 PCIe x16 slots and is suited to run two GPU utilizing x16 PCIe lanes each or three GPU in a x16/x8/x8 configuration.

EVGA GTX 1080 Ti GPU
The GTX 1080 Ti fits into a 16 lane PCIe slot.

The Strix X99 motherboard has two PCIe x16 slots and one PCIe x8 slot, and the first GPU is installed in the first slot.

Strix X99 16x PCIe lane

PSU

I chose a EVGA 1000 GQ power supply unit for the build. Since the absolute peak load for a GTX 1080 Ti that is overclocked is 350 W, the 1000 W PSU will be sufficient for an upgrade to two GPUs. The EVGA 1000 GQ comes with one ATX 20+4-pin cable for the main motherboard power supply, two 8(4+4)-pin CPU cables, two standalone 8(6+2)-pin and four 8(6+2)-pin x2 cables, three SATA 5-pin x4 cables, one Molex 4-pin x3 cable, and one Molex to FDD adapter. The ‘4+4’ notation indicates that two 4 pin male cables are wired adjacently to connect with either an 8 or 4 pin female connector.

I recommend connecting all required cables to the PSU prior to installing in the case since access to the rear of the PSU is restricted inside the case. The Phanteks Eclipse P400 case has an exhaust port along the case bottom for the PSU fan.

Motherboard and Power Connections

Once the PSU is installed, I complete the build by making all remaining motherboard and power connections. The GTX 1080 Ti requires one 8 pin and one 6 pin VGA cable.

The motherboard has a 4-pin header for the water pump and a 4-pin header for the CPU fan. The water pump power connector has holes for 3 pins and should be connected to the 3 header pins aligned with the bar as shown below. The fourth pin left of the water pump connector has no bar behind it. The fourth header pin would allow pump/fan speed control via pulse width modulation. The CPU fan has holes for four pins.

The chassis is connected to the motherboard with the following connections. The power, reset, and HD audio connections are shown in the upper left corner of the image below. The chassis USB 3.0 ports are connected in the center connector. The front panel audio connector is shown in the upper right.

The CPU is powered with both the 8-pin and 4-pin ATX connectors. The CPU water cooler is powered with a 15-pin SATA connector from the PSU as shown below in the upper left corner.

After binding the wires in the chassis behind the motherboard, the machine hardware build is now complete! In Part 4, I will describe the OS and software installation.

Building a Deep Learning Machine – Part 2: Installing Motherboard, CPU, and CPU Water Cooler

Installing the CPU

The machine build began with installing the CPU. The CPU is an Intel Xeon E5-1620 v4. Although the processor is a v4, it is designed for a LGA 2011-v3 socket consistent with the ASUS Strix X99 motherboard. I described why I chose this processor in Part 1 of the series.

The LGA 2011-v3 socket on the motherboard has a protective cover to prevent exposing the pins any longer than necessary. The cover warns the user to keep the cover on the socket until after installing the CPU. Removing the cover just before installing the CPU is also fine since the pins are covered once the CPU is pressed into the motherboard socket.

LGA 2011-v3 socket on the Strix X99 motherboard

I opened the socket cover by releasing both spring levers to an open position.

The CPU should be aligned with the arrow on the CPU corner aligned with the arrow on the motherboard socket before being placed into the motherboard socket.

The CPU and motherboard have arrows shown on the lower right corner.

The socket is closed and spring levers are returned to their locked position. Some force is required to press the CPU contacts to the motherboard contacts in order to lock the spring levers. Finally the protective cover is removed.

Installing the motherboard

The PC case is a Phanteks Eclipse P400 Tempered Glass Edition midtower.

Installing the motherboard into the case was straightforward since the board is aligned with the rear I/O connection on the case. The case is built to conceal wires behind the motherboard and has two wire ports located on the opposite side from the rear I/O connection. The motherboard has 9 screws to attach to the case.

View of motherboard and chassis from above with the rear I/O connection along the bottom.

As my first PC build, I learned the hard way that the thermal paste is already layered on the water cooler interface out of the box when I compared the water cooler size to the CPU by placing the cooler on the CPU. I was able to salvage the situation and make the final alignment the same as when I first transferred the thermal paste to the CPU to ensure the thermal paste coverage is consistent.

Installing the Water Cooler

I bought the Corsair H60 water cooler which has a single 120 mm radiator and fan. This water cooler gets good marks on PC part picker for being economical (currently $70) and effective in its price range.

A main consideration before installing the water cooler is whether to apply aftermarket thermal paste. Thermal paste is necessary to ensure suitable thermal contact between the cooler and CPU. Water coolers will come with thermal paste already applied by default. Tests are inconclusive about whether aftermarket thermal paste improves heat transfer; I have seen tests demonstrate worse heat transfer with aftermarket paste. Factors can include the quality of default thermal paste and how well the after market paste is applied. I decided to use the default thermal paste and will trend the temperatures in operation.

The radiator should be positioned against the case wall with the fan oriented as an inlet fan as recommended by Corsair. This setup ensures cooler air is drawn over the radiator to produce a larger temperature delta rather than warm air from inside the box. The Phantek Eclipse P400 provides space for the water cooler radiator and fan on the top of the case. I positioned it towards the front to be closer to the exhaust fans.

The Corsair H60 screws directly into the top of the Intel processor socket on the motherboard. AMD processors are attached with adapters from behind the motherboard.

The Corsair H60 has a 15-pin SATA power connection which will connect directly to the power supply.

Building a Deep Learning Machine – Part 1: Components

I have started building a desktop machine designed for fitting machine learning models including deep learning applications. The Reinforcement Learning and Decision Making class in the OMS CS program at Georgia Tech motivated me to build a machine appropriate for machine learning applications as I start the second half of the masters program. I was able to complete RLDM with my laptop which has a 2.16 GHz Celeron processor and 8 GB RAM, but I plan to use the new machine for upcoming machine learning classes.

I have prioritized designing the machine learning desktop around the GPU(s). I will take a short detour to explain why. GPUs have become the main engine for solving computations in data science models versus CPUs. GPUs have many more arithmetic logic units (ALUs) than CPUs which provides an improved ability to perform simple operations in parallel. Machine learning, artificial intelligence, and deep learning problems generally require matrix math operations that can be accelerated by solving in parallel. My design goals were:

  • A powerful GPU that has sufficient RAM to be well suited for computer vision applications. Some users have reported using 8 GB RAM at a minimum for training computer vision models, but an upgrade to 11 GB is beneficial. The GPU should also have broad support for machine learning libraries. The cuDNN library built on top of Nvidia CUDA programming framework is used by major deep learning frameworks including TensorFlow and PyTorch. I decided on the GeForce GTX 1080 Ti made by EVGA which has a Nvidia processor with 11 GB.
  • Sufficient RAM to handle a future upgrade to two GPUs. The machine should have at least as much RAM as the GPUs. Since I would like the machine to be ready for a possible upgrade to 2 GPUs in the future, I have purchased 32 GB RAM.
  • A 40-lane CPU that can accommodate an upgrade to two 16 PCIe lane GPUs while maximizing the PCIe lanes for data transfer between the CPU and GPU. The data transfer between the CPU and GPU across PCIe lanes can be a bottleneck which slows the GPU performance depending on the application. I chose an Intel Xeon E5 1620 V4 3.5 GHz processor over an i-7 series processor since the Xeon has 40 PCIe lanes which will allow two GPUs to use 16 lanes apiece. *07/02/21 UPDATE: My research in 2018 indicated that PCI lanes may restrict GPU performance. Given the cost of the GPU, I preferred to ensure my system did not restrict performance due to data transfer limitations. However, more recent posts have shown that deep learning may be restricted by memory but should be little restricted by PCI lanes and data transfer with the CPU. Tim Dettmers has a nice article discussing GPU selection for deep learning: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/.
  • A motherboard that suits the GPU and CPU and handles an upgrade to two GPUs while maximizing the PCIe 3.0 lanes for data transfer between the CPU and GPU. PCIe 3.0 is recommended for multiple GPU machines. To have space for two 1080 Ti GPUs, the motherboard needs to support two dual-width x16 graphics slots. The motherboard should also have a LGA 2011 processor slot for the Xeon processor. I chose the ASUS STRIX X99 motherboard which provides 40 PCIe 3.0 lanes which supports a 16/16/8 configuration.

I have provided a full list of the components I chose on PC Part Picker: https://pcpartpicker.com/list/RtKCq4 .

References

Tim Dettmer: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Slav Ivanov: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Yan-David Erlich: https://medium.com/yanda/building-your-own-deep-learning-dream-machine-4f02ccdb0460