Building a Deep Learning Machine – Part 3: Installing the SSD, RAM, GPU, PSU, and Motherboard/Power Connections.

I completed building the deep learning machine this past weekend. I will describe the final steps for assembling the hardware in this Part 3 and discuss the OS installation in Part 4.

Solid State Drive
The build has a 500 GB Samsung 960 EVO M.2 solid state storage drive. The drive uses the NVMEe protocol which can be utilized by the M.2 socket with ‘M’ keying (read more about keying here). The Strix X99 M.2 socket runs on a PCIe 3.0 x4 lane which it shares with a U.2 connector. The socket is compatible with the following SSD sizes: 2242/2260/2280/22110. The first two numbers ’22’ are the width (22 mm), and the remaining numbers are the length (42 mm, etc.). The M.2 socket was designed to provide faster link speeds than the mini-SATA connector. The SATE 3.0 has a link speed of up to 6 GB/s versus the PCIe 3.0 x4 lane which runs up to 20 GB/s. The 960 EVO has sequential read/write speeds up to 3.2 GB/s and 1.8 GB/s. Read more about the performance difference between SATA and M.2 at PCWorld here.

Samsung 960 EVO M.2 SSD with 500 GB

When inserted into the M.2 socket, the SSD will be angled upward. I first install a hex head jack screw to raise the screw mount even with the socket and press the SSD onto the jack screw while screwing the mounting screw into the jack screw.

RAM

I have started the build with two 16 GB DDR4 RAM cards.

I initially installed the two RAM cards in the D1 and B1 motherboard locations as recommended by the motherboard manual and as shown below.

After completing the installation and booting the machine, the BIOS utility only recognized the RAM in the B1 slot though the D1 slot is recommended as the first slot to use with one card.

Bios utility recognizes 16GB RAM in the B1 slot.
BIOS utility recognizes a card in the D1 slot but does not recognize the size.

When I researched this issue, the first solutions I found recommended overclocking the motherboard with increased RAM slot voltage to permit using additional RAM cards. In this case, I moved the card in D1 to A1, and the BIOS utility recognized two cards and 32 GB RAM. I recommend moving RAM cards to another slot as the first troubleshooting step when a card is not recognized.

BIOS utility recognizes both cards in the A1 and B1 slots.

GPU

The build will begin with one EVGA GTX 1080 Ti 11GB graphical processing unit. Tests have shown the 1080 Ti performance is comparable to the more expensive Titan X for machine learning applications. The motherboard has 3 PCIe x16 slots and is suited to run two GPU utilizing x16 PCIe lanes each or three GPU in a x16/x8/x8 configuration.

EVGA GTX 1080 Ti GPU
The GTX 1080 Ti fits into a 16 lane PCIe slot.

The Strix X99 motherboard has two PCIe x16 slots and one PCIe x8 slot, and the first GPU is installed in the first slot.

Strix X99 16x PCIe lane

PSU

I chose a EVGA 1000 GQ power supply unit for the build. Since the absolute peak load for a GTX 1080 Ti that is overclocked is 350 W, the 1000 W PSU will be sufficient for an upgrade to two GPUs. The EVGA 1000 GQ comes with one ATX 20+4-pin cable for the main motherboard power supply, two 8(4+4)-pin CPU cables, two standalone 8(6+2)-pin and four 8(6+2)-pin x2 cables, three SATA 5-pin x4 cables, one Molex 4-pin x3 cable, and one Molex to FDD adapter. The ‘4+4’ notation indicates that two 4 pin male cables are wired adjacently to connect with either an 8 or 4 pin female connector.

I recommend connecting all required cables to the PSU prior to installing in the case since access to the rear of the PSU is restricted inside the case. The Phanteks Eclipse P400 case has an exhaust port along the case bottom for the PSU fan.

Motherboard and Power Connections

Once the PSU is installed, I complete the build by making all remaining motherboard and power connections. The GTX 1080 Ti requires one 8 pin and one 6 pin VGA cable.

The motherboard has a 4-pin header for the water pump and a 4-pin header for the CPU fan. The water pump power connector has holes for 3 pins and should be connected to the 3 header pins aligned with the bar as shown below. The fourth pin left of the water pump connector has no bar behind it. The fourth header pin would allow pump/fan speed control via pulse width modulation. The CPU fan has holes for four pins.

The chassis is connected to the motherboard with the following connections. The power, reset, and HD audio connections are shown in the upper left corner of the image below. The chassis USB 3.0 ports are connected in the center connector. The front panel audio connector is shown in the upper right.

The CPU is powered with both the 8-pin and 4-pin ATX connectors. The CPU water cooler is powered with a 15-pin SATA connector from the PSU as shown below in the upper left corner.

After binding the wires in the chassis behind the motherboard, the machine hardware build is now complete! In Part 4, I will describe the OS and software installation.

Building a Deep Learning Machine – Part 2: Installing Motherboard, CPU, and CPU Water Cooler

Installing the CPU

The machine build began with installing the CPU. The CPU is an Intel Xeon E5-1620 v4. Although the processor is a v4, it is designed for a LGA 2011-v3 socket consistent with the ASUS Strix X99 motherboard. I described why I chose this processor in Part 1 of the series.

The LGA 2011-v3 socket on the motherboard has a protective cover to prevent exposing the pins any longer than necessary. The cover warns the user to keep the cover on the socket until after installing the CPU. Removing the cover just before installing the CPU is also fine since the pins are covered once the CPU is pressed into the motherboard socket.

LGA 2011-v3 socket on the Strix X99 motherboard

I opened the socket cover by releasing both spring levers to an open position.

The CPU should be aligned with the arrow on the CPU corner aligned with the arrow on the motherboard socket before being placed into the motherboard socket.

The CPU and motherboard have arrows shown on the lower right corner.

The socket is closed and spring levers are returned to their locked position. Some force is required to press the CPU contacts to the motherboard contacts in order to lock the spring levers. Finally the protective cover is removed.

Installing the motherboard

The PC case is a Phanteks Eclipse P400 Tempered Glass Edition midtower.

Installing the motherboard into the case was straightforward since the board is aligned with the rear I/O connection on the case. The case is built to conceal wires behind the motherboard and has two wire ports located on the opposite side from the rear I/O connection. The motherboard has 9 screws to attach to the case.

View of motherboard and chassis from above with the rear I/O connection along the bottom.

As my first PC build, I learned the hard way that the thermal paste is already layered on the water cooler interface out of the box when I compared the water cooler size to the CPU by placing the cooler on the CPU. I was able to salvage the situation and make the final alignment the same as when I first transferred the thermal paste to the CPU to ensure the thermal paste coverage is consistent.

Installing the Water Cooler

I bought the Corsair H60 water cooler which has a single 120 mm radiator and fan. This water cooler gets good marks on PC part picker for being economical (currently $70) and effective in its price range.

A main consideration before installing the water cooler is whether to apply aftermarket thermal paste. Thermal paste is necessary to ensure suitable thermal contact between the cooler and CPU. Water coolers will come with thermal paste already applied by default. Tests are inconclusive about whether aftermarket thermal paste improves heat transfer; I have seen tests demonstrate worse heat transfer with aftermarket paste. Factors can include the quality of default thermal paste and how well the after market paste is applied. I decided to use the default thermal paste and will trend the temperatures in operation.

The radiator should be positioned against the case wall with the fan oriented as an inlet fan as recommended by Corsair. This setup ensures cooler air is drawn over the radiator to produce a larger temperature delta rather than warm air from inside the box. The Phantek Eclipse P400 provides space for the water cooler radiator and fan on the top of the case. I positioned it towards the front to be closer to the exhaust fans.

The Corsair H60 screws directly into the top of the Intel processor socket on the motherboard. AMD processors are attached with adapters from behind the motherboard.

The Corsair H60 has a 15-pin SATA power connection which will connect directly to the power supply.

Building a Deep Learning Machine – Part 1: Components

I have started building a desktop machine designed for fitting machine learning models including deep learning applications. The Reinforcement Learning and Decision Making class in the OMS CS program at Georgia Tech motivated me to build a machine appropriate for machine learning applications as I start the second half of the masters program. I was able to complete RLDM with my laptop which has a 2.16 GHz Celeron processor and 8 GB RAM, but I plan to use the new machine for upcoming machine learning classes.

I have prioritized designing the machine learning desktop around the GPU(s). I will take a short detour to explain why. GPUs have become the main engine for solving computations in data science models versus CPUs. GPUs have many more arithmetic logic units (ALUs) than CPUs which provides an improved ability to perform simple operations in parallel. Machine learning, artificial intelligence, and deep learning problems generally require matrix math operations that can be accelerated by solving in parallel. My design goals were:

  • A powerful GPU that has sufficient RAM to be well suited for computer vision applications. Some users have reported using 8 GB RAM at a minimum for training computer vision models, but an upgrade to 11 GB is beneficial. The GPU should also have broad support for machine learning libraries. The cuDNN library built on top of Nvidia CUDA programming framework is used by major deep learning frameworks including TensorFlow and PyTorch. I decided on the GeForce GTX 1080 Ti made by EVGA which has a Nvidia processor with 11 GB.
  • Sufficient RAM to handle a future upgrade to two GPUs. The machine should have at least as much RAM as the GPUs. Since I would like the machine to be ready for a possible upgrade to 2 GPUs in the future, I have purchased 32 GB RAM.
  • A 40-lane CPU that can accommodate an upgrade to two 16 PCIe lane GPUs while maximizing the PCIe lanes for data transfer between the CPU and GPU. The data transfer between the CPU and GPU across PCIe lanes can be a bottleneck which slows the GPU performance depending on the application. I chose an Intel Xeon E5 1620 V4 3.5 GHz processor over an i-7 series processor since the Xeon has 40 PCIe lanes which will allow two GPUs to use 16 lanes apiece. *07/02/21 UPDATE: My research in 2018 indicated that PCI lanes may restrict GPU performance. Given the cost of the GPU, I preferred to ensure my system did not restrict performance due to data transfer limitations. However, more recent posts have shown that deep learning may be restricted by memory but should be little restricted by PCI lanes and data transfer with the CPU. Tim Dettmers has a nice article discussing GPU selection for deep learning: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/.
  • A motherboard that suits the GPU and CPU and handles an upgrade to two GPUs while maximizing the PCIe 3.0 lanes for data transfer between the CPU and GPU. PCIe 3.0 is recommended for multiple GPU machines. To have space for two 1080 Ti GPUs, the motherboard needs to support two dual-width x16 graphics slots. The motherboard should also have a LGA 2011 processor slot for the Xeon processor. I chose the ASUS STRIX X99 motherboard which provides 40 PCIe 3.0 lanes which supports a 16/16/8 configuration.

I have provided a full list of the components I chose on PC Part Picker: https://pcpartpicker.com/list/RtKCq4 .

References

Tim Dettmer: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Slav Ivanov: https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9

Yan-David Erlich: https://medium.com/yanda/building-your-own-deep-learning-dream-machine-4f02ccdb0460