
Building an efficient input pipeline is an important performance optimization for training deep neural networks. Beside this, data provisioning needs even to be well structured and transparent do be ruled out as a source of errors for your training. While a lot of current developments are running on PyTorch — Tensorflow is still the way to go if you plan to go to Edge Devices or if want to run on giant training clusters with Terabytes of data. This is where the tf.data API with the tf.data.Dataset jumps in: having an efficient pipeline to provide you with training data which…
A hardware/software guide based on our setup.

Getting a GPU machine running with recent versions of Cuda, Tensorflow, Pytorch are quite some steps. There are many guides out there, but it is still a gamble to get the right combination of hardware together with the right combination of software. This guide shows one working combination which currently works at our office and does the job — if you want to resemble this or if you want to get clues how we got things running, read on.
We start with the hardware we got and our decisions behind choosing these components…

Computer Scientist, Consultant, Founder of Riitail.