Update April 2019 I bought a new GPU for my server, a P400 which is capable of transcoding H265 encoded video. I found this guide wasn’t working properly because it wasn’t ensuring that
/dev/nvidia-uvm was present and passed through to the container. With some help from this post, this guide has been updated to address that.
This project has been on my home media server to-do list since I rebuilt my media server this summer. I put off the problem, which I knew would take some work, until this week while I had the house to myself and no school or work to attend to. It was prickly enough that I decided to post about it - in case I ever need to do this again and have forgotten how, or that this ranks high enough in search results to help someone else who’s struggling.
In this first section I just provide background and some extra details. Skip down to the Guide for the how-to guide to get Nvidia GPU Encoding working in a Linux Container (LXC).
When I built my home server I planned it to be a multi-purpose machine. I based it on Proxmox 4.4 (Debian Jessie 8.6) so that I could use virtualization or its built-in lxc-based container system to segregate the various roles this server will have.
Its primary roles are as a file system (ZFS storage array) and a media server using Emby, a mono-based (.net) Plex-alternative that I switched to a couple years ago and am very happy with.
I built it on low wattage hardware (8-core atom C2758-based A1SRM-2758F) that struggled to transcode high-resolution video streams - 1080p streams transcoded at 17 fps, slower than realtime. Part of my problem may have been how Proxmox/LXC scheduled the CPU - it seemed to be reserving some of the CPU for another container, despite that container not needing it.
I bought a GT 710 which, despite its naming convention, was recently released and has a passively cooled 28nm Kepler chip that draws around 19 watts. Unfortunately it doesn’t support HEVC/x265, but hopefully in the next few years Nvidia will come out with an updated low-power card that does.
Emby came out with experimental GPU encoding support in Januay 2015, so my mission is to
- Get my containerized Emby server the ability to access the GPU
- Provide Emby a copy of FFMPEG that supports NVENC (Nvidia’s GPU-accelerated encoding engine).
Pass GPU to LXC Container
In order to get a LXC container to have access to the NVidia GPU you need to pass the device through to the container in the lxc config file (more on that later). But in order to do that, the devices need to appear in the
/dev/ folder, specificially
/dev/nvidia0 and its siblings.
My breakthrough came when I found this guide on giving CUDA access to Linux containers. You have a choice - you can download the drivers from a repository (in my case nvidia proprietary driver version
367.48 was available from jessie-backports). I tried this - several locations suggested it, and it’s an easier solution if it works. In addition, 367 was recent enough to give my hardware NVENC, the holy grail (at least for me).
It didn’t work. I never got the desired nodes in
/dev/nvidia-uvm, nor did
nvidia-smi ever run correctly. In the end, I settled on downloading the most recent .run package from Nvidia directly, version
375.26. Once I did this, and configured
nvidia-persistenced to run at startup, I got my
/dev nodes to appear consistently at startup.
My next job was to get them to also appear in the container.
I added the necessary lines to my lxc
.config file, and indeed they did show up in the container. However they didn’t work. I tested it by running Nvidia System Management Interface (
nvidia-smi) in the container, and it failed every time.
I struggled with this for far too long, until I had an eureka moment after reading a forum post: I need the same OS in the container that’s in the host, Debian 8.6. My current container was Ubuntu 16.04.
I spun up a new Debian 8.6-based container, made it as similar to the old except I made it unpriviliged, confident it would work based on much of my research.
Voila, I was able to see the devices and
nvidia-smi could query the device successfully. One down, one to go.
Get an NVENC-compatible FFMPEG
I tried the most obvious approach, which is using a static build of FFMPEG for my architecture. Emby devs suggested that other users use these builds, so I grabbed it too. A quick
ffmpeg -encoders | grep nvenc showed that the they were built with NVENC. I tried it out, and every time I tried using the h264_nvenc encoder it would segmentation fault.
I decided to try and build it myself, and I used two sources - one, an older pdf guide from Nvidia, and the second was Nvidia’s ffmpeg page. They were both valuable, but I leaned more heavily on the newer web page for my final product.
Update April 2017: Nvidia has released a new FFMPEG guide (you must have a free developer account to access it) that is much better than the old one.
After a few false starts, I got a working build that was able to utilize the GPU to encode video. After some more work turning my ffmpeg binary into a more multi-purpose encoder (setting the appropriate flags), it was finished. After a trial run, it worked flawlessly.
I told Emby to use the system-installed ffmpeg and to use NVENC for transcoding. I was immediately blown away at how fast the transcoded video stream started, and how quickly it proceeded. I had a 23Mbit source file that I forced to transcode to 1Mbit, and later at 21Mbit. According to my benchmarks, the 21Mbit transcode of a 23Mbit source was giving me about 112 fps. More than fast enough - in fact, if I can get the quality higher with a slower transcode, I’d be happy, because…
Frankly, the NVENC encoded video looks much worse than the original and a CPU transcode. Even transcoding to 21Mbit, NVENC had a lot of compression artifacts and the beautiful grain of the source was entirely lost. It was apparent on my mid-tier 4k TV and on my low end 1080p LCD.
CPU transcoding was unwatchable, so this is definitely an improvement. I will work with the Emby devs to see if there’s a way to pass a quality setting to NVENC so it more closely matches the source material.
Use Nvidia Driver in LXC Container
Start with a working LXC host that has
build-essential, the kernel headers installed, with an Nvidia-based GPU that supports NVENC.
# apt install build-essential
I use Proxmox, so the command to install the kernel header package is:
# apt install pve-headers
Note: You may need to add the Proxmox No-Subscription Repository if you don’t have a Proxmox subscription.
If you’re using Debian or Ubuntu, this command should install them:
# apt install linux-headers-$(uname -r)
Optional: Including the
pkg-config package will help the Nvidia driver installer find libraries on your system, so install that with
apt install pkg-config
1: Download and install Nvidia proprietary drivers on the host
Go to http://www.nvidia.com/object/unix.html to download the most recent driver for your architecture. You may need to install gcc and make from your package manager. Make sure you remove any previously-installed nvidia drivers before you do this.
# wget http://us.download.nvidia.com/XFree86/Linux-x86_64/440.100/NVIDIA-Linux-x86_64-440.100.run # chmod +x NVIDIA-Linux-x86_64-440.100.run # ./NVIDIA-Linux-x86_64-440.100.run
The installer will guide you through the necessary settings. In my experience, the defaults are just fine. You will probably need to restart your machine as part of this process. Once you have rebooted, run the installer again. When it is complete, you’ll see some Nvidia binaries were installed and are part of your path - type
nvid and the
tab key to see them all. Likewise, you should have at least some of the
/dev/nvidia* nodes present, but I had to tell
nvidia-persistenced to run at startup before they would work. This step also fixed an issue where the nodes appeared and
nvidia-smi reported the GPU correctly, but nvenc would not work properly:
# nvidia-persistenced --persistence-mode
Test you can access the driver by running
2: Load Nvidia Kernel Modules at Boot
This is a step I added to ensure that the CUDA kernel modules are loaded and the device nodes are present at boot, pulled directly from this post.
/etc/modules-load.d/modules.conf to add the following two lines:
# /etc/modules: kernel modules to load at boot time. # This file contains the names of kernel modules that should be loaded # at boot time, one per line. Lines beginning with “#" are ignored. nvidia nvidia_uvm
Update initramfs to apply the changes:
# update-initramfs -u
To have the nodes created in /dev/, add the following rules to
# /etc/udev/rules.d/70-nvidia.rules # Create /nvidia0, /dev/nvidia1 and /nvidiactl when nvidia module is loaded KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'" # Create the CUDA node when nvidia_uvm CUDA module is loaded KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
Reboot the server and confirm you see
3: Passthrough GPU /dev nodes to LXC
Make sure your container is the same version as your host - eg: if your host is Ubuntu 16.10, your container must be Ubuntu 16.10.
Update: This may not be necessary, but consider making them the same if you are encountering trouble.
Edit your lxc container
.conf file to pass through the devices - Proxmox places the files in
/etc/pve/lxc/###.conf - edit it like so:
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
4: Download and install Nvidia proprietary drivers in the container
Log into your Container and install the driver, but don’t install the kernel modules - let’s use
--no-kernel-module to make sure the .run package doesn’t try to install them, since we’re using the host kernel that already has the module loaded:
# lxc-attach -n ### # wget http://us.download.nvidia.com/XFree86/Linux-x86_64/440.100/NVIDIA-Linux-x86_64-440.100.run # chmod +x NVIDIA-Linux-x86_64-440.100.run # ./NVIDIA-Linux-x86_64-440.100.run --no-kernel-module
If it doesn’t, you may need to run the
nvidia-persistenced command as above.
Maintaining the Nvidia Driver
Eventually you will have a new Linux kernel and its headers come down from
apt. Install those the latest version keep your kernel patched against known kernel vulnerabilities. Sometimes DKMS doesn’t work as it should to keep your Nvidia driver kernel module updated. If this happens, you need to reinstall the Nvidia drivers on the host.
Reinstalling & Upgrading the Nvidia Drivers
You can use the
.run file for your currently installed driver if you still have it. This may also be a good time to upgrade your driver to a new ‘Long Lived Branch’ version if one is out - check out http://www.nvidia.com/object/unix.html. You will have to upgrade all your containers to the new version as well, so consider that before upgrading your host to a new version.
.run you have decided on, upgrade your host by running the installer:
In my experience, you may have errors when it attempts to upgrade your version of the driver. If you do, try uninstalling with:
# ./NVIDIA-Linux-x86_64-440.100.run --uninstall
After this, it’s good time to make sure your kernel and headers are current with
apt update && apt upgrade -y. Once you are, reboot your host. You should be able to install the new version after taking these steps.
Congratulations! You now have CUDA/Nvidia devices present on your host and within your container! If you want to use the CUDA toolkit to build your own packages, keep reading
Install and configure CUDA
The CUDA toolkit allows you to build applications that let you leverage the power of your GPU. I use it to build FFMPEG, but these steps should get you most of the way to building any CUDA application for your container.
1: Install the CUDA Toolkit
We need to download the current CUDA SDK, which we’ll use to build FFMPEG later. Go to https://developer.nvidia.com/cuda-downloads to download the appropriate runfile.
When you install the CUDA SDK, be sure not to install the driver again - you already have. Install the toolkit and (optionally) the samples in the default locations.
# wget http://developer.download.nvidia.com/compute/cuda/11.0.1/local_installers/cuda_11.0.1_450.36.06_linux.run # chmod +x cuda_11.0.1_450.36.06_linux.run # ./cuda_11.0.1_450.36.06_linux.run
We also need to copy the cuda.h file to our local includes to make it easier to build FFMPEG:
# cp /usr/local/cuda/include/cuda.h /usr/include/
2: Install the Nvidia Video Codec SDK
Next, download the Nvidia Video Codec SDK from the developer’s site: https://developer.nvidia.com/nvidia-video-codec-sdk. You will need to create an account to gain access to the most current version of the SDK.
I tested with 7.1, however it’s possible 8.0 will also work. 8.0 is required for FFMPEG 3.4. Once you have it, extract it to a development folder and move the header files to your local include directory:
# mkdir ~/development # cd ~/development # wget https://developer.nvidia.com/designworks/video_codec_sdk/downloads/v7.1?accept_eula=yes # unzip Video_Codec_SDK_7.1.9.zip # cp Video_Codec_SDK_7.1.9/Samples/common/inc/*.h /usr/local/include
3: Grab other FFMPEG dependencies
Based on your FFMPEG configuration, you may need different libraries. I opted to grab as many as I could from apt in order to simplify, but you may find you need to search for others. My configuration is based on the general-purpose static build recommended by Emby devs, and called for the following libraries and packages to be installed:
# apt install libvpx4 frei0r-plugins-dev libgnutls28-dev libass-dev \ libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev \ libopenjpeg-dev libopus-dev librtmp-dev libsoxr-dev libspeex-dev \ libtheora-dev libvo-amrwbenc-dev libvorbis-dev libvpx-dev libwebp-dev \ libx264-dev libx265-dev libxvidcore-dev
Later on, during the configure stage, you may find you need other libraries - this isn’t exhaustive and may change based on the current version of FFMPEG and its dependencies.
4: Get and Build FFMPEG
Download FFMPEG from git and select your desired branch. I chose 3.1, since 3.2.2 had issues with Emby (based on a flaw in FFMPEG):
# cd ~/development # git clone https://git.ffmpeg.org/ffmpeg.git # cd ffmpeg
View and select a remote branch
# git branch -a # git checkout release/3.3
Now, let`s configure our ffmpeg build
# mkdir ../ffmpegbuild # cd ../ffmpegbuild # ../ffmpeg/configure \ --enable-nonfree --disable-shared --enable-nvenc \ --enable-cuda --enable-cuvid --enable-libnpp \ --extra-cflags=-Ilocal/include --enable-gpl \ --enable-version3 --disable-debug \ --disable-ffplay --disable-indev=sndio \ --disable-outdev=sndio --enable-fontconfig \ --enable-frei0r --enable-gnutls --enable-gray \ --enable-libass --enable-libfreetype \ --enable-libfribidi --enable-libmp3lame \ --enable-libopencore-amrnb --enable-libopencore-amrwb \ --enable-libopenjpeg --enable-libopus --enable-librtmp \ --enable-libsoxr --enable-libspeex --enable-libtheora \ --enable-libvo-amrwbenc --enable-libvorbis \ --enable-libvpx --enable-libwebp --enable-libx264 \ --enable-libx265 --enable-libxvid \ --extra-cflags=-I/usr/local/cuda/include \ --extra-ldflags=-L/usr/local/cuda/lib64
The last line tells ffmpeg where to find the cuda libraries it needs from the CUDA SDK installed earlier. If all looks good, let’s build it:
# make -j 8
-j flag tells the compiler how many cores to use, so you can omit it (it didn’t work on a Python build I ran recently) or change it to the number of cores you want to use
That’s it! Take FFMPEG for a spin - try some encodes and see how fast it goes. If it works, let’s install it! If it doesn’t, take a look at the error and start your troubleshooting there.
# make install
Leave a comment
Your email address will not be published. Required fields are marked *