February 16, 2011

CUDA_VISIBLE_DEVICES

On systems with more than one GPU, it's useful to be able to select which device(s) you want to use for running CUDA apps. The CUDA APIs will select a GPU as the default, so unless you specify differently, all your CUDA applications will run on the same GPU. Setting compute-exclusive mode doesn't change this behavior -- all the programs will still target the same default GPU, but at least the additional ones will fail quickly rather than consuming resources that might be required by the first program.

One solution is to use $CUDA_VISIBLE_DEVICES. The environment variable CUDA_VISIBLE_DEVICES lists which devices are visible as a comma-separated string. For example, I've equipped my desktop with two Tesla cards and a Quadro card. I can use the deviceQuery program from the CUDA SDK and a little grep magic to list them:

$ ./deviceQuery -noprompt | egrep "^Device"
Device 0: "Tesla C2050"
Device 1: "Tesla C1060"
Device 2: "Quadro FX 3800"

By setting the envar, I can make only a subset of them visible to the runtime:

$ export CUDA_VISIBLE_DEVICES="0,2"
$ ./deviceQuery -noprompt | egrep "^Device"
Device 0: "Tesla C2050"
Device 1: "Quadro FX 3800"

Note that I didn't change the deviceQuery options at all, just the setting of $CUDA_VISIBLE_DEVICES. Also note that the GPUs are still enumerated sequentially from zero, but only the cards listed in the visible devices envar are exposed to the CUDA app.

This is useful in a couple situations. The first is the example I used above. My development workstation often has a mix of CUDA-capable devices in it. I generally need to target a specific model of card for testing, and the easiest way to do so is $CUDA_VISIBLE_DEVICES (especially with bash where it can be set per-invocation by prefixing the command).

The other case is clusters where nodes might be time-shared (multiple jobs running on the same node at the same time) but multiple GPUs on those nodes should be space-shared (one job per GPU). By setting $CUDA_VISIBLE_DEVICES in the prologue script, the batch system can route jobs to the right GPU without requiring the user to set additional command-line flags or configuration files. Of course, that requires the batch scheduler to support treating GPUs as independent consumables, but several common batch scheduling and resource management systems have already added that capability.

CUDA_VISIBLE_DEVICES is a simple feature, but it's a good technique to have in your pocket if you work on multiple GPU systems.

No comments: