January 26, 2011

CUDA in Runlevel 3

Most Linux clusters operate at runlevel 3 instead of runlevel 5. The major difference is that runlevel 5 starts the Xdm graphical login system -- something that isn't very useful on cluster nodes which don't have a display attached. Configuring nodes for runlevel 3 eliminates the overhead of that unused X11 server, which is good for application performance. Unfortunately, the NVIDIA driver is usually loaded when X11 starts, so a node operating at runlevel 3 can't run CUDA applications.

The fix is using an init script to modprobe the NVIDIA driver and create the necessary /dev files. My script for doing that is below. In addition to loading the driver and creating the necessary device files, it also sets compute-exclusive mode on each card and loops nvidia-smi to keep a persistent connection for faster kernel launches.

#!/bin/bash
#
# /etc/init.d/cuda startup script for nvidia driver
# symlink from /etc/rc3.d/S80cuda in non xdm environments
#
# Creates devices, sets persistent and compute-exclusive mode
# Useful for compute nodes in runlevel 3 w/o X11 running
#
# chkconfig: 345 80 20

# Source function library
. /lib/lsb/init-functions

# Alias RHEL's success and failure functions
success() {
    log_success_msg $@
}
failure() {
    log_failure_msg $@
}

# Create /dev nodes
function createdevs() {
    # Count the number of NVIDIA controllers
    N=`/sbin/lspci -m | /bin/egrep -c '(3D|VGA).+controller.+nVidia'`

    # Create Devices, exit on failure
    while [ ${N} -gt 0 ] 
    do
      let N-=1
      /bin/mknod -m 666 /dev/nvidia${N} c 195 ${N} || exit $?
    done
    /bin/mknod -m 666 /dev/nvidiactl c 195 255 || exit $?
}

# Remove /dev nodes
function removedevs() {
    /bin/rm -f /dev/nvidia*
}

# Set compute-exclusive
function setcomputemode() {
    # Count the number of NVIDIA controllers
    N=`/sbin/lspci -m | /bin/egrep -c '(3D|VGA).+controller.+nVidia'`
    # Set Compute-exclustive mode, continue on failures
    while [ $N -gt 0 ]
    do
      let N-=1
      /usr/bin/nvidia-smi -c 1 -g ${N} > /dev/null
    done
}

# Start daemon
function start() {
   echo -n $"Loading nvidia kernel module: "
   /sbin/modprobe nvidia && success || { failure ; exit 1 ;}
   echo -n $"Creating CUDA /dev entries: "
   createdevs && success || { failure ; exit 1 ;}
   echo $"Setting CUDA compute-exclusive mode."
   setcomputemode
   echo $"Starting nvidia-smi for persistence."
   /usr/bin/nvidia-smi -l -i 60 > /dev/null &
}

# Stop daemon
function stop() {
   echo $"Killing nvidia-smi."
   /usr/bin/killall nvidia-smi
   echo -n $"Unloading nvidia kernel module: "
   sleep 1
   /sbin/rmmod -f nvidia && success || failure
   echo -n $"Removing CUDA /dev entries: "
   removedevs && success || failure
}

# See how we were called
case "$1" in
   start)
       start
      ;;
   stop)
       stop
      ;;
   restart)
       stop
       start
      ;;
   *)
       echo $"Usage: $0 {start|stop|restart}"
       exit 1
esac
exit 0

This script is fairly well-tested on RHEL/CentOS systems, but probably works on other distros with no or minor modifications.

No comments: