Parallel Evaluations in R

3 minute read

Overview

R provides a variety of packages for parallel computations. One of the most comprehensive parallel computing environments for R is batchtools (formerly BatchJobs). It supports both multi-core and multi-node computations with and without schedulers. By making use of cluster template files, most schedulers and queueing systems are also supported (e.g. Torque, Sun Grid Engine, Slurm).

R code of this section

To simplify the evaluation of the R code of this page, the corresponding text version is available for download from here.

Parallelization with batchtools

The following introduces the usage of batchtools for a computer cluster using SLURM as scheduler (workload manager).

Set up working directory for SLURM

First login to your cluster account, open R and execute the following lines. This will create a test directory (here mytestdir), redirect R into this directory and then download the required files:

dir.create("mytestdir")
setwd("mytestdir")
download.file("https://goo.gl/tLMddb", "slurm.tmpl")
download.file("https://goo.gl/5HrYkE", ".batchtools.conf.R")

Load package and define some custom function

This is the test function (here toy example) that will be run on the cluster for demonstration purposes. It subsets the iris data frame by rows, and appends the host name and R version of each node where the function was executed. The R version to be used on each node can be specified in the slurm.tmpl file (under module load).

library('RenvModule')
module('load','slurm') # Loads slurm among other modules

library(batchtools)
myFct <- function(x) {
	result <- cbind(iris[x, 1:4,],
	Node=system("hostname", intern=TRUE),
	Rversion=paste(R.Version()[6:7], collapse="."))
}

Submit jobs from R to cluster

The following creates a batchtools registry, defines the number of jobs and resource requests, and then submits the jobs to the cluster via SLURM.

reg <- makeRegistry(file.dir="myregdir", conf.file=".batchtools.conf.R")
Njobs <- 1:4 # Define number of jobs (here 4)
ids <- batchMap(fun=myFct, x=Njobs) 
done <- submitJobs(ids, reg=reg, resources=list(partition="short", walltime=60, ntasks=1, ncpus=1, memory=1024))
waitForJobs() # Wait until jobs are completed

Summarize job status

After the jobs are completed one instect their status as follows.

getStatus() # Summarize job status
showLog(Njobs[1])
# killJobs(Njobs) # # Possible from within R or outside with scancel

Access/assemble results

The results are stored as .rds files in the registry directory (here myregdir). One can access them manually via readRDS or use various convenience utilities provided by the batchtools package.

readRDS("myregdir/results/1.rds") # reads from rds file first result chunk
loadResult(1) 
lapply(Njobs, loadResult)
reduceResults(rbind) # Assemble result chunks in single data.frame
do.call("rbind", lapply(Njobs, loadResult))

Remove registry directory from file system

By default existing registries will not be overwritten. If required one can exlicitly clean and delete them with the following functions.

clearRegistry() # Clear registry in R session
removeRegistry(wait=0, reg=reg) # Delete registry directory
# unlink("myregdir", recursive=TRUE) # Same as previous line

Load registry into R

Loading a registry can be useful when accessing the results at a later state or after moving them to a local system.

from_file <- loadRegistry("myregdir", conf.file=".batchtools.conf.R")
reduceResults(rbind)
Last modified July 8, 2021: added aliases from old site (9ab14f0)