Parallel Evaluations in R
3 minute read
Overview
R provides a variety of packages for parallel computations. One of the most
comprehensive parallel computing environments for R is batchtools
(formerly BatchJobs
). It supports both multi-core and multi-node computations with and
without schedulers. By making use of cluster template files, most schedulers
and queueing systems are also supported (e.g. Torque, Sun Grid Engine, Slurm).
R code of this section
To simplify the evaluation of the R code of this page, the corresponding text version is available for download from here.
Parallelization with batchtools
The following introduces the usage of batchtools
for a computer cluster using SLURM as scheduler (workload manager).
Set up working directory for SLURM
First login to your cluster account, open R and execute the following lines. This will
create a test directory (here mytestdir
), redirect R into this directory and then download
the required files:
dir.create("mytestdir")
setwd("mytestdir")
download.file("https://goo.gl/tLMddb", "slurm.tmpl")
download.file("https://goo.gl/5HrYkE", ".batchtools.conf.R")
Load package and define some custom function
This is the test function (here toy example) that will be run on the cluster for demonstration
purposes. It subsets the iris
data frame by rows, and appends the host name and R version of each
node where the function was executed. The R version to be used on each node can be
specified in the slurm.tmpl
file (under module load
).
library('RenvModule')
module('load','slurm') # Loads slurm among other modules
library(batchtools)
myFct <- function(x) {
result <- cbind(iris[x, 1:4,],
Node=system("hostname", intern=TRUE),
Rversion=paste(R.Version()[6:7], collapse="."))
}
Submit jobs from R to cluster
The following creates a batchtools
registry, defines the number of jobs and resource requests, and then submits the jobs to the cluster
via SLURM.
reg <- makeRegistry(file.dir="myregdir", conf.file=".batchtools.conf.R")
Njobs <- 1:4 # Define number of jobs (here 4)
ids <- batchMap(fun=myFct, x=Njobs)
done <- submitJobs(ids, reg=reg, resources=list(partition="short", walltime=60, ntasks=1, ncpus=1, memory=1024))
waitForJobs() # Wait until jobs are completed
Summarize job status
After the jobs are completed one instect their status as follows.
getStatus() # Summarize job status
showLog(Njobs[1])
# killJobs(Njobs) # # Possible from within R or outside with scancel
Access/assemble results
The results are stored as .rds
files in the registry directory (here myregdir
). One
can access them manually via readRDS
or use various convenience utilities provided
by the batchtools
package.
readRDS("myregdir/results/1.rds") # reads from rds file first result chunk
loadResult(1)
lapply(Njobs, loadResult)
reduceResults(rbind) # Assemble result chunks in single data.frame
do.call("rbind", lapply(Njobs, loadResult))
Remove registry directory from file system
By default existing registries will not be overwritten. If required one can exlicitly clean and delete them with the following functions.
clearRegistry() # Clear registry in R session
removeRegistry(wait=0, reg=reg) # Delete registry directory
# unlink("myregdir", recursive=TRUE) # Same as previous line
Load registry into R
Loading a registry can be useful when accessing the results at a later state or after moving them to a local system.
from_file <- loadRegistry("myregdir", conf.file=".batchtools.conf.R")
reduceResults(rbind)