Running Models in Parallel

Running Models in Parallel


Last edit: Simon Tarr (13 Dec 2016 14:32) | Revisions: 1 | Created by: Simon Tarr | Rating: 0


Introduction

By default, R only utilises a single core when processing data. However, if you have a large job to run (for example, running multiple models with different parameters) then it pays to parallelise your code so that all cores in your computer can be utilised at once. You can run as many models in parallel as you have CPU cores using this method. If you're running extremely large jobs, perhaps requiring a cluster, then you will benefit from using other parallel solutions (e.g. Snowfall). In short, this solution is ideal for medium-sized jobs that can be run in reasonable times scales on modest hardware (e.g. a 'standard' desktop computer or laptop).

In the following example, I use the R package doParallel to run multiple generalised least squares (GLS) models in parallel1 (a CPU-intensive task).

Required Packages

install.packages(“foreach”)
install.packages(“doParallel”)
install.packages(“nlme”)

Tutorial

In this example below, I have a list of all possible model formulae for my project (n=155) saved in an object called model.formulae. This is simply a response variable plus 4 predictor variables. On each iteration of the loop, it will run the model for the formula before moving onto the next one.

First, you will need to create a 'cluster'. This just tells doParallel () how many CPU cores are available:

cl<-makeCluster(3)
registerDoParallel(cl)

Then the code to run models in parallel:

your.output.list<- foreach(z = 1:length(model.formulae), .packages=”nlme”, %dopar% {
gls(as.formula(model.formulae[z]),data=cuba,correlation=corExp(form=~x+y, nugget=T),na.action=na.omit,method=”ML”)
}

In the above code, the argument .packages is important. ‘foreach’ doesn’t actually pass any required packages into your loop, even if you’ve previously loaded them up (i.e. by using install.packages() or require()). You’ll need to specify the required packages here (in this case, just ‘nlme’). If you require multiple packages within your loop, you can specify it like this: 

.packages=c(“nlme”,”raster”,”…”))

Finally, by default, foreach() saves the output to a list. You can change this with the argument .output. I direct you to the official foreach() documentation for more information on this. For my purposes this is exactly what was required so I’ll leave it at that!

The last bit of code terminates the cluster:

stopCluster(cl)

Full Syntax

install.packages(“foreach”)
install.packages(“doParallel”)
install.packages(“nlme”)

cl<-makeCluster(3)
registerDoParallel(cl)

your.output.list<- foreach(z = 1:length(model.formulae), .packages=”nlme”, %dopar% {
gls(as.formula(model.formulae[z]),data=cuba,correlation=corExp(form=~x+y, nugget=T),na.action=na.omit,method=”ML”)
}
stopCluster(cl)

You can view the original version of this article on my website.


rating: 0+x
Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License