Parallel Computing | Now Systems learn faster with R Programming

Filed Under: R Programming
Parallel Computing

Computing systems used to execute the input instructions in a serial order. The computation tasks are highly limited to one at a time. The system should execute the first instruction to move to the second. In this article, we will try to understand parallel computing, which makes the system faster.

The traditional serial computing is shown below.

Serial Computing

We can achieve parallel computing with multiple processors. We can combine to solve complex or large problems. Even though modern systems have one processor, they have multiple cores like dual, quad, hex, octa, and more to perform tasks parallelly.

You can see the work flow of Parallel computing here.

Parallel Computing

As we know computers will be connected over a network and those networks are called clusters. Sometimes pretty much large clusters will be separated over a distance and using multiple hardware as well.

Parallelization is great but it also takes huge efforts to monitor the systems, work, and workers as well. 50 computers can do 50 different works at the same time and its execution time will be 1/50 i.e. 0.066. But say, the work should be equally distributed and tasks should be monitored constantly to align all the outcomes into an outcome.

You can see the parallel computing effects in the k-fold cross-validation process. Where 10 folds will do different tasks without getting interfered with another task. Later they can be aligned into a single outcome.

Execution Time for Parallel Computing Processes

In the computer world, the execution time is nothing but time taken by a system or a CPU to execute a particular task. R programming is not an exception here.

You can test the program runtime or execution time. One of the prominent ways to do this is by using a stopwatch in the system.

I am using an Octa-core i5 processor laptop. So, let’s see how much time my PC will take to perform a task. So we can try this for generating random numbers, let’s say 100,000.

If you wonder which function will get your job done, it is system.time().

#Generating random samples
 user  system   elapsed 
0.01    0.00    0.02 

Wow! I think my system is much powerful than I thought! You can try this by assigning various tasks and check the execution time by the system.

The Parallel Computing Package in R

The parallel package in the R is very useful for multitasking. This package will remove the barrier to deploying parallel models and algorithms parallelly.

Using this library you can detect the cores, check the runtime of the task with different hardware, and much more.

The parallel package will be available by default in R, so dont worry about it. You can start with importing it directly.

#Importing the library 
#Detects the cores

Output = 8

As simple as that. You can use the detectCores function to get the number of cores present in your PC.

Let’s perform the same task of generating 100,000 random numbers, but this time using different cores to test the elapsed time.

#Checks the runtime
system.time(a<-unlist(mclapply(1:2, function(x){rnorm(1000000)},mc.cores = 4)))
user system elapsed
0.04    0    0.03
#cheks the runtime
system.time(a<-unlist(mclapply(1:2, function(x){rnorm(1000000)},mc.cores = 8)))
user system elapsed
0.03    0     0.03

In both the samples, you can see how the elapsed time will get reduced as we increase the cores. You can run the multiple R sessions by using the same memory and time, but windows may hurt you in this process.

Foreach and doParallel libraries in R

Each library helps you to parallel computing on R particularly if you use windows. As the name hints, for each item in the loop, do something.

Let’s see how it works.

We can take the previous example for this illustration.

#Generates the random numbers
user  system  elapsed 
0.08   0.00    0.08 
#Using foreach
system.time(a<-foreach(i=1:4, .combine = 'c')%do% rnorm(1000000))
 user  system  elapsed 
 0.36   0.03    0.39 

Ohh, what happened? The elapsed time should be reduced right? but it doesn’t. The answer is the foreach library will compute the tasks in a serial way i.e. one after one computation.

So here comes our hero, doParallel.

Let’s rock with this library. You cannot have fun all the time by just importing the library. Try to install the doParallel package now.

#Install the required packages 
#Imports the library 
#Set the cores
registerDoParallel(cores = 4)

You have to import the library and then have to set the cores as well to work on.

#Using doParallel 
system.time(a <- foreach(i-1:4, .combine = 'c')%do% rnorm(1000000))
user  system  elapsed 
0.27   0.02    0.28 

Whoo!! Here is our expected result. You can reduce the elapsed time good margins. Because the doParallel library will help you to run the tasks parallelly and not serially.

stopImplicitCluster() – Use this function you can stop the doParallel cluster.

Wrapping Up

Things have become smarter and faster as we advance. Parallel computing is one thing that has changed the whole working of system computations. Using the same memory and time, you can perform multiple tasks. In R, you can run multiple sessions using the same memory as I illustrated in this article.

But yes, you can make perform better tasks than generating random numbers :). That’s all for now. Happy R!!!

More read: CRAN R Project

Generic selectors
Exact matches only
Search in title
Search in content