The apply(), sapply(), lapply() and tapply() Functions in R Programming

Filed Under: R Programming
The Apply() Function Family In R

The family of apply() functions in R is used to apply user-defined functions to the elements of complex structures like matrices, lists or data frames.

These functions help a lot in simplifying your code and making it more readable. Moreover, they are compatible with parallel processing as well. Let us look at each function with detailed examples.

The apply() function in R Programming

The apply() function in R is used in case of matrices to apply a user-specified function on the rows or columns of the matrix. The following is the general syntax for apply() function.

apply(matrix, code, f, fargs)

In the above form, matrix is the matrix object which we are using the apply() function for. Code represents whether we wish to apply the function for rows (code set to 1) or columns (code set to 2). F represents the function we need to apply and fargs are the arguments to pass to the function.

Let us first define a matrix to illustrate the apply() function.

> x <-matrix(c(4,5,6,10,12,16),nrow=2,ncol=3)
> x
     [,1] [,2] [,3]
[1,]    4    6   12
[2,]    5   10   16

Suppose that we wish to perform a very specific function upon each of these elements, like squaring first, then dividing by 3 and multiplying by 4. Let us define a function that does this.

> f <-function(x)
+ {
+   return (x^2*3/4)
+ }
> f(5)
[1] 18.75

Now, in order to apply this function to all the rows/columns in a matrix, we call the apply() function.

> apply(x, 2, f)
      [,1] [,2] [,3]
[1,] 12.00   27  108
[2,] 18.75   75  192

The function gets conveniently applied to each element in the matrix without calling it in a loop. The apply() function in R doesn’t provide any speed benefit in execution but helps you write a cleaner and more compact code.

sapply() and lapply() functions in R Programming

Working with Lists

The lapply() function in R is short for list apply. This works in a manner similar to the apply() function above, but uses lists instead of matrices.

Let us look at an example:

> mylist <- list(c(1,2,3,4),c(10,20,30,40),c(5,5,5,5))
> lapply(mylist,mean)
[[1]]
[1] 2.5

[[2]]
[1] 25

[[3]]
[1] 5

The list mylist is a list of 3 vectors. We wish to apply a mean function to each one of the vectors. This is done by calling lapply(mylist,mean) that returns the mean values of the three constituent vectors.

Similarly, sapply() function in R is short for for simplified apply. Instead of obtaining a separate mean value for each vector, sapply returns a vector containing the mean values.

> sapply(mylist,mean)
[1]  2.5 25.0  5.0

Working with Data Frames in R

Since data frames can be treated as a special case of lists, the functions lapply() and sapply() work in both cases. Let us look at an example.

Let us create a data frame first and then apply a sort() function on it using the lapply() function in R.

names <- c("Adam","Antony","Brian","Carl","Doug")
ages <- c(23,22,24,25,26)
playerdata <- data.frame(names,ages,stringsAsFactors = FALSE)

#Apply a sort function on the dataframe
> lapply(playerdata,sort)
$names
[1] "Adam"   "Antony" "Brian"  "Carl"   "Doug"  

$ages
[1] 22 23 24 25 26

The function returns both the columns of the data frame in a sorted order separately.

Similarly, calling sapply() provides a compact list with each column sorted separately.

> sapply(playerdata,sort)
     names    ages
[1,] "Adam"   "22"
[2,] "Antony" "23"
[3,] "Brian"  "24"
[4,] "Carl"   "25"
[5,] "Doug"   "26"

tapply() function

The tapply() function also belongs to the same family but used only in case of factors. This is best explained with an example. Suppose we have the salaries of employees in a company in the form of a vector and their respective means of transport in a factor. Suppose that we wish to calculate what is the average salary of each group using a specific means of transport.

The tapply() function in R programming can be called for this purpose using the R’s built-in mean function.

> salaries <-c(25000,30000,45000,66000,20000,50000,35000,20000,15000)
> transport <-c('Bus','Car','Bus','Car','Metro','Metro','Bus','Bus','Metro')
> tapply(salaries,transport,mean)
     Bus      Car    Metro 
31250.00 48000.00 28333.33 

As you can observe, the tapply() function in R outputs a well-formatted mean of the salaries with the means of transport as columns. Also, notice how it only accounts for unique values from the transport vector automatically.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages