A Complete Reference to the Subset() function in R

Filed Under: R Programming
Subset() Function In R

R language is a supremo in analysing data, which includes data processing, manipulation and analysis. It offers numerous function to do so and subset() function in R is one among them.

In general words, subsetting means, a set of data that is derived or extracted from the base data.

For example, consider the word – ” R-Programming” where the word “program” is the subset of the base word. At the same time, “R-lang” is not a subset of “R-Programming”. Even though R is present, the letters ‘lang’ is not present in the parent or base word.

I hope the above sample will bring you closer to the concept of subsetting the data. Let’s move and explore some benefits of subset() function in R.


Let’s start with the syntax

subset(): The subset function will extract or return the specific part of the input data based on given parameters/conditions.

subset(x,condition,select)

Where:

  • x = The input data file, vector, matrix, and a string.
  • Condition = The input condition which needs to be satisfied by the function.
  • Select = Select the number of columns.

Key benefits of the subset() function in R

The subset() function in R is beneficial due to couple of reasons:

  • The subset is an in-built R function and doesn’t require installing additional packages.
  • filter() function in R also does the same job (subsetting data). But the subset() function is way faster than the filter in terms of execution time.
  • Found its importance in terms of dealing with huge data set.

The subset() function in R – An Easy Example

In this section, with the help of a simple example, we are going to subset the data.

Loading the dataset:

#importing dataset 
datasets::airquality
Airquality Dataset 1
Airquality Dataset

In the above image, you can see the ‘air quality’ dataset, which is available in R by default. Now, let’s apply the subset() function to extract the data present in the Ozone column which are greater than 30.

#returns values in ozone 
subset(airquality,Ozone > 30)
Subset In R
Ozone values > 30

In the above image, you can see that all the values present in Ozone column are greater than 30 ( > 30 ). I hope now you got the better understanding of this function.

Let’s move further and explore more about the subset() function in R.


Multiple conditions using subset function

In the above sections, we passed one condition to our function. Now, let’s try passing multiple conditions to the subset function and let’s see how it works.

#function with multiple conditions
subset(airquality, Ozone > 30 & Temp >= 40 & Wind >= 5)
Subset Function In R with multiple conditions
Subset function In R with multiple conditions

In the above code, you can observe that we used three parameters in the function. And in the output, you can see that all our conditions were satisfied by the subset() function.

Like this, you can easily pass as many conditions you can and the function will satisfy the valid ones and returns the same as output.


The ‘select’ parameter in subset function

The ‘select’ parameter in the syntax of the subset function. Let’s apply this select parameter which returns the specific columns.

Let’s see how it works.

#function with select option 
subset(airquality, Temp>=30 & Ozone >= 30, select = c(Ozone,Temp))
Subsetting In R
Sub-setting In R

As you can see in the above code, we have added a ‘select’ parameter, which returns the specified columns as shown in the above output image.

The select parameter will return the specified columns which satisfied the mentioned conditions. In the above output you can observe that, the Ozone and Temp values are > 30.


The ‘select’ parameter with multiple inputs

In the above section, we have added only 2 inputs to the select parameter. But our dataset has 6 columns. So, in this section, we are going to pass a range of columns using the colon symbol.

Let’s see how it works.

subset(airquality, Ozone>30 & Solar.R >30  ,select = c(Ozone:Day)) 
Subset In R

Well, in this output image, you can clearly see that our subset function satisfied the mentioned conditions along with displayed the all the rows between Ozone:Day as mentioned in the code.

Well, you have done it. Congratulations!!!


Handling categorical values using subset() function in R

So far, So good. Hold on!

Till now, you get to know about the subset function and its applications using the numerical data.

But what if you came across categorical data?

Let’s see what happens when you apply the above techniques to the categorical data.

For this purpose, we are using ‘iris’ dataset, which is available by default in R.

#importing dataset
datasets::iris
Iris
iris dataset

In the above dataset, we have categorical data in the Species column.

Let’s see how many types of species are there in this data using the below code.

#getting unique values in column species  
unique(iris[,"Species"])
Output  = 

Levels: setosa   versicolor   virginica

We have three categories as shown above:

  • Setosa
  • Versicolor
  • Virginica

Well, we got our categories and now let’s use subset() function to extract data that fall under specific categories.

#returns specific categories
subset(iris, Species == "setosa")
Iris 1
iris data with species – setosa

We use the double equal sign “==” here to compare two strings and verify their exactness. This is common across many programming languages where a single equal sign is for assignment of values and double equals is for comparison.

Kudos, you have done it.

This is a very simple process to handle categorical values while using the subset() function in R.


Wrapping up

Well, the subset() function in R is used to subset the data from it’s parent data. i.e. extracting data from a string, vector, matrix or it may be a data set as well.

You can mention the conditions and the function will satisfy them and returns the final values. You can also use select function to display specific columns as well.

That’s all for now. Be excited for tomorrow and learn something new each day. Happy subsetting!!!

More study: R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages