R language is a supremo in analysing data, which includes data processing, manipulation and analysis. It offers numerous function to do so and subset() function in R is one among them.
In general words, subsetting means, a set of data that is derived or extracted from the base data.
For example, consider the word – ” R-Programming” where the word “program” is the subset of the base word. At the same time, “R-lang” is not a subset of “R-Programming”. Even though R is present, the letters ‘lang’ is not present in the parent or base word.
I hope the above sample will bring you closer to the concept of subsetting the data. Let’s move and explore some benefits of subset() function in R.
Let’s start with the syntax
subset(): The subset function will extract or return the specific part of the input data based on given parameters/conditions.
- x = The input data file, vector, matrix, and a string.
- Condition = The input condition which needs to be satisfied by the function.
- Select = Select the number of columns.
Key benefits of the subset() function in R
The subset() function in R is beneficial due to couple of reasons:
- The subset is an in-built R function and doesn’t require installing additional packages.
- filter() function in R also does the same job (subsetting data). But the subset() function is way faster than the filter in terms of execution time.
- Found its importance in terms of dealing with huge data set.
The subset() function in R – An Easy Example
In this section, with the help of a simple example, we are going to subset the data.
Loading the dataset:
#importing dataset datasets::airquality
In the above image, you can see the ‘air quality’ dataset, which is available in R by default. Now, let’s apply the subset() function to extract the data present in the Ozone column which are greater than 30.
#returns values in ozone subset(airquality,Ozone > 30)
In the above image, you can see that all the values present in Ozone column are greater than 30 ( > 30 ). I hope now you got the better understanding of this function.
Let’s move further and explore more about the subset() function in R.
Multiple conditions using subset function
In the above sections, we passed one condition to our function. Now, let’s try passing multiple conditions to the subset function and let’s see how it works.
#function with multiple conditions subset(airquality, Ozone > 30 & Temp >= 40 & Wind >= 5)
In the above code, you can observe that we used three parameters in the function. And in the output, you can see that all our conditions were satisfied by the subset() function.
Like this, you can easily pass as many conditions you can and the function will satisfy the valid ones and returns the same as output.
The ‘select’ parameter in subset function
The ‘select’ parameter in the syntax of the subset function. Let’s apply this select parameter which returns the specific columns.
Let’s see how it works.
#function with select option subset(airquality, Temp>=30 & Ozone >= 30, select = c(Ozone,Temp))
As you can see in the above code, we have added a ‘select’ parameter, which returns the specified columns as shown in the above output image.
The select parameter will return the specified columns which satisfied the mentioned conditions. In the above output you can observe that, the Ozone and Temp values are > 30.
The ‘select’ parameter with multiple inputs
In the above section, we have added only 2 inputs to the select parameter. But our dataset has 6 columns. So, in this section, we are going to pass a range of columns using the colon symbol.
Let’s see how it works.
subset(airquality, Ozone>30 & Solar.R >30 ,select = c(Ozone:Day))
Well, in this output image, you can clearly see that our subset function satisfied the mentioned conditions along with displayed the all the rows between Ozone:Day as mentioned in the code.
Well, you have done it. Congratulations!!!
Handling categorical values using subset() function in R
So far, So good. Hold on!
Till now, you get to know about the subset function and its applications using the numerical data.
But what if you came across categorical data?
Let’s see what happens when you apply the above techniques to the categorical data.
For this purpose, we are using ‘iris’ dataset, which is available by default in R.
#importing dataset datasets::iris
In the above dataset, we have categorical data in the Species column.
Let’s see how many types of species are there in this data using the below code.
#getting unique values in column species unique(iris[,"Species"])
Output = Levels: setosa versicolor virginica
We have three categories as shown above:
Well, we got our categories and now let’s use subset() function to extract data that fall under specific categories.
#returns specific categories subset(iris, Species == "setosa")
We use the double equal sign “==” here to compare two strings and verify their exactness. This is common across many programming languages where a single equal sign is for assignment of values and double equals is for comparison.
Kudos, you have done it.
This is a very simple process to handle categorical values while using the subset() function in R.
You can mention the conditions and the function will satisfy them and returns the final values. You can also use select function to display specific columns as well.
That’s all for now. Be excited for tomorrow and learn something new each day. Happy subsetting!!!
More study: R documentation