Hello folks, now, read the excel files in R with ease with the use of libraries such as readxl and xlsx.
In this article, we are going to discuss many things which be interesting as well as informative. Well, before rolling in to the topic directly, it’s good to know about the things like relationship between excel and R, data preparation, working environment setup and more.
Table of Contents
- 1 Let’s start with the syntax
- 2 Reading Excel Files in R Using the readxl Package
- 3 Reading Excel Files in R Using the xlsx Package
- 4 Wrapping up
Let’s start with the syntax
Readxl: Readxl is one of the R functions that help in reading excel files in R.
Xlsx: It is an exclusive R package for reading xlsx files in the R environment.
read_excel(x, row.names = FALSE, na = "", sheet = 1) read.xlsx(x,row.names=FALSE, header = TRUE, sheet=1)
- X = input file.
- row.names = row names can be enabled or disabled by mentioning TRUE and FALSE.
- header = if enabled the header will appear otherwise disabled.
- Sheet = reads the mentioned sheet number. i.e. sheet 1, sheet 2.
Reading Excel Files in R Using the readxl Package
Excel is an amazing spreadsheet data storing software that is widely used in businesses. In data analysis, a majority of the data is stored in excel spreadsheets. As R is best known for its statistical data analysis ability, many times we come across data passed along in excel format.
An excel file can be imported into R and you can perform processing, manipulation, writing, and analyzing of the data. In this article, we are digging deeper into excel files and how they can be read in R.
1 – Preparing the Data for Processing
We cannot import an excel file into R and begin to manipulate or process or analyze it. There are some preliminary steps that should be taken up to ease the process.
In any data storing platform, you should make sure whether the stored or collected data is in the right format and are there missing values. For example, if you use the built-in Car dataset, where there will be multiple columns. Among them, one column is distance.
There are various ways the distance can be represented. It could be in metric or imperial units based on which country you reside in. Apart from that, it also comes down to your preferences for the display units. Since the data displayed in different units is not going to be the same, we need to pre-process it before beginning to work.
You should address these kinds of data first and should convert them into the correct dimension.
- Identify the right metrics for the stored data.
- Missing values should be addressed (replace or remove)
- Look for proper header and avoid long names as headers.
- Make sure that all the data stored is even and there should be no uncertainties in that.
You are done with your data! Let’s move on to another step where we will be setting up the working environments.
2 – Setting up the Working Environment
In R whenever you work with files outside of the program scope, you should set up the working directories so that the R will import/save the files and data in that directory.
Let’s setup your working directory.
The below code will display the current working directory and note that it will be the default directory in all PC’s.
#shows the current working directory getwd()
Output = "C:/Users/Dell/Documents"
Now, let’s set the working directory to our folder / directory.
#sets the new working directory setwd("C:/Users/Dell/Desktop/rfiles")
The above code will change the working directory to our suggested directory.
I hope you are clear on setting up the working directory in R. Let’s move to next step.
3 – Reading the data using “Readxl”
In this section, we are going to read our excel file using the package ‘Readxl’.
The below code will install the ‘readxl’ package and import the ‘testdoc’ (in my PC) named excel file.
#install required packages install.packages('readxl') library(readxl) #reads the excel file df<-read_excel("testdoc.xlsx") df
Wow, fantastic shot! You have read the excel file in R. See how beautiful it is looking in R. Just like this, with the help of ‘Readxl’ package, you can read the excel file in R.
4 – Choose your file with file.choose()
If you want to read any kind of file in R, you must import it first. Okay, you will first set your working directory and mention the file name with its extension and read the data. This is the regular process you do right?
But did you know that you can choose any file during the program runtime? Yes, by using the function file.choose() you can easily choose the files in your PC and can read it in the R.
#chooses the file df<-read_excel(file.choose())
By executing the above the code you will get a pop up window as shown below.
In the pop window, you can open any file which is stored in your PC. Cool right? Now, you don’t require any paths and enter lengthy file names as well. Just choose and import to R.
This is with respect to excel file. If you want to choose csv file, then you should write read_csv(file.choose()).
5 – Read the excel/xlsx file in R with sheet name
Sometimes, your excel file sheet may have multiple sheets. Now, you are required to access the second sheet or sheet by name. How can you do it?
The ‘Readxl’ package has the certain parameters where you can read the file with sheet name or sheet number. Let’s see how it works.
#reads the excel file with sheet name / number df<-read_excel("testdoc.xlsx",sheet = 2) df<-read_excel("testdoc.xlsx",sheet = "Mydatasheet")
You can see that, I have mentioned my sheet name, which is in second position in the excel file. The readxl function will read the ‘Mydatasheet’ data.
don’t worry about the data, to illustrate I have copied the data from sheet 1.
So, If your excel file is having multiple sheets or multiple sheets are renamed with various names, dont worry Just mention the sheet number or sheet name. The read_excel will take care of all the things.
If you want to choose the file, by mentioning the sheet name with file.choose() you can easily choose the file and can read the required sheet.
Reading Excel Files in R Using the xlsx Package
Till now we have read the data using the Readxl function. Now, let’s use the read_xlsx function to read the excel file in R.
1 – Install the xlsx Package in R
We are going to install the required packages first.
#installs the required packages install.packages('xlsx') library(xlsx)
#reads the excel file in R df<-read_xlsx('testdoc.xlsx') df
The read.xlsx function works as same as the readxl function. You can see the excel file in R.
2 – Read excel file with sheet name using xlsx
read_xlsx can easily read an excel file in R. You can also specify the sheet name which you want to read. In this section let’s see how it works.
#reads the excel file by sheet name / number dummy<-read_xlsx("testdoc.xlsx",sheet = "Mydatasheet") View(dummy)
As you can notice that, our function read the second sheet named as Mydatasheet. Like this you can easily read the excel file by specifiying the sheet number or name.
3 – Choose the excel file during runtime
Finally, we are in the last section of this article. Here we are going to see how we can choose the excel files directly from the PC.
As we are familiar with the file.choose() function in the above sections, we can just implement that in read.xlsx function.
#choose the file from the PC df<-read.xlsx(file.choose(),1) View(df)
After you execute the above code, you will find a pop up that asks you to choose the file to work with. Just choose any excel file and the function automatically reads it in R.
Well, in this descriptive article, we have discussed a lot of things regarding the topic ‘read excel/xlsx file in R‘. Yes, there are multiple functions and packages in R to do so.
Using the functions like read_excel and read_xlsx we can easily read the excel files in R.
That’s all for now. Keep reading!!!