There are hundreds of free R packages and libraries that are constantly being developed and improved by a rich and diverse developer community. But today we’ll look at the most popular R packages that you must know of.
List of the Most Popular R Packages
This article aims to introduce you to some of the most popular R packages that are used for data analytics and visualization. If you’re short on time, here’s a quick list of the most popular R packages that you must know.
A quick list of the best, most popular R packages:
- dpylr – This is the best R package that makes data operations easier by designing them into actions known as verbs.
- ggplot2 – This is the most popular R package for plotting beautiful looking graphs. The letters gg stand for the grammar of graphics.
- tidyr – Tidyr is one of the best R packages for giving your data a tidy appearance when performing data munging.
- lubridate – One of the best data manipulation R packages, exclusively meant to simplify working with date and time formats
- tibble – This R package makes working with large data sets very easy by providing us options to subset the data before displaying
- stringr – The stringr R package aims to build a more cohesive set of these string operations and provides capabilities for several string operations
- RMarkDown – This R package is designed to help you with designing and processing reports using markdown
- Shiny – This package is an alternative to RMarkdown to generate web-based interactive apps to communicate your data science findings
- modelr – This is one of the best R packages that assists you with building models using your data
- mlr – The mlr package is one of the most popular R packages for implementing machine learning algorithms in R
This R package was developed to solve the data manipulation challenges from beginner to expert level. The package makes your data operations easier by designing them into actions known as the verbs. These verbs perform a different function each.
- filter() – This is used to filter through huge data frames and obtain only relevant observations.
- arrange() – Organize and re-order the rows by specific conditions.
- select() – Makes selections from the data frame by a condition specified.
- mutate() – Modify the existing variables to create new ones.
- summarize() – Derive a single value by summarizing multiple variables.
- sample_n() and sample_frac() – To obtain random samples from the data.
These seemingly simple looking operations can be used to perform complex data manipulations when grouped together.
Official Website: Dplyr
R is preferred by several data scientists and statisticians for its beautifully formatted graphics. In addition to the inbuilt graphics package in R, ggplot2 is another very popular graphics suite. The letters gg stand for the grammar of graphics which allows you to create aesthetic looking graphics in a declarative manner.
Each aspect of the graphics such as the data, aesthetic mapping, object scale, coordinates, etc is treated as separate building blocks of the graphic in ggplot2. Thus, it is very flexible to create different graphics using ggplot2 due to this abstraction.
Ggplot2 is one of the R packages that also gives a far more polished look to the graphics as it takes care of many issues in appearance and allows you to customize the graphics by themes.
Ggplot2 is included in one of the most popular package collection called the tidyverse.
Official Website: GGPlot2
As the name indicates, this R package is all about giving your data a tidy appearance when performing data munging. The rows here are saved as observations and columns are each an exclusive variable.
This creates a neat and tidy representation of data that can be manipulated in an organized way. Tidyr like dplyr has a few major verbs to perform these actions.
- gather() – To convert columns to rows with key and value pairs.
- spread() – To convert rows to columns.
- separate() – To separate a single column to multiple columns.
- unite() – To combine multiple columns into a single column.
Official Website: Tidyr
This is one of the data manipulation R packages, exclusively meant to simplify working with date and time formats. This R package can handle dates written in multiple formats and separators.
Lubridate makes it easy to extract specific information from date objects such as day, month, year, weekday, etc. It also has support for several time zones so that you can avoid inconsistency in representing the same moment of time. Adding and subtracting dates is also much more straightforward with lubridate.
Official Website: Lubridate
Tibble is the name given for the data frame alternative in the tidyverse group of R packages. Tibbles behave in a manner very much similar to data frames but have some extra functionality that makes working with nibbles much more convenient.
Tibbles employs a novel print method that displays the first 10 rows of a data frame instead of the entire frame. This makes working with large data frames much easier. Similarly, subsetting is also much easier with tibbles compared to data frames.
Official Website: Tibble
Strings play a major role in data cleaning and preparation tasks. However, traditional string handling methods in R can become very clumsy due to their inconsistency.
The stringr package aims to build a more cohesive set of these string operations and provides capabilities for several string operations such as:
- Finding length – str_length()
- Concatenate two strings – str_c()
- Extract substrings – str_sub()
- Duplicate string – str_dup()
- Match a pattern – str_match()
Official Website: Stringr
R Markdown is a package designed to help you with designing and processing reports using markdown. R Markdown provides you with a notebook format for integrating your data science code and provides results along with commentary.
These beautifully generated reports can be used to communicate your work to the decision-makers and collaborate with other data scientists on the web.
Official Website: RMarkDown
This R package is another alternative to RMarkdown to generate web-based interactive apps to communicate your data science findings with decision-makers and programmers.
Official Website: Shiny in R
Modelr is also a tidyverse component package with a large number of helper functions to assist you with building models using your data.
A model family can be a simple linear equation, a quadratic function, or any other kind of function that captures the data.
The model is fit to the data by changing its parameters to correspond to the data closely. Modelr offers functions for sampling data, generating models, evaluating models using quality metrics, and also interact with models using new unseen data.
Official Website: ModelR
If you are keen on implementing machine learning algorithms using R and are looking for a package that provides an infrastructure to do so, mlr is the right package for you.
Mlr is one of the R packages which has several functions to build classification, regression, clustering and survival models in R. The latest version of mlr known as the mlr3 has even more advanced features to build ML models to suit the current day needs.
Official Website: MLR