R Data Structure – A Comprehensive View

Filed Under: R Programming
R Data Structure, A Comprehensive View

Being a data analyst who uses R programming to crunch the data, it is mesentery to identify and understand the data structures. Before you analyze the data, you should be able to represent the data first. R offers 6 different types of R data structures. Let’s understand all of those in a detailed way. 

6 types of primary R data structure –

  • Atomic Vectors
  • Matrix
  • Arrays
  • Factors
  • Data Frames
  • Lists

Have a look at the table below which shows the type and dimensionality of the R data structures.

R data structures

1. Atomic Vectors

Atomic vectors are the simplest R data structure. It is聽one-dimensional. In atomic vectors, all the elements are in the same type.聽

The various data types in atomic vectors are –

  • Numeric
  • Integer
  • Character
  • Logical

Examples –

#Numeric

a <- c(1,2,3,-4,5)
class(a)

“numeric”

#Charecter 

b <- c('one','two','three','four','five')
class(b)

“character”

#Logical

c <- c(TRUE, FALSE, TRUE,FALSE,FALSE,TRUE)
class(c)

“logical”

To create a vector, you need not write all the elements in it. You can make use of colon ':' to print the range of numbers.

#colon ':'

x <- 1:10
x

y <- -2:5
y

x -> 1 2 3 4 5 6 7 8 9 10

y -> -2 -1 0 1 2 3 4 5


2. Matrix in R

A matrix in R is very similar to the atomic vectors with a dimensional attribute. All the elements in a matrix should be of the same type. It can be a number, character, or logical.

The nrow and ncol parameters are responsible to arrange the elements in a matrix.

By default, the matrix works in column-wise elements assignment. But, passing the argument byrow = TRUE, you can turn it row-wise.

#Matrix - column wise assignment

matrix(1:9, nrow = 3, ncol = 3)
        [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

You can see that the elements are assigned column-wise. Now, you can pass the byrow argument as advised above.

#Matrix - Row wise assignment

matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) 
        [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Awesome 馃槢

To access the elements in the matrix, you can use the index of elements. I will show you how.

#Indexing 
#Here we are accessing the elements in rows 1,2,3 and columns 2 & 3.  
mat[c(1,2,3),c(2,3)]
        [,1] [,2]
[1,]    2    3
[2,]    5    6
[3,]    8    9

That’s it. You can play around with multiple combinations.


3. Arrays in R

Arrays in R are similar to the matrix but have more than 2 dimensions. You can call it an N-D array.

#Arrays

arr <- array(1:10, dim = c(5,4,3))
arr
, , 1

        [,1] [,2] [,3] [,4]
[1,]    1    6    1    6
[2,]    2    7    2    7
[3,]    3    8    3    8
[4,]    4    9    4    9
[5,]    5   10    5   10

, , 2

        [,1] [,2] [,3] [,4]
[1,]    1    6    1    6
[2,]    2    7    2    7
[3,]    3    8    3    8
[4,]    4    9    4    9
[5,]    5   10    5   10

, , 3

        [,1] [,2] [,3] [,4]
[1,]    1    6    1    6
[2,]    2    7    2    7
[3,]    3    8    3    8
[4,]    4    9    4    9
[5,]    5   10    5   10

The arrays take vectors as input to create an array. As shown above, you have to mention a vector and rows and columns as well. The last argument shows the number of arrays to be created.

You can check the dimension of the array using the dim function.

#dimension

dim(arr)

5 4 3


4. Factors in R

Factors in R are used to categorize the data on different levels. It takes only finite categorical values as input.

You will understand this better with this example.

#Factors

demo <- c('male','female','male','male','male','female')
factor(demo)

male female male male male female
Levels: female male

You have to use factor() function to describe the factors in input data. Just like arrays and matrices, you can access the elements in a factor as well.

#Access the elements 

demo[2]

“female”


5. Data Frames in R

The data frames are the most used R data structure. The data frames are very similar to the matrix but it includes data of different data types.

You can use data.frame() function in R to create a dataframe from the input values. Pass the StringAsFactor = False to avoid converting the categorical data into factors.

#dataframe 

name <- c('Jay','Kevin','Reshaine','Rose')
age <- c(23,21,20,22)
weight <- c(56,67,65,72)
df <- data.frame(name,age,weight)
df
Image 5

Yes, you can access the individual columns as well. Let’s see how.

#Accessing

out <- df['name']
out
      name
1      Jay
2    Kevin
3 Reshaine
4     Rose

6. Lists in R programming

Lists are the most聽complex data structures聽in R. It includes elements of different data types. A list can be a combination of vector, matric, dataframe, or even multiple lists.聽

You can use list() function in R to create a list.

#list

vec <- c(1,2,3,4,5,6)
mat <- matrix(vec,3,3)
list_t <- list(vec,mat)
list_t
[[1]]
[1] 1 2 3 4 5 6

[[2]]
        [,1] [,2] [,3]
[1,]    1    4    1
[2,]    2    5    2
[3,]    3    6    3

That’s cool!!!


R Data structures – Conclusion

As a data analyst or scientist, you will always get a blend of numeric, categorical values in your data. So, it is most important that you represent the data in the correct way to analyze it further. I have discussed all 6 R data structures here. Keep this in mind and it will come in handy at all times. I hope you get to know something from this story.

That’s all for now. Happy R!!!

More read: Data types and structures in R

close
Generic selectors
Exact matches only
Search in title
Search in content