Support Vector Machines in R programming

Filed Under: R Programming
Support Vector Machines In R

Hello, readers! In this article, we will be focusing on Support Vector Machines in R programming, in detail.

So, let us begin!! 馃檪


What are Support Vector Machines in R?

On a superficial scale, Machine Learning as a domain provides us with various algorithms to work on the data and make estimations on the same to predict the solutions for real life scenarios.

On a broader scale, we mainly deal with Classification and Regression Supervised Machine Learning algorithms for relatively structured data. Support Vector Machine is one such Supervised algorithm.

SVM (Support Vector Machine) as an algorithm, basically works for both classification as well as regression data values. But, prominently it is used for categorical data values i.e. classification data.

As a whole, SVM enables us to classify the data values into the different classification groups specified in the categorical data variables.

Technically, a Support vector machine generates a decision boundary called ‘Hyperplane‘ that acts as a boundary and segregates the data of every group from another.

Here, we also come across the concept of support vectors that help segregate the data into different boundaries.

Support vectors are the data points that are the closest to the hyperplane.

Working of Support Vector Machine:

At first, it starts by the creation of a random hyperplane in accordance with the data values (groups). Further, it checks the distance between the drawn hyperplane and the support vectors. The distance between these support vectors and the hyperplane is the Margin.

To sum up, the hyperplane is drawn totally based on the support vector points and practically there has to be a maximum distance between the hyperplane and the support vectors.

To add, it works with both linear (structured data) as well as non-linear (3 or above dimensional) data values.

Now, that we have understood the working of SVM, let us unveil the working of SVM for linear data at a generic level.


Support Vector Machine (SVM) – A practical Approach

In this article, we will be making use of Bank Loan dataset wherein we need to predict whether a customer is a loan defaulter or not!

You can find the dataset here!

At first, we load the dataset into the R environment using the read.csv() function. From the dataset, it is clear that the target variable is a categorical data variable with two groups: 0 and 1.

Further, we convert the data type of the target variable to factor type to be accurate.

After performing necessary conversions, we head towards segregation of the dataset into training and test data values! For the same, we make use of createDataPartition() method to split the datasets.

We have even made use of the dummies library from R to create dummies of the categorical data. So as to have a separate column for every group of a category.

Example:

rm(list = ls())
#Setting the working directory
setwd("D:/Edwisor_Project:Loan_Defaulter")
getwd()
#Load the dataset
dta = read.csv("bank-loan.csv",header=TRUE)

###################################### Exploratory Data Analysis ##################################################
#  Understanding the data values of the dataset
str(dta)
# Understanding the data distribution of the dataset
summary(dta)
#  Checking the dimensions of the dataset
dim(dta)
dta$ed=as.factor(dta$ed)
dta$default=as.factor(dta$default)

categorical_col= c('ed')
library(dummies)
data = dta
data = dummy.data.frame(data,categorical_col)
dim(data)
library(caret)
set.seed(101)
split = createDataPartition(data$default, p = 0.80, list = FALSE)
train_data = data[split,]
test_data = data[-split,]

svm_Linear <- train(default ~., data = train_data, method = "svmLinear",
                    preProcess = c("center", "scale"),
                    tuneLength = 10)

print(svm_Linear)

Having split the data, we now apply the SVM on the training dataset to see the outcome. We make use of train() function and pass the value for ‘method‘ as ‘svmLinear‘.

Further, we scale the data using center and scale measures through the parameter ‘preProcess‘.

Output:

As a result, we see the SVM model has identified the predictor and the target class. Further we see the accuracy of around 79% i.e. 0.79

On a broader perspective, we can even perform the validation of the model by applying the model on the test dataset!

Support Vector Machines with Linear Kernel 

563 samples
 12 predictor
  2 classes: '0', '1' 

Pre-processing: centered (12), scaled (12) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 563, 563, 563, 563, 563, 563, ... 
Resampling results:

  Accuracy   Kappa     
  0.7905796  0.03640858

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Try implementing the concept of Support Vector Machine with other forms of linear data and do let us know your understanding about it in the comment section.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content