Python Catboost Classifier module – Fast performance ML model

Filed Under: Python Modules
Python Catboost Classifier

Hello, readers! In this article, we will be focusing on Python Catboost Classifier module, in detail.

So, let us begin!! 馃檪


Python Catboost Classifier module – Crisp Overview

Python being a multi-purpose programming language provides us with various functions and modules that we can use to formulate and get our data in a proper customized stage.

When we think in terms of data science and machine learning, Python offers us various modules that inculcate the machine learning algorithm’s behavior and gives us up to the mark results of the same. Also, apart from machine learning algorithms, it also offers us various techniques to prepare the data for modeling and visualization.

Within Machine Learning, we deal with regression as well as categorical data values i.e. numeric and categorical values. Specifically talking about categorical values, we often need to process these values to have them in a numeric format to have the values grouped. This task is at times a tedious one, because of the large data values that keep varying according to the dataset.

With the context to this noticed problem, we will be having an introduction to Python Catboost module.

Catboost Model is a powerful, scalable, and robust machine learning model that enables us to have escalated performance based on the gradient boosting system and the decision trees altogether. Moreover, it is available both for categorical and continuous data values.

Diving into the categorical values, Catboost Classifier reduces our overhead of data translation from categorical data type to numeric form at ease and initiates the building of the model too. It enables and handles the categorical features or variables automatically and treats them.

Having understood about Catboost Classifier, let us try to implement the same.


Implementation of Catboost Classifier model on a dataset

To have a better understanding of the working of the model, we will be applying the Catboost Classifier on the below dataset (link attached).

Bike Rental Count dataset

Step 1 :: Load the dataset into the working environment.

At first, we will be loading the dataset into the environment. Also, we would import the necessary libraries to be used such as pandas, CatBoostClassifier, etc.

Step 2 :: Now, having loaded the dataset, we pre-process the data and perform Missing value analysis, Outlier analysis on the same. This way, we make our data ready for processing.

Step 3 :: The next step is to split the entire dataset into training and test dataset for use. We here make use of train_test_split() function for the same having a ratio of 80:20

Step 4 :: Modelling – Here we apply the Catboost Classifier model on the training data for iterations=100. Further, we use fit() function to make the built model fit to our training data.

Having done that, we make predictions on the test data using predict() function.

Example:

import pandas
import os
from catboost import CatBoostClassifier
BIKE = pandas.read_csv("day.csv")
BIKE.dtypes
BIKE.isnull().sum() # missing value analysis
import numpy as np  # Outlier analysis

for x in ['hum','windspeed']:
    q75,q25 = np.percentile(BIKE.loc[:,x],[75,25])
    intr_qr = q75-q25
    max = q75+(1.5*intr_qr)
    min = q25-(1.5*intr_qr)
    BIKE.loc[BIKE[x] < min,x] = np.nan
    BIKE.loc[BIKE[x] > max,x] = np.nan

#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']

# Splitting the dataset into 80% training data and 20% testing data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

model = CatBoostClassifier(iterations=100,task_type="GPU")
model.fit(X_train, Y_train,verbose=False)
C_predict = model.predict(X_test)

Output:

Catboost -- output
Catboost — output

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python programming, Stay tuned with us.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content