Studying Netflix Subscription Dataset in Python

Filed Under: Python Advanced
NetflixData Study FeaImg

Hey fellow coder! Today we are going to look at the dataset of a very popular movies streaming platform, Netflix. The dataset contains information about the number of shows, subscription costs for a lot of countries present in the dataset which uses Netflix.

Let’s start off by understanding the dataset.

Also read: Sentiment Analysis on Animal Crossing Game Dataset using Python


Netflix Subscription Dataset Description

You can download the dataset from the Kaggle link here. It contains the following attributes:

  1. Country: Some countries that uses Netflix.
  2. Total Library Size: Total number of movies & TV series aired in a particular country.
  3. No. of TV Shows: Total number of TV series broadcast in the country.
  4. No. of Movies: Total number of movies released in the country.
  5. Cost Per Month – Basic: The monthly price of the “basic package”.
  6. The Cost Per Month – Standard: The monthly price of the “standard package”.
  7. Cost Per Month – Premium: The monthly price of the “premium package”.

Code Implementation for Netflix Subscription Data Study

Let’s now get into studying the dataset for Netflix subscriptions using Python.

Importing Libraries

import numpy as np 
import pandas as pd 
import os
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import pandas_profiling

Loading Dataset

The dataset present is in form of CSV files which include one row of data per line, and each line is a comma-separated list with each element being a column. Pandas make reading this data simple and hence, we use the pandas module to read the dataset using the code below.

data = pd.read_csv('gta_cars.csv')
data.head()
NetflixSubscription Dataset
Netflix subscription Dataset

Visualizing some basic Histograms

We will visualize histograms for some of the columns from the dataset using the code below. Histograms help us to understand how a certain column is distributed along with a certain range of values.

plt.style.use('seaborn')
plt.figure(figsize=(20,7),facecolor='w')

plt.subplot(1,3,1)
plt.hist(data['Total Library Size'],edgecolor='black',color='pink')
plt.xlabel("Size of the Library")
plt.ylabel("Distribution")
plt.title("Histogram for Library Size")

plt.subplot(1,3,2)
plt.hist(data['No. of TV Shows'],edgecolor='black',color="lightgreen")
plt.xlabel("No. of TV Shows")
plt.ylabel("Distribution")
plt.title("Histogram for No. of TV Shows")

plt.subplot(1,3,3)
plt.hist(data['No. of Movies'],edgecolor='black',color="cyan")
plt.xlabel("No. of Movies")
plt.ylabel("Distribution")
plt.title("Histogram for No. of Movies")

plt.show()
NetflixSubscription Histograms
Netflix subscription Histograms

Visualizing Montly Subscription Cost of the countries

We can also visualize the subscription cost for basic, standard, and premium packages of Netflix for all the countries present in the dataset. For this tutorial, we will be visualizing the basic monthly cost in the form of the bar chart, pie chart, and scatter plot using the codes below.

You can see how beautiful the plots turn out to be and they are interactive as well which makes them a plus!

fig = px.bar(data, x='Country', y='Cost Per Month - Basic ($)', color = "Cost Per Month - Basic ($)",
             title="Country vs Cost per Month")
fig.show()
NetflixSubscription MonthyCost Country Bar
Netflix subscription MonthyCost Country Bar
fig = px.pie(data, values='Cost Per Month - Basic ($)', names='Country',title = "Cost Per Month - Basic ($)")
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig.show()
NetflixSubscription MonthyCost Country Pie
Netflix subscription MonthyCost Country Pie
fig = px.scatter(data, x="Country", y="Cost Per Month - Basic ($)",title = "Cost Per Month - Basic ($)")
fig.show()
NetflixSubscription MonthyCost Country Scatter
Netflix subscription MonthyCost Country Scatter

All Subsription costs in one plot

Next, we can also visualize all the subscription types ( Basic, Standard, and Premium ) costs of all the countries into one single plot using the code below.

plt.figure(figsize=(20,10),facecolor='w')
plt.plot(data["Country"],data["Cost Per Month - Basic ($)"],color="maroon",label="Basic Subscription")
plt.plot(data["Country"],data["Cost Per Month - Standard ($)"],color="darkblue",label="Standard Subscription")
plt.plot(data["Country"],data["Cost Per Month - Premium ($)"],color="orchid",label="Premium Subscription")
plt.xticks(rotation=90)
plt.title("All Subscription Costs in Various Countries",size=14)
plt.legend(title = "Subscription Type")
plt.show()
NetflixSubscription Cost Countrywise
Netflix subscription Cost Countrywise

Conclusion

Congratulations! This tutorial covered the basic visualizations of the Netflix subscription dataset present on Kaggle. I hope you learned a lot through the tutorial and will be able to apply the same code snippets on other datasets as well.

Thank you for reading!

If you like reading such tutorials, here are some similar tutorials you will surely enjoy:

  1. How to Parse CSV Files in Python
  2. JSON to CSV: Export a JSON file to a CSV file using Python

close
Generic selectors
Exact matches only
Search in title
Search in content