Data Slicing Using Python Pandas – A Complete Guide

Filed Under: Pandas
Data Slicing Using Python Pandas

As we know, Pandas is the go-to library in python for data manipulation and analysis. It is a known thing that we cannot able to get insights from the raw data. Hence, as a data analyst or scientist, you have to tweak the data to uncover hidden patterns. In other words, it is called subsetting the data or even data slicing. Here, you will be interested only in some part of the data rather than the entire visibility. Today, let’s discuss what is data slicing and how we can use pandas for that. 

More read: Crosstab Using Pandas For Data Summarization – In Detail


Data Slicing Using Python Pands

In this tutorial, we will be working with the coffee sales dataset, which is quite huge and offers real-world data flavor. Let’s load the data using the read_csv() function in pandas. 

#data

import pandas as pd
data = pd.read_csv('coffeesales.csv')
data.head(5)
Coffeesales 1

Well, our data is ready to be sliced and diced!


1. Pandas Series

We will first work on the pandas series. Let’s create a simple series and then we will see how we can extract the data from the series.

#series

my_series = pd.Series([11,22,33,44,55,66,77,88,99,0])
my_series

0 11
1 22
2 33
3 44
4 55
5 66
6 77
7 88
8 99
9 0
dtype: int64

This is our simple pandas series. Now, we can slice the data based on the index.

#index slicing 

my_series[5]

66

#index slicing

my_series[1]

22

#index slicing 

my_series[9]

0

That’s it. You can extract the data value by specifying the index of that. I know it will be very easy for you to do this.

Now, let’s create a pandas series with a defined index.

#series with index

dummy = pd.Series([89,78,60,71,90],index = ['Josh','Sam','Reece','Kay','Jade'])
dummy
Josh     89
Sam      78
Reece    60
Kay      71
Jade     90
dtype: int6

It looks good. Let’s slice the data based on this defined index.

#indexed slicing 

dummy['Josh']

89

##indexed slicing 

dummy['Kay']

71

#indexed slicing 

dummy['Jade']

90

You got it right.

##indexed slicing 

dummy['Josh':'Kay']
Josh     89
Sam      78
Reece    60
Kay      71
dtype: int64

That’s all about extracting the data from the pandas series. In the next phase, we will be working with pandas data frames.


2. Pandas Dataframe

The panda’s data frames are the 2-D data structures that include the attributes of various datatypes. It is just like a spreadsheet or a SQL table.

It consists of rows and columns which are indexed. This will help us to get the data we need for our analysis. Well, we have already loaded the data (coffeesales) and it should be ready to work on.

To start things, we first look at the different features present in the data.

#features

data.columns
Index(['order_date', 'market', 'region', 'product_category', 'product', 'cost',
       'inventory', 'net_profit', 'sales'],
      dtype='object')

Quickly we can check for the null values.

#null values

data.isnull().sum()
order_date          0
market              0
region              0
product_category    0
product             0
cost                0
inventory           0
net_profit          0
sales               0
dtype: int64

Perfect!, we don’t have any null values in our dataset. Let’s move to the slicing part.

Now, we can slice the data as we want. Let’s pull up the region values from the data and see how it works.

#region

data['region']
0       Central
1       Central
2       Central
3       Central
4       Central
         ...   
4243       West
4244       West
4245       West
4246       West
4247       West
Name: region, Length: 4248, dtype: object

I know that you are getting an idea now, how to slice and dice!. In the next step, we will try to extract multiple columns in the order that we need. What I meant is, up next, I will choose the order of the features unlike in the raw data.

#multiple features

data[['product','sales','net_profit','region']]
data slicing

I hope you got the idea now. The order here starts with a product followed by its sales, profit, and the region. It will make sense now for sure unlike the raw data and mixed order.

If you are much interested in the region of the sales, then you can set the index to the region and then slice the data based on that for better insights.

Slicing the Dataframe

#value counts

data['region'].value_counts()
Central    1344
West       1344
East        888
South       672
Name: region, dtype: int64

Well, we have 4 regions among which most of the stores are located in the central and west parts. Now, we want to see the data only related to the shops located at the central region. For this, we have to use loc function by pandas to locate the particular region and values associated with it.

#region data

df = data.set_index('region')
df

df.loc[['Central']]
data slicing

The above returned only the data associated with the central region.

#region

df.loc[['Central'], 'product':'sales']
data slicing

Wow! This is telling much interesting story to us. I hope by now you understood how to listen to a story from the data using data slicing methods.


Wrapping Up – Data slicing

Data slicing is one of the handy methods to slice and dice the data to gain the most precious insights for your analysis. We have discussed these methods over pandas series and dataframe as well. There are many functions that will help you in this data slicing process as shown in this tutorial.

I hope you will find this useful in your future assignments. That’s all for now. Happy Python!!!

More read: Working with data using Pandas

close
Generic selectors
Exact matches only
Search in title
Search in content