Pandas Pivot table in Python – A brief Introduction

Filed Under: Pandas
Pandas Pivot Table In Python

For any dataset, if you want to extract the relationships, you will generally use the groupby() function. But today, we will be focusing on the Pandas Pivot table, which you commonly see on spreadsheets that deal with tabular data. 

In short, a Pandas pivot table takes column data as input and groups the entries, and produces a multidimensional summary. 

A bit about Pandas Pivot Tables

  • Pandas pivot tables are easy and flexible to use.
  • It has simple syntax.
  • Gives ability to summarize the data quickly.
  • Provides quick analysis.
  • Much useful in report creation.
  • You can use pivot_table() method for creating pivot tables.


Pivoting data using Groupby function

Well, we can start by creating data summarizations using the Groupby function. It is very similar to the pivot_table() function but fails in multidimensional aggregation.

I will be using a titanic dataset for this purpose. I hope you have installed and loaded pandas into python. If not, run the below code which installs and loads the pandas.

#Install and load pandas

pip install pandas
import pandas as pd

Perfect! Let’s load the dataset into python. You can download the dataset here.

#Load data

import pandas as pd
data = pd.read_csv('titanic.csv')
Titanic dataset

The data is ready for action. Using groupby function we can summarize the data in a meaningful way.

#Group data by 'sex' and aggregation function as sum

female	     233
male	     109
#Group data by 'sex' and aggregation function as mean

female	  0.742038
male	  0.188908

The grouping of data is very useful as it will throw some insights immediately. In the above results,

  • The survival rate of Females is more (every 3 out of 4 females survived).
  • But, the survival rate of Males is less (every 1 out of 5 males survived).

Let’s spice up this analysis a bit. We can try grouping data by Pclass and Sex against Survived as shown below. On top of that, we can try using different aggregations as well. 

#Group by multidimensionality 

data.groupby(['Sex', 'Pclass'])['Survived'].mean()
  S      Pclass
female    1         0.968085
          2         0.921053
          3         0.500000
male      1         0.368852
          2         0.157407
          3         0.135447

Name: Survived, dtype: float64

That’s amazing. We got what we have expected. But, wait! Let’s unstack this output for readability.

#Unstacking the output

data.groupby(['Sex', 'Pclass'])['Survived'].mean().unstack()
Pclass	    1	        2	        3
female	0.968085	0.921053	0.500000
male	0.368852	0.157407	0.135447

Now, the output looks great! That’s it. Pivoting the data using the Groupby function is much easy and offers immediate insights over data.

Pandas Pivot Tables – Pivot_table()

The pivot_table function is something that is dedicated to creating pivot tables. Not only a dedicated function, but also it brings more scalability for pivoting the data.

Let’s see how it works.


data.pivot_table('Survived', 'Sex', 'Pclass')


data.pivot_table('Survived', index='Sex', columns = 'Pclass')

You can use either code. I have just added more readability. Writing readable code is much important though 😛

Pclass	    1	        2	       3
female	0.968085	0.921053	0.500000
male	0.368852	0.157407	0.135447

The one thing which always pushes me to use pivot_table instead of groupby is the simplicity and readability of code. Still produce the same result!

Multilevel Pivoting of data

You know what, with the pivot_table function, you can add multiple options or levels in your quick analysis. Suppose you are interested in adding the attribute ‘age’ as another level, yes, you are free to do so.

#Multi-level pivoting 

attribute_age = pd.cut(data['Age'],[0,18,60])

data.pivot_table('Survived', ['Sex',attribute_age], 'Pclass')
Pclass	               1	            2	         3
Sex	Age			
female	(0, 18]	    0.909091	    1.000000	  0.511628
        (18, 60]	0.972222	    0.900000	  0.413793
male	(0, 18]	    0.800000	    0.600000	  0.215686
        (18, 60]    0.416667	    0.061728	  0.136364

Here, I have used pd.cut function to bin the ‘age’ data. I have added multiple levels. You can see 1-18 as minors and 18-60 are majors.

Just like this, you can always add multiple levels using pivot_table.

Pandas Pivot Table – Aggfunc

In pivot_table the aggregation function will me mean by default. Whereas in group by, you have to pass a string parameter for that.

Here we can use Aggfunc to contain the inputs.

#Using aggfunc

data.pivot_table(index='Sex', columns='Pclass',
 aggfunc={'Survived':sum, 'Fare':'mean'})
	                   Fare	                 Survived
Pclass     	1	        2	        3	    1	2	3
female	106.125798	21.970121	16.118810	91	70	72
male	67.226127	19.741782	12.661633	45	17	47

Awesome right!

Pandas Pivot table – Margins()

The margins parameter is used to add all the rows and columns in the pivoting table. You can also call it as grand totaling.

Let’s see how it works.

#Adding margins parameter

data.pivot_table('Survived', index='Sex', columns='Pclass', margins=True)
Pclass	    1	        2	       3	      All
female	0.968085	0.921053	0.500000	0.742038
male	0.368852	0.157407	0.135447	0.188908
All	    0.629630	0.472826	0.242363	0.383838

This is what we got and it’s of course perfect!

  • With this output, we can say that the overall survival rate is 38%.
  • Over all female survival rate is 74%.
  • Finally, overall male survival rate is 18%.

It’s fascinating that without any fancy libraries and robust analysis, just with the help of pivot tables, we have found something which worth more.

Ending note

I found a function names pivot_table and was excited to share this cool function with you all. I hope this can save you valuable time and with minimum code, you can get awesome insights.

That’s all for now. Happy Python 🙂

More read: pivot_table() – pandas documentation

Generic selectors
Exact matches only
Search in title
Search in content