Table of Contents
What is a Pivot Table?
A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum, average, min, max, etc.
A pivot table is a data processing technique to derive useful information from a table.
Pandas pivot_table() function
Pandas pivot_table() function is used to create pivot table from a DataFrame object. We can generate useful information from the DataFrame rows and columns. The pivot_table() function syntax is:
def pivot_table(
data,
values=None,
index=None,
columns=None,
aggfunc="mean",
fill_value=None,
margins=False,
dropna=True,
margins_name="All",
observed=False,
)
- data: the DataFrame instance from which pivot table is created.
- values: column to aggregate.
- index: the column to group by on the pivot table index.
- columns: the column to group by on the pivot table column.
- aggfunc: the aggregate function to run on the data, default is numpy.mean
- fill_value: value to replace null or missing value in the pivot table.
- margins: add all rows/columns. It’s useful in generating grand total of the records.
- dropna: don’t include columns whose entries are all NaN.
- margins_name: Name of the row / column that will contain the totals when margins is True.
- observed: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Pandas Pivot Table Examples
It’s better to use real-life data to understand the actual benefit of pivot tables. I have downloaded a sample CSV file from this link. Here is the direct download link for the CSV file.
The CSV file is a listing of 1,460 company funding records reported by TechCrunch. The below image shows the sample data from the file.
We are interested in the columns – ‘company’, ‘city’, ‘state’, ‘raisedAmt’, and ’round’. Let’s create some pivot tables to generate useful statistics from this data.
1. Simple Pivot Table Example
Let’s try to create a pivot table for average funding by the state.
import pandas as pd
import numpy as np
df = pd.read_csv('TechCrunchcontinentalUSA.csv', usecols=['company', 'city', 'state', 'raisedAmt', 'round'])
print('DataFrame Records:\n', df.head(6))
# average funding by State
df1 = pd.pivot_table(df, values='raisedAmt', columns='state')
print('\nAverage Funding by State:\n', df1)
Output:
DataFrame Records:
company city state raisedAmt round
0 LifeLock Tempe AZ 6850000 b
1 LifeLock Tempe AZ 6000000 a
2 LifeLock Tempe AZ 25000000 c
3 MyCityFaces Scottsdale AZ 50000 seed
4 Flypaper Phoenix AZ 3000000 a
5 Infusionsoft Gilbert AZ 9000000 a
Average Funding by State:
state AZ CA ... VA WA
raisedAmt 5613750.0 1.072324e+07 ... 1.158261e+07 8.140103e+06
[1 rows x 33 columns]
We can also call pivot_table() function directly on the DataFrame object. The above pivot table can be generated using the below code snippet too.
df1 = df.pivot_table(values='raisedAmt', columns='state')
2. Pivot Table with Agreegate Function
The default aggregate function is numpy.mean
. We can specify the aggregate function as numpy.sum
to generate the total funding by the state.
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', aggfunc=np.sum)
print('\nTotal Funding by State:\n', df1)
Output:
Total Funding by State:
state AZ CA CO ... UT VA WA
raisedAmt 50523750 9361385000 126470000 ... 153080000 266400000 789590000
[1 rows x 33 columns]
3. Total Funding by Company
df1 = pd.pivot_table(df, values='raisedAmt', columns='company', aggfunc=np.sum)
print('\nTotal Funding by Company:\n', df1)
Output:
Total Funding by Company:
company 23andMe 3Jam 4HomeMedia ... vbs tv x+1 xkoto
raisedAmt 9000000 4000000 2850000 ... 10000000 16000000 7500000
[1 rows x 909 columns]
4. Setting Index Column in the Pivot Table
Let’s try to create a pivot table for the average funding by round grouped by the state. The trick is to generate a pivot table with ’round’ as the index column.
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round')
print('\nAverage Funding by round in State:\n', df1)
Output:
Average Funding by round in State:
state AZ CA ... VA WA
round ...
a 6.000000e+06 7.158314e+06 ... 9910000.0 6.570476e+06
angel 2.337500e+05 1.006784e+06 ... NaN 8.935714e+05
b 6.850000e+06 1.238483e+07 ... 9850000.0 1.187826e+07
c 2.500000e+07 2.369708e+07 ... 19500000.0 1.592222e+07
d NaN 3.012188e+07 ... 20000000.0 8.500000e+06
debt_round NaN 1.660833e+07 ... NaN NaN
e NaN 3.132500e+07 ... NaN 2.200000e+07
seed 1.466667e+05 8.778214e+05 ... 350000.0 7.800000e+05
unattributed NaN 1.933000e+07 ... NaN 2.050000e+07
[9 rows x 33 columns]
5. Replacing Null Values with a default value
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round', aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by round in State:\n', df1)
Output:
Total Funding by round in State:
state AZ CA CO ... UT VA WA
round ...
a 18000000 2526885000 25650000 ... 31800000 99100000 275960000
angel 233750 74502000 3950000 ... 0 0 12510000
b 6850000 2898050000 66900000 ... 67200000 68950000 273200000
c 25000000 2109040000 28850000 ... 54000000 78000000 143300000
d 0 963900000 0 ... 0 20000000 17000000
debt_round 0 199300000 500000 ... 0 0 0
e 0 250600000 0 ... 0 0 44000000
seed 440000 49158000 620000 ... 80000 350000 3120000
unattributed 0 289950000 0 ... 0 0 20500000
[9 rows x 33 columns]
5. Multiple Index Columns Pivot Table Example
Let’s look at a more complex example. We will create a pivot table of total funding per company per round, state wise.
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index=['company', 'round'], aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by company and round in State:\n', df1)
Output:
Total Funding by round in State:
state AZ CA CO CT DC FL GA ... PA RI TN TX UT VA WA
company round ...
23andMe a 0 9000000 0 0 0 0 0 ... 0 0 0 0 0 0 0
3Jam a 0 4000000 0 0 0 0 0 ... 0 0 0 0 0 0 0
4HomeMedia a 0 2850000 0 0 0 0 0 ... 0 0 0 0 0 0 0
5min a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
angel 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
... .. ... .. .. .. .. .. ... .. .. .. .. .. .. ..
uber b 0 7600000 0 0 0 0 0 ... 0 0 0 0 0 0 0
utoopia seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
vbs tv seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
x+1 a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
xkoto b 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
[1405 rows x 33 columns]