Pivot Tables in Pandas, Python

Pivot tables

ProTip: Refer to Pandas User Guide

While pivot() provides general purpose pivoting with various data types (strings, numerics, etc.), pandas also provides pivot_table() for pivoting with aggregation of numeric data.

The function pivot_table() can be used to create spreadsheet-style pivot tables. See the cookbook for some advanced strategies.

It takes a number of arguments:

data: a DataFrame object.
values: a column or a list of columns to aggregate.
index: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
columns: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
aggfunc: function to use for aggregation, defaulting to numpy.mean.

Example

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline 

data_url = 'https://s3.amazonaws.com/tripdata/201907-citibike-tripdata.csv.zip'

data = pd.read_csv(data_url)

data.head()

	tripduration	starttime	stoptime	start station id	start station name	start station latitude	start station longitude	end station id	end station name	end station latitude	end station longitude	bikeid	usertype	birth year	gender
0	897	2019-07-01 00:00:00.1320	2019-07-01 00:14:58.0040	493.0	W 45 St & 6 Ave	40.756800	-73.982912	454.0	E 51 St & 1 Ave	40.754557	-73.965930	18340	Subscriber	1966	1
1	267	2019-07-01 00:00:05.1780	2019-07-01 00:04:32.4500	3143.0	5 Ave & E 78 St	40.776321	-73.964274	3226.0	W 82 St & Central Park West	40.782750	-73.971370	21458	Customer	1996	1
2	2201	2019-07-01 00:00:05.2130	2019-07-01 00:36:46.7490	317.0	E 6 St & Avenue B	40.724537	-73.981854	3469.0	India St & West St	40.731814	-73.959950	39874	Subscriber	1986	1
3	1660	2019-07-01 00:00:08.6010	2019-07-01 00:27:48.8050	249.0	Harrison St & Hudson St	40.718710	-74.009001	369.0	Washington Pl & 6 Ave	40.732241	-74.000264	38865	Subscriber	1988	1
4	109	2019-07-01 00:00:12.1580	2019-07-01 00:02:01.5670	3552.0	W 113 St & Broadway	40.805973	-73.964928	3538.0	W 110 St & Amsterdam Ave	40.802692	-73.962950	30256	Subscriber	1997	1

data['starttime'] = pd.to_datetime(data['starttime'])
data.set_index('starttime', inplace=True)

Create a pivoted table with count of rides for every day

pivoted = data.pivot_table(values='bikeid', index=data.index.hour, columns=data.index.date, aggfunc=lambda x: len(x.unique()))

pivoted.iloc[:5, :5]

	2019-07-01	2019-07-02	2019-07-03	2019-07-04	2019-07-05
starttime
0	605	500	513	748	865
1	324	241	301	504	450
2	176	161	162	307	291
3	105	104	98	180	181
4	156	154	141	132	122

ax = pivoted.plot(legend=False, alpha=0.4, figsize=(20, 10), title='Hourly Ride counts');
ax.set_xlabel("Hour of the day")
ax.set_ylabel("Total rides");

png

Share on

Twitter Facebook LinkedIn

Pivot Tables in Pandas, Python

Pivot tables

Example

Create a pivoted table with count of rides for every day

Share on

You may also enjoy

NIFTY Index Performance

How do we do Agile in Data Science?

Enjoy Enjaami

Altair Stock Charts Example