Nov 21, 2021
import numpy as np
import pandas as pd
crosstab, pivot_table
titanic = pd.read_csv('https://raw.githubusercontent.com/shekhar270779/Learn_ML/main/datasets/Titanic.csv')titanic.head()
Cross Tabulation / Contingency table
pd.crosstab(titanic.Survived, titanic.Pclass)
rows = titanic.Survived
cols = titanic.Pclass
pd.crosstab(rows, cols, margins=['rows','columns'])
pd.crosstab(titanic.Survived, titanic.Pclass, normalize=True, margins=True)
Pivoting
pd.pivot_table(index='Survived', columns='Pclass', values=['Fare'], aggfunc=lambda x : np.mean(x), data=titanic)
pd.pivot_table(index='Survived', columns='Pclass', values=['Age'], aggfunc=lambda x : np.mean(x), data=titanic)
titanic.pivot_table(index='Pclass', columns='Sex', values='Age', aggfunc=lambda x : np.mean(x)).unstack()Sex Pclass
female 1 34.611765
2 28.722973
3 21.750000
male 1 41.281386
2 30.740707
3 26.507589
dtype: float64
Challenge on pivot_table, compare avg. age based on
- Survived vs Pclass
- Survived vs Sex
- Class vs Sex
titanic.pivot_table(index='Survived', columns='Pclass', values=['Age'], aggfunc=lambda x : np.mean(x))
titanic.groupby(['Survived', 'Pclass']).agg({'Age': np.mean})