Nov 27, 2021
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt
%matplotlib inlinedf= pd.read_csv('./datasets/customer_churn.csv')df.head(3)
Normalization
Lets take CreditScore column
min_CS = np.min(df.CreditScore)
max_CS = np.max(df.CreditScore)
df['CreditScore_normalize'] = round((df.CreditScore - min_CS)/(max_CS - min_CS),2)print(f"For CreditScore_normalize:\
\n min:{df['CreditScore_normalize'].min()}\
\n max:{df['CreditScore_normalize'].max()}\
\n mean: {df['CreditScore_normalize'].mean():.2f}\
\n Std: {df['CreditScore_normalize'].std():.2f}")For CreditScore_normalize:
min:0.0
max:1.0
mean: 0.60
Std: 0.19
Normalization rescales a dataset so that each value falls between 0 and 1.
Standardization
mu_CS = np.mean(df.CreditScore)
std_CS = np.std(df.CreditScore)
df['CreditScore_standard'] = round((df['CreditScore'] - mu_CS)/std_CS,2)print(f"For CreditScore_standardize:\
\n min:{df['CreditScore_standard'].min():.2f}\
\n max:{df['CreditScore_standard'].max():.2f}\
\n mean: {df['CreditScore_standard'].mean():.2f}\
\n Std: {df['CreditScore_standard'].std():.2f}")For CreditScore_standardize:
min:-3.11
max:2.06
mean: -0.00
Std: 1.00
Standardization rescales data such that new dataset has mean as 0 and sd is 1
Lets Plot these columns
import seaborn as snssns.kdeplot(df.CreditScore)<AxesSubplot:xlabel='CreditScore', ylabel='Density'>
sns.kdeplot(df.CreditScore_normalize)<AxesSubplot:xlabel='CreditScore', ylabel='Density'>
sns.kdeplot(df.CreditScore_standard)<AxesSubplot:xlabel='CreditScore_standard', ylabel='Density'>
- Scale of Column (i.e. feature) has changed, however shape of data remains intact