scaling_data

Nov 27, 2021

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt
%matplotlib inlinedf= pd.read_csv('./datasets/customer_churn.csv')df.head(3)

Normalization

Lets take CreditScore column

min_CS = np.min(df.CreditScore)
max_CS = np.max(df.CreditScore)

df['CreditScore_normalize'] = round((df.CreditScore - min_CS)/(max_CS - min_CS),2)print(f"For CreditScore_normalize:\
        \n min:{df['CreditScore_normalize'].min()}\
        \n max:{df['CreditScore_normalize'].max()}\
        \n mean: {df['CreditScore_normalize'].mean():.2f}\
        \n Std: {df['CreditScore_normalize'].std():.2f}")For CreditScore_normalize:        
 min:0.0        
 max:1.0        
 mean: 0.60        
 Std: 0.19

Normalization rescales a dataset so that each value falls between 0 and 1.

Standardization

mu_CS = np.mean(df.CreditScore)
std_CS = np.std(df.CreditScore)

df['CreditScore_standard'] = round((df['CreditScore'] - mu_CS)/std_CS,2)print(f"For CreditScore_standardize:\
        \n min:{df['CreditScore_standard'].min():.2f}\
        \n max:{df['CreditScore_standard'].max():.2f}\
        \n mean: {df['CreditScore_standard'].mean():.2f}\
        \n Std: {df['CreditScore_standard'].std():.2f}")For CreditScore_standardize:        
 min:-3.11        
 max:2.06        
 mean: -0.00        
 Std: 1.00

Standardization rescales data such that new dataset has mean as 0 and sd is 1

Lets Plot these columns

import seaborn as snssns.kdeplot(df.CreditScore)<AxesSubplot:xlabel='CreditScore', ylabel='Density'>

sns.kdeplot(df.CreditScore_normalize)<AxesSubplot:xlabel='CreditScore', ylabel='Density'>

sns.kdeplot(df.CreditScore_standard)<AxesSubplot:xlabel='CreditScore_standard', ylabel='Density'>

Scale of Column (i.e. feature) has changed, however shape of data remains intact

Normalization

Lets take CreditScore column

Standardization

Lets Plot these columns

Written by shekhar pandey

No responses yet