import numpy as npimport pandas as pdimport matplotlib.pyplot as plt
%matplotlib inline
df= pd.read_csv('./datasets/customer_churn.csv')df.head(3)
png

Normalization

Exploratory Data Analysis

  • performing initial investigations on data
  • to discover patterns
  • to spot anomalies
  • to test hypothesis, and
  • to check assumptions

Univariate Analysis

“Uni” means one and “variate” means variable, so univariate analysis is analysis of 1 variable at a time.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import numpy as np
import pandas as pd

crosstab, pivot_table

titanic = pd.read_csv('https://raw.githubusercontent.com/shekhar270779/Learn_ML/main/datasets/Titanic.csv')titanic.head()
png

Cross Tabulation / Contingency table

pd.crosstab(titanic.Survived, titanic.Pclass)

Method Chaining

import pandas as pd
import numpy as np
df = pd.read_csv(r'https://raw.githubusercontent.com/shekhar270779/Learn_ML/main/datasets/Property_Crimes.csv')df.head(3)
png
# rename columns to lower case
df.columns = df.columns.str.lower()
df.head(3)

pandas : Post 04

Random Sampling

import numpy as np
import pandas as pd
df = pd.read_csv(r'https://raw.githubusercontent.com/shekhar270779/Learn_ML/main/datasets/Property_Crimes.csv')df.head()
png

Random sample 70% without replacement

nrows = df.shape[0]
nrows
2449df_sample = df.sample(frac=0.70, replace=False, random_state=100)
df_sample.shape
(1714, 8)

Bootstrap sample

# randomly pick same no. of rows as in dataset but with replacement
bootstrap_sample = df.sample(frac=1, replace=True, random_state=100)
bootstrap_sample.shape
(2449, 8)

Challenge

  • Calculate 95% Confidence Interval…
import numpy as np
import pandas as pd
df = pd.read_csv(r'https://raw.githubusercontent.com/shekhar270779/Learn_ML/main/datasets/Churn.csv')df.head(3)
png
df.columns = df.columns.str.replace(' ','_')df.head(3)

pandas : Post 01

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

Getting Started
Installation

import pandas as pd
print(pd.__version__)
1.1.5import numpy as np
print(np.__version__)
1.19.5

Create a DataFrame from Array

# Set the seed for random values generator 
np.random.seed(100)
#…

Numpy

Post 04 — Date, Functions, Vectorization

import numpy as np

Search sorted

image.png

Search sort returns the position at which a new number to be inserted into array so that new array remains sorted

np.set_printoptions(threshold=2000)# create an array

np.random.seed(100)
arr = np.random.randint(1, 30, size=100)
arr.sort()
arr
array([ 1, 1, 2, 2, 3, 3…

Numpy

Post 03 — Sorting methods

import numpy as npnp.set_printoptions(suppress=True)data = np.genfromtxt(r'./Numpy_Datasets/Lifecyclesavings.csv', delimiter=',', skip_header=1)
print(data.ndim)
print(data.shape)
print(data[:5,:])
2
(50, 5)
[[ 11.43 29.35 2.87 2329.68 2.87]
[ 12.07 23.32 4.41 1507.99 3.93]
[ 13.17 23.8 4.43 2108.47 3.82]
[ 5.75 41.89 1.67 189.13 0.22]
[ 12.88 42.19 0.83 728.47 4.56]]
dt = {'names':["sr","pop15","pop75","dpi","ddpi"]…

Numpy

Post 02: Meshgrid

Meshgrid is a numpy function that creates 2d rectangular arrays from two 1d arrays.
It provides all possible combinations of 1d array.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (10,7)
x = np.arange(-3,4,1)
y = np.arange(-5,6,1)
print("X:")
print(x)
print("\nY:")
print(y)
X:
[-3 -2 -1 0…

shekhar pandey

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store