Exploratory Data Analysis

• performing initial investigations on data
• to discover patterns
• to spot anomalies
• to test hypothesis, and
• to check assumptions

Univariate Analysis

“Uni” means one and “variate” means variable, so univariate analysis is analysis of 1 variable at a time.

import numpy as np
import pandas as pd
import matplotlib.pyplot …

Random Sampling

import numpy as np
import pandas as pd

Random sample 70% without replacement

nrows = df.shape
nrows
2449df_sample = df.sample(frac=0.70, replace=False, random_state=100)
df_sample.shape
(1714, 8)

Bootstrap sample

# randomly pick same no. of rows as in dataset but with replacement
bootstrap_sample = df.sample(frac=1, replace=True, random_state=100)
bootstrap_sample.shape
(2449, 8)

Challenge

• Calculate 95% Confidence Interval…

pandas : Post 01

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

Getting Started
Installation

import pandas as pd
print(pd.__version__)
1.1.5import numpy as np
print(np.__version__)
1.19.5

Create a DataFrame from Array

# Set the seed for random values generator
np.random.seed(100)
#…

Post 03 — Sorting methods

import numpy as npnp.set_printoptions(suppress=True)data = np.genfromtxt(r'./Numpy_Datasets/Lifecyclesavings.csv', delimiter=',', skip_header=1)
print(data.ndim)
print(data.shape)
print(data[:5,:])
2
(50, 5)
[[ 11.43 29.35 2.87 2329.68 2.87]
[ 12.07 23.32 4.41 1507.99 3.93]
[ 13.17 23.8 4.43 2108.47 3.82]
[ 5.75 41.89 1.67 189.13 0.22]
[ 12.88 42.19 0.83 728.47 4.56]]
dt = {'names':["sr","pop15","pop75","dpi","ddpi"]…

Post 02: Meshgrid

Meshgrid is a numpy function that creates 2d rectangular arrays from two 1d arrays.
It provides all possible combinations of 1d array.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (10,7)
x = np.arange(-3,4,1)
y = np.arange(-5,6,1)
print("X:")
print(x)
print("\nY:")
print(y)
X:
[-3 -2 -1 0… 