19/02/2026

Exploratory Data Analysis

1️⃣ What is EDA?

Exploratory Data Analysis (EDA) means exploring the data before applying any machine learning algorithm.

The word exploratory itself means to explore something carefully.


2️⃣ Real-Life Example

Imagine you move into a new house. What do you do first?

  • Visit the hall
  • Check the kitchen
  • Look at bedrooms
  • See balconies and terrace
  • Check if there is a garden

You try to understand:

  • How many rooms are there?
  • How spacious is it?
  • What facilities are available?

👉 This process is called exploration.

Similarly, when we receive a dataset, we first explore the data to understand it properly. This process is called EDA (Exploratory Data Analysis).


3️⃣ How Do We Explore Data?

There are mainly two approaches:

1️⃣ Statistical Approach

In this method, we calculate:

  • Mean
  • Median
  • Mode
  • Minimum & Maximum
  • Range
  • Quartiles
  • Standard Deviation

These statistics help us understand:

  • What is the average value?
  • How spread out is the data?
  • What is the central tendency?

2️⃣ Visualization Approach

Common visualization tools:

  • 📊 Histogram
  • 📦 Box Plot
  • 📈 Scatter Plot
  • 🥧 Pie Chart
  • 📉 Trend Line
  • 📊 Area Plot

Visualization helps us understand:

  • Distribution of data
  • Patterns
  • Trends
  • Relationships



4️⃣ Why is EDA Important?

EDA is performed at the initial stage of machine learning.

It helps us:

✅ 1. Understand Variables

Example dataset variables:

  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width
  • Species (Target variable)

All these are called:

  • Variables
  • Features
  • Attributes
  • Fields
  • Columns

(All mean the same thing.)

EDA helps us understand:

  • Average value
  • Median
  • Distribution
  • Spread

✅ 2. Detect Errors and Missing Values

During EDA, we can identify:

  • Missing values
  • Incorrect entries
  • Data entry mistakes
  • Outliers
📌 What are Outliers?

Outliers are values that are very different from other values. They are like the “odd one out” in a group.

Example:

If most students score between 60–80 marks, and one student scores 5, that 5 is an outlier.

EDA helps us detect such unusual values.


✅ 3. Identify Patterns and Trends

Through visualization, we can identify:

  • Increasing trends
  • Decreasing trends
  • Clusters
  • Relationships

EDA makes further steps like:

  • Clustering
  • Recommendation
  • Prediction
  • Data Mining
  • Machine Learning

much easier.


Types of EDA

1️⃣ Univariate Analysis

Uni = One
Variate = Variable

👉 Analysis of one variable at a time

Examples:

  • What is the average score?
  • What is the distribution of rainfall?

In univariate analysis, we analyze one variable alone.


2️⃣ Bivariate Analysis

Bi = Two
Variate = Variable

👉 Analysis of two variables together

Focus: Cause and Effect Relationship

Examples:

  • More study hours → Higher marks
  • Fast driving → Accident
  • Rainfall → Crop Yield
  • Temperature → Ice Cream Sales

Here, we analyze:

  • Relationship between two variables
  • Correlation
  • Dependency

3️⃣ Multivariate Analysis

Multi = Many variables

👉 Analysis of more than two variables

Advanced techniques used:

  • Cluster Analysis
  • Factor Analysis
  • Multiple Regression
  • Principal Component Analysis (PCA)

Example:

Student marks depend on:

  • Study hours
  • Attendance
  • Prior knowledge
  • Practice
  • Sleep

All together influence performance.

Share This
Previous Post
Next Post