1️⃣ What is EDA?
Exploratory Data Analysis (EDA) means exploring the data before applying any machine learning algorithm.
The word exploratory itself means to explore something carefully.
2️⃣ Real-Life Example
Imagine you move into a new house. What do you do first?
- Visit the hall
- Check the kitchen
- Look at bedrooms
- See balconies and terrace
- Check if there is a garden
You try to understand:
- How many rooms are there?
- How spacious is it?
- What facilities are available?
👉 This process is called exploration.
Similarly, when we receive a dataset, we first explore the data to understand it properly. This process is called EDA (Exploratory Data Analysis).
3️⃣ How Do We Explore Data?
There are mainly two approaches:
1️⃣ Statistical Approach
In this method, we calculate:
- Mean
- Median
- Mode
- Minimum & Maximum
- Range
- Quartiles
- Standard Deviation
These statistics help us understand:
- What is the average value?
- How spread out is the data?
- What is the central tendency?
2️⃣ Visualization Approach
Common visualization tools:
- 📊 Histogram
- 📦 Box Plot
- 📈 Scatter Plot
- 🥧 Pie Chart
- 📉 Trend Line
- 📊 Area Plot
Visualization helps us understand:
- Distribution of data
- Patterns
- Trends
- Relationships
4️⃣ Why is EDA Important?
EDA is performed at the initial stage of machine learning.
It helps us:
✅ 1. Understand Variables
Example dataset variables:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
- Species (Target variable)
All these are called:
- Variables
- Features
- Attributes
- Fields
- Columns
(All mean the same thing.)
EDA helps us understand:
- Average value
- Median
- Distribution
- Spread
✅ 2. Detect Errors and Missing Values
During EDA, we can identify:
- Missing values
- Incorrect entries
- Data entry mistakes
- Outliers
📌 What are Outliers?
Outliers are values that are very different from other values. They are like the “odd one out” in a group.
Example:
If most students score between 60–80 marks, and one student scores 5, that 5 is an outlier.
EDA helps us detect such unusual values.
✅ 3. Identify Patterns and Trends
Through visualization, we can identify:
- Increasing trends
- Decreasing trends
- Clusters
- Relationships
EDA makes further steps like:
- Clustering
- Recommendation
- Prediction
- Data Mining
- Machine Learning
much easier.
Types of EDA
1️⃣ Univariate Analysis
Uni = One
Variate = Variable
👉 Analysis of one variable at a time
Examples:
- What is the average score?
- What is the distribution of rainfall?
In univariate analysis, we analyze one variable alone.
2️⃣ Bivariate Analysis
Bi = Two
Variate = Variable
👉 Analysis of two variables together
Focus: Cause and Effect Relationship
Examples:
- More study hours → Higher marks
- Fast driving → Accident
- Rainfall → Crop Yield
- Temperature → Ice Cream Sales
Here, we analyze:
- Relationship between two variables
- Correlation
- Dependency
3️⃣ Multivariate Analysis
Multi = Many variables
👉 Analysis of more than two variables
Advanced techniques used:
- Cluster Analysis
- Factor Analysis
- Multiple Regression
- Principal Component Analysis (PCA)
Example:
Student marks depend on:
- Study hours
- Attendance
- Prior knowledge
- Practice
- Sleep
All together influence performance.

