Pradip G. Vanparia: Normalizing the Data

1️⃣ What is Normalization?

Normalization means bringing all values to the same scale.

2️⃣ Why Normalization is Needed?

Suppose a column has values:

1, 2, 100

Problem:

100 is very large
1 and 2 are very small
ML algorithm will give more importance to 100

So:

❌ Large values dominate
❌ Small values become less important

This reduces model performance.

3️⃣ Solution → Rescaling

Convert values into a common range like:

✔ 0 to 1
✔ -1 to 1
✔ Around 0

This process is called Normalization / Rescaling.

4️⃣ Example of Rescaling

Original Values:

1, 2, 100

After Scaling (0 to 1 range):

1   → 0.0097
2   → 0.0194
100 → 0.9708

✔ All values are between 0 and 1
✔ No value dominates too much

5️⃣ Types of Normalization Techniques

There are 3 main techniques:

Maximum Absolute Scaling
Min-Max Scaling
Standardization (Z-score)

1️⃣ Maximum Absolute Scaling

Formula:

Xnew = X / |Xmax|

Meaning:

Divide every value by maximum value
Ignore + or – sign
Final range = -1 to 1

Example:

Values: 8, 9, 10
Maximum = 10

8/10  = 0.8
9/10  = 0.9
10/10 = 1

Range: -1 to 1

✅ When to Use?

✔ When data contains positive and negative values
✔ When you want zero values unchanged
✔ Sparse data (text data, TF-IDF)
✔ Values already centered around 0

❌ Not Good When:

Data has extreme outliers
Minimum value is important

2️⃣ Min-Max Scaling (Most Common)

Formula:

Xnew = (X - Xmin) / (Xmax - Xmin)

Range: 0 to 1

Example:

Values: 8, 9, 10

Minimum = 8
Maximum = 10
Range = 2

8  → (8-8)/2  = 0
9  → (9-8)/2  = 0.5
10 → (10-8)/2 = 1

✅ When to Use?

✔ Neural Networks
✔ KNN (distance-based models)
✔ Features have different ranges
✔ No extreme outliers
✔ Image pixel data
✔ Crop price data

❌ Not Good When:

Data has large outliers (compresses other values)

3️⃣ Standardization (Z-Score)

Formula:

Z = (X - Mean) / Standard Deviation

Steps:

Find mean
Subtract mean from each value
Divide by standard deviation

Example:

Values: 8, 9, 10
Mean = 9

8  → (8-9)/SD
9  → (9-9)/SD
10 → (10-9)/SD

✔ Mean becomes 0
✔ Standard deviation becomes 1
✔ No fixed range
✔ Data centered around 0

✅ When to Use?

✔ Data follows normal distribution
✔ Linear Regression
✔ Logistic Regression
✔ SVM
✔ PCA
✔ Data contains outliers

6️⃣ Main Purpose of Normalization

✔ Reduce difference between large and small values
✔ Improve model accuracy
✔ Faster training
✔ Better distance calculation (KNN, SVM)

7️⃣ When to Use Which?

Technique	Range	When to Use
Max Absolute	-1 to 1	Sparse data
Min-Max	0 to 1	Neural Networks, KNN
Standardization	Around 0	Regression, SVM

9️⃣ Is Normalization Required for Dataset?

Step 1️⃣: Analyze Columns

Commodity → Categorical (No normalization needed ❌)
Year → 2005 (Similar scale, usually not normalized ❌)
Month → 1–12 (Small range, optional ❌)
Min Price → ~1800–2000 ✅
Max Price → ~2000–2200 ✅
Modal Price → ~1900–2100 ✅

👉 Since price columns are numerical and may have different ranges across years, normalization is required for some ML models.

✔ Required If Using:

KNN
SVM
Neural Networks
Gradient Descent based models

❌ Not Compulsory For:

Decision Tree
Random Forest
XGBoost

Step 2️⃣: Create Dataset

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Commodity': ['Cotton', 'Cotton', 'Cotton', 'Cotton'],
    'Year': [2005, 2005, 2005, 2005],
    'Month': [9, 10, 11, 12],
    'Min Price (Rs./Quintal)': [1838, np.nan, 1922, 1955],
    'Max Price (Rs./Quintal)': [2080, 2136, 1997, 2038],
    'Modal Price (Rs./Quintal)': [1992, 1997, 1964, 2005]
})

print(df)

Step 3️⃣: Handle Missing Values

You have one missing value in Min Price.

df['Min Price (Rs./Quintal)'] = df['Min Price (Rs./Quintal)'].fillna(
    df['Min Price (Rs./Quintal)'].mean()
)

print(df)

Why mean?

Keeps price distribution stable
Suitable for numeric continuous data

Step 4️⃣: Select Numeric Columns

We do NOT normalize:

Commodity
Year
Month

numeric_cols = [
'Min Price (Rs./Quintal)',
'Max Price (Rs./Quintal)',
'Modal Price (Rs./Quintal)'
]

Step 5️⃣: Apply Normalization Methods

1️⃣ Maximum Absolute Scaling

df_maxabs = df.copy()

for col in numeric_cols:
    df_maxabs[col] = df_maxabs[col] / df_maxabs[col].abs().max()

print(df_maxabs)

✔ Values scaled between 0 and 1

2️⃣ Min-Max Scaling (Most Recommended)

df_minmax = df.copy()

for col in numeric_cols:
    df_minmax[col] = (df_minmax[col] - df_minmax[col].min()) / 
                     (df_minmax[col].max() - df_minmax[col].min())

print(df_minmax)

✔ Values between 0 and 1
✔ Best for crop price prediction

3️⃣ Z-Score Standardization

df_zscore = df.copy()

for col in numeric_cols:
    df_zscore[col] = (df_zscore[col] - df_zscore[col].mean()) / 
                     df_zscore[col].std()

print(df_zscore)

✔ Mean ≈ 0
✔ Some values negative
✔ Good for normally distributed data

Using SKLearn (Recommended)

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_scaled = df.copy()
df_scaled[numeric_cols] = scaler.fit_transform(df[numeric_cols])

print(df_scaled)

🔎 Final Conclusion for Your Crop Price Dataset

✔ Normalization is required for Min, Max, and Modal Price
❌ Not required for Commodity, Year, Month
✔ Use MinMaxScaler for LSTM, ANN, SVM
✔ Use StandardScaler for Linear Regression
✔ Not required for Random Forest

19/02/2026

Normalizing the Data

1️⃣ What is Normalization?

2️⃣ Why Normalization is Needed?

3️⃣ Solution → Rescaling

4️⃣ Example of Rescaling

5️⃣ Types of Normalization Techniques

1️⃣ Maximum Absolute Scaling

✅ When to Use?

❌ Not Good When:

2️⃣ Min-Max Scaling (Most Common)

✅ When to Use?

❌ Not Good When:

3️⃣ Standardization (Z-Score)

✅ When to Use?

6️⃣ Main Purpose of Normalization

7️⃣ When to Use Which?

9️⃣ Is Normalization Required for Dataset?

Step 1️⃣: Analyze Columns

✔ Required If Using:

❌ Not Compulsory For:

Step 2️⃣: Create Dataset

Step 3️⃣: Handle Missing Values

Step 4️⃣: Select Numeric Columns

Step 5️⃣: Apply Normalization Methods

1️⃣ Maximum Absolute Scaling

2️⃣ Min-Max Scaling (Most Recommended)

3️⃣ Z-Score Standardization

Using SKLearn (Recommended)

🔎 Final Conclusion for Your Crop Price Dataset

Popular Posts

Followers