1️⃣ What is Normalization?
Normalization means bringing all values to the same scale.
2️⃣ Why Normalization is Needed?
Suppose a column has values:
1, 2, 100
Problem:
- 100 is very large
- 1 and 2 are very small
- ML algorithm will give more importance to 100
So:
- ❌ Large values dominate
- ❌ Small values become less important
This reduces model performance.
3️⃣ Solution → Rescaling
Convert values into a common range like:
- ✔ 0 to 1
- ✔ -1 to 1
- ✔ Around 0
This process is called Normalization / Rescaling.
4️⃣ Example of Rescaling
Original Values:
1, 2, 100
After Scaling (0 to 1 range):
1 → 0.0097 2 → 0.0194 100 → 0.9708
- ✔ All values are between 0 and 1
- ✔ No value dominates too much
5️⃣ Types of Normalization Techniques
There are 3 main techniques:
- Maximum Absolute Scaling
- Min-Max Scaling
- Standardization (Z-score)
1️⃣ Maximum Absolute Scaling
Formula:
Xnew = X / |Xmax|
Meaning:
- Divide every value by maximum value
- Ignore + or – sign
- Final range = -1 to 1
Example:
Values: 8, 9, 10 Maximum = 10 8/10 = 0.8 9/10 = 0.9 10/10 = 1
Range: -1 to 1
✅ When to Use?
- ✔ When data contains positive and negative values
- ✔ When you want zero values unchanged
- ✔ Sparse data (text data, TF-IDF)
- ✔ Values already centered around 0
❌ Not Good When:
- Data has extreme outliers
- Minimum value is important
2️⃣ Min-Max Scaling (Most Common)
Formula:
Xnew = (X - Xmin) / (Xmax - Xmin)
Range: 0 to 1
Example:
Values: 8, 9, 10 Minimum = 8 Maximum = 10 Range = 2 8 → (8-8)/2 = 0 9 → (9-8)/2 = 0.5 10 → (10-8)/2 = 1
✅ When to Use?
- ✔ Neural Networks
- ✔ KNN (distance-based models)
- ✔ Features have different ranges
- ✔ No extreme outliers
- ✔ Image pixel data
- ✔ Crop price data
❌ Not Good When:
- Data has large outliers (compresses other values)
3️⃣ Standardization (Z-Score)
Formula:
Z = (X - Mean) / Standard Deviation
Steps:
- Find mean
- Subtract mean from each value
- Divide by standard deviation
Example:
Values: 8, 9, 10 Mean = 9 8 → (8-9)/SD 9 → (9-9)/SD 10 → (10-9)/SD
- ✔ Mean becomes 0
- ✔ Standard deviation becomes 1
- ✔ No fixed range
- ✔ Data centered around 0
✅ When to Use?
- ✔ Data follows normal distribution
- ✔ Linear Regression
- ✔ Logistic Regression
- ✔ SVM
- ✔ PCA
- ✔ Data contains outliers
6️⃣ Main Purpose of Normalization
- ✔ Reduce difference between large and small values
- ✔ Improve model accuracy
- ✔ Faster training
- ✔ Better distance calculation (KNN, SVM)
7️⃣ When to Use Which?
| Technique | Range | When to Use |
|---|---|---|
| Max Absolute | -1 to 1 | Sparse data |
| Min-Max | 0 to 1 | Neural Networks, KNN |
| Standardization | Around 0 | Regression, SVM |
9️⃣ Is Normalization Required for Dataset?
Step 1️⃣: Analyze Columns
- Commodity → Categorical (No normalization needed ❌)
- Year → 2005 (Similar scale, usually not normalized ❌)
- Month → 1–12 (Small range, optional ❌)
- Min Price → ~1800–2000 ✅
- Max Price → ~2000–2200 ✅
- Modal Price → ~1900–2100 ✅
👉 Since price columns are numerical and may have different ranges across years, normalization is required for some ML models.
✔ Required If Using:
- KNN
- SVM
- Neural Networks
- Gradient Descent based models
❌ Not Compulsory For:
- Decision Tree
- Random Forest
- XGBoost
Step 2️⃣: Create Dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Commodity': ['Cotton', 'Cotton', 'Cotton', 'Cotton'],
'Year': [2005, 2005, 2005, 2005],
'Month': [9, 10, 11, 12],
'Min Price (Rs./Quintal)': [1838, np.nan, 1922, 1955],
'Max Price (Rs./Quintal)': [2080, 2136, 1997, 2038],
'Modal Price (Rs./Quintal)': [1992, 1997, 1964, 2005]
})
print(df)
Step 3️⃣: Handle Missing Values
You have one missing value in Min Price.
df['Min Price (Rs./Quintal)'] = df['Min Price (Rs./Quintal)'].fillna(
df['Min Price (Rs./Quintal)'].mean()
)
print(df)
Why mean?
- Keeps price distribution stable
- Suitable for numeric continuous data
Step 4️⃣: Select Numeric Columns
We do NOT normalize:
- Commodity
- Year
- Month
numeric_cols = [ 'Min Price (Rs./Quintal)', 'Max Price (Rs./Quintal)', 'Modal Price (Rs./Quintal)' ]
Step 5️⃣: Apply Normalization Methods
1️⃣ Maximum Absolute Scaling
df_maxabs = df.copy()
for col in numeric_cols:
df_maxabs[col] = df_maxabs[col] / df_maxabs[col].abs().max()
print(df_maxabs)
✔ Values scaled between 0 and 1
2️⃣ Min-Max Scaling (Most Recommended)
df_minmax = df.copy()
for col in numeric_cols:
df_minmax[col] = (df_minmax[col] - df_minmax[col].min()) /
(df_minmax[col].max() - df_minmax[col].min())
print(df_minmax)
- ✔ Values between 0 and 1
- ✔ Best for crop price prediction
3️⃣ Z-Score Standardization
df_zscore = df.copy()
for col in numeric_cols:
df_zscore[col] = (df_zscore[col] - df_zscore[col].mean()) /
df_zscore[col].std()
print(df_zscore)
- ✔ Mean ≈ 0
- ✔ Some values negative
- ✔ Good for normally distributed data
Using SKLearn (Recommended)
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() df_scaled = df.copy() df_scaled[numeric_cols] = scaler.fit_transform(df[numeric_cols]) print(df_scaled)
🔎 Final Conclusion for Your Crop Price Dataset
- ✔ Normalization is required for Min, Max, and Modal Price
- ❌ Not required for Commodity, Year, Month
- ✔ Use MinMaxScaler for LSTM, ANN, SVM
- ✔ Use StandardScaler for Linear Regression
- ✔ Not required for Random Forest

