Normalization
Normalization standardizes spectral intensities to enable comparison between samples measured under different conditions.
Overview
Using FTIRdataprocessing Class (Recommended)
The easiest way to apply normalization is through the FTIRdataprocessing class with built-in evaluation:
from xpectrass import FTIRdataprocessing
# Initialize with your data
ftir = FTIRdataprocessing(df, label_column="type")
# Convert to absorbance first (normalization expects absorbance-scale spectra)
df_abs = ftir.convert(mode="to_absorbance", plot=False)
# Step 1: Evaluate all normalization methods to find the best one
norm_results = ftir.find_normalization_method(
data=df_abs,
methods="FTIR",
n_splits=5,
)
# Step 2: View evaluation results
print(norm_results.head())
# Step 3: Apply the best method
best_method = norm_results.iloc[0]["method"]
ftir.normalize(data=df_abs, method=best_method, plot=False)
# Step 4: Access normalized data
normalized_df = ftir.df_norm
Using Utility Functions Directly
For standalone use or custom pipelines:
from xpectrass.utils import normalize, normalize_method_names
# See available methods
print(normalize_method_names())
# e.g. ['adaptive_regional', 'area', 'curvature_weighted', ..., 'snv', 'vector']
# Apply normalization to a single spectrum
normalized = normalize(intensities, method='snv')
Available Methods
Method |
Formula |
Use Case |
|---|---|---|
|
(x - mean) / std |
Default - removes scatter effects |
|
x / ‖x‖₂ |
Compare spectral shapes |
|
(x - min) / (max - min) |
Scale to [0, 1] |
|
x / sum(|x|) |
Total area = 1 |
|
x / x[ref] |
Normalize to reference peak |
|
x / (max - min) |
Preserve relative intensities |
|
x / max(|x|) |
Maximum = 1 |
Method Details
Standard Normal Variate (SNV) - Recommended
Centers and scales each spectrum to have mean=0 and std=1. Effectively removes multiplicative scatter effects.
normalized = normalize(intensities, method='snv')
# Result: mean ≈ 0, std ≈ 1
Vector Normalization (L2)
Scales spectrum to unit length (Euclidean norm = 1).
normalized = normalize(intensities, method='vector')
# Result: ||normalized||₂ = 1
Min-Max Normalization
Scales values to a specified range (default [0, 1]).
normalized = normalize(
intensities,
method='minmax',
feature_range=(0, 1)
)
Area Normalization
Scales so total absolute area equals 1.
normalized = normalize(intensities, method='area')
# Result: sum(|normalized|) = 1
Peak Normalization
Normalizes by intensity at a specific peak position.
normalized = normalize(
intensities,
method='peak',
peak_idx=1500 # Index of reference peak
)
Scaling Methods for PCA/PLS
Mean Centering
Essential preprocessing for PCA - centers each variable (wavenumber) across samples.
from xpectrass.utils import mean_center
# Returns centered data and mean for reconstruction
centered, mean = mean_center(spectra_matrix, axis=0, return_mean=True)
Auto-Scaling
Mean centering + unit variance scaling. Each variable has mean=0, std=1.
from xpectrass.utils import auto_scale
scaled, mean, std = auto_scale(spectra_matrix, return_params=True)
Pareto Scaling
Less aggressive than auto-scaling - divides by sqrt(std) instead of std.
from xpectrass.utils import pareto_scale
scaled, mean, std = pareto_scale(spectra_matrix, return_params=True)
Detrending
Remove polynomial trends (often combined with SNV):
from xpectrass.utils import detrend, snv_detrend
# Linear detrending
detrended = detrend(intensities, order=1)
# SNV + detrending (common combination)
snv_dt = snv_detrend(intensities, detrend_order=1)
Batch Operations
Normalize Multiple Spectra
from xpectrass.utils import normalize_df
normalized_matrix = normalize_df(spectra_matrix, method="snv")
DataFrame Operations
from xpectrass.utils import normalize_df, mean_center
# Normalize Polars DataFrame
normalized_df = normalize_df(
df,
method="snv",
label_column="type",
exclude_columns=["study", "sample_id", "environmental", "resolution"],
)
# Mean-center spectral matrix for PCA
spectral_cols = [c for c in normalized_df.columns if c not in ["study", "sample_id", "type", "environmental", "resolution"]]
centered_matrix, mean = mean_center(normalized_df[spectral_cols].to_numpy(), return_mean=True)
Comparison
Method |
Removes Offset |
Removes Scale Diff |
Preserves Shape |
PCA Ready |
|---|---|---|---|---|
SNV |
✓ |
✓ |
✓ |
✓ |
Vector |
✗ |
✓ |
✓ |
Needs centering |
MinMax |
Partial |
✓ |
✓ |
Needs centering |
Area |
✗ |
✓ |
✓ |
Needs centering |
Mean Center |
✓ |
✗ |
✓ |
✓ |
Auto-Scale |
✓ |
✓ |
✗ |
✓ |
Recommendations
Task |
Recommended Method |
|---|---|
General preprocessing |
|
Classification |
|
PCA/PLS |
|
Quantitative analysis |
|
Visual comparison |
|
Example
from xpectrass.utils import (
normalize, normalize_df, mean_center,
snv_detrend
)
import numpy as np
# Single spectrum
spectrum = load_spectrum('sample.csv')
# SNV normalization
snv_spectrum = normalize(spectrum, method='snv')
# SNV + detrending (scatter correction)
snv_dt = snv_detrend(spectrum)
# Batch processing
spectra = np.vstack([load_spectrum(f) for f in files])
# Normalize all
normalized = normalize_df(spectra, method="snv")
# Mean center for PCA
centered, mean = mean_center(normalized, return_mean=True)