Normalization

Normalization standardizes spectral intensities to enable comparison between samples measured under different conditions.

Overview

Using FTIRdataprocessing Class (Recommended)

The easiest way to apply normalization is through the FTIRdataprocessing class with built-in evaluation:

from xpectrass import FTIRdataprocessing

# Initialize with your data
ftir = FTIRdataprocessing(df, label_column="type")

# Convert to absorbance first (normalization expects absorbance-scale spectra)
df_abs = ftir.convert(mode="to_absorbance", plot=False)

# Step 1: Evaluate all normalization methods to find the best one
norm_results = ftir.find_normalization_method(
    data=df_abs,
    methods="FTIR",
    n_splits=5,
)

# Step 2: View evaluation results
print(norm_results.head())

# Step 3: Apply the best method
best_method = norm_results.iloc[0]["method"]
ftir.normalize(data=df_abs, method=best_method, plot=False)

# Step 4: Access normalized data
normalized_df = ftir.df_norm

Using Utility Functions Directly

For standalone use or custom pipelines:

from xpectrass.utils import normalize, normalize_method_names

# See available methods
print(normalize_method_names())
# e.g. ['adaptive_regional', 'area', 'curvature_weighted', ..., 'snv', 'vector']

# Apply normalization to a single spectrum
normalized = normalize(intensities, method='snv')

Available Methods

Method	Formula	Use Case
`snv`	(x - mean) / std	Default - removes scatter effects
`vector`	x / ‖x‖₂	Compare spectral shapes
`minmax`	(x - min) / (max - min)	Scale to [0, 1]
`area`	x / sum(\|x\|)	Total area = 1
`peak`	x / x[ref]	Normalize to reference peak
`range`	x / (max - min)	Preserve relative intensities
`max`	x / max(\|x\|)	Maximum = 1

Method Details

Standard Normal Variate (SNV) - Recommended

Centers and scales each spectrum to have mean=0 and std=1. Effectively removes multiplicative scatter effects.

normalized = normalize(intensities, method='snv')
# Result: mean ≈ 0, std ≈ 1

Vector Normalization (L2)

Scales spectrum to unit length (Euclidean norm = 1).

normalized = normalize(intensities, method='vector')
# Result: ||normalized||₂ = 1

Min-Max Normalization

Scales values to a specified range (default [0, 1]).

normalized = normalize(
    intensities,
    method='minmax',
    feature_range=(0, 1)
)

Area Normalization

Scales so total absolute area equals 1.

normalized = normalize(intensities, method='area')
# Result: sum(|normalized|) = 1

Peak Normalization

Normalizes by intensity at a specific peak position.

normalized = normalize(
    intensities,
    method='peak',
    peak_idx=1500  # Index of reference peak
)

Scaling Methods for PCA/PLS

Mean Centering

Essential preprocessing for PCA - centers each variable (wavenumber) across samples.

from xpectrass.utils import mean_center

# Returns centered data and mean for reconstruction
centered, mean = mean_center(spectra_matrix, axis=0, return_mean=True)

Auto-Scaling

Mean centering + unit variance scaling. Each variable has mean=0, std=1.

from xpectrass.utils import auto_scale

scaled, mean, std = auto_scale(spectra_matrix, return_params=True)

Pareto Scaling

Less aggressive than auto-scaling - divides by sqrt(std) instead of std.

from xpectrass.utils import pareto_scale

scaled, mean, std = pareto_scale(spectra_matrix, return_params=True)

Detrending

Remove polynomial trends (often combined with SNV):

from xpectrass.utils import detrend, snv_detrend

# Linear detrending
detrended = detrend(intensities, order=1)

# SNV + detrending (common combination)
snv_dt = snv_detrend(intensities, detrend_order=1)

Batch Operations

Normalize Multiple Spectra

from xpectrass.utils import normalize_df

normalized_matrix = normalize_df(spectra_matrix, method="snv")

DataFrame Operations

from xpectrass.utils import normalize_df, mean_center

# Normalize Polars DataFrame
normalized_df = normalize_df(
    df,
    method="snv",
    label_column="type",
    exclude_columns=["study", "sample_id", "environmental", "resolution"],
)

# Mean-center spectral matrix for PCA
spectral_cols = [c for c in normalized_df.columns if c not in ["study", "sample_id", "type", "environmental", "resolution"]]
centered_matrix, mean = mean_center(normalized_df[spectral_cols].to_numpy(), return_mean=True)

Comparison

Method	Removes Offset	Removes Scale Diff	Preserves Shape	PCA Ready
SNV	✓	✓	✓	✓
Vector	✗	✓	✓	Needs centering
MinMax	Partial	✓	✓	Needs centering
Area	✗	✓	✓	Needs centering
Mean Center	✓	✗	✓	✓
Auto-Scale	✓	✓	✗	✓

Recommendations

Task	Recommended Method
General preprocessing	`snv`
Classification	`snv` or `vector`
PCA/PLS	`snv` + `mean_center` or `auto_scale`
Quantitative analysis	`peak` (internal standard)
Visual comparison	`minmax`

Example

from xpectrass.utils import (
    normalize, normalize_df, mean_center,
    snv_detrend
)
import numpy as np

# Single spectrum
spectrum = load_spectrum('sample.csv')

# SNV normalization
snv_spectrum = normalize(spectrum, method='snv')

# SNV + detrending (scatter correction)
snv_dt = snv_detrend(spectrum)

# Batch processing
spectra = np.vstack([load_spectrum(f) for f in files])

# Normalize all
normalized = normalize_df(spectra, method="snv")

# Mean center for PCA
centered, mean = mean_center(normalized, return_mean=True)