# Normalization Normalization standardizes spectral intensities to enable comparison between samples measured under different conditions. ## Overview ### Using FTIRdataprocessing Class (Recommended) The easiest way to apply normalization is through the `FTIRdataprocessing` class with built-in evaluation: ```python from xpectrass import FTIRdataprocessing # Initialize with your data ftir = FTIRdataprocessing(df, label_column="type") # Convert to absorbance first (normalization expects absorbance-scale spectra) df_abs = ftir.convert(mode="to_absorbance", plot=False) # Step 1: Evaluate all normalization methods to find the best one norm_results = ftir.find_normalization_method( data=df_abs, methods="FTIR", n_splits=5, ) # Step 2: View evaluation results print(norm_results.head()) # Step 3: Apply the best method best_method = norm_results.iloc[0]["method"] ftir.normalize(data=df_abs, method=best_method, plot=False) # Step 4: Access normalized data normalized_df = ftir.df_norm ``` ### Using Utility Functions Directly For standalone use or custom pipelines: ```python from xpectrass.utils import normalize, normalize_method_names # See available methods print(normalize_method_names()) # e.g. ['adaptive_regional', 'area', 'curvature_weighted', ..., 'snv', 'vector'] # Apply normalization to a single spectrum normalized = normalize(intensities, method='snv') ``` ## Available Methods | Method | Formula | Use Case | |--------|---------|----------| | `snv` | (x - mean) / std | **Default** - removes scatter effects | | `vector` | x / ‖x‖₂ | Compare spectral shapes | | `minmax` | (x - min) / (max - min) | Scale to [0, 1] | | `area` | x / sum(\|x\|) | Total area = 1 | | `peak` | x / x[ref] | Normalize to reference peak | | `range` | x / (max - min) | Preserve relative intensities | | `max` | x / max(\|x\|) | Maximum = 1 | ## Method Details ### Standard Normal Variate (SNV) - Recommended Centers and scales each spectrum to have mean=0 and std=1. Effectively removes multiplicative scatter effects. ```python normalized = normalize(intensities, method='snv') # Result: mean ≈ 0, std ≈ 1 ``` ### Vector Normalization (L2) Scales spectrum to unit length (Euclidean norm = 1). ```python normalized = normalize(intensities, method='vector') # Result: ||normalized||₂ = 1 ``` ### Min-Max Normalization Scales values to a specified range (default [0, 1]). ```python normalized = normalize( intensities, method='minmax', feature_range=(0, 1) ) ``` ### Area Normalization Scales so total absolute area equals 1. ```python normalized = normalize(intensities, method='area') # Result: sum(|normalized|) = 1 ``` ### Peak Normalization Normalizes by intensity at a specific peak position. ```python normalized = normalize( intensities, method='peak', peak_idx=1500 # Index of reference peak ) ``` ## Scaling Methods for PCA/PLS ### Mean Centering Essential preprocessing for PCA - centers each variable (wavenumber) across samples. ```python from xpectrass.utils import mean_center # Returns centered data and mean for reconstruction centered, mean = mean_center(spectra_matrix, axis=0, return_mean=True) ``` ### Auto-Scaling Mean centering + unit variance scaling. Each variable has mean=0, std=1. ```python from xpectrass.utils import auto_scale scaled, mean, std = auto_scale(spectra_matrix, return_params=True) ``` ### Pareto Scaling Less aggressive than auto-scaling - divides by sqrt(std) instead of std. ```python from xpectrass.utils import pareto_scale scaled, mean, std = pareto_scale(spectra_matrix, return_params=True) ``` ## Detrending Remove polynomial trends (often combined with SNV): ```python from xpectrass.utils import detrend, snv_detrend # Linear detrending detrended = detrend(intensities, order=1) # SNV + detrending (common combination) snv_dt = snv_detrend(intensities, detrend_order=1) ``` ## Batch Operations ### Normalize Multiple Spectra ```python from xpectrass.utils import normalize_df normalized_matrix = normalize_df(spectra_matrix, method="snv") ``` ### DataFrame Operations ```python from xpectrass.utils import normalize_df, mean_center # Normalize Polars DataFrame normalized_df = normalize_df( df, method="snv", label_column="type", exclude_columns=["study", "sample_id", "environmental", "resolution"], ) # Mean-center spectral matrix for PCA spectral_cols = [c for c in normalized_df.columns if c not in ["study", "sample_id", "type", "environmental", "resolution"]] centered_matrix, mean = mean_center(normalized_df[spectral_cols].to_numpy(), return_mean=True) ``` ## Comparison | Method | Removes Offset | Removes Scale Diff | Preserves Shape | PCA Ready | |--------|---------------|-------------------|-----------------|-----------| | SNV | ✓ | ✓ | ✓ | ✓ | | Vector | ✗ | ✓ | ✓ | Needs centering | | MinMax | Partial | ✓ | ✓ | Needs centering | | Area | ✗ | ✓ | ✓ | Needs centering | | Mean Center | ✓ | ✗ | ✓ | ✓ | | Auto-Scale | ✓ | ✓ | ✗ | ✓ | ## Recommendations | Task | Recommended Method | |------|-------------------| | General preprocessing | `snv` | | Classification | `snv` or `vector` | | PCA/PLS | `snv` + `mean_center` or `auto_scale` | | Quantitative analysis | `peak` (internal standard) | | Visual comparison | `minmax` | ## Example ```python from xpectrass.utils import ( normalize, normalize_df, mean_center, snv_detrend ) import numpy as np # Single spectrum spectrum = load_spectrum('sample.csv') # SNV normalization snv_spectrum = normalize(spectrum, method='snv') # SNV + detrending (scatter correction) snv_dt = snv_detrend(spectrum) # Batch processing spectra = np.vstack([load_spectrum(f) for f in files]) # Normalize all normalized = normalize_df(spectra, method="snv") # Mean center for PCA centered, mean = mean_center(normalized, return_mean=True) ```