Baseline Correction
Baseline correction removes instrumental artifacts and background signals from FTIR spectra.
Overview
The xpectrass baseline module wraps pybaselines to provide 50+ baseline correction algorithms through a unified interface.
Using FTIRdataprocessing Class (Recommended)
The easiest way to apply baseline correction is through the FTIRdataprocessing class:
from xpectrass import FTIRdataprocessing
from xpectrass.data import load_jung_2018
# Initialize with sample data
df = load_jung_2018().head(80)
ftir = FTIRdataprocessing(df=df, label_column="type")
# Prepare denoised input first (baseline evaluation expects processed spectra)
df_denoised = ftir._get_denoised_data(denoising_method="wavelet", plot=False)
# Step 1: Evaluate baseline methods
rfzn, nar, snr = ftir.find_baseline_method(
data=df_denoised,
flat_windows=[(1800, 1900), (2400, 2700)],
baseline_methods="FTIR",
n_samples=20,
plot=False,
)
# Step 2: View evaluation metrics
print(ftir.rfzn_tbl) # Residual Flatness in Zero Noise
print(ftir.nar_tbl) # Negative Absorbance Ratio
print(ftir.snr_tbl) # Signal-to-Noise Ratio
# Step 3: Plot evaluation results
ftir.plot_rfzn_nar_snr(metric_name="RFZN")
# Step 4: Apply the best method
ftir.correct_baseline(data=df_denoised, method="asls", lam=1e6, plot=False)
# Step 5: Access corrected data
corrected_df = ftir.df_corr
Using Utility Functions Directly
For standalone use or custom pipelines:
from xpectrass.utils import baseline_correction, baseline_method_names
# See all available methods
print(baseline_method_names())
# Apply baseline correction to a single spectrum
corrected = baseline_correction(intensities, method='airpls', lam=1e6)
Available Methods
Whittaker-Based Methods
Method |
Description |
Best For |
|---|---|---|
|
Asymmetric Least Squares |
General purpose |
|
Adaptive Iteratively Reweighted Penalized LS |
Default choice for FTIR |
|
Asymmetrically Reweighted Penalized LS |
Strong baselines |
|
Improved AsLS |
Better peak preservation |
|
Peak-Screening AsLS |
Sharp peaks |
|
Adaptive Smoothness Penalized LS |
Variable smoothness |
Polynomial-Based Methods
Method |
Description |
|---|---|
|
Standard polynomial fit |
|
Modified polynomial |
|
Iterative modified polynomial |
|
Penalized polynomial |
|
Local regression |
Morphological Methods
Method |
Description |
|---|---|
|
Morphological opening |
|
Iterative morphological |
|
Morphological and mollification |
|
Rolling ball algorithm |
|
Top-hat transform |
Spline-Based Methods
Method |
Description |
|---|---|
|
Mixture model approach |
|
Iteratively reweighted spline quantile regression |
|
Penalized spline AsLS |
Custom Methods
Method |
Description |
|---|---|
|
Median filter baseline |
|
Adaptive minimum filter |
Function Reference
baseline_correction
corrected = baseline_correction(
intensities, # 1-D array of intensities
method='airpls', # Algorithm name
window_size=101, # For custom windowed filters
poly_order=4, # For polynomial methods
clip_negative=True, # Set negative values to 0
return_baseline=False, # Return (corrected, baseline) tuple
**kwargs # Method-specific parameters
)
Common Parameters by Method
For Whittaker methods (asls, airpls, arpls, etc.):
lam: Smoothness parameter (typically 1e4 to 1e8). Higher = smoother baseline.p: Asymmetry parameter (typically 0.001 to 0.1). Lower = less peak influence.
For polynomial methods:
poly_order: Polynomial degree (typically 2-6)
Evaluation
Compare baseline methods using RFZN and NAR metrics:
from xpectrass.utils import evaluate_baseline_correction_methods
# Define flat regions (known baseline-only areas)
flat_windows = [(2500, 2600), (3350, 3450)]
# Evaluate all methods
rfzn, nar, snr = evaluate_baseline_correction_methods(
data=df,
flat_windows=flat_windows,
label_column="type",
exclude_columns=["study", "sample_id", "environmental", "resolution"],
baseline_methods=["asls", "airpls", "arpls"],
n_samples=20,
sample_selection="random",
)
# Lower RFZN and NAR = better baseline correction
print("Best methods by RFZN:", rfzn.mean().sort_values().head())
Metrics
Metric |
Full Name |
Interpretation |
|---|---|---|
RFZN |
Residual Flat-Zone Noise |
RMS of corrected signal in known baseline regions. Lower = better. |
NAR |
Negative Area Ratio |
Fraction of negative area. Lower = better. |
SNR |
Signal-to-Noise Ratio |
Peak height / noise. Higher = better. |
Visualization
from xpectrass.utils import plot_baseline_correction_metric_boxes
# Visualize RFZN distribution for evaluated methods
plot_baseline_correction_metric_boxes(
df=rfzn,
metric_name="RFZN",
)
Recommendations for Plastics
Plastic Type |
Recommended Method |
Notes |
|---|---|---|
HDPE, LDPE |
|
Strong CH peaks, smooth baseline |
PET |
|
Complex spectrum |
PP |
|
Similar to PE |
PS |
|
Aromatic features |
PVC |
|
May have strong baseline drift |
Example
import numpy as np
from xpectrass.utils import baseline_correction
# Load spectrum
wavenumbers = np.linspace(400, 4000, 3751)
intensities = load_spectrum('HDPE1.csv')
# Apply baseline correction
corrected = baseline_correction(
intensities,
method='airpls',
lam=1e6
)
# Get baseline for visualization
corrected, baseline = baseline_correction(
intensities,
method='airpls',
lam=1e6,
return_baseline=True
)