User Guide
Overview
This user guide provides comprehensive documentation for the xpectrass FTIR data processing and analysis library. The library is built around two main classes:
FTIRdataprocessing: Handles all preprocessing steps with evaluation and visualization
FTIRdataanalysis: Provides statistical analysis, dimensionality reduction, clustering, and machine learning
Each section covers:
Purpose: Why this feature is important
Available Methods: All implemented algorithms
Parameters: Configurable options
Examples: Code snippets and use cases
Best Practices: Recommendations for FTIR plastic classification
Quick Start
Basic Preprocessing Workflow
from xpectrass import FTIRdataprocessing
import pandas as pd
# Load your FTIR data
df = pd.read_csv("ftir_data.csv", index_col=0)
# Initialize the preprocessing pipeline
ftir = FTIRdataprocessing(
df,
label_column="type",
wn_min=400,
wn_max=4000
)
# Step 1: Convert to absorbance (if needed)
ftir.convert(mode="to_absorbance", plot=True)
# Step 2: Remove atmospheric interference
ftir.exclude_interpolate(method="spline", plot=True)
# Step 3: Find and apply best baseline correction
ftir.find_baseline_method(n_samples=50, plot=True)
ftir.correct_baseline(method="asls", plot=True)
# Step 4: Find and apply best denoising
ftir.find_denoising_method(n_samples=50, plot=True)
ftir.denoise_spect(method="savgol")
# Step 5: Evaluate and apply normalization
norm_results = ftir.find_normalization_method(data=ftir.df_denoised, n_splits=5)
best_norm = norm_results.iloc[0]["method"]
ftir.normalize(method=best_norm)
# Step 6: Compare all processing stages
ftir.plot_multiple_spec(sample="Sample1")
# Get the processed data
processed_df = ftir.df_norm
Basic Analysis Workflow
from xpectrass import FTIRdataanalysis
# Initialize analysis with processed data
analysis = FTIRdataanalysis(processed_df, label_column="type")
# Visualize spectra
analysis.plot_mean_spectra()
analysis.plot_heatmap()
# Dimensionality reduction
analysis.plot_pca()
analysis.plot_tsne()
analysis.plot_umap()
# Statistical analysis
analysis.perform_anova()
analysis.plot_correlation()
# Machine learning
analysis.ml_prepare_data(test_size=0.2)
results = analysis.run_all_models()
analysis.model_parameter_tuning(number_of_models=3)
Preprocessing Pipeline Order
The recommended preprocessing order is:
1. Data Validation → Ensure data quality
2. Conversion → Transmittance ↔ Absorbance conversion
3. Atmospheric Corr. → Remove CO₂/H₂O interference
4. Baseline Correction → Remove instrumental artifacts
5. Denoising → Reduce high-frequency noise
6. Scatter Correction → Correct for scattering effects (optional)
7. Normalization → Standardize intensity scales
8. Derivatives → Enhance spectral features (optional)
Key Features
Evaluation-First Approach
Xpectrass uses an evaluation-first philosophy - for each major preprocessing step, you can evaluate multiple methods to find the best one for your data:
Baseline correction: 50+ methods evaluated using RFZN, NAR, and SNR metrics
Denoising: 7 methods evaluated using spectral quality metrics
Normalization: 7+ methods evaluated using consistency and quality metrics
Bundled Datasets
The library includes 6 pre-loaded FTIR plastic datasets from published studies:
from xpectrass.data import load_jung_2018, load_all_datasets
# Load a specific dataset
df = load_jung_2018()
# Load all datasets
all_data = load_all_datasets()
Comprehensive Machine Learning
Built-in support for 20+ classification models with:
Automatic hyperparameter tuning
SHAP explainability analysis
Cross-validation and performance metrics
Model comparison visualizations
Main Classes
FTIRdataprocessing
The preprocessing class maintains state through the entire pipeline:
Attribute |
Description |
|---|---|
|
Original data |
|
After transmittance/absorbance conversion |
|
After atmospheric correction |
|
After baseline correction |
|
After denoising |
|
After normalization |
|
After derivative calculation |
FTIRdataanalysis
The analysis class provides visualization, statistics, and machine learning:
Category |
Methods |
|---|---|
Visualization |
|
Dimensionality Reduction |
|
Clustering |
|
Statistics |
|
Machine Learning |
|