User Guide

Overview

This user guide provides comprehensive documentation for the xpectrass FTIR data processing and analysis library. The library is built around two main classes:

  • FTIRdataprocessing: Handles all preprocessing steps with evaluation and visualization

  • FTIRdataanalysis: Provides statistical analysis, dimensionality reduction, clustering, and machine learning

Each section covers:

  • Purpose: Why this feature is important

  • Available Methods: All implemented algorithms

  • Parameters: Configurable options

  • Examples: Code snippets and use cases

  • Best Practices: Recommendations for FTIR plastic classification

Quick Start

Basic Preprocessing Workflow

from xpectrass import FTIRdataprocessing
import pandas as pd

# Load your FTIR data
df = pd.read_csv("ftir_data.csv", index_col=0)

# Initialize the preprocessing pipeline
ftir = FTIRdataprocessing(
    df,
    label_column="type",
    wn_min=400,
    wn_max=4000
)

# Step 1: Convert to absorbance (if needed)
ftir.convert(mode="to_absorbance", plot=True)

# Step 2: Remove atmospheric interference
ftir.exclude_interpolate(method="spline", plot=True)

# Step 3: Find and apply best baseline correction
ftir.find_baseline_method(n_samples=50, plot=True)
ftir.correct_baseline(method="asls", plot=True)

# Step 4: Find and apply best denoising
ftir.find_denoising_method(n_samples=50, plot=True)
ftir.denoise_spect(method="savgol")

# Step 5: Evaluate and apply normalization
norm_results = ftir.find_normalization_method(data=ftir.df_denoised, n_splits=5)
best_norm = norm_results.iloc[0]["method"]
ftir.normalize(method=best_norm)

# Step 6: Compare all processing stages
ftir.plot_multiple_spec(sample="Sample1")

# Get the processed data
processed_df = ftir.df_norm

Basic Analysis Workflow

from xpectrass import FTIRdataanalysis

# Initialize analysis with processed data
analysis = FTIRdataanalysis(processed_df, label_column="type")

# Visualize spectra
analysis.plot_mean_spectra()
analysis.plot_heatmap()

# Dimensionality reduction
analysis.plot_pca()
analysis.plot_tsne()
analysis.plot_umap()

# Statistical analysis
analysis.perform_anova()
analysis.plot_correlation()

# Machine learning
analysis.ml_prepare_data(test_size=0.2)
results = analysis.run_all_models()
analysis.model_parameter_tuning(number_of_models=3)

Preprocessing Pipeline Order

The recommended preprocessing order is:

1. Data Validation     → Ensure data quality
2. Conversion          → Transmittance ↔ Absorbance conversion
3. Atmospheric Corr.   → Remove CO₂/H₂O interference
4. Baseline Correction → Remove instrumental artifacts
5. Denoising          → Reduce high-frequency noise
6. Scatter Correction  → Correct for scattering effects (optional)
7. Normalization      → Standardize intensity scales
8. Derivatives        → Enhance spectral features (optional)

Key Features

Evaluation-First Approach

Xpectrass uses an evaluation-first philosophy - for each major preprocessing step, you can evaluate multiple methods to find the best one for your data:

  • Baseline correction: 50+ methods evaluated using RFZN, NAR, and SNR metrics

  • Denoising: 7 methods evaluated using spectral quality metrics

  • Normalization: 7+ methods evaluated using consistency and quality metrics

Bundled Datasets

The library includes 6 pre-loaded FTIR plastic datasets from published studies:

from xpectrass.data import load_jung_2018, load_all_datasets

# Load a specific dataset
df = load_jung_2018()

# Load all datasets
all_data = load_all_datasets()

Comprehensive Machine Learning

Built-in support for 20+ classification models with:

  • Automatic hyperparameter tuning

  • SHAP explainability analysis

  • Cross-validation and performance metrics

  • Model comparison visualizations

Main Classes

FTIRdataprocessing

The preprocessing class maintains state through the entire pipeline:

Attribute

Description

df

Original data

converted_df

After transmittance/absorbance conversion

df_atm

After atmospheric correction

df_corr

After baseline correction

df_denoised

After denoising

df_norm

After normalization

df_deriv

After derivative calculation

FTIRdataanalysis

The analysis class provides visualization, statistics, and machine learning:

Category

Methods

Visualization

plot_mean_spectra, plot_overlay_spectra, plot_heatmap, plot_cv

Dimensionality Reduction

plot_pca, plot_tsne, plot_umap, plot_plsda, plot_oplsda

Clustering

plot_kmeans_clus, plot_hierarchical_clus

Statistics

perform_anova, plot_correlation

Machine Learning

run_all_models, model_parameter_tuning, explain_by_shap