Region Selection
Select or exclude specific wavenumber regions for focused analysis.
Overview
import polars as pl
from xpectrass.data import load_jung_2018
from xpectrass.utils import select_region, exclude_regions, FTIR_REGIONS
# Region utilities expect a Polars dataframe with 'sample' and 'label' columns
df_raw = load_jung_2018().head(100)
df = pl.from_pandas(
df_raw.rename(columns={"sample_id": "sample", "type": "label"})
)
# Predefined regions
print(FTIR_REGIONS.keys())
# fingerprint, ch_stretch, carbonyl, aromatic, ...
# Select regions
fingerprint_df = select_region(df, 'fingerprint')
Predefined Regions
Main Regions
Name |
Range (cm⁻¹) |
Description |
|---|---|---|
|
400-4000 |
Complete spectrum |
|
400-1500 |
Unique molecular signatures |
|
1500-4000 |
Functional group region |
Functional Groups
Name |
Range (cm⁻¹) |
Description |
|---|---|---|
|
2800-3100 |
C-H stretching |
|
1350-1480 |
C-H bending |
|
1650-1800 |
C=O stretch |
|
1400-1600 |
Aromatic ring |
|
3200-3600 |
O-H stretching |
|
1000-1300 |
C-O-C stretch |
Plastic-Specific
Name |
Range (cm⁻¹) |
Plastic |
|---|---|---|
|
700-750 |
PE identification |
|
1370-1380 |
PP CH₃ deformation |
|
690-760 |
PS benzene |
|
1710-1730 |
PET C=O |
|
600-700 |
PVC C-Cl |
Atmospheric
Name |
Range (cm⁻¹) |
Source |
|---|---|---|
|
2300-2400 |
CO₂ |
|
1350-1900 |
H₂O |
|
3550-3900 |
H₂O |
Functions
select_region
Select specific wavenumber ranges:
# By name
fingerprint = select_region(df, 'fingerprint')
# By range
ch_region = select_region(df, (2800, 3100))
# Multiple regions
selected = select_region(df, [(400, 1500), (2800, 3100)])
exclude_regions
Remove specific ranges:
# Exclude atmospheric
no_atm = exclude_regions(df, 'co2')
# Exclude multiple
clean = exclude_regions(df, [
(2300, 2400), # CO2
(3550, 3900) # H2O
])
exclude_atmospheric
Convenience function to exclude all atmospheric bands:
from xpectrass.utils import exclude_atmospheric
clean_df = exclude_atmospheric(df)
NumPy Functions
For working with arrays directly:
from xpectrass.utils import select_region_np, select_regions_np
# Single region
selected_int, selected_wn = select_region_np(
intensities, wavenumbers, start=400, end=1500
)
# Multiple regions
selected_int, selected_wn = select_regions_np(
intensities, wavenumbers,
regions=[(400, 1500), (2800, 3100)]
)
Analysis
Analyze intensity statistics across regions:
from xpectrass.utils import analyze_regions
stats = analyze_regions(df)
print(stats)
# region start_cm end_cm n_points mean_intensity ...
# 0 fingerprint 400 1500 1150 97.5 ...
# 1 ch_stretch 2800 3100 312 85.2 ...
Helper Functions
from xpectrass.utils import (
get_region_names, # List all region names
get_region_range, # Get range for named region
get_wavenumbers, # Extract wavenumber array from df
get_spectra_matrix # Extract spectra as numpy matrix
)
# Get wavenumber array
wavenumbers = get_wavenumbers(df)
# Get spectra matrix
spectra = get_spectra_matrix(df) # Shape: (n_samples, n_wavenumbers)
Example: Classification Regions
from xpectrass.utils import select_region, get_wavenumbers
# Key regions for plastic classification
classification_regions = [
(400, 1800), # Fingerprint + carbonyl
(2700, 3100) # CH stretch region
]
training_df = select_region(df, classification_regions)
print(f"Reduced from {len(get_wavenumbers(df))} to {len(get_wavenumbers(training_df))} features")