Region Selection

Select or exclude specific wavenumber regions for focused analysis.

Overview

import polars as pl
from xpectrass.data import load_jung_2018
from xpectrass.utils import select_region, exclude_regions, FTIR_REGIONS

# Region utilities expect a Polars dataframe with 'sample' and 'label' columns
df_raw = load_jung_2018().head(100)
df = pl.from_pandas(
    df_raw.rename(columns={"sample_id": "sample", "type": "label"})
)

# Predefined regions
print(FTIR_REGIONS.keys())
# fingerprint, ch_stretch, carbonyl, aromatic, ...

# Select regions
fingerprint_df = select_region(df, 'fingerprint')

Predefined Regions

Main Regions

Name

Range (cm⁻¹)

Description

full

400-4000

Complete spectrum

fingerprint

400-1500

Unique molecular signatures

functional

1500-4000

Functional group region

Functional Groups

Name

Range (cm⁻¹)

Description

ch_stretch

2800-3100

C-H stretching

ch_bend

1350-1480

C-H bending

carbonyl

1650-1800

C=O stretch

aromatic

1400-1600

Aromatic ring

oh_stretch

3200-3600

O-H stretching

ether

1000-1300

C-O-C stretch

Plastic-Specific

Name

Range (cm⁻¹)

Plastic

hdpe_ldpe

700-750

PE identification

pp_methyl

1370-1380

PP CH₃ deformation

ps_aromatic

690-760

PS benzene

pet_ester

1710-1730

PET C=O

pvc_ccl

600-700

PVC C-Cl

Atmospheric

Name

Range (cm⁻¹)

Source

co2

2300-2400

CO₂

h2o_bend

1350-1900

H₂O

h2o_stretch

3550-3900

H₂O

Functions

select_region

Select specific wavenumber ranges:

# By name
fingerprint = select_region(df, 'fingerprint')

# By range
ch_region = select_region(df, (2800, 3100))

# Multiple regions
selected = select_region(df, [(400, 1500), (2800, 3100)])

exclude_regions

Remove specific ranges:

# Exclude atmospheric
no_atm = exclude_regions(df, 'co2')

# Exclude multiple
clean = exclude_regions(df, [
    (2300, 2400),   # CO2
    (3550, 3900)    # H2O
])

exclude_atmospheric

Convenience function to exclude all atmospheric bands:

from xpectrass.utils import exclude_atmospheric

clean_df = exclude_atmospheric(df)

NumPy Functions

For working with arrays directly:

from xpectrass.utils import select_region_np, select_regions_np

# Single region
selected_int, selected_wn = select_region_np(
    intensities, wavenumbers, start=400, end=1500
)

# Multiple regions
selected_int, selected_wn = select_regions_np(
    intensities, wavenumbers,
    regions=[(400, 1500), (2800, 3100)]
)

Analysis

Analyze intensity statistics across regions:

from xpectrass.utils import analyze_regions

stats = analyze_regions(df)
print(stats)
#         region  start_cm  end_cm  n_points  mean_intensity  ...
# 0  fingerprint     400    1500      1150          97.5     ...
# 1   ch_stretch    2800    3100       312          85.2     ...

Helper Functions

from xpectrass.utils import (
    get_region_names,    # List all region names
    get_region_range,    # Get range for named region
    get_wavenumbers,     # Extract wavenumber array from df
    get_spectra_matrix   # Extract spectra as numpy matrix
)

# Get wavenumber array
wavenumbers = get_wavenumbers(df)

# Get spectra matrix
spectra = get_spectra_matrix(df)  # Shape: (n_samples, n_wavenumbers)

Example: Classification Regions

from xpectrass.utils import select_region, get_wavenumbers

# Key regions for plastic classification
classification_regions = [
    (400, 1800),    # Fingerprint + carbonyl
    (2700, 3100)    # CH stretch region
]

training_df = select_region(df, classification_regions)
print(f"Reduced from {len(get_wavenumbers(df))} to {len(get_wavenumbers(training_df))} features")