Augmentations in linnaeus
This document provides a comprehensive overview of the augmentation pipeline in linnaeus, with special emphasis on the Selective Mixup and CutMix implementations and their interaction with hierarchical classification heads.
Augmentation Architecture
The linnaeus augmentation system (linnaeus/aug/
) handles transformations applied to data during training. It separates augmentations based on when they are applied:
- Single-Sample Augmentations: Applied during preprocessing within the data loading pipeline (
linnaeus/h5data/
). These include transformations like AutoAugment and Random Erasing. They operate on individual samples before they are batched. - Batch-Wise Augmentations: Applied by the custom dataloader's
collate_fn
(linnaeus/h5data/h5dataloader.py
) after samples are grouped into a batch. This includes Selective Mixup and CutMix.
Augmentation Pipeline Components
Core augmentation logic is defined by abstract base classes in linnaeus.aug.base.py
:
AugmentationPipeline
: Abstract base for single-sample augmentation sequences.AutoAugmentBatch
: Base for AutoAugment implementations (applies a sequence of transformations based on a policy).RandomErasing
: Base for Random Erasing (masks random patches).SelectiveMixup
: Base for group-aware mixup implementations.SelectiveCutMix
: Base for group-aware CutMix implementations.
CPU vs GPU Implementations
Most augmentations have both CPU (linnaeus.aug.cpu.*
) and GPU (linnaeus.aug.gpu.*
) implementations.
- Single-sample augmentations (AutoAugment, RandomErasing) can run on CPU (per-sample during preprocessing) or GPU (batch-oriented within collate_fn). The choice is configurable via
AUG.PIPELINE_DEVICE
. GPU mode offers dramatically improved throughput by eliminating Python overhead. - Batch-wise augmentations (SelectiveMixup, SelectiveCutMix) can run on either CPU or GPU, configured via
SCHEDULE.MIX.USE_GPU
. GPU is generally preferred for speed when tensors are already on the GPU after collation.
The AugmentationPipelineFactory
(linnaeus.aug.factory.py
) creates the appropriate single-sample pipeline based on the configuration.
High-Performance GPU Augmentations with Kornia
When using GPU augmentations (AUG.PIPELINE_DEVICE: 'gpu'
), linnaeus uses the industry-standard Kornia library for robust, maintainable augmentation pipelines. The GPU pipeline provides dramatic performance improvements over CPU-based augmentations by eliminating Python overhead and leveraging optimized CUDA kernels.
Performance Benefits
The GPU augmentation pipeline delivers significant performance improvements: - ~39% Step Time Reduction: GPU pipeline refactoring achieved step time reduction from ~1900ms to ~1160ms baseline - Eliminated Python Overhead: Batch-oriented processing removes per-sample Python dispatch costs - Industry-Standard Operations: Leverages Kornia's optimized CUDA implementations - Clean Performance Baseline: Proper debug logging guards ensure accurate performance measurements
Kornia Integration
The system uses a version-adaptive wrapper (kornia_wrappers.py
) that handles API changes gracefully:
- Policy Support: Supports 'imagenet', 'cifar10', 'svhn', and 'original' policies
- API Compatibility: Automatically handles Kornia version differences
- Graceful Fallback: Falls back to legacy implementations if Kornia unavailable
- Error Handling: Clear error messages and diagnostic information
torch.compile Support (Experimental)
Linnaeus includes experimental torch.compile
support, though comprehensive testing has shown torch.compile is ineffective for stochastic augmentation pipelines:
AUG:
PIPELINE_DEVICE: 'gpu'
GPU_COMPILE:
ENABLED: false # Default: disabled based on validation results
BACKEND: 'inductor' # Compilation backend (default: inductor)
MODE: 'default' # Compilation mode: 'default', 'reduce-overhead', or 'max-autotune'
Important Note: Extensive optimization work (v0.1.4a-e) definitively established that torch.compile cannot achieve kernel fusion for augmentation pipelines containing stochastic operations like RandomErasing and policy selection. The feature is provided for experimental purposes but provides negligible performance benefits.
Capability Probing
The system includes intelligent compilation detection: - Automatic Detection: Tests compilation compatibility at startup - Clear User Feedback: Explicit messages when compilation fails or provides no benefit - Graceful Fallback: Automatically disables compilation if ineffective - No Performance Impact: Fallback to eager mode maintains full functionality
Selective Mixup and CutMix
Overview
The linnaeus system supports two group-aware mixing techniques, Selective Mixup and Selective CutMix, both designed for multi-task, hierarchical settings:
- Selective Mixup (
CPUSelectiveMixup
,GPUSelectiveMixup
): Blends two images using interpolation. - Selective CutMix (
CPUSelectiveCutMix
,GPUSelectiveCutMix
): Pastes a rectangular region from one image onto another.
Key features of both techniques:
- Group-Aware Pairwise Mixing: Mixes only samples belonging to the same group ID. Group IDs are typically derived from a specific taxonomic rank (e.g., species
taxa_L10
) via theGroupedBatchSampler
. See Scheduling Documentation. - Chunk-Wise Metadata Handling: For auxiliary metadata (
aux_info
), both techniques perform a "hard pick" for discrete chunks (derived fromDATA.META.COMPONENTS
) to maintain physical plausibility (e.g., choosing spatial coordinates from one sample or the other, not interpolating them). - Null Sample Exclusion: Can optionally exclude samples with null labels (
class_idx=0
) from being mixed usingSCHEDULE.MIX.EXCLUDE_NULL_SAMPLES: True
.
Key Differences
Selective Mixup | Selective CutMix | |
---|---|---|
Image Mixing | Blends entire image with interpolation: lam * img1 + (1-lam) * img2 |
Pastes a rectangular region from one image onto another |
Label Mixing | Uses same lambda for all labels: lam * target1 + (1-lam) * target2 |
Uses area-adjusted lambda based on patch size: (1 - patch_area/total_area) * target1 + (patch_area/total_area) * target2 |
Use Case | More subtle feature blending | More aggressive feature combination |
Config | Uses SCHEDULE.MIX.MIXUP.ALPHA |
Uses SCHEDULE.MIX.CUTMIX.ALPHA and optional SCHEDULE.MIX.CUTMIX.MINMAX |
Implementation Details
- Location: Both are applied within
H5DataLoader.collate_fn
. - Input: Both receive a batch containing
(images, targets, aux_info, meta_masks, group_ids)
. - Permutation: Both generate an in-group permutation (
_get_ingroup_permutation
), ensuring samplei
is only potentially mixed with another samplej
ifgroup_ids[i] == group_ids[j]
andgroup_ids[i] != -1
. Samples withgroup_id == -1
(including those excluded due to null labels) are never mixed. - Metadata Mixing: Both use the same
_mix_aux_info_chunkwise
method to implement the hard-pick logic for metadata chunks. - Output: Both return
(mixed_images, mixed_targets, mixed_aux_info, mixed_meta_masks)
.
Choosing Between Mixup and CutMix
When both SCHEDULE.MIX.MIXUP.ENABLED
and SCHEDULE.MIX.CUTMIX.ENABLED
are True
, the system uses SCHEDULE.MIX.SWITCH_PROB
to randomly select which technique to apply for each batch:
- For each batch where mixing should apply (based on
SCHEDULE.MIX.PROB
), a random number is generated. - If the random number <
SWITCH_PROB
, CutMix is used; otherwise, Mixup is used. - This dynamic switching creates a more varied augmentation strategy.
Group-Based Batching (GroupedBatchSampler
)
Both Selective Mixup and CutMix rely on the GroupedBatchSampler
(linnaeus/h5data/grouped_batch_sampler.py
) to create batches where samples likely share the same group ID. The sampler supports two modes:
strict-group
mode (default): Each batch contains only samples from a single group.mixed-pairs
mode: Each batch contains pairs of samples from the same group, but different pairs can be from different groups. This allows for more diverse batches while still maintaining in-group mixing compatibility.
The sampler uses the group_ids
array corresponding to the currently active MIX.GROUP_LEVELS
(e.g., taxa_L10
).
Standard Batch Sampler Option
As an alternative to GroupedBatchSampler
, a standard batch sampler can be used by setting DATA.SAMPLER.TYPE: 'standard'
. This disables mixing operations, as mixing requires grouping samples by their group IDs.
Scheduled Level Switching (Currently Disabled)
The configuration allows specifying multiple GROUP_LEVELS
and LEVEL_SWITCH_STEPS
/EPOCHS
to change the grouping criterion during training.
IMPORTANT LIMITATION: As detailed in Design Decisions, scheduled switching of the mixup group level is currently disabled.
- The fields
SCHEDULE.MIX.LEVEL_SWITCH_STEPS
andSCHEDULE.MIX.LEVEL_SWITCH_EPOCHS
must be empty in the configuration. Providing values will result in aNotImplementedError
at startup. - The system will only use the first task key listed in
SCHEDULE.MIX.GROUP_LEVELS
for the entire training duration. - This decision was made to resolve a circular dependency during schedule initialization related to determining the dataloader length.
Excluding Null Samples
Using SCHEDULE.MIX.EXCLUDE_NULL_SAMPLES: True
is crucial when training with taxonomy-aware loss functions or hierarchical heads.
- It calls
utils.aug.exclude_null_samples_from_mixup
before the main mixing logic. - This function identifies samples with null labels (class index 0) for the specified
null_task_keys
(defaults to all tasks if not specified, though typically only the lowest rank matters). - It sets the
group_id
of these null samples to-1
. - The
_get_ingroup_permutation
logic ignores samples withgroup_id == -1
, effectively preventing them from being selected as mixing partners.
Best Practices for Mixing with Taxonomy-Aware Loss / Hierarchical Heads
When using taxonomy-aware components (like TaxonomyAwareLabelSmoothingCE
or HierarchicalSoftmaxHead
), maintaining the integrity of hierarchical relationships during mixing is essential. The following configuration ensures that mixed targets still effectively represent a single, valid taxonomic lineage:
-
Set
SCHEDULE.MIX.GROUP_LEVELS
to ONLY the lowest-rank taxonomic level in your task hierarchy (typically the species level, e.g.,['taxa_L10']
).- This guarantees that mixed pairs share identical labels at the lowest level.
- Due to the tree structure, they must also share identical labels for all higher ranks.
-
Always set
SCHEDULE.MIX.EXCLUDE_NULL_SAMPLES: True
.- Prevents mixing known samples with samples having unknown classifications at the grouping level.
-
For better batch diversity, consider using
DATA.SAMPLER.GROUPED_MODE: 'mixed-pairs'
.- Allows more varied batches while still ensuring all mixed pairs come from the same group.
Why This Works: When these conditions are met, the interpolated mixed_targets
dictionary, although containing soft labels between 0 and 1, still corresponds to a single valid taxonomic path for each sample in the batch. Loss functions like TaxonomyAwareLabelSmoothingCE
can then safely use argmax()
on the 2D mixed target tensor for a given task level to retrieve the correct integer class index needed for gathering the corresponding row from the smoothing matrix. Hierarchical heads process the mixed image features, but the loss is still computed against a target distribution that reflects a consistent taxonomic identity.
Warning: Deviating from this configuration (e.g., grouping by a higher rank like taxa_L20
, allowing multiple group levels, or setting EXCLUDE_NULL_SAMPLES=False
) can lead to ambiguous mixed targets that violate hierarchical constraints, potentially degrading performance, especially with taxonomy-aware components.
Configuration Examples
GPU Augmentations with Kornia (Recommended)
AUG:
PIPELINE_DEVICE: "gpu" # GPU mode for high performance (~39% faster than CPU)
AUTOAUG:
POLICY: 'original' # Maps to 'imagenet' policy in Kornia
COLOR_JITTER: 0.4
RANDOM_ERASE:
PROB: 0.25
AREA_RANGE: [0.02, 0.33]
ASPECT_RATIO: [0.3, 3.3]
GPU_COMPILE:
ENABLED: false # Recommended: keep disabled (torch.compile ineffective)
BACKEND: 'inductor'
MODE: 'default'
CPU Augmentations (Legacy)
AUG:
PIPELINE_DEVICE: "cpu" # CPU mode for compatibility (slower than GPU)
AUTOAUG:
POLICY: 'originalr' # Choose an AutoAugment policy
COLOR_JITTER: 0.4
RANDOM_ERASE:
PROB: 0.25
MODE: 'pixel'
AREA_RANGE: [0.02, 0.4]
Standard BatchSampler (No Mixing)
DATA:
SAMPLER:
TYPE: 'standard' # Uses standard PyTorch BatchSampler (no mixing)
Enabling Selective Mixup Only
DATA:
SAMPLER:
TYPE: 'grouped' # Required for mixing operations
GROUPED_MODE: 'strict-group' # Each batch contains samples from a single group
SCHEDULE:
MIX:
# --- Grouping Configuration ---
GROUP_LEVELS: ['taxa_L10'] # IMPORTANT: Only the first level is used. Must be lowest rank.
# LEVEL_SWITCH_STEPS: [] # MUST BE EMPTY
# LEVEL_SWITCH_EPOCHS: [] # MUST BE EMPTY
MIN_GROUP_SIZE: 4 # Groups smaller than this aren't mixed
EXCLUDE_NULL_SAMPLES: True # IMPORTANT: Keep True for hierarchical consistency
# --- Probability Scheduling ---
PROB:
ENABLED: True
START_PROB: 1.0 # Probability at step 0
END_PROB: 0.2 # Probability at END_FRACTION/STEPS
# Choose ONE end definition:
END_FRACTION: 0.5 # Reach END_PROB at 50% of total steps
# END_STEPS: 0
# --- Mixup Configuration ---
MIXUP:
ENABLED: True
ALPHA: 0.8 # Beta distribution alpha (e.g., 0.8 for standard mixup)
# --- CutMix Configuration ---
CUTMIX:
ENABLED: False # Disabled in this example
# --- General Settings ---
USE_GPU: True # Perform mixing on GPU (requires tensors on GPU)
# Note: Metadata chunk boundaries for "hard-pick" mixing are automatically
# derived from DATA.META.COMPONENTS configuration
Enabling Both Mixup and CutMix with Mixed-Pairs Sampler
DATA:
SAMPLER:
TYPE: 'grouped' # Required for mixing operations
GROUPED_MODE: 'mixed-pairs' # Each batch contains pairs from the same group, but different pairs can be from different groups
SCHEDULE:
MIX:
# --- Grouping Configuration ---
GROUP_LEVELS: ['taxa_L10'] # IMPORTANT: Only the first level is used. Must be lowest rank.
MIN_GROUP_SIZE: 4 # Groups smaller than this aren't mixed
EXCLUDE_NULL_SAMPLES: True # IMPORTANT: Keep True for hierarchical consistency
# --- Probability Scheduling ---
PROB:
ENABLED: True
START_PROB: 1.0 # Probability at step 0
END_PROB: 0.2 # Final probability
END_FRACTION: 0.5 # Reached at 50% of total steps
# --- Mixup Configuration ---
MIXUP:
ENABLED: True
ALPHA: 0.8 # Beta distribution alpha (e.g., 0.8 for standard mixup)
# --- CutMix Configuration ---
CUTMIX:
ENABLED: True
ALPHA: 1.0 # Beta distribution alpha for CutMix
MINMAX: [0.2, 0.8] # Optional: Min/max bounds for CutMix patch size
# --- Switching Between Mixup and CutMix ---
SWITCH_PROB: 0.5 # 50% chance of using CutMix when mixing is applied
# --- General Settings ---
USE_GPU: True # Perform mixing on GPU (requires tensors on GPU)
Remember that mixing operations (Mixup and CutMix) are only available when using the GroupedBatchSampler
(DATA.SAMPLER.TYPE: 'grouped'
). Setting DATA.SAMPLER.TYPE: 'standard'
will disable all mixing operations, regardless of the SCHEDULE.MIX
configuration.