Linnaeus Architecture Overview
This document explains the Linnaeus model architecture system, focusing on its component design, configuration approach, and extensibility patterns.
Core Design: Component Registry and Building Blocks
Linnaeus uses a two-tiered approach for organizing neural network components:
1. Registered Components
These are the primary swappable units selected via configuration:
- Models: Complete architectures like
mFormerV1
andmFormerV0
- Classification Heads: Output layers mapping features to predictions (e.g.,
LinearHead
,ConditionalClassifierHead
) - Registration: Registered with factory system via decorators (
@register_model
,@register_head
) - Selection: Chosen via configuration parameters (
MODEL.TYPE
,MODEL.CLASSIFICATION.HEADS.<task>.TYPE
)
2. Building Blocks
Fundamental, reusable modules that form the internal structures:
- Purpose: Encapsulate specific architectural components (attention blocks, MLP blocks, etc.)
- Examples:
RoPE2DMHSABlock
,ConvNeXtBlock
,Mlp
,DropPath
- Instantiation: Imported and used directly within model implementations
- Configuration: Parameters set within parent component's configuration section
Example Architecture: mFormerV1
mFormerV1 (Registered Model)
├── ConvNeXtBlock stages (Building Blocks)
│ ├── Configuration in MODEL.CONVNEXT_STAGES
│ └── Parameters: depths, dims, drop_path_rate, etc.
├── RoPE2DMHSABlock stages (Building Blocks)
│ ├── Configuration in MODEL.ROPE_STAGES
│ └── Parameters: depths, num_heads, window_size, etc.
└── Classification heads (Registered Components)
├── Configuration in MODEL.CLASSIFICATION.HEADS
└── One head per task, each with its own TYPE and parameters
Factory System (model_factory.py
)
The factory pattern enables runtime component selection and instantiation:
- Registration: Maps string identifiers to component classes
- Purpose: Creates requested component instances based on configuration
- Primary registries:
MODEL_REGISTRY
: Top-level model architecturesHEAD_REGISTRY
: Classification output headsATTENTION_REGISTRY
: Attention mechanisms (optional use)AGGREGATION_REGISTRY
: Feature aggregation methods (optional use)RESOLVER_REGISTRY
: Feature resolution strategies (optional use)
Configuration System (YACS)
Configuration files (*.yaml
) define model and experiment parameters:
MODEL:
TYPE: "mFormerV1"
EMBED_DIM: 192
META_DIMS: [16]
CONVNEXT_STAGES:
DEPTHS: [3, 3, 9, 3]
DIMS: [192, 384, 768, 1536]
ROPE_STAGES:
DEPTHS: [3, 3, 3]
NUM_HEADS: [6, 12, 24]
WINDOW_SIZE: 7
CLASSIFICATION:
HEADS:
task_taxonomy:
TYPE: "LinearHead"
NUM_CLASSES: 1000
Configuration handles:
- Component selection (TYPE
fields)
- Hyperparameter setting (dimensions, depths, learning rates)
- Training dynamics (schedules, optimizers, augmentations)
- Environment setup (paths, devices, output locations)
Model Extension Patterns
To create a new model architecture in Linnaeus:
-
Create the model class in
linnaeus/models/
directory: ```python @register_model class NewArchitecture(BaseModel): """New architecture implementation"""def init(self, cfg): super().init(cfg) # Build internal structure using Building Blocks # Configure based on cfg parameters
def forward(self, x, metadata=None): # Implement forward pass return output ```
-
Create configuration files in
configs/model/archs/NewArchitecture/
:yaml MODEL: TYPE: "NewArchitecture" # Architecture-specific parameters
-
Register with factory:
python from linnaeus.models.model_factory import register_model
Core Available Models
mFormerV0
- Hybrid CNN-Transformer architecture
- Implements MetaFormer paradigm with RelativeAttention
- 3 model sizes: Small (15M), Medium (35M), Large (55M)
mFormerV1
- Enhanced hybrid architecture with improved attention
- Implements 2D RoPE (Rotary Position Embedding)
- Flash Attention compatible for faster training
- 4 model sizes: Small (18M), Medium (38M), Large (65M), XLarge (120M)
Building Block Libraries
Linnaeus includes various implementation blocks:
- Attention Mechanisms:
RoPE2DMHSABlock
: 2D Rotary Position Embedding with Multi-Head Self AttentionRelativeMHSA
: Relative position bias attentionLinformerSelfAttention
: Linear attention for efficiency-
Other specialized attention variants
-
Convolution Blocks:
ConvNeXtBlock
: Modern CNN block with depthwise convolutions-
MBConv
: Mobile inverted bottleneck convolution -
Common Components:
Mlp
: Multi-layer perceptron with configurable activationDropPath
: Stochastic depth for regularizationProgressivePatchEmbed
: Progressive patch embedding
Customization Guidelines
When extending Linnaeus: - Create new Building Blocks for reusable architectural components - Register new top-level Models for major architecture changes - Register new Classification Heads for output transformations - Use configuration for experiment-level parameter tuning - Prefer code changes for deep architectural modifications