Linnaeus Architecture Overview
This document explains the Linnaeus model architecture system, focusing on its component design, configuration approach, and extensibility patterns.
Core Design: Component Registry and Building Blocks
Linnaeus uses a two-tiered approach for organizing neural network components:
1. Registered Components
These are the primary swappable units selected via configuration:
- Models: Complete architectures like
mFormerV1andmFormerV0 - Classification Heads: Output layers mapping features to predictions (e.g.,
LinearHead,ConditionalClassifierHead) - Registration: Registered with factory system via decorators (
@register_model,@register_head) - Selection: Chosen via configuration parameters (
MODEL.TYPE,MODEL.CLASSIFICATION.HEADS.<task>.TYPE)
2. Building Blocks
Fundamental, reusable modules that form the internal structures:
- Purpose: Encapsulate specific architectural components (attention blocks, MLP blocks, etc.)
- Examples:
RoPE2DMHSABlock,ConvNeXtBlock,Mlp,DropPath - Instantiation: Imported and used directly within model implementations
- Configuration: Parameters set within parent component's configuration section
Example Architecture: mFormerV1
mFormerV1 (Registered Model)
├── ConvNeXtBlock stages (Building Blocks)
│ ├── Configuration in MODEL.CONVNEXT_STAGES
│ └── Parameters: depths, dims, drop_path_rate, etc.
├── RoPE2DMHSABlock stages (Building Blocks)
│ ├── Configuration in MODEL.ROPE_STAGES
│ └── Parameters: depths, num_heads, window_size, etc.
└── Classification heads (Registered Components)
├── Configuration in MODEL.CLASSIFICATION.HEADS
└── One head per task, each with its own TYPE and parameters
Factory System (model_factory.py)
The factory pattern enables runtime component selection and instantiation:
- Registration: Maps string identifiers to component classes
- Purpose: Creates requested component instances based on configuration
- Primary registries:
MODEL_REGISTRY: Top-level model architecturesHEAD_REGISTRY: Classification output headsATTENTION_REGISTRY: Attention mechanisms (optional use)AGGREGATION_REGISTRY: Feature aggregation methods (optional use)RESOLVER_REGISTRY: Feature resolution strategies (optional use)
Configuration System (YACS)
Configuration files (*.yaml) define model and experiment parameters:
MODEL:
TYPE: "mFormerV1"
EMBED_DIM: 192
META_DIMS: [16]
CONVNEXT_STAGES:
DEPTHS: [3, 3, 9, 3]
DIMS: [192, 384, 768, 1536]
ROPE_STAGES:
DEPTHS: [3, 3, 3]
NUM_HEADS: [6, 12, 24]
WINDOW_SIZE: 7
CLASSIFICATION:
HEADS:
task_taxonomy:
TYPE: "LinearHead"
NUM_CLASSES: 1000
Configuration handles:
- Component selection (TYPE fields)
- Hyperparameter setting (dimensions, depths, learning rates)
- Training dynamics (schedules, optimizers, augmentations)
- Environment setup (paths, devices, output locations)
Model Extension Patterns
To create a new model architecture in Linnaeus:
-
Create the model class in
linnaeus/models/directory: ```python @register_model class NewArchitecture(BaseModel): """New architecture implementation"""def init(self, cfg): super().init(cfg) # Build internal structure using Building Blocks # Configure based on cfg parameters
def forward(self, x, metadata=None): # Implement forward pass return output ```
-
Create configuration files in
configs/model/archs/NewArchitecture/:yaml MODEL: TYPE: "NewArchitecture" # Architecture-specific parameters -
Register with factory:
python from linnaeus.models.model_factory import register_model
Core Available Models
mFormerV0
- Hybrid CNN-Transformer architecture
- Implements MetaFormer paradigm with RelativeAttention
- 3 model sizes: Small (15M), Medium (35M), Large (55M)
mFormerV1
- Enhanced hybrid architecture with improved attention
- Implements 2D RoPE (Rotary Position Embedding)
- Flash Attention compatible for faster training
- 4 model sizes: Small (18M), Medium (38M), Large (65M), XLarge (120M)
Building Block Libraries
Linnaeus includes various implementation blocks:
- Attention Mechanisms:
RoPE2DMHSABlock: 2D Rotary Position Embedding with Multi-Head Self AttentionRelativeMHSA: Relative position bias attentionLinformerSelfAttention: Linear attention for efficiency-
Other specialized attention variants
-
Convolution Blocks:
ConvNeXtBlock: Modern CNN block with depthwise convolutions-
MBConv: Mobile inverted bottleneck convolution -
Common Components:
Mlp: Multi-layer perceptron with configurable activationDropPath: Stochastic depth for regularizationProgressivePatchEmbed: Progressive patch embedding
Customization Guidelines
When extending Linnaeus: - Create new Building Blocks for reusable architectural components - Register new top-level Models for major architecture changes - Register new Classification Heads for output transformations - Use configuration for experiment-level parameter tuning - Prefer code changes for deep architectural modifications