[ ] find unused params must be TRUE for hierarchical classifier heads (conditional, hierarchical softmax) if using GradNorm
Inference Testing Issues (June 2025):
[ ] Model registry mismatch: models registered as "mFormerV1" but inference configs expect "mFormerV1_sm" (size variants should be config-based, not separate registrations)
[ ] TaskPrediction schema changed - now requires temperature field (> 0) but tests weren't updated