Skip to content

Typus Models

This document describes the Pydantic models used in Typus for representing various data structures.

Classification Models

These DTOs are part of Typus' public contract. They are exported from typus/__init__.py, included in the generated schema surface, and consumed by downstream inference, UI, and reporting surfaces rather than re-defined in each consumer.

ClassificationResult

ClassificationResult is the canonical POL-980 v1.2.1 classification contract. It separates source identity, per-rank belief, score semantics, calibration provenance, decision policies, and policy-applied outcomes so model inference, curated taxon cards, synthetic demo caches, replayed caches, and test fixtures can share one envelope without pretending they mean the same thing.

Top-level fields:

  • schema_version: Literal["classification-result.v1"]
  • taxonomy_context: TaxonomyContext
  • provenance: ClassificationProvenance
  • input_context: ClassificationInputContext
  • consistency: ClassificationConsistency
  • ranks: list[RankBelief]
  • outcomes: list[DecisionOutcome] | None

Each RankBelief contains a discriminated candidates union:

  • TaxonCandidate for biological taxon scores.
  • RankNullCandidate for rank-level null or abstention mass.
  • ResidualBelowTaxonCandidate for "known to parent taxon, unresolved below" mass.

Every candidate carries score plus score_semantics. Probability-bearing scores are explicit (rank_softmax_probability, temperature_scaled_rank_probability, calibrated_rank_probability); authored cards, synthetic demo mass, conformal set membership, and display weights use different semantics so consumers do not silently treat every number as a calibrated probability.

outcomes are present exactly when provenance.decision_policies is non-empty. Validators enforce rank ordering, policy-id resolution, and CandidateRef resolution into the corresponding rank's candidates. Curated taxon cards must use authored_assertion_weight for every candidate.

Helper functions in typus.helpers.classification derive lineage/tree views and apply reference decision or calibration projections such as argmax, Chow thresholds, hierarchy repair, expected-utility cost-sensitive policy, and temperature scaling.

Cost-sensitive expected-utility policy

expected_utility_policy(result, costs) applies the POL-1304 hierarchy-aware Bayes-risk decision rule to probability-bearing RankBelief candidates. It appends DecisionPolicy(kind="cost_sensitive_policy"), records the selected cost profile in parameters["cost_matrix"], and emits terminal outcomes whose decision_score_semantics is policy_confidence.

The v0 helper is intentionally post-hoc: no training changes, no new runtime dependencies, and no live taxonomy service requirement. It greedily walks coarse-to-fine ranks, commits when expected utility exceeds abstention, and requires child-rank commits to remain descendants of the previously committed parent. The cost matrix separates three design levers:

  • specificity reward, with named shapes (linear, sqrt, log);
  • overclaim cost, scaled by committed rank depth;
  • wrong-branch cost, based on LCA rank-depth gap rather than raw graph edge count.

Typus ships three explicit v0 profiles:

  • v0_conservative: log specificity reward, higher overclaim and wrong-branch penalties.
  • v0_balanced: sqrt specificity reward, 5x overclaim default, and rank-depth LCA separation. This is the default.
  • v0_aggressive: linear specificity reward with lower penalties for smoke comparison and downstream tuning.

These defaults are calibration scaffolding, not a final ecological cost model. They are motivated by the current Insecta pilot distribution, where collapsed model-supported expert depths are mostly genus-safe, and should be tuned again against larger expert-calibrated data.

TaxonomyContext

Identifies the taxonomy source used to produce or interpret classification results.

  • source: str = "CoL2024": taxonomy source label.
  • version: str | None = None: optional source version or snapshot identifier.
  • root_taxon_ids: list[int] = []: optional root taxon IDs for this taxonomy slice.
  • null_taxon_ids_by_rank: dict[int, int] = {}: rank-specific synthetic null taxon IDs used for compatibility with older producers.

Deprecated legacy classification models

TaskPrediction and HierarchicalClassificationResult remain importable for a one-release migration window. They emit DeprecationWarning when constructed. HierarchicalClassificationResult.to_classification_result() converts the old per-rank prediction shape into a canonical raw ClassificationResult with source_kind = "model_inference", no calibration, no decision policies, and no outcomes.

TaskPrediction

Represents the deprecated top predictions for one taxonomic rank level.

  • rank_level: RankLevel: the rank this task predicts, such as species, genus, family, or another Typus rank level.
  • temperature: float: positive calibration temperature used for the task.
  • predictions: list[tuple[int, float]]: ordered (taxon_id, probability) pairs. Validation rejects values whose probabilities sum to more than 1.0, allowing a small 1e-6 tolerance for floating-point rounding.

HierarchicalClassificationResult

Bundles the taxonomy context with one or more rank-level prediction tasks.

  • taxonomy_context: TaxonomyContext: source context for the result.
  • tasks: list[TaskPrediction]: per-rank prediction outputs.
  • subtree_roots: set[int] | None = None: optional taxon IDs that constrained the candidate subtree for the classification run.

Example:

from typus import (
    HierarchicalClassificationResult,
    RankLevel,
    TaskPrediction,
    TaxonomyContext,
)

result = HierarchicalClassificationResult(
    taxonomy_context=TaxonomyContext(source="CoL2024", version="2024-12"),
    tasks=[
        TaskPrediction(
            rank_level=RankLevel.L10,
            temperature=1.0,
            predictions=[(123, 0.72), (456, 0.18)],
        )
    ],
    subtree_roots={789},
)

json_payload = result.to_json(indent=2)
canonical = result.to_classification_result()

Taxonomy summaries (v0.4.2+)

TaxonTrailNode

  • taxon_id: int
  • rank_level: RankLevel
  • scientific_name: str
  • vernacular_name: str | None

TaxonSummary

  • taxon_id, scientific_name, vernacular_name, rank_level
  • trail: list[TaxonTrailNode] ordered root → focal taxon
  • format_trail(separator=" → ", include_vernacular=True) convenience formatter for UI strings

PollinatorGroup (Enum)

Coarse groupings for high-level UI labels: Bee, Butterfly/Moth, Fly, Wasp, Beetle, Bird, Bat, Other.

Use pollinator_groups_for_ancestry(ancestry_ids) or the service helper pollinator_groups_for_taxon(taxon_id) to map a taxon's lineage to these buckets. The mapping is intentionally opinionated and not a substitute for detailed taxonomy.

Geometry Models

Canonical Geometry (v0.3.0+)

For new code, always use the canonical BBoxXYWHNorm type. See the Canonical Geometry documentation for full details.

The canonical bounding box format with enforced invariants:

  • Format: [x, y, width, height] (top-left origin)
  • Normalization: All values in [0, 1] range
  • Immutable: Cannot be modified after creation
  • Validated: Enforces coordinate bounds and non-finite rejection

Example:

from typus import BBoxXYWHNorm

# Canonical bbox covering 20%-70% horizontally, 10%-60% vertically
bbox = BBoxXYWHNorm(x=0.2, y=0.1, w=0.5, h=0.5)

# Convert to/from pixel coordinates
from typus import to_xyxy_px, from_xyxy_px

# To pixels (for 1920x1080 image)
x1, y1, x2, y2 = to_xyxy_px(bbox, W=1920, H=1080)
# Result: (384, 108, 1344, 648)

# From pixels
bbox_from_pixels = from_xyxy_px(384, 108, 1344, 648, W=1920, H=1080)

Legacy Geometry Types

The following types are maintained for backward compatibility but should not be used in new code.

BBoxFormat (Enum) - Legacy

Defines the format of a legacy bounding box's coordinates.

  • XYXY_REL: Relative coordinates representing [x_min, y_min, x_max, y_max], where values are fractions of image width/height.
  • XYXY_ABS: Absolute pixel coordinates representing [x_min, y_min, x_max, y_max].
  • CXCYWH_REL: Relative coordinates representing [center_x, center_y, width, height], where values are fractions of image width/height.
  • CXCYWH_ABS: Absolute pixel coordinates representing [center_x, center_y, width, height].

MaskEncoding (Enum)

Defines the encoding method for an instance mask.

  • RLE_COCO: Run-Length Encoding in COCO format. The data field in EncodedMask will be a string (or COCO RLE dict if pre-formatted).
  • POLYGON: Polygon representation as a list of [x,y] points. The data field in EncodedMask will be a List[List[float]].
  • PNG_BASE64: Base64 encoded PNG image representing the mask. The data field in EncodedMask will be a string.

BBox - Legacy

Legacy bounding box with multiple format support. Use BBoxXYWHNorm for new code.

  • coords: Tuple[float, float, float, float]: The coordinates of the bounding box.
  • fmt: BBoxFormat = BBoxFormat.XYXY_REL: The format of the coordinates. Defaults to relative XYXY.

EncodedMask

Represents an encoded instance mask.

  • data: str | List[List[float]]: The mask data, format depends on encoding.
  • encoding: MaskEncoding: The encoding method used for the mask.
  • bbox_hint: BBox | None = None: An optional bounding box associated with the mask, which can be useful as a hint for processing.

Example:

from typus.models.geometry import EncodedMask, MaskEncoding, BBox

# Polygon mask (list of [x,y] points)
polygon_mask = EncodedMask(
    data=[[10.0, 10.0], [50.0, 10.0], [50.0, 50.0], [10.0, 50.0]],
    encoding=MaskEncoding.POLYGON
)

# RLE mask (data is a string, details depend on COCO RLE specifics)
rle_mask = EncodedMask(
    data="someRLEString...",
    encoding=MaskEncoding.RLE_COCO,
    bbox_hint=BBox(coords=(10,10,50,50), fmt=BBoxFormat.XYXY_ABS)
)

Detection Models

These models are used to represent the output of object detection and instance segmentation systems.

InstancePrediction

Represents a single detected instance within an image.

  • instance_id: int: A unique identifier for the instance within the image (non-negative).
  • bbox: BBox: The bounding box of the detected instance.
  • mask: EncodedMask | None = None: The instance mask (optional).
  • score: float: The confidence score of the detection (between 0 and 1).
  • taxon_id: int | None = None: Optional taxonomic identifier for the instance.
  • classification: HierarchicalClassificationResult | None = None: Optional legacy hierarchical classification result for the instance. New classification producers should emit ClassificationResult directly while this field remains backward-compatible for older detection payloads.

Example:

from typus.models.geometry import BBox, BBoxFormat
from typus.models.detection import InstancePrediction

instance = InstancePrediction(
    instance_id=1,
    bbox=BBox(coords=(0.2, 0.3, 0.6, 0.7), fmt=BBoxFormat.XYXY_REL),
    score=0.92,
    taxon_id=1001
)

ImageDetectionResult

Represents the complete set of detections for a single image.

  • width: int: The width of the image in pixels.
  • height: int: The height of the image in pixels.
  • instances: List[InstancePrediction]: A list of all detected instances in the image.
  • taxonomy_context: TaxonomyContext | None = None: Optional context about the taxonomy used for classifications.

Example:

from typus.models.detection import ImageDetectionResult, InstancePrediction
from typus.models.geometry import BBox

img_result = ImageDetectionResult(
    width=1920,
    height=1080,
    instances=[
        InstancePrediction(
            instance_id=1,
            bbox=BBox(coords=(0.1, 0.1, 0.3, 0.3)),
            score=0.95,
            taxon_id=101
        ),
        InstancePrediction(
            instance_id=2,
            bbox=BBox(coords=(0.4, 0.4, 0.6, 0.6)),
            score=0.88,
            taxon_id=102
        )
    ]
)

# Serializing to JSON (uses CompactJsonMixin for camelCase and no Nones)
json_output = img_result.to_json(indent=2)
print(json_output)

Helper Utilities (typus.models.detection.utils)

to_coco()

Converts an ImageDetectionResult object into a COCO-style dictionary (primarily the "annotations" part).

  • image: ImageDetectionResult: The detection result to convert.
  • category_map: dict[int, int]: A mapping from Typus taxon_id to COCO category_id.

Example:

from typus.models.detection import ImageDetectionResult, InstancePrediction
from typus.models.geometry import BBox
from typus.models.detection.utils import to_coco

# (Assuming ImageDetectionResult 'img_result' is defined as above)
category_map = {101: 1, 102: 2} # typus taxon_id -> coco category_id
coco_annotations = to_coco(img_result, category_map)
# coco_annotations will be a dict like:
# {
#   "annotations": [
#     { "image_id": 0, "category_id": 1, "bbox": [...], "score": 0.95, "id": 1 },
#     { "image_id": 0, "category_id": 2, "bbox": [...], "score": 0.88, "id": 2 }
#   ]
# }

from_coco()

Converts a COCO-style JSON dictionary into a list of ImageDetectionResult objects.

  • coco: dict: The COCO JSON data (can contain information for multiple images).

Example:

from typus.models.detection.utils import from_coco

coco_json_data = {
    "images": [
        {"id": 1, "width": 1920, "height": 1080}
    ],
    "annotations": [
        {"image_id": 1, "id": 1, "category_id": 1, "bbox": [192, 108, 384, 216], "score": 0.95} # xywh_abs
    ],
    "categories": [{"id": 1, "name": "object"}]
}

typus_results = from_coco(coco_json_data)
if typus_results:
    img_result_from_coco = typus_results[0]
    print(f"Image dimensions: {img_result_from_coco.width}x{img_result_from_coco.height}")
    for instance in img_result_from_coco.instances:
        print(f"Instance {instance.instance_id}, BBox: {instance.bbox.coords}, Score: {instance.score}")

ExpandedTaxa ORM columns

Column Description
taxonID primary key
rankLevel numeric rank value
rank canonical rank name
name scientific name
taxonActive boolean active flag
commonName english common name
immediateAncestor_taxonID direct parent taxon ID
immediateAncestor_rankLevel rank level of the parent
immediateMajorAncestor_taxonID nearest major ancestor ID
immediateMajorAncestor_rankLevel rank level of that ancestor

Legacy trueParentID, majorParentID, path and ancestry columns were removed in v0.1.9.