expanded_taxa Table Documentation
Overview
The expanded_taxa table is a denormalized view that transforms iNaturalist's compact taxonomy representation into an expanded format optimized for efficient querying, filtering, and analysis. It "unpacks" the ancestry string from the original taxa table into discrete columns for each taxonomic rank level, enabling fast clade-based filtering and ancestor lookups without recursive string parsing.
Purpose and Design Philosophy
The table serves as the foundation for taxonomic data processing and export operations, providing: - Efficient clade filtering through indexed rank-level columns - Direct parent access via immediate ancestor columns - Fast taxonomic traversal without recursive queries - Common name integration from Catalog of Life Data Package (ColDP)
Core Table Schema
Primary Identification Columns
| Column | Type | Description |
|---|---|---|
| taxonID | integer | Primary key; unique taxon identifier from iNaturalist |
| rankLevel | double precision | Numeric indicator of the taxonomic rank (lower = higher in hierarchy) |
| rank | varchar(255) | Human-readable taxonomic rank label (e.g., "species", "genus") |
| name | varchar(255) | Scientific name of the taxon |
| taxonActive | boolean | Indicates whether the taxon is currently active in iNaturalist's taxonomy |
| commonName | varchar(255) | Primary common name (populated from ColDP integration) |
Immediate Ancestor Columns
These columns provide direct access to parent taxa in the taxonomic hierarchy, eliminating the need for recursive queries:
| Column | Type | Description |
|---|---|---|
| immediateMajorAncestor_taxonID | integer | Taxon ID of the immediate major ancestor (next major rank up) |
| immediateMajorAncestor_rankLevel | double precision | Rank level of the immediate major ancestor |
| immediateAncestor_taxonID | integer | Direct parent taxon ID in the taxonomy |
| immediateAncestor_rankLevel | double precision | Rank level of the direct parent |
Expanded Taxonomic Hierarchy Columns
For each standard taxonomic rank level, three columns are provided:
- L{level}_taxonID - Taxon ID at that rank level
- L{level}_name - Scientific name at that rank level
- L{level}_commonName - Common name at that rank level (from ColDP)
Complete Rank Level Mapping
| Level | Standard Rank | Column Prefix | Major Rank |
|---|---|---|---|
| 5 | subspecies | L5_ | No |
| 10 | species | L10_ | Yes |
| 11 | species group | L11_ | No |
| 12 | species subgroup | L12_ | No |
| 13 | species complex | L13_ | No |
| 15 | hybrid | L15_ | No |
| 20 | genus | L20_ | Yes |
| 24 | subgenus | L24_ | No |
| 25 | section | L25_ | No |
| 26 | subsection | L26_ | No |
| 27 | series | L27_ | No |
| 30 | tribe | L30_ | No |
| 32 | subtribe | L32_ | No |
| 33 | supertribe | L33_ | No |
| 33.5 | subfamily | L33_5_ | No |
| 34 | family | L34_ | Yes |
| 34.5 | epifamily | L34_5_ | No |
| 35 | superfamily | L35_ | No |
| 37 | infraorder | L37_ | No |
| 40 | order | L40_ | Yes |
| 43 | suborder | L43_ | No |
| 44 | infraorder | L44_ | No |
| 45 | superorder | L45_ | No |
| 47 | infraclass | L47_ | No |
| 50 | class | L50_ | Yes |
| 53 | subclass | L53_ | No |
| 57 | superclass | L57_ | No |
| 60 | subphylum | L60_ | No |
| 67 | phylum | L67_ | Yes |
| 70 | kingdom | L70_ | Yes |
Note: Levels 33.5 and 34.5 use underscore notation (L33_5_, L34_5_) in column names.
Indexing Strategy
The table employs strategic indexing for optimal query performance:
Primary and Core Indexes
- Primary key on
taxonID - B-tree index on
namefor scientific name lookups - B-tree index on
rankLevelfor rank-based filtering
Taxonomic Level Indexes (for clade filtering)
idx_expanded_taxa_l10_taxonid- Species-level filteringidx_expanded_taxa_l20_taxonid- Genus-level filteringidx_expanded_taxa_l30_taxonid- Tribe-level filteringidx_expanded_taxa_l40_taxonid- Order-level filteringidx_expanded_taxa_l50_taxonid- Class-level filteringidx_expanded_taxa_l60_taxonid- Subphylum-level filteringidx_expanded_taxa_l70_taxonid- Kingdom-level filtering
Immediate Ancestor Indexes
idx_immediate_ancestor_taxon_id- For parent lookupsidx_immediate_major_ancestor_taxon_id- For major ancestor traversal
Common Use Cases
Species-Level Filtering
-- Get all records for a specific species
SELECT * FROM expanded_taxa
WHERE L10_taxonID = 47219; -- Apis mellifera
Clade-Based Filtering
-- Get all bees (Anthophila epifamily)
SELECT * FROM expanded_taxa
WHERE L34_5_taxonID = 630955;
-- Get all birds (Aves class)
SELECT * FROM expanded_taxa
WHERE L50_taxonID = 3;
Ancestor Traversal
-- Find immediate parent
SELECT parent.*
FROM expanded_taxa child
JOIN expanded_taxa parent ON child.immediateAncestor_taxonID = parent.taxonID
WHERE child.taxonID = 47219;
Taxonomic Hierarchy Display
-- Show full taxonomic hierarchy for a species
SELECT
name as species_name,
L20_name as genus,
L34_name as family,
L40_name as order,
L50_name as class,
L67_name as phylum,
L70_name as kingdom
FROM expanded_taxa
WHERE taxonID = 47219;
Finding Taxa with Common Names
-- Find all taxa with common names at species level
SELECT taxonID, name, commonName
FROM expanded_taxa
WHERE rankLevel = 10
AND commonName IS NOT NULL;
Performance Considerations
- Use Indexed Columns: Always filter on indexed
LXX_taxonIDcolumns for clade queries - Avoid String Operations: Use numeric
taxonIDcomparisons rather than name matching when possible - Leverage Immediate Ancestors: Use the immediate ancestor columns for parent lookups instead of joins
- Batch Operations: For large updates, use appropriate transaction boundaries
Important Notes on Ancestry Column
The ancestry column has been deprecated in production databases in favor of:
- Immediate ancestor columns for direct parent relationships
- Expanded LXX_taxonID columns for efficient hierarchical queries
Modern implementations should NOT rely on the ancestry column. The typus library handles both legacy (with ancestry) and modern (without ancestry) database schemas transparently.
Integration with Typus Library
The typus library provides async taxonomy services that work with this table structure:
from typus import PostgresTaxonomyService
# Connect to a database with expanded_taxa
service = PostgresTaxonomyService("postgresql+asyncpg://user:pass@host/db")
# Get a taxon - works with or without ancestry column
taxon = await service.get_taxon(47219)
# Traverse hierarchy using parent relationships
async for child in service.children(630955, depth=2):
print(f"{child.scientific_name} (rank: {child.rank_level})")
Data Sources and Updates
- Primary Source: iNaturalist taxonomy exports
- Common Names: Catalog of Life Data Package (ColDP) via mapping tables
- Update Frequency: Synchronized with iNaturalist data releases
- Generation Process: Created via expand_taxa.sh script during ingestion pipeline
Version History
- v0r1: Added immediate ancestor columns, complete ColDP integration
- v0r0: Basic expanded taxonomy without common names or immediate ancestors
For implementation details and operational procedures, see the ingestion and ColDP integration documentation.