Serving with LitServe

LitServe provides a lightweight way to expose the LinnaeusInferenceHandler (which can be loaded from a local inference bundle or a Hugging Face model identifier) as a REST API. After installing litserve, create a small server script that loads the handler and registers its predict method.

# server.py
import litserve as ls
from linnaeus.inference.handler import LinnaeusInferenceHandler

# Option 1: Load from a local inference bundle
# handler = LinnaeusInferenceHandler.load_from_artifacts(
#     config_file_path="path/to/your_model_bundle/inference_config.yaml"
# )

# Option 2: Load from Hugging Face Hub (conceptual - adapt to actual API)
# This assumes the handler can be loaded directly or you have a helper function.
# See 'running_inference_with_pretrained_models.md' for more detailed loading.
# For example, if Linnaeu[sInferenceHandler has a method like 'load_from_hf_hub':
# handler = LinnaeusInferenceHandler.load_from_hf_hub(
#     hf_model_id="polli-caleb/linnaeus-aves-mformerV1_sm-v1"
# )
# For this example, we'll assume the handler is already loaded, e.g., from a local bundle:
handler = LinnaeusInferenceHandler.load_from_artifacts(
     config_file_path="path/to/your_model_bundle/inference_config.yaml" # Replace this
)
# Ensure 'handler' is correctly initialized using one of the methods above
# before passing to LitServer.

if handler is None:
    raise ValueError("LinnaeusInferenceHandler could not be loaded. Please check configuration.")

app = ls.LitServer()
app.add_route("/predict", handler.predict, methods=["POST"])
app.add_route("/info", lambda: handler.info(), methods=["GET"])

if __name__ == "__main__":
    app.run()

By default LitServe will batch concurrent requests and place the model on the best available device (CPU, CUDA or MPS). The /info route returns the metadata produced by handler.info() so clients can discover the model’s expected inputs and taxonomy details.