Docs

Inference & Deployment

Inference in v0.5.1 is strict SCM inference. There are no stochastic training thresholds, no gradient policies, and no hidden guard branches. A deployment receives numeric payloads plus masks and must treat those masks as the contract.

Stable Output Contract

Strict inference returns:

decoded, bottom_mask, gap_mask = result
Field Meaning Deployment use
decoded Finite decoded payload when accepted; commonly NaN for bottom samples Feed accepted samples to downstream code
bottom_mask Authoritative fail-closed mask Reject, abstain, or route to fallback
gap_mask Finite samples where `tau_infer <= Q

bottom_mask and gap_mask should be disjoint. Do not treat gap_mask as bottom; it is a warning that a finite prediction is close to the training/inference boundary.

Strict Decode

from zeroproofml.inference import InferenceConfig, strict_inference

cfg = InferenceConfig(tau_infer=1e-6, tau_train=1e-4)
decoded, bottom_mask, gap_mask = strict_inference(P, Q, config=cfg)

Runtime rule:

|Q| < tau_infer -> bottom_mask=True
tau_infer <= |Q| < tau_train -> gap_mask=True

If tau_train is not known or not relevant, omit it and monitor only bottoms.

Choosing tau_infer

Pick tau_infer from held-out denominator data, not from a training default.

from zeroproofml.metrics import tau_infer_sweep_from_q_abs, write_tau_infer_sweep

curves = tau_infer_sweep_from_q_abs(
    q_abs=q_abs_held_out,
    is_in_range=is_in_range,
    taus=[1e-6, 3e-6, 1e-5, 3e-5, 1e-4],
)
write_tau_infer_sweep(
    "results/tau_calibration",
    curves,
    provenance={"split": "held_out"},
)

Selection trade-off:

  • Larger tau_infer: fewer unsafe finite accepts near singularities, more rejection.
  • Smaller tau_infer: higher coverage, more risk near the denominator boundary.

Freeze the chosen value in InferenceConfig and ship it inside bundle metadata.

Wrapping A Model

from zeroproofml.inference import InferenceConfig, SCMInferenceWrapper

wrapped = SCMInferenceWrapper(
    model,
    config=InferenceConfig(tau_infer=1e-6, tau_train=1e-4),
).eval()

decoded, bottom_mask, gap_mask = wrapped(x)

In training mode, wrappers can pass projective outputs through for loss computation. In eval mode, they decode strictly.

ONNX Bundles

ONNX bundles are the preferred deployment handoff artifact:

from zeroproofml.inference import (
    export_bundle,
    load_onnx_runtime_bundle,
    run_bundle_reference_smoke_test,
    validate_bundle,
)

export_bundle(wrapped, "bundle_dir", example_inputs=(x_example,))
validate_bundle("bundle_dir")

runtime = load_onnx_runtime_bundle(
    "bundle_dir",
    providers=["CPUExecutionProvider"],
)
decoded, bottom_mask, gap_mask = runtime.run(x_numpy)

run_bundle_reference_smoke_test("bundle_dir", wrapped, (x_smoke,))

A stable bundle contains:

  • model.onnx
  • metadata.json

Common report artifacts beside the bundle:

  • VALIDATION_REPORT.md
  • VALIDATION_REPORT.summary.json
  • VALIDATION_REPORT.summary.svg

metadata.json records the schema version, tau_infer, optional tau_train, input/output signatures, batch-axis semantics, mask semantics, and the strict output order:

decoded, bottom_mask, gap_mask

TorchScript support remains a legacy compatibility path. Prefer ONNX for new deployments.

Fallback Patterns

The simplest deployment action is reject-on-bottom:

from zeroproofml.inference import reject_on_bottom

decoded_safe, accept_mask = reject_on_bottom(decoded, bottom_mask)

For conservative systems, reject the gap region too:

from zeroproofml.inference import reject_on_gap

decoded_safe, accept_mask = reject_on_gap(decoded, bottom_mask, gap_mask)

For robotics or control systems, route bottom or risky samples to an analytic solver:

from zeroproofml.inference import route_to_analytic_solver

resolved, route_mask = route_to_analytic_solver(
    decoded,
    bottom_mask=bottom_mask,
    analytic_solver=solver,
    inputs=x,
)

Keep the mask and the selected route in logs. Silent replacement of rejected values makes audits much harder.

Monitoring

from zeroproofml.inference import StrictInferenceMonitor
from zeroproofml.utils.logging import JsonlLogger

events = JsonlLogger("strict_inference_events.jsonl")
monitor = StrictInferenceMonitor(
    bundle_id="robotics_rr_ik_v1",
    event_logger=events,
    acceptance_rate_drift_threshold=0.1,
)

monitor.update(bottom_mask, gap_mask)
rates = monitor.rates()
state = monitor.export_state(
    include_histograms=True,
    include_batch_summaries=True,
)

Monitor at least:

  • bottom rate
  • gap rate
  • acceptance rate
  • route/fallback rate
  • denominator minimums when available
  • drift against calibration or validation rates

Optional Provenance Diagnostics

The stable output contract is still (decoded, bottom_mask, gap_mask). v0.5.1 also has opt-in experimental diagnostics that split bottom decisions into richer explanations.

from zeroproofml.inference import InferenceConfig, strict_inference

cfg = InferenceConfig(
    tau_infer=1e-6,
    tau_train=1e-4,
    provenance="experimental",
    provenance_representation="split_masks",
)

result = strict_inference(P, Q, config=cfg)
decoded, bottom_mask, gap_mask = result

fault_mask = getattr(result, "fault_mask", None)
semantic_bottom_mask = getattr(result, "semantic_bottom_mask", None)
bottom_provenance = getattr(result, "bottom_provenance", None)

Rules for consumers:

  • Depend only on decoded, bottom_mask, and gap_mask for stable deployment.
  • Treat provenance fields as optional diagnostics.
  • Keep working when provenance fields are absent.
  • Do not change stable ONNX output names to expose experimental provenance.

The Q2 decision for v0.5.1 keeps provenance experimental until a review artifact and rerun show a clear downstream win without breaking the stable mask contract.

Direction-Aware Censoring

For three-way censoring problems, combine strict bottom gating with an optional direction head:

from zeroproofml.inference import decode_strict_censored_3way

decoded, bottom_mask, class_id = decode_strict_censored_3way(
    P.squeeze(-1),
    Q.squeeze(-1),
    tau_infer=1e-6,
    direction_logits=direction_logits,
)

Use this when a bottom output should still indicate below-limit versus above-limit, or a related task-specific direction.

Reference Deployment

The reference robotics deployment runs train -> bundle -> strict inference -> fallback -> report:

python scripts/reference_robotics_deployment.py --device cpu --epochs 2 --n-samples 6000

It writes a self-contained directory under results/reference_deploy_robotics/ with the ONNX bundle, validation report, inference summary, strict-inference audit, and output contract.

The same path is importable:

from zeroproofml.reference_robotics_deployment import (
    ReferenceRoboticsDeploymentConfig,
    run_reference_robotics_deployment,
)

artifacts = run_reference_robotics_deployment(
    ReferenceRoboticsDeploymentConfig(device="cpu", epochs=2, n_samples=6000)
)
print(artifacts.bundle_model_path)

Deployment Checklist

  • Freeze InferenceConfig from held-out calibration data.
  • Export an ONNX bundle from the eval wrapper.
  • Run validate_bundle(...).
  • Run run_bundle_reference_smoke_test(...) against saved smoke inputs.
  • Regenerate the validation report with python -m zeroproofml.report bundle <bundle_dir>.
  • Confirm downstream consumers use bottom_mask and gap_mask explicitly.
  • Log route/fallback actions and acceptance-rate drift in production.