Experiments & Reproducibility
This page lists the current supported reproduction entry points for v0.5.1. It separates auditable benchmark workflows from archived paper-era scripts and exploratory examples.
Quick Reproduction Path
The paper-facing replay contract lives in artifacts/paper_2026/. It records commands, configs, expected outputs, tolerance bands, checksums, and pinned container recipes.
From a clean checkout:
python -m pip install -r requirements-paper.txt
python -m pip install -e .
make reproduce-paper
Run one domain:
make reproduce-dose
make reproduce-rf
make reproduce-ik
make reproduce-reference-robotics
Override output root or device:
make reproduce-dose REPRO_ROOT=results/tmp_repro REPRO_DEVICE=cpu
For paper-grade reruns, retain:
- the git commit
artifacts/paper_2026/- the produced run directory
- run-local
manifest.jsonandprovenance.json
Scientific Claim Benchmarks
Run smoke mode locally:
python -m zeroproofml.benchmarks dose --mode smoke --device cpu --seeds 1
python -m zeroproofml.benchmarks rf --mode smoke --device cpu --seeds 1
python -m zeroproofml.benchmarks ik --mode smoke --device cpu --seeds 1
Resume a paper-mode run:
python -m zeroproofml.benchmarks dose --mode paper --device cpu \
--out-root results/benchmarks/dose/<run_dir> --resume
Regenerate a report:
python -m zeroproofml.report benchmark results/benchmarks/dose/<run_dir> --html-report
Expected output root:
results/benchmarks/<domain>/run_<timestamp>_<sha>/
Core artifacts:
seed_*/per_seed_result.jsonmanifest.jsonprovenance.jsonresume_state.jsonRUN_REPORT.md- optional
RUN_REPORT.html aggregated/summary.jsonaggregated/paired_stats.jsonaggregated/benchmark_metrics.jsonlCLAIM_AUDIT.md- regenerated figures under
figures/
Domain Notes
| Domain | Focus | Extra artifacts |
|---|---|---|
| DOSE | Censoring, operating points, direction-aware bottoms | dose_operating_points, dose_pareto_front, diagnostics, direction-head summaries |
| RF | Resonator response, peak retention, extrapolation | signal traces, frequency-response SVGs, qualitative failure packs |
| IK | Robotics inverse kinematics near singularities | workspace heatmaps, determinant-stratified metrics, fallback route plots |
Use generated artifacts for papers and reviews. Avoid hand-curated notebook summaries when a benchmark report can be regenerated from stored JSON.
Reference Robotics Deployment
End-to-end reference path:
python scripts/reference_robotics_deployment.py --device cpu --epochs 1 --n-samples 2000
Expected artifacts under results/reference_deploy_robotics/:
output_contract.jsoninference_summary.jsonstrict_inference_audit.jsonbundle/model.onnxbundle/metadata.jsonbundle/VALIDATION_REPORT.mdbundle/VALIDATION_REPORT.summary.json
Importable path:
from zeroproofml.reference_robotics_deployment import (
ReferenceRoboticsDeploymentConfig,
load_reference_robotics_deployment_artifacts,
run_reference_robotics_deployment,
)
artifacts = run_reference_robotics_deployment(
ReferenceRoboticsDeploymentConfig(device="cpu", epochs=1, n_samples=2000)
)
same_run = load_reference_robotics_deployment_artifacts(artifacts.out_root)
Trajectory Evaluation
Generate a stratified RR IK trajectory dataset:
python scripts/generate_reference_robotics_trajectory_data.py \
--n-trajectories 48 --steps-per-trajectory 16
Evaluate a policy:
from zeroproofml.reference_robotics_trajectory_eval import (
evaluate_reference_robotics_trajectory_policy,
make_reference_robotics_dls_policy,
)
summary = evaluate_reference_robotics_trajectory_policy(
"results/reference_robotics_trajectory_eval/rr_trajectory_eval_dataset.json",
make_reference_robotics_dls_policy(damping=0.05),
)
print(summary["aggregate"]["mean_tracking_error"])
The returned summaries include tracking error, fallback rates, joint-limit violations, chattering events, and latency-budget violations. Provenance-aware fallback splits are included when policies tag route kinds.
Downstream Pipeline Simulator
Use the downstream simulator to test whether reject flags, provenance labels, and direction labels survive multi-step handoffs:
from zeroproofml.downstream_pipeline import (
DownstreamPipelineReferenceSample,
build_downstream_pipeline_simulator,
compare_downstream_pipeline_strategies,
write_downstream_pipeline_report,
)
simulator = build_downstream_pipeline_simulator(
"5-step",
drop_reject_flag_probability=0.05,
bad_downstream_behaviors=("json_roundtrip", "aggregate_mean"),
)
comparison = compare_downstream_pipeline_strategies(
[
DownstreamPipelineReferenceSample(
decoded=(-3.0,),
should_reject=True,
provenance="semantic",
direction_label="below",
sample_id="censored_low",
),
],
simulator,
)
write_downstream_pipeline_report("artifacts/composability", result=comparison)
This is an experimental harness for composability evidence, not a stable core SCM API.
Examples Inventory
Recommended tutorial sequence:
examples/01_quickstart.pyexamples/02_rational_layer.pyexamples/03_projective_mode.pyexamples/05_coverage_control.pyexamples/06_export_bundle.pyexamples/fru_strict_check_demo.py
Supported reference examples include bridge, autodiff, optimization, C++ bundle consumer, deployment workflows, and the 2R arm example.
Benchmark-helper directories include domain-specific example data and compatibility wrappers used by the scientific harness.
Archive or experimental paths include legacy Transreal-era scripts and non-promoted robotics side paths such as older 3R/6R examples.
Comparing Runs
Compare a new run with one or more baselines:
from zeroproofml.benchmarks import compare_benchmark_runs, load_benchmark_run
run = load_benchmark_run("results/benchmarks/dose/<new_run>")
comparison = compare_benchmark_runs(
run,
["results/benchmarks/dose/<baseline_run>"],
)
print(comparison.to_dict())
Each run records git commit, package versions, arguments, dataset fingerprints, hardware metadata, checkpoint hashes, discovered bundle directories, and dirty-worktree state. Resumed runs preserve attempt history.
Archived Workflows
Older paper-era scripts remain in the repository for historical reference, but current docs and reproducibility claims should use the benchmark harness, reference deployment, and paper bundle above.