Topic 9: TR‑Softmax and Policies
This topic describes ZeroProof’s TR‑safe softmax surrogate, design rationale, and policy options for extreme cases.
Why a TR‑Softmax Surrogate?
Classical softmax uses exp, which can overflow/underflow and propagate NaNs/Infs through autodiff. In ZeroProof, we build a softmax‑like function from rational TR operations to:
- Preserve totality (never throw),
- Avoid NaN/Inf propagation in REAL regions,
- Remain compatible with Mask‑REAL/Hybrid autodiff.
Implementation Sketch
- Shift by
max(logits)to reduce dynamic range (log‑sum‑exp trick). - Apply a monotone rational decay
r(z) = 1 / (1 + c t + d t²), wheret = max(0, −z)(implemented viat = (|z| − z)/2). - Normalize:
p_i = r_i / Σ_j r_jusing TR reductions (deterministic reductions optional via policy).
Code: zeroproof/layers/tr_softmax.py:1.
Policy: One‑Hot on +∞ (Optional)
By default, if a logit is +∞, the shift‑by‑max can yield non‑REAL tags in the surrogate. This is TR‑safe and deterministic, but sometimes you want a strict one‑hot in the presence of +∞.
Enable the policy toggle to force a one‑hot distribution at the first +∞ index (deterministic tie‑break):
from zeroproof.policy import TRPolicy, TRPolicyConfig
TRPolicyConfig.set_policy(TRPolicy(softmax_one_hot_infinity=True))
This preserves totality and avoids ambiguity when infinities appear in logits.
Testing and Examples
- Unit tests:
tests/unit/test_tr_softmax.pycover basic properties, shift invariance, extreme logits, and+∞/Φ cases (with/without policy). - Tiny classifier example:
examples/classification_demo.pydemonstrates training a 2‑class linear model usingtr_softmaxand TR autodiff.
Notes
- The surrogate uses only TR operations and remains differentiable in REAL regions. Mask‑REAL and Hybrid gradients apply as configured globally.
- For large products/sums in normalization, enable deterministic compensated reductions (policy flag) to reduce round‑off sensitivity.