Topic 3: Autodiff Modes (Mask‑REAL, Saturating, Hybrid)

This topic explains ZeroProof’s gradient semantics and how to select between Mask‑REAL, Saturating, and Hybrid modes, with code pointers and usage examples.

Goals

Preserve classical gradients on REAL paths.
Remain total and stable when forwards produce non‑REAL tags (±∞, Φ).
Provide bounded alternatives near poles without ε hacks.

Modes at a Glance

Mask‑REAL (default): If a forward node tag is non‑REAL, send zero gradient to its inputs/params.
Saturating: Replace singular growth terms with bounded, TR‑safe forms near poles.
Hybrid: Scheduler picks Saturating near poles and Mask‑REAL elsewhere; tracks q‑statistics for decisions.

APIs

Mode config/context: zeroproof/autodiff/grad_mode.py:1
Hybrid schedule/context: zeroproof/autodiff/hybrid_gradient.py:1
Backward pass rules: zeroproof/autodiff/backward.py:1
Tape and lifted ops: zeroproof/autodiff/gradient_tape.py:1, zeroproof/autodiff/tr_ops_grad.py:1

Mask‑REAL (Default)

Behavior

Rule: If node.forward.tag ∈ {PINF, NINF, PHI}, gradients to inputs/params are exactly zero.
Chain effect: Any path that encounters a non‑REAL node contributes zero to the Jacobian.

Code

Check: backward.py: MASK_REAL branch in backward_pass and compute_input_gradients.
Using default: no action required; GradientModeConfig defaults to MASK_REAL.

Usage

from zeroproof.autodiff import grad_mode
from zeroproof.autodiff.tr_node import TRNode
from zeroproof.core import real
from zeroproof.autodiff.tr_ops_grad import tr_div

x = TRNode.parameter(real(0.0))
y = tr_div(real(1.0), x)   # forward tag = PINF
# Backprop uses Mask‑REAL → grad wrt x is 0
y.backward()
print(x.gradient)  # REAL 0.0

Saturating Mode

Behavior

Bounded gradients near poles; replaces singular terms like 1/Q² with a capped TR‑safe form.
Keeps continuity of gradient magnitudes across pole neighborhoods.

Code

Enable: GradientModeConfig.set_mode(GradientMode.SATURATING) in grad_mode.py:1.
Div/log/sqrt/pow grad handlers call saturating_ops in backward.py:200.

Usage

from zeroproof.autodiff.grad_mode import GradientMode, GradientModeConfig
from zeroproof.autodiff.tr_ops_grad import tr_div
from zeroproof.autodiff.tr_node import TRNode
from zeroproof.core import real

GradientModeConfig.set_mode(GradientMode.SATURATING)
GradientModeConfig.set_saturation_bound(1.0)

x = TRNode.parameter(real(1e-12))
y = tr_div(real(1.0), x)
y.backward()
print(x.gradient)  # finite, bounded by saturation rule

Hybrid Mode

Intuition

Early/far from poles: use Mask‑REAL’s stability guardrails.
Near poles: switch to Saturating to allow informative gradients.
The schedule controls a threshold δ(t) and tracks q‑statistics.

Code

Schedule and context: hybrid_gradient.py:1 (HybridGradientSchedule, HybridGradientContext).
Decision point: backward.py: DIV case → HybridGradientContext.should_use_saturating.
Mode flag: GradientMode.HYBRID in grad_mode.py:1.

Common Setup

from zeroproof.autodiff.hybrid_gradient import create_default_schedule
from zeroproof.autodiff.grad_mode import GradientMode, GradientModeConfig
from zeroproof.autodiff.tr_ops_grad import tr_div
from zeroproof.autodiff.tr_node import TRNode
from zeroproof.core import real

schedule = create_default_schedule(aggressive=False, warmup_epochs=5, force_exploration=True)
for epoch in range(50):
    with schedule.apply(epoch):
        x = TRNode.parameter(real(1e-6))
        y = tr_div(real(1.0), x)
        y.backward()
        # Inspect context stats per batch/epoch
        from zeroproof.autodiff.hybrid_gradient import HybridGradientContext
        stats = HybridGradientContext.get_statistics()
        # stats includes q_min_batch, near_pole_ratio, saturating_ratio, etc.

Schedule Features

Warmup and transition epochs with linear/exponential/cosine decay.
Adaptive δ: Increase tolerance when batch q_min is very small.
Forced exploration: Schedule regions around detected poles for saturating activations.
Stats: q_min (batch/epoch), near_pole ratio, saturating vs mask‑real counts.

Notes

No ε in arithmetic: δ is a scheduler parameter, not part of core TR ops.
Deterministic decisions per backend given identical inputs and seeding.

Tape and Lifted Ops

Use gradient_tape() for functional differentiation and tr_ops_grad for graph building.
TRNode.parameter(...) marks trainable nodes; TRNode.constant(...) for constants.

Example: Value and Gradient

from zeroproof.autodiff.grad_funcs import tr_value_and_grad
from zeroproof.autodiff.tr_node import TRNode
from zeroproof.core import real
from zeroproof.autodiff.tr_ops_grad import tr_mul, tr_add

# f(x) = x*x + 2x + 1
def f(x):
    return tr_add(tr_add(tr_mul(x, x), tr_mul(real(2.0), x)), real(1.0))

val_and_grad = tr_value_and_grad(f)
x = TRNode.parameter(real(3.0))
val, grad = val_and_grad(x)
# val is REAL 16, grad is REAL 8

Gradient Checking Utilities

For finite‑difference checks on TR functions, use zeroproof/autodiff/grad_funcs.py:
- check_gradient(func, x, eps=1e-5) computes analytical vs numerical grad and relative error on REAL paths.
- tr_grad and tr_value_and_grad lift scalar functions into gradient evaluators for quick tests.
See tests under tests/unit/test_tr_autodiff.py and the robotics multi‑input gradcheck at tests/unit/test_robotics_gradcheck.py for usage patterns.

Practical Guidance

Default to Mask‑REAL for safety; enable Hybrid for tasks requiring pole learning.
Start with small warmup_epochs and moderate δ range (e.g., 1e‑2 → 1e‑6).
Use stats to tune saturating_bound and schedule aggressiveness.
For deterministic runs, fix seeds and backend deterministic flags (see spec repro checklist in complete_v2.md:1).

Cross‑References

Conceptual background: concept_250908.md:200 (Hybrid Schedule section)
Mask‑REAL guide: docs/autodiff_mask_real.md:1
Saturating guide: docs/saturating_grad_guide.md:1

Docs

Topic 3: Autodiff Modes (Mask‑REAL, Saturating, Hybrid)

Goals

Modes at a Glance

Mask‑REAL (Default)

Saturating Mode

Hybrid Mode

Tape and Lifted Ops

Gradient Checking Utilities

Practical Guidance

Cross‑References