Training Guide

This guide covers training neural networks with SCM semantics, including projective learning, gradient policies, and specialized loss functions.

Projective Learning
Gradient Policies
Loss Functions
Training Loop
Adaptive Loss Strategies

Projective Learning

When to Use Projective Mode

Projective learning lifts rational subgraphs to homogeneous tuples ⟨N,D⟩, allowing training on a smooth manifold while preserving strict SCM semantics at inference.

Use projective mode when:

Training rational heads that should avoid instantiating ⊥ during optimization
Gradient dead zones around Q ≈ 0 hurt convergence
Safety-critical outputs where distinguishing +∞ vs −∞ matters
Need smooth gradients through potential singularities

Skip projective mode when:

Working with simple SCM operations
Singularities are rare in training data
Model architecture doesn't have rational bottlenecks

How It Works

Encoding:

φ(x) = ⟨x, 1⟩     for finite values
φ(⊥) = ⟨1, 0⟩     for bottom

Decoding:

φ⁻¹(N, D) = N/D   when |D| ≥ τ_infer
φ⁻¹(N, D) = ⊥     when |D| < τ_infer

Detached Renormalization:

To prevent overflow without altering the represented value:

S = sg(√(N² + D²) + γ)
(N', D') ← (N/S, D/S)

The stop_gradient (sg) operator ensures the optimizer learns the direction of the tuple, not its magnitude. This creates "ghost gradients" that flow smoothly even when D → 0.

Integration Steps

Lift targets to projective tuples:

from zeroproof.training.targets import lift_targets

# Finite targets: y_finite → ⟨y_finite, 1⟩
# Infinite targets: ±inf → ⟨±1, 0⟩
targets_n, targets_d = lift_targets(y_true)

Use PROJECT gradient policy in projective regions:

from zeroproof.autodiff.policies import GradientPolicy, gradient_policy

with gradient_policy(GradientPolicy.PROJECT):
    loss.backward()

Combine specialized losses (see Loss Functions section)
Decode at boundaries and monitor coverage

Gap Region

Training uses stochastic thresholds (τ_train_min, τ_train_max) to avoid learning a brittle boundary. Inference uses a fixed τ_infer.

When τ_train > τ_infer, the interval [τ_infer, τ_train) is the gap region where inference returns a finite value but the denominator is numerically risky.

from zeroproof.inference import strict_inference, InferenceConfig

decoded, bottom_mask, gap_mask = strict_inference(
    N, D,
    config=InferenceConfig(tau_infer=1e-6, tau_train=1e-4)
)

Monitor gap_mask.sum() to track how often predictions fall in this uncertain zone.

Gradient Policies

Gradient policies control how backpropagation interacts with ⊥. Available in zeroproof.autodiff.policies.

Policy Options

Policy	Behavior	Use When
CLAMP	Zeroes gradients on ⊥ paths; clamps finite gradients to [-1, 1]	Default for SCM-only graphs
PROJECT	Masks gradients when forward value is ⊥	Projective heads, points at infinity
REJECT	Always zero gradient	Learning through coverage/rejection losses only
PASSTHROUGH	Gradients propagate through ⊥	Debugging only

Usage

from zeroproof.autodiff.policies import GradientPolicy, gradient_policy

# Global policy
with gradient_policy(GradientPolicy.PROJECT):
    outputs = model(inputs)
    loss = loss_fn(outputs, targets)
    loss.backward()

# Per-layer policy (advanced)
from zeroproof.autodiff.policies import register_policy
register_policy(my_rational_layer, GradientPolicy.PROJECT)

Design Notes

Policies are deterministic and XLA/TorchScript compatible
No Python-side branching on tensors
Projective mode pairs PROJECT with detached renormalization

Loss Functions

ZeroProofML combines multiple losses to stabilize training and preserve orientation information.

1. Implicit Loss

Cross-product form that avoids direct division:

from zeroproof.losses.implicit import implicit_loss

# For projective outputs (N, D) and targets (Y_n, Y_d)
loss_fit = implicit_loss(N, D, Y_n, Y_d)

Formula:

E = (N · Y_d - D · Y_n)²
L_fit = mean(E / (sg(D² Y_d² + N² Y_n²) + γ))

Scale-invariant
Numerically stable when D → 0
Default γ = 1e-9

2. Margin Loss

Encourages denominators to stay away from zero:

from zeroproof.losses.margin import margin_loss

loss_margin = margin_loss(D, tau_train=1e-4)

Formula:

L_margin = mean(max(0, τ_train - |D|)²)

Penalizes denominators approaching τ_train
Can be masked to finite paths only
Default λ_margin = 0.1

3. Sign Consistency Loss

Disambiguates +∞ vs -∞ using projective cosine similarity:

from zeroproof.losses.sign import sign_consistency_loss

loss_sign = sign_consistency_loss(N, D, Y_n, Y_d, tau_sing=1e-3)

Formula:

L_sign = 𝟙(|Y_d| < τ_sing) · (1 - (N·Y_n + D·Y_d) / (‖(N,D)‖ ‖(Y_n,Y_d)‖))

Only applied to singular targets (|Y_d| < τ_sing)
Aligns orientation in projective space
Default λ_sign = 1.0

4. Coverage & Rejection Loss

Monitor and penalize low coverage (fraction of finite predictions):

from zeroproof.losses.coverage import rejection_loss

# Compute coverage
coverage = (bottom_mask.logical_not()).float().mean()

# Penalize if below threshold
loss_rej = rejection_loss(coverage, target_coverage=0.95)

Adaptive sampling can increase coverage over time
Early stopping when coverage stagnates
Default target: 95%

Combined Objective

from zeroproof.training.loss import SCMTrainingLoss

loss_fn = SCMTrainingLoss(
    lambda_margin=0.1,
    lambda_sign=1.0,
    lambda_rejection=0.01,
    tau_train=1e-4,
    tau_sing=1e-3,
    gamma=1e-9
)

total_loss = loss_fn(outputs=(N, D), targets=(Y_n, Y_d))

Training Loop

Using SCMTrainer

The reference trainer handles target lifting, gradient policies, and coverage monitoring:

from zeroproof.training import SCMTrainer, TrainingConfig

trainer = SCMTrainer(
    model=model,
    optimizer=optimizer,
    loss_fn=loss_fn,
    train_loader=train_loader,
    val_loader=val_loader,  # optional
    config=TrainingConfig(
        max_epochs=100,
        gradient_policy=GradientPolicy.PROJECT,
        coverage_threshold=0.90,
        coverage_patience=10,
        use_amp=True,  # mixed precision
        grad_accumulation_steps=1,
        tau_train_min=1e-4,
        tau_train_max=1e-4
    )
)

history = trainer.fit()

Manual Training Loop

For custom workflows:

from zeroproof.autodiff.policies import gradient_policy, GradientPolicy
from zeroproof.training.targets import lift_targets

model.train()
for epoch in range(num_epochs):
    for batch_x, batch_y in train_loader:
        # Lift targets to projective tuples
        Y_n, Y_d = lift_targets(batch_y)

        # Forward pass
        N, D = model(batch_x)  # projective outputs

        # Compute losses
        loss_fit = implicit_loss(N, D, Y_n, Y_d)
        loss_margin = margin_loss(D, tau_train=1e-4)
        loss_sign = sign_consistency_loss(N, D, Y_n, Y_d)

        total_loss = loss_fit + 0.1*loss_margin + 1.0*loss_sign

        # Backward with gradient policy
        optimizer.zero_grad()
        with gradient_policy(GradientPolicy.PROJECT):
            total_loss.backward()
        optimizer.step()

        # Monitor coverage
        _, bottom_mask, _ = strict_inference(N, D)
        coverage = (~bottom_mask).float().mean()
        print(f"Coverage: {coverage:.3f}")

Adaptive Loss Strategies

Coverage Control

Gradually increase target coverage as training progresses:

from zeroproof.training.adaptive import AdaptiveCoverageScheduler

scheduler = AdaptiveCoverageScheduler(
    initial_coverage=0.80,
    target_coverage=0.95,
    warmup_epochs=20
)

for epoch in range(num_epochs):
    target_cov = scheduler.step(epoch)
    loss_rej = rejection_loss(current_coverage, target_coverage=target_cov)

Threshold Perturbation

Perturb thresholds per batch to avoid brittle boundaries:

from zeroproof.training.thresholds import perturbed_threshold

for batch in train_loader:
    tau = perturbed_threshold(
        tau_train_min=1e-4,
        tau_train_max=2e-4,
        mode='uniform'  # or 'log_uniform'
    )
    loss_margin = margin_loss(D, tau_train=tau)

Best Practices

Start with SCM mode before adding projective complexity
Monitor coverage throughout training; early stop if stagnant
Use sign consistency for all singular targets (±∞)
Keep τ_train_min and τ_train_max close unless you need strong perturbations
Log threshold distributions to understand near-singular exposure
Validate on strict inference mode with τ_infer threshold
Check gap_mask in production; reject predictions in the gap if needed

Hyperparameter Defaults

Based on Physics Trinity benchmarks (see scm/paper_2601.tex):

Parameter	Default	Range
`γ` (implicit loss stability)	1e-9	[1e-12, 1e-6]
`τ_train` (margin threshold)	1e-4	[1e-6, 1e-3]
`τ_infer` (strict inference)	1e-6	[1e-8, 1e-4]
`τ_sing` (sign label tolerance)	1e-3	[1e-4, 1e-2]
`λ_margin`	0.1	[0.01, 1.0]
`λ_sign`	1.0	[0.1, 10.0]
`λ_rejection`	0.01	[0.001, 0.1]

Next Steps

Inference & Deployment - Deploy with strict SCM semantics
Development Guide - Debug logging and verification
API Reference - Full API documentation

Docs

Training Guide

Table of Contents

Projective Learning

When to Use Projective Mode

How It Works

Integration Steps

Gap Region

Gradient Policies

Policy Options

Usage

Design Notes

Loss Functions

1. Implicit Loss

2. Margin Loss

3. Sign Consistency Loss

4. Coverage & Rejection Loss

Combined Objective

Training Loop

Using SCMTrainer

Manual Training Loop

Adaptive Loss Strategies

Coverage Control

Threshold Perturbation

Best Practices

Hyperparameter Defaults

Next Steps