Training Guide
This guide covers training neural networks with SCM semantics, including projective learning, gradient policies, and specialized loss functions.
Table of Contents
Projective Learning
When to Use Projective Mode
Projective learning lifts rational subgraphs to homogeneous tuples ⟨N,D⟩, allowing training on a smooth manifold while preserving strict SCM semantics at inference.
Use projective mode when:
- Training rational heads that should avoid instantiating ⊥ during optimization
- Gradient dead zones around
Q ≈ 0hurt convergence - Safety-critical outputs where distinguishing
+∞vs−∞matters - Need smooth gradients through potential singularities
Skip projective mode when:
- Working with simple SCM operations
- Singularities are rare in training data
- Model architecture doesn't have rational bottlenecks
How It Works
Encoding:
φ(x) = ⟨x, 1⟩ for finite values
φ(⊥) = ⟨1, 0⟩ for bottom
Decoding:
φ⁻¹(N, D) = N/D when |D| ≥ τ_infer
φ⁻¹(N, D) = ⊥ when |D| < τ_infer
Detached Renormalization:
To prevent overflow without altering the represented value:
S = sg(√(N² + D²) + γ)
(N', D') ← (N/S, D/S)
The stop_gradient (sg) operator ensures the optimizer learns the direction of the tuple, not its magnitude. This creates "ghost gradients" that flow smoothly even when D → 0.
Integration Steps
- Lift targets to projective tuples:
from zeroproof.training.targets import lift_targets
# Finite targets: y_finite → ⟨y_finite, 1⟩
# Infinite targets: ±inf → ⟨±1, 0⟩
targets_n, targets_d = lift_targets(y_true)
- Use PROJECT gradient policy in projective regions:
from zeroproof.autodiff.policies import GradientPolicy, gradient_policy
with gradient_policy(GradientPolicy.PROJECT):
loss.backward()
-
Combine specialized losses (see Loss Functions section)
-
Decode at boundaries and monitor coverage
Gap Region
Training uses stochastic thresholds (τ_train_min, τ_train_max) to avoid learning a brittle boundary. Inference uses a fixed τ_infer.
When τ_train > τ_infer, the interval [τ_infer, τ_train) is the gap region where inference returns a finite value but the denominator is numerically risky.
from zeroproof.inference import strict_inference, InferenceConfig
decoded, bottom_mask, gap_mask = strict_inference(
N, D,
config=InferenceConfig(tau_infer=1e-6, tau_train=1e-4)
)
Monitor gap_mask.sum() to track how often predictions fall in this uncertain zone.
Gradient Policies
Gradient policies control how backpropagation interacts with ⊥. Available in zeroproof.autodiff.policies.
Policy Options
| Policy | Behavior | Use When |
|---|---|---|
| CLAMP | Zeroes gradients on ⊥ paths; clamps finite gradients to [-1, 1] | Default for SCM-only graphs |
| PROJECT | Masks gradients when forward value is ⊥ | Projective heads, points at infinity |
| REJECT | Always zero gradient | Learning through coverage/rejection losses only |
| PASSTHROUGH | Gradients propagate through ⊥ | Debugging only |
Usage
from zeroproof.autodiff.policies import GradientPolicy, gradient_policy
# Global policy
with gradient_policy(GradientPolicy.PROJECT):
outputs = model(inputs)
loss = loss_fn(outputs, targets)
loss.backward()
# Per-layer policy (advanced)
from zeroproof.autodiff.policies import register_policy
register_policy(my_rational_layer, GradientPolicy.PROJECT)
Design Notes
- Policies are deterministic and XLA/TorchScript compatible
- No Python-side branching on tensors
- Projective mode pairs
PROJECTwith detached renormalization
Loss Functions
ZeroProofML combines multiple losses to stabilize training and preserve orientation information.
1. Implicit Loss
Cross-product form that avoids direct division:
from zeroproof.losses.implicit import implicit_loss
# For projective outputs (N, D) and targets (Y_n, Y_d)
loss_fit = implicit_loss(N, D, Y_n, Y_d)
Formula:
E = (N · Y_d - D · Y_n)²
L_fit = mean(E / (sg(D² Y_d² + N² Y_n²) + γ))
- Scale-invariant
- Numerically stable when
D → 0 - Default
γ = 1e-9
2. Margin Loss
Encourages denominators to stay away from zero:
from zeroproof.losses.margin import margin_loss
loss_margin = margin_loss(D, tau_train=1e-4)
Formula:
L_margin = mean(max(0, τ_train - |D|)²)
- Penalizes denominators approaching
τ_train - Can be masked to finite paths only
- Default
λ_margin = 0.1
3. Sign Consistency Loss
Disambiguates +∞ vs -∞ using projective cosine similarity:
from zeroproof.losses.sign import sign_consistency_loss
loss_sign = sign_consistency_loss(N, D, Y_n, Y_d, tau_sing=1e-3)
Formula:
L_sign = 𝟙(|Y_d| < τ_sing) · (1 - (N·Y_n + D·Y_d) / (‖(N,D)‖ ‖(Y_n,Y_d)‖))
- Only applied to singular targets (
|Y_d| < τ_sing) - Aligns orientation in projective space
- Default
λ_sign = 1.0
4. Coverage & Rejection Loss
Monitor and penalize low coverage (fraction of finite predictions):
from zeroproof.losses.coverage import rejection_loss
# Compute coverage
coverage = (bottom_mask.logical_not()).float().mean()
# Penalize if below threshold
loss_rej = rejection_loss(coverage, target_coverage=0.95)
- Adaptive sampling can increase coverage over time
- Early stopping when coverage stagnates
- Default target: 95%
Combined Objective
from zeroproof.training.loss import SCMTrainingLoss
loss_fn = SCMTrainingLoss(
lambda_margin=0.1,
lambda_sign=1.0,
lambda_rejection=0.01,
tau_train=1e-4,
tau_sing=1e-3,
gamma=1e-9
)
total_loss = loss_fn(outputs=(N, D), targets=(Y_n, Y_d))
Training Loop
Using SCMTrainer
The reference trainer handles target lifting, gradient policies, and coverage monitoring:
from zeroproof.training import SCMTrainer, TrainingConfig
trainer = SCMTrainer(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
train_loader=train_loader,
val_loader=val_loader, # optional
config=TrainingConfig(
max_epochs=100,
gradient_policy=GradientPolicy.PROJECT,
coverage_threshold=0.90,
coverage_patience=10,
use_amp=True, # mixed precision
grad_accumulation_steps=1,
tau_train_min=1e-4,
tau_train_max=1e-4
)
)
history = trainer.fit()
Manual Training Loop
For custom workflows:
from zeroproof.autodiff.policies import gradient_policy, GradientPolicy
from zeroproof.training.targets import lift_targets
model.train()
for epoch in range(num_epochs):
for batch_x, batch_y in train_loader:
# Lift targets to projective tuples
Y_n, Y_d = lift_targets(batch_y)
# Forward pass
N, D = model(batch_x) # projective outputs
# Compute losses
loss_fit = implicit_loss(N, D, Y_n, Y_d)
loss_margin = margin_loss(D, tau_train=1e-4)
loss_sign = sign_consistency_loss(N, D, Y_n, Y_d)
total_loss = loss_fit + 0.1*loss_margin + 1.0*loss_sign
# Backward with gradient policy
optimizer.zero_grad()
with gradient_policy(GradientPolicy.PROJECT):
total_loss.backward()
optimizer.step()
# Monitor coverage
_, bottom_mask, _ = strict_inference(N, D)
coverage = (~bottom_mask).float().mean()
print(f"Coverage: {coverage:.3f}")
Adaptive Loss Strategies
Coverage Control
Gradually increase target coverage as training progresses:
from zeroproof.training.adaptive import AdaptiveCoverageScheduler
scheduler = AdaptiveCoverageScheduler(
initial_coverage=0.80,
target_coverage=0.95,
warmup_epochs=20
)
for epoch in range(num_epochs):
target_cov = scheduler.step(epoch)
loss_rej = rejection_loss(current_coverage, target_coverage=target_cov)
Threshold Perturbation
Perturb thresholds per batch to avoid brittle boundaries:
from zeroproof.training.thresholds import perturbed_threshold
for batch in train_loader:
tau = perturbed_threshold(
tau_train_min=1e-4,
tau_train_max=2e-4,
mode='uniform' # or 'log_uniform'
)
loss_margin = margin_loss(D, tau_train=tau)
Best Practices
- Start with SCM mode before adding projective complexity
- Monitor coverage throughout training; early stop if stagnant
- Use sign consistency for all singular targets (±∞)
- Keep τ_train_min and τ_train_max close unless you need strong perturbations
- Log threshold distributions to understand near-singular exposure
- Validate on strict inference mode with
τ_inferthreshold - Check gap_mask in production; reject predictions in the gap if needed
Hyperparameter Defaults
Based on Physics Trinity benchmarks (see scm/paper_2601.tex):
| Parameter | Default | Range |
|---|---|---|
γ (implicit loss stability) |
1e-9 | [1e-12, 1e-6] |
τ_train (margin threshold) |
1e-4 | [1e-6, 1e-3] |
τ_infer (strict inference) |
1e-6 | [1e-8, 1e-4] |
τ_sing (sign label tolerance) |
1e-3 | [1e-4, 1e-2] |
λ_margin |
0.1 | [0.01, 1.0] |
λ_sign |
1.0 | [0.1, 10.0] |
λ_rejection |
0.01 | [0.001, 0.1] |
Next Steps
- Inference & Deployment - Deploy with strict SCM semantics
- Development Guide - Debug logging and verification
- API Reference - Full API documentation