Topic 4: Layers & Variants (TR‑Rational, TR‑Norm, Enhanced)

This topic covers ZeroProof’s core layers, how they differ, and when to use each. For API details see docs/layers.md.

TR‑Rational (P/Q)

Purpose: Learn rational functions with explicit poles; total under TR semantics.
Form: y = P_θ(x) / Q_φ(x) with Q leading 1 for identifiability.
Tags: REAL when Q≠0; ±∞ when Q=0 and P≠0 (sign from P); Φ when P=Q=0.
Gradients: Mask‑REAL (default); zero grads when forward tag is non‑REAL.
Stability aids: L2 on φ; optional L1 projection bound on ||φ||₁.
Code: zeroproof/layers/tr_rational.py:1.

Usage

from zeroproof.layers import TRRational, ChebyshevBasis
from zeroproof.autodiff.tr_node import TRNode
from zeroproof.core import real

layer = TRRational(d_p=3, d_q=2, basis=ChebyshevBasis())
y, tag = layer.forward(TRNode.constant(real(0.2)))

Choosing a basis

Monomial: simple, good for low degree.
Chebyshev: stable on bounded intervals; recommended default.
Fourier: for periodic signals (if available in your build).

TR‑Norm (Epsilon‑Free Normalization)

Purpose: Batch/layer normalization with ε→0⁺ semantics; deterministically handles σ²=0.
Behavior: If σ²>0 → classical normalization; if σ²=0 → bypass to β.
Stats: Use DROP_NULL over REAL-only subset for μ, σ².
Gradients: Regular branch = classical; bypass branch = ∂ŷ/∂x=0, ∂ŷ/∂β=1, ∂ŷ/∂γ=0.
Code: zeroproof/layers/tr_norm.py:1.

Enhanced & Variant Layers

When you need more control or explicit pole learning:

SaturatingTRRational: Same P/Q with Saturating gradient mode baked in.
- Code: zeroproof/layers/saturating_rational.py:1.
HybridTRRational: Integrates Hybrid gradient schedule (Mask‑REAL far from poles, Saturating near poles) and optional Q tracking.
- Code: zeroproof/layers/hybrid_rational.py:1.
HybridRationalWithPoleHead: Adds auxiliary pole‑detection head to localize Q≈0.
- Code: zeroproof/layers/hybrid_rational.py:180.
EnhancedTRRational / EnhancedTRRationalMulti: Integrates pole detection and regularization with multi‑output options.
- Code: zeroproof/layers/enhanced_rational.py:1.
TagAwareRational / TagAwareMultiRational: Adds tag‑aware losses and outputs for training that supervises tag distribution.
- Code: zeroproof/layers/tag_aware_rational.py:1.
PoleAwareRational / FullyIntegratedRational: End‑to‑end stacks with pole metrics/regularizers.
- Code: zeroproof/layers/pole_aware_rational.py:1.
EnhancedPoleDetectionHead and regularizer components for custom assemblies.
- Code: zeroproof/layers/enhanced_pole_detection.py:1.

Multi‑Input / Multi‑Output Rationals

Some tasks (e.g., robotics IK) take vector inputs and produce multi‑dimensional outputs while sharing pole structure.

TRRationalMulti: Multiple outputs with a shared denominator Q (optional) and separate numerators.
- Use when outputs share the same pole locations (shared_Q=True).
- Code: zeroproof/layers/tr_rational.py:408 (class TRRationalMulti).
TRMultiInputRational: Small TR‑MLP front end (R^D→K features) feeding TR‑Rational heads for multi‑output.
- Forward (vector): forward([TRNode|float]*input_dim) -> List[(TRNode, TRTag)] (len = n_outputs).
- Structured forward: forward_fully_integrated(...) -> {outputs, tags, Q_abs_list?, pole_score?}.
- Supports enabling a simple pole head; provides regularization_loss() across heads.
- Code: zeroproof/layers/multi_input_rational.py:1.

Usage (4D→2D)

from zeroproof.layers import TRMultiInputRational, MonomialBasis

model = TRMultiInputRational(
    input_dim=4,
    n_outputs=2,
    d_p=3,
    d_q=2,
    basis=MonomialBasis(),
    hidden_dims=[8],
    shared_Q=True,
    enable_pole_head=False,
)

# Vector forward (floats or TRNodes)
outs = model.forward([0.1, -0.2, 0.05, -0.05])  # [(TRNode, TRTag), (TRNode, TRTag)]

When to use which

Prefer TRRationalMulti when outputs share Q and the input is scalar or already projected.
Prefer TRMultiInputRational when inputs are vectors and you want a lightweight front end with shared Q across heads.

Practical Patterns

Start simple: TRRational + Chebyshev basis; monitor tag distribution and q_min.
For pole learning: switch to HybridTRRational with a gentle schedule; enable Q tracking to tune δ.
Add a pole head when you have labels/weak‑labels for singularities; combine with coverage control in training.
For normalization without ε: use TRNorm or TRLayerNorm to avoid tuning eps.
Benchmark training: Hybrid trainer prints and records per‑epoch timings — avg_step_ms, data_time_ms, optim_time_ms, batches — returned in training summaries under bench_history. Adjust logging cadence via log_interval (CLI --log_every).

Interactions with Autodiff

All layers use lifted TR ops (tr_ops_grad) and integrate with TRNode.
Autodiff modes apply as configured globally (Mask‑REAL default, Saturating, or Hybrid via schedules).
Code references: zeroproof/autodiff/tr_ops_grad.py:1, zeroproof/autodiff/backward.py:1.

Diagnostics to Track

q_min (batch/epoch): from Hybrid context or via layer Q tracking.
Tag distribution: counts of REAL, PINF, NINF, PHI during training.
Near‑pole ratio: fraction of samples triggering Saturating under Hybrid.
Pole localization metrics: use utils/pole_metrics.py if applicable.

Docs