How It Works

Prefer a deep dive? Download the full article as a PDF or read it on Zenodo.

How ZeroProofML Works

The 3-minute story

Some machine learning problems have mathematical singularities—points where functions should go to infinity or become undefined. Learning inverse kinematics for robot arms is one example: when the arm approaches full extension, the Jacobian matrix loses rank and calculating joint angles requires dividing by numbers approaching zero. Traditional approaches either smooth away the singularity (losing the sharp behavior you need to learn) or add small ε constants to denominators (introducing position-dependent bias). If your problem involves learning rational functions P(x)/Q(x) where Q can approach zero, or other mathematical structures with poles, ZeroProofML provides an alternative.

The approach: extend arithmetic rather than avoid singularities

Standard neural networks operate over real numbers ℝ where division by zero is undefined—it produces NaN, gradients break, and training becomes unstable. We extend this to transreal arithmetic T = ℝ ∪ {+∞, −∞, Φ}, where every operation returns a tagged value:

  • REAL — ordinary finite numbers (the vast majority of computations)
  • ±∞ — signed infinities when denominators approach zero (1/0 = +∞, −1/0 = −∞)
  • Φ — nullity for indeterminate forms like 0/0

This isn't just error handling—it's a complete arithmetic system with well-defined rules. When Q(x) → 0 in a rational layer, the framework computes the appropriate infinite value with correct sign rather than crashing or returning an arbitrary large number. Tags propagate through the computation graph, and gradients are handled differently based on these tags: exact derivatives in REAL regions, bounded surrogates near singularities.

Why this helps (for problems with the right structure)

Neural networks with standard smooth activations (ReLU, tanh, sigmoid) fundamentally cannot represent true poles—they learn smooth approximations that systematically underestimate gradients near singularities, creating "dead zones" where learning plateaus. Rational layers P(x)/Q(x) have the architectural capacity to represent poles, but naive implementation leads to gradient explosion when Q → 0.

ZeroProofML solves this through tag-aware automatic differentiation. When the forward pass tags a value as ±∞ or Φ, the backward pass applies different gradient rules: either masking gradients through non-REAL paths (preventing exploding updates) or using bounded saturating surrogates (maintaining descent direction while capping magnitude). This keeps optimization stable while preserving the learned pole structure. A hybrid switching policy with hysteresis determines when to use exact gradients (far from singularities) versus bounded approximations (near them), ensuring finite mode transitions during training.

Validated results and limitations

On robot inverse kinematics (2R, 3R, 6R manipulators):

  • 30–47% error reduction specifically in near-singularity regions where |det(J)| < 10⁻⁴
  • Overall performance similar to baselines elsewhere in workspace (this is expected—improvement is localized)
  • 12× faster training than 5-member ε-ensemble
  • Deterministic, bit-reproducible results across runs given fixed seeds

These results matter when your application operates in those critical regions. For a robot doing assembly at maximum reach, 47% error reduction in precisely those configurations is huge. For a robot working in the middle of its workspace, ZeroProofML offers no advantage—you're adding complexity for no benefit.

When this matters / doesn't matter

Good candidates:

  • Learning rational functions or their approximations
  • Inverse problems with mathematical singularities (kinematics, some PDEs)
  • Physics models with known pole structure
  • Cases where ε-regularization introduces bias you can measure
  • Applications requiring deterministic behavior for certification

Poor candidates:

  • Classification, computer vision, natural language processing
  • Any problem without division operations or rational structure
  • Smooth optimization where singularities don't appear
  • Cases where standard methods already work fine

Most neural networks don't encounter division-by-zero issues. Transformers, CNNs, ResNets—they work without this. ZeroProofML addresses a specific class of problems. If you're not hitting numerical instabilities near mathematical singularities, you probably don't need it.

What this means practically

If your research or application involves learning functions with poles, you'd replace standard layers with rational layers:

from zeroproofml import TRRationalLayer

# Instead of: y = some_mlp(x)
# Use rational layer with learned P and Q polynomials
y, tag = TRRationalLayer(degree_p=3, degree_q=2)(x)

# tag tells you: REAL, PINF, NINF, or Φ
# Gradients automatically handled based on tag

The framework integrates with PyTorch's autograd—you don't rewrite your training loop. But you do need to think about: Where are my singularities? Do I have sufficient training coverage near them? Is my problem actually rational in structure?

Open questions

We've validated one domain thoroughly (inverse kinematics) and hypothesize this applies to: physics-informed neural networks learning singular solutions, quantitative finance problems with correlation matrix singularities, power systems near voltage collapse, and other areas with rational mathematical structure. But these remain untested. We're actively looking for collaborators with domain expertise and real problems to determine where else this helps—and importantly, where it doesn't. Mapping the boundaries of applicability is as valuable as success stories.

The code is open source. If you're working with mathematical singularities and current approaches (ε-regularization, smoothing, ensembles) aren't satisfactory, try it on your problem and let us know what you find. Honest negative results advance understanding too.