/Why Your Neural Network Can't Learn 1/x (And What We Did About It)

Try this experiment: train a standard neural network to approximate f(x) = 1/x on the interval [-2, 2]. Use whatever architecture you like—dense layers, ReLU activations, plenty of capacity. Train until convergence. Now plot the results near x = 0.
You'll see something frustrating: while your network captures the behavior perfectly at x = ±2, ±1, ±0.5, it creates a smooth, incorrect plateau exactly where the function should shoot to infinity. The network has learned to give up. It's not a matter of training longer or adding more parameters—standard architectures with continuous, bounded activation functions fundamentally cannot represent poles. They'll always smooth the singularity away, creating what we call a "dead zone" where gradients flatline and learning stops.
This isn't just a mathematical curiosity. In robotics inverse kinematics, these dead zones appear exactly where you need accuracy most: when a robot arm approaches full extension and the Jacobian matrix loses rank. In our experiments on 2-revolute and 3-revolute manipulators, standard networks would predict joint angles confidently but incorrectly in these critical configurations, with errors 30-47% higher than necessary in regions where |det(J)| < 10⁻⁴. The robot "thinks" it knows what to do, but the mathematics has broken down.
Our solution combines two ideas. First, we use rational layers: networks that compute P(x)/Q(x) where both numerator and denominator are learned polynomials. This gives the architecture the capacity to represent poles—when Q(x) → 0, the output can legitimately diverge. Second, we extended the arithmetic system itself using transreal numbers, where division by zero produces tagged results (±∞ for signed infinities, Φ for indeterminate forms like 0/0) rather than NaN errors. During training, we track these tags and use hybrid gradient policies: exact gradients away from singularities, bounded surrogates near them. This keeps optimization stable while preserving the mathematical structure.
The results on inverse kinematics were substantial: mean squared error dropped from 0.0032 to 0.0022 in the most critical near-singularity bucket, with training 12× faster than ensemble approaches that try multiple ε-regularization values. More importantly, the system behaves deterministically—given the same inputs and random seed, it produces identical outputs across runs, critical for safety certification in robotics. The framework maintains bounded gradient norms even at singularities, preventing the training instabilities that typically occur when neural networks encounter mathematical edge cases.
But here's what we don't know yet: where else does this matter? Inverse kinematics has a clear rational structure (Jacobian inverses are ratios of determinants), making it a natural fit. We suspect applications exist in computational physics (learning solutions with known singularities), quantitative finance (correlation matrices losing rank during crises), and power systems (voltage collapse analysis). However, these remain untested hypotheses. Not every singularity is a simple pole, and not every pole-learning problem benefits from this approach over domain-specific methods. If you work with functions that blow up, go to zero, or become undefined in ways that matter to your application, we'd genuinely like to know whether this helps—or why it doesn't. The code is open source, and negative results are just as valuable as positive ones.
Enter your email to receive our latest newsletter.
Don't worry, we don't spam
Why smooth activations create dead zones near poles—and how rational layers with tagged infinities fix it for robotics IK and beyond.
Why bucketed metrics near singularities matter more than overall averages for certain robotics IK deployments.
Call for real problems with rational structure and true singularities—help us find where ZeroProofML matters beyond IK.