Topic 6: Sampling & Curriculum (Near‑Pole Focus)
Strategies to expose the model to informative near‑pole data without destabilizing training.
Why Sampling Matters
- Poles are defined by Q(x)=0; informative gradients live near pole neighborhoods.
- Uniform sampling under‑represents near‑poles; importance/active sampling boosts coverage of hard regions.
Importance Sampling (1/|Q|^p)
- Weight samples by 1/(|Q(x)|^power), with clipping for stability.
- Temperature and max_weight guard distribution sharpness.
- Hybrid batches: mix importance and uniform subsets.
- Code:
zeroproof/training/sampling_diagnostics.py:1(ImportanceSampler, ImportanceSamplerConfig).
Usage
from zeroproof.training.sampling_diagnostics import ImportanceSampler, ImportanceSamplerConfig
sampler = ImportanceSampler(ImportanceSamplerConfig(weight_power=2.0, max_weight=100.0))
# Given pools x_pool, q_pool (tensors)
batch_x, idx = sampler.sample_batch(x_pool, q_pool, batch_size=128)
Active Sampling (Grid Refinement)
- Maintain a grid over the input domain; refine where |Q| is small.
- Controls: refinement_threshold, max_refinement_level, bounds, memory caps.
- Tracks refinement history and q statistics.
- Code:
zeroproof/training/sampling_diagnostics.py:200(ActiveSampler, ActiveSamplerConfig).
Hybrid Strategies
- Combine uniform + importance; or alternate active refinement epochs.
- Balance exploration (coverage of space) and exploitation (near‑poles).
Curriculum Learning
- Stage difficulty: far → mid → near pole regions based on |Q| bands.
- Works well with Hybrid gradient schedules to avoid early instability.
- Reference concepts:
concept_250908.md: Part VI.
Diagnostics
- Monitor q_min (batch/epoch), mean |Q|, near‑pole ratio, sample diversity.
- DiagnosticMonitor exports histories and JSON snapshots.
- Code:
zeroproof/training/sampling_diagnostics.py:480(DiagnosticMonitor).
Practical Tips
- Clip |Q| with min_q_abs in weights to avoid extreme peaking.
- Keep an importance_batch_ratio < 1 to retain coverage of easy regions.
- Use adaptive_delta in Hybrid schedule to align sampler focus and gradient mode.
- Persist sampler state for reproducibility when needed.
See Also
- Autodiff Hybrid:
docs/topics/03_autodiff_modes.md:1 - Training Policies:
docs/topics/05_training_policies.md:1 - Conceptual background:
concept_250908.md:1(Sampling strategies)