Optimizer & LR Scheduler Explorer

Compare how SGD, Momentum, RMSProp, and Adam navigate loss surfaces. Switch between 2D view (1 weight) and 3D view (2 weights) to see how difficulty scales with dimensionality.

1 weight, 1D search. The optimizer moves a single value w along the curve. Every update is: w ← w − lr(t) × ∂L/∂w. Clean and easy to follow step by step.
2 weights, 2D search. Now the optimizer must navigate a full surface. Gradients interact across both dimensions — ridges, saddle points, and curved valleys all appear. Naive SGD can zigzag badly while Adam cuts more directly. Real networks have millions of dimensions; this is just two.
step 0
Optimizers — click to toggle
LR schedule
Presets
lr = 0.02 lr = 0.12 lr = 0.50 lr = 1.50
Base learning rate — 0.120
Start position — w = 5.00
Speed
global minimum
local minimum
amber = gradient direction  |  teal = step taken