Optimizer & LR Scheduler Explorer

Compare how SGD, Momentum, RMSProp, and Adam navigate loss surfaces. Switch between 2D view (1 weight) and 3D view (2 weights) to see how difficulty scales with dimensionality.

1 weight, 1D search. The optimizer moves a single value w along the curve. Every update is: w ← w − lr(t) × ∂L/∂w. Clean and easy to follow step by step.

2 weights, 2D search. Now the optimizer must navigate a full surface. Gradients interact across both dimensions — ridges, saddle points, and curved valleys all appear. Naive SGD can zigzag badly while Adam cuts more directly. Real networks have millions of dimensions; this is just two.

step 0

Optimizers — click to toggle

LR schedule

Presets

lr = 0.02 lr = 0.12 lr = 0.50 lr = 1.50

Base learning rate — 0.120

Start position — w = 5.00

Speed

global minimum

local minimum

amber = gradient direction | teal = step taken