In Nesterov momentum, we are changing where/when we compute the gradient. We make a big jump in the direction of the previously accumulated gradient. Then, we measure the gradient at this new position and make a correction/update accordingly.
This correction prevents the ordinary momentum algorithm from updating too quickly, hence producing fewer oscillations as the gradient descent tries to converge.