Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Stochastic gradient descent

We can further optimize the training process with a simple change. With basic (or batch) gradient descent, we calculate the adjustment by looking at the entire dataset. Therefore, the next obvious step for optimization is: can we calculate the adjustment by looking at less than the entire dataset?

As it turns out, the answer is yes! As we are expecting to train the network over numerous iterations, we can take advantage of the fact that we expect the gradient to be updated multiple times by calculating it for fewer examples. We can even do it by calculating it for a single example. By performing fewer calculations for each network update, we can significantly reduce the amount of computation required, meaning faster training times. This is essentially a stochastic approximation to gradient descent and, hence, how it got its name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Stochastic gradient descent

Create new playlist

Sign In

Sign Up

Table of Contents for
Stochastic gradient descent