출처: https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/

What is Gradient Descent?

Gradient descent is an optimization algorithm often used for finding the weights or coefficients of machine learning algorithms, such as artificial neural networks and logistic regression.

Contrasting the 3 Types of Gradient Descent

Gradient descent can vary in terms of the number of training patterns used to calculate error; that is in turn used to update the model.

The number of patterns used to calculate the error includes how stable the gradient is that is used to update the model. We will see that there is a tension in gradient descent configurations of computational efficiency and the fidelity of the error gradient.

The three main flavors of gradient descent are batch, stochastic, and mini-batch.

Let’s take a closer look at each.

What is Stochastic Gradient Descent?

Stochastic gradient descent, often abbreviated SGD, is a variation of the gradient descent algorithm that calculates the error and updates the model for each example in the training dataset.

The update of the model for each training example means that stochastic gradient descent is often called an

online machine learning algorithm

What is Batch Gradient Descent?

Batch gradient descent is a variation of the gradient descent algorithm that calculates the error for each example in the training dataset, but only updates the model after all training examples have been evaluated.

One cycle through the entire training dataset is called a training epoch. Therefore, it is often said that batch gradient descent performs model updates at the end of each training epoch.

What is Mini-Batch Gradient Descent?

Mini-batch gradient descent is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients.

Implementations may choose to sum the gradient over the mini-batch or take the average of the gradient which further reduces the variance of the gradient.

Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent. It is the most common implementation of gradient descent used in the field of deep learning.

How to Configure Mini-Batch Gradient Descent

Mini-batch gradient descent is the recommended variant of gradient descent for most applications, especially in deep learning.

Mini-batch sizes, commonly called “batch sizes” for brevity, are often tuned to an aspect of the computational architecture on which the implementation is being executed. Such as a power of two that fits the memory requirements of the GPU or CPU hardware like 32, 64, 128, 256, and so on.

Optimization - GD

What is Gradient Descent?

Contrasting the 3 Types of Gradient Descent

What is Stochastic Gradient Descent?

What is Batch Gradient Descent?

What is Mini-Batch Gradient Descent?

How to Configure Mini-Batch Gradient Descent

Further Reading

Additional Reading

results matching ""

No results matching ""

What is Gradient Descent?

Contrasting the 3 Types of Gradient Descent

What is Stochastic Gradient Descent?

What is Batch Gradient Descent?

What is Mini-Batch Gradient Descent?

How to Configure Mini-Batch Gradient Descent

Further Reading

Related Posts

Additional Reading

results matching ""

No results matching ""