Backpropagation in Machine Learning
What is Backpropagation?
Backpropagation is the process of updating the weights and biases of a neural network to minimize the error of the model’s predictions.
There are two elements to backpropagation in machine learning:
- Cost Function
- Gradient Descent
The Formula for the Cost Function
For regression models, we often use the Mean Squared Error (MSE) to calculate the cost function. The MSE essentially tells us how wrong the model was in calculating the output.
Let’s break it down…
In the case of one training example, we square the difference between the Actual Value and the Expected Value. We then repeat this for all the training examples (n = number of training examples), and calculate the average of the values.
Now, for each training example we calculate the MSE (say there are four training examples). If we plot the input and expected output and the input and the actual output for each training example, at first, we would get something like this:
In this graph, you can see how off the model is from the expected output (i.e. the error or the cost function value). Our goal is to minimize the error of the model, so that we can enhance its performance. The process of minimizing the cost function is known as gradient descent.
So, how do we actually minimize the cost function? Since the weights and biases of the network affect the output, we can update the values of the weights and biases to minimize the cost function. But should we increase or decrease the weights / biases?
In order to answer this question, we have to visualize the graph of the cost function:
From the starting point, you could go up or go down. WHAT?????? Why would you move up? You would have to move down in order to reach the minimum?!?! That’s true… but the algorithm doesn’t know whether to increase the weight (move up) or decrease the weight (move down) in order to reach the minimum.
That’s why we have to use a bit of calculus. By calculating the slope of the cost function with respect to the weight, we can figure out if we need to increase or decrease the weight. But wait. This function is not linear. Therefore, we have to calculate the partial derivative of the cost function in order to get the slope — we cannot simply use the slope formula we learned in Algebra 1.
Ok, so we now know if we should move up or down. But on one iteration of the cost function, how much should we move up or down? This is known as the learning rate: how big of a step we should take in order to reach the minimum of the cost function.
Finally, we now have enough information to determine how much to update the weights by in order to minimize the cost function.
W_new = W_old - Learning Rate * Partial Derivative(Cost Function)
We do the same thing for the biases:
B_new = B_old - Learning Rate * Partial Derivative(Cost Function)
Recap: Backpropagation consists of 2 parts: calculating the cost function (MSE for regression models) and gradient descent, and the ultimate goal is to update the weights and biases, so that our model performs well when fed data.
That’s it! I hope you learned what backpropagation is and most of the math behind it — stay tuned for more articles!
References:
Mc., C. (2021, April 4). Machine learning fundamentals (I): Cost functions and gradient descent. Medium. https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220
Kaur, B. S. (n.d.). Introduction Linear Regression vs. Logistic Regression. https://linuxhint.com/linear-regression-vs-logistic-regression/
Pandey, P. (2021, December 9). Understanding the Mathematics behind Gradient Descent. Medium. https://towardsdatascience.com/understanding-the-mathematics-behind-gradient-descent-dde5dc9be06e
3Blue1Brown. (2017, November 3). Backpropagation calculus | Chapter 4, Deep learning [Video]. YouTube. https://www.youtube.com/watch?v=tIeHLnjs5U8