Neural Network Backpropagation3/22/2021
Follow Analytics Vidhya Follow Analytics Vidhya is a community of Analytics and Data Science professionals.The equation unveils the nature of how the neural network learns.
After the forward propagation in a neural network, when all the inputs are multiplied by weights, added to biases, and passed through activation functions, the neural network achieves a final prediction for the target value. However, the neural network cant get anywhere without the feedback engine that adjusts the weights and biases. The backpropagation equation provides the basis for how weights and biases are adjusted in the most efficient way to reduce the cost function. This brief article will provide a mathematical reasoning for how the beautiful backpropagation equation is achieved. Let us consider the most simple neural network, with a single input, an arbitrary amount of hidden layers with one neuron, and a single output. The following notation will be used: In this case, the cost function c () is simply or the difference between the predicted final value and the target variable, squared. It is important to remember that where A represents any activation function, like the sigmoid or the ReLU functions, w represents the weight, and b represents the bias. Neural Network Backpropagation Plus The BiasThe equation simply states, the output of a neuron is equal to the activation function applied to, the product of the weight and the output of the neuron before it, plus the bias. We want to find how a change w, the weight, corresponds with c, our cost function. Phrased mathematically, it is a small change in the cost function divided by a small change in the weight. Because the cost function is a function of the weight, a change in weight will correspond with a change in the cost function. A change in w directly changes z (the weighted sum), whose change in value directly changes a (the activation function), whose final value directly changes c. ![]() Now that we have the fraction written as a set of smaller derivatives, we can compute them. With knowledge of the cost function and the product rule, we can compute the last derivative (change in cost function with respect to the activation function output) to be the derivative of the cost function: Next, we can compute the second-to-last derivative (the output of the activation function to the weighted sum, its input) to simply be the derivative of the activation function. In this case, we will use the sigmoid function as the activation function since its derivative is easy to find. Lastly, we can compute the first derivative, change in the weighted sum with respect to the weight. Since the weighted sum function is simply a linear equation, the derivative works out to be Now that we have all three derivatives, we can find the value of the first derivative, change in the cost function with respect to weight, by multiplying them together. Weve achieved a numerical indicator of the change in the cost function c with respect to the weight w. This value tells us what the biggest reduction in the cost function c will be for a certain weight w, and is the basis of backpropagation. It tells the weights how much they need to increase or decrease for the maximum decrease in the cost function. Even as neural networks grow more and more complicated, from small neural networks to massive convolutional neural networks with countless parameters, the script that can recognize cancer, drive a car (CNNs), and produce music (GANs) relies on the humble backpropagation equation to keep it running. If you enjoyed, feel free to check out some of my other work Analytics Vidhya Analytics Vidhya is a community of Analytics and Data Follow 169 Neural Networks Backpropagation Calculus Equation Mathematics 169 claps 169 claps Written by Andre Ye Follow Co-founder of ML, CS, math enthusiast.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |