A perceptron is the fundamental unit of a neural network. Given that the most accurate and detailed machine learning models in today’s world are neural networks, the importance of a perceptron cannot be overstated. Before diving into deep learning, one must learn the creation of neural networks from a very basic level. In this article, I shall explain the working and the theory behind the building blocks of the world’s most powerful model.
A perceptron is a linear, binary classifier. It is made on the model of a neuron in our body. It is an excellent way to make a quick decision based on one aspect of the data. A perceptron has the following important features:
- Input values (x): The values of attributes in the dataset.
- Weights (w): The weights assigned to each input.
- Activation function: The formula used for calculation defines a perceptron
- Threshold Value(t): The threshold value based on which binary classification is done.
The perceptron works on one simple principle: the corresponding weight represents the importance of a single node. The lower the weight- the less we concentrate on it. This makes sure that there is no overfitting due to inaccurate data. The fun thing about perceptrons is that it makes simple predictions leading up to a very accurate prediction on a much larger picture.
Working: The perceptron function
With the above things, we dive into the working of a perceptron. The activation value is at the heart of this working. Let us take a good look at it. First, we multiply the weights of each input with the input itself. Now, we can have a crack at the activation function.
Let us have a look at the activation function now. Normally it returns zero if the weighted sum of inputs is less than or equal to the threshold value and 1 otherwise. This threshold value(t) is conventionally set to zero for the ease of calculation and the algorithm adjusts the weight accordingly. Thus, we can say that the threshold value is not of grave importance. Now, we have a simple mathematical model to deal with. The binary classification has the following outputs for the mentioned conditions:
- If the weighted sum of input( summation of w.x) is greater than the threshold value(conventionally zero)- the output is one
- If the weighted sum of input( summation of w.x) is less than the threshold value(conventionally zero)- the output is zero
This function is called the hardlim or perceptron function. We shall have look at the sigmoid function very shortly. The inputs are pre-defined. We have talked about the activation function, the threshold value is normally zero but the weights are still a big question mark. We solve this problem by assigning arbitrary weights and then updating them based on the error at the output end. This is known as the perceptron learning rule.
The sigmoid function
This is where things get a little more complex. The hardlim function is great for making binary predictions but a huge problem with it is, it is not differentiable at the threshold value. Here is the graph for it:
You can see that at the threshold value, it jumps from zero to one. As a solution to this, we can also use the sigmoid function. It is more useful for mapping of the probability of an event happening instead of just making the prediction. Just think about it: if the weighted sum is much higher than the threshold value, the output is definitely one. But, if it is slightly more than the threshold value, it is less likely to be one and vice versa. For this, we introduce the sigmoid function.
This gives us probability instead of a decision. The principle is to not hurry towards a decision but to gain a more accurate estimate of the likelihood of prediction. At exactly, the threshold value, the probability is half. This solves the problem of having an output of zero at threshold value which we had in hardlim function.
If the sigmoid function jibber jabber scares you, do not worry. You have learned the basics of a perceptron- thereby getting one step closer to mastering deep learning. The trick to learning machine learning is starting from the bottom. Not always relying on coding.
It might interest you that the perceptron is as far as we have gotten with deep learning. After this, we do not have a mathematically proven way of finding out how many perceptrons to use and in what context. We do not know how many layers to have for a neural network. The principal reason behind this ignorance is the fact that a neural network is modeled on our nervous system. And we do not know how our brain works! The functioning of the brain remains a mystery to medics and researchers across the world. Until we solve that, the perceptron is the only full-proof deep learning model we have got.
Grab this course, to learn more.