Bias Variance Trade Off
If you are a Machine Learning Beginner, you might have heard about this term, “Bias Variance Trade Off”. The term might feel very heavy and hard to understand, but trust me, it is very very simple. This concept applies to all the Machine Learning and Deep Learning models, not just a particular model.
We will dive into what Bias and Variance really means but first let us look into these set of points that are plotted on the graph below:
Let us say that we have a two dimensional dataset where, based on the x-axis value we have to predict the y-axis value of a data point. As usual, we divide our dataset into two sets: Training and Testing Set. The training points are the blue circles and the testing points are the green diamonds.
Looking at the graph, we see that the data holds a curved pattern and we would want a model to generalize this curve. Let me show you what I mean.
Let’s say we have these three models:
Model 1: A very simple model which tries to fit a line in the data.
Model 2: A medium level model which tries to generalize the data
Model 3: A very complex model which tries to learn the training points
Now, which of these models would you like to have? Of course, Model 2, right? But let’s discuss the details of the matter.
Bias
Bias is defined as “the inability of a model to learn the patterns of a dataset”, which basically means that the model is unable to capture the true relationship in the dataset and is biased.
In this case, Model 1 can be said to have very high bias because it tries to fit a line instead of a curve which is showing its inability to capture the curvature of the dataset.
On the other hand, Model 3 is trying very hard to go through all the training points. We say that the model has very very low bias because it tries to learn all the training data points.
Model 2 plays a safe side by not passing through the training points though it captures the true relationship of the data. We say that this model has medium bias compared to Model 1 and Model 3.
Variance
Model 3 tries its best to learn all the training points but, in the meantime, it varies very much from the testing points. We say that the model has high variance because the predictions vary too much with small changes in input value is high compared to model 2 which has medium variance.
On the other hand, Model 1 can be said to have low variance because it does vary from the training as well as testing points but has least amount of change in the prediction value with a small amount of change in the input value.
Which model should we choose?
To summarize,
Model 3 has Low Bias but High Variance. It tries to learn all the training points but it will fail to predict the testing points accurately. You can think of this model as a Student who has mugged up the syllabus. When he is asked questions in a different manner, he fails to answer them. We surely don’t want such a model which memorizes the training points. We say that the model is Overfitting.
Model 1 has High Bias and Low Variance. You can think of this model as a casual guy who has tried to complete the syllabus one night before the exam and ultimately forgot everything at the exam hall. We definitely don’t want this model too. We say that the model is Underfitting.
Model 2 seems like a right choice because it has some bias, it has some variance but it captures the curvature nature of the data points. It is not good at predicting the testing points but also not bad at it either. So it is able to generalize the dataset. This is the guy who has not mugged up the syllabus but has understood the concept.
That is what Machine Learning Engineers do. They intentionally let some bias go into the model so that it does not mug up the training points. We really do not want a model whose Training Accuracy is high but Testing accuracy is poor. We try to find the Sweet Spot between the bias and the variance of the model and hence the term, “Bias Variance Trade Off”.
But the determination of how much bias and how much variance a model should have is very hard and often relies on the experience of the ML Engineer. Of course, we do have techniques which help regulate this factor.
For example:
- Regularization
- Bagging
- Boosting, are some of them.
This concept of “Bias Variance Trade Off” is intuitional and a beginner might have applied this without even knowing that this term exists. Now a beginner can apply techniques to regulate this factor without manually changing the model parameters to find the sweet spot.
Hope you have learnt something useful from this blog. Please consider clapping and sharing it. Thank you.
References:
ML Mentorship Programme, CAMPUS{X} , Kolkata — Nitish Singh.