Distances in Machine Learning
If you have ever wondered what the distance parameter(as highlighted in the image below) is and what are the different types of distance metrics used in Machine Learning, this blog will certainly help you.
The different types of distances used are:
- Euclidean Distance
- Manhattan Distance
- Minkowski Distance
- L1, L2 and Lp norms
- Hamming Distance
- Cosine Similarity and Cosine Distance
I am sure, as a ML beginner you have probably heard of some of these but did not fully understand all of them.
This is the most famous type of distance metric used in the whole world and we have been studying it from the school standards. For two points, P1(x1,y1) and P2(x2,y2) in a 2-dimensional coordinate space, the Euclidean Distance between the two points .
For the two points in n-D space,
This is another type of Distance which is widely used in Machine Learning. For two points, P1 (x1,y1) and P2 (x2, y2) in a 2-D coordinate space, the Manhattan Distance between them is,
For two points in n-D space,
It is basically the sum of the absolute values of the differences between the points in a particular axis.
But why is it called Manhattan? Imagine you are in a planned city where the city is divided into blocks. To go from one intersection to the other, what should be the distance? Or rather I should say, what will be the directions from one point to the other when asked a resident of that city? He might answer, “Go 6 blocks straight and 6 blocks left”.
But when is it preferred over Euclidean Distance?
This distance is preferred when the dimensionality of the space increases too much and computing the square of difference for all the dimensions becomes hectic and computationally expensive. Since Manhattan distance is just the sum of the differences as it reduces the computational complexity to a certain level than the Euclidean distance.
This distance is just complex by the name that is given to it. It is just the generalization of Euclidean and Manhattan Distances.
If you have carefully observed the equations of Euclidean and Manhattan distances, you would have noticed the only difference between them is the power that their differences are raised to.(ED is raised to 2 and MD is raised to 1).
Minkowski Distance is the distance raised to ‘p’. The equation is,
The general form of Minkowski Distance for two points in n-D is,
L1, L2 and Lp norms
These are just the fancy names for distances.
L1 indicates we are using Manhattan distance(L1 is because of the fact that MD is raised to 1).
L2 indicates Euclidean distance(L2 is because ED is raised to 2).
Lp indicates Minkowski Distance(Lp is because Minkowski Distance is raised to p ).
This distance is completely different from the distances that we have learnt till now. This distance involves comparison between two strings and finding out how different they are.
Let’s take an example to properly understand it. Consider the following two strings which are of the same length:
Hamming Distance is the number of mismatches between the characters at a particular position. In the above case, there are three mismatches so the Hamming Distance is 3.
Where are they used? They are generally used for Error Detection and Correction. But in order to do error detection we have to have the correct word present with us.
Where are they useful in Machine Learning? Well, let’s say you have built a model which can identify spelling mistakes in a piece of text. Hamming distance can be used to train the model to find those mistakes. And who knows what more can be done with this distance!
Cosine Similarity and Cosine Distance
Have you ever imagined how Amazon and Flipkart recommend similar types of products when you buy one? They use Cosine Distances to build their recommendation engines. But how do these distances help? Let’s start by discussing Cosine Similarity(cos-sim) first.
Basically, cos-sim = cos (theta) where (theta)= The angle between the vectors of the two points in space relative to the origin. Refer to the diagram below.
So the cos-sim can only be between -1 to 1(both included) as cos (theta) can be -1 to 1.
The greater the cos-sim, the greater is the similarity between them. Let’s take two points where one of them is placed on the x-axis and the other on the y-axis.
The cos-sim = cos (theta)= cos 90 = 0. This means that there is very less similarity between these two points.
Similarly, if the two points are on the x-axis but on the right hand side of the origin and the other on the left side of the origin, the cos-sim will be -1 as cos 180= -1 which means that the points are not similar at all.
If two points are on the same axis and on the same side of the origin, they are very very similar to each other.
Cosine Distance is basically (1- cos-sim). That means if the two points are similar, they will have less distance. But if the two points are dis-similar, they will have more distance.
Let’s make this concrete with an example. Let’s say we are building a recommendation engine for Flipkart. The first thing we do is assign tags for every product. Let’s say Smartphones and Mobile Covers will have a tag: “Electronics” and a Guitar will have a tag: “Instrument”. Then we draw a coordinate space for all the tags. Let’s say we put “Electronics” in the x-axis and “Instrument” in the y-axis. And then we put the products in the coordinate space.
If you find out the cosine distance between Guitar and Smartphone, it will be 1 as (1-Cos 90=1).
If you find out the cosine distance between Smartphone and Mobile Covers, it will be 0(zero) as (1- cos 0=0).
Since the distance between Guitar and Smartphone is greater than Smartphone and Mobile cover, a person buying a Smartphone will not get a recommendation to buy a Guitar but a Mobile Cover because they are similar to each other.
So Cosine Distances are measured to find out how much similar two products are. Here, we discussed only 2 dimensions. But the dimensions increase with the increase in the number of tags and we might assign multiple tags to a product to increase the efficiency of our recommendation engine.
The main question now is when should these distances be used. Well, it really depends on the algorithm that you are following. It is better to go for Euclidean Distance but if you see that the dimensions are too many, you can choose Manhattan distance to see if you get any increase in your accuracy. Cosine Similarity and Distance can be used to build systems where you want to find similar components among millions of components. But in Machine Learning, it really boils down to applying all the distances to see which one works better, which one gives better accuracy. Next time you see distances mentioned in a Model, try tweaking that parameter to achieve finer results.