Visualizing the Confusion Matrix
Summary of terminologies in Confusion Matrix
Table of Contents
Confusion Matrix is a matrix built for binary classification problems. It is an important starting tool in understanding how well a binary classifier is performing and provides a whole bunch of metrics to be analysed and compared.
Here, I present an intuitive visualization given that most of the times the definition gets confusing.
How to read the visualization?
Before we go ahead and read the visualization, let us remember the definitions.

True Negatives  All samples that were identified as negative labels and were truly negative

False Negatives  All samples that were identified as negative labels and were in fact positive

True Positives  All samples that were identified as positive labels and were truly positive

False Positives  All samples that were identified as positive labels and were in fact negative
Now, each array in the visualization above specifies the name of the metric that we are going to measure, and the start point of each ray represents the numerator of that metric and the span of the ray represents the summation of the adjacent terms. Note that each metric is essentially a fraction.
Let us read the most popular ones from the visualization.
$\text{Recall} = \frac{TP}{TP+FN}$ $\text{Precision} = \frac{TP}{TP+FP}$When are they useful?
These metrics come in handy when trying to determine the best threshold to separate the positive classes from the negative classes in a binary classification problem.
For instance, a popular tradeoff is the precisionrecall tradeoff which is realized in the graph below. Precision tends to be more wriggly by nature.
More simply we might just choose a Precision v/s Recall Curve. This curve shows that we still have scope for improvement towards the right as it suddenly shows a dip in precision with increase in recall.
Or another popular curve called the ROCCurve which maps between the True Positive Rate and False Positive Rate. It can also be seen as the Sensitivity v/s 1Specificity. The closer this curve is to the lefttop corner, the better the classifier. Or alternatively, the closer the curve is to the center line, the more likely it is to be just as good as a random classifier.
The scope of what is useful when is more sample dependent but these curves should be a good starting point in the analysis of the first binary classifier that one builds.