Visualizing the Confusion Matrix

Summary of terminologies in Confusion Matrix

Oct 15, 2017

2 mins

Confusion Matrix is a matrix built for binary classification problems. It is an important starting tool in understanding how well a binary classifier is performing and provides a whole bunch of metrics to be analysed and compared.

Here, I present an intuitive visualization given that most of the times the definition gets confusing.

How to read the visualization?🔗

Before we go ahead and read the visualization, let us remember the definitions.

True Negatives - All samples that were identified as negative labels and were truly negative
False Negatives - All samples that were identified as negative labels and were in fact positive
True Positives - All samples that were identified as positive labels and were truly positive
False Positives - All samples that were identified as positive labels and were in fact negative

Now, each array in the visualization above specifies the name of the metric that we are going to measure, and the start point of each ray represents the numerator of that metric and the span of the ray represents the summation of the adjacent terms. Note that each metric is essentially a fraction.

Let us read the most popular ones from the visualization.

\text{Recall} = \frac{TP}{TP+FN}

\text{Precision} = \frac{TP}{TP+FP}

When are they useful?🔗

These metrics come in handy when trying to determine the best threshold to separate the positive classes from the negative classes in a binary classification problem.

For instance, a popular trade-off is the precision-recall trade-off which is realized in the graph below. Precision tends to be more wriggly by nature.

More simply we might just choose a Precision v/s Recall Curve. This curve shows that we still have scope for improvement towards the right as it suddenly shows a dip in precision with increase in recall.

Or another popular curve called the ROC-Curve which maps between the True Positive Rate and False Positive Rate. It can also be seen as the Sensitivity v/s 1-Specificity. The closer this curve is to the left-top corner, the better the classifier. Or alternatively, the closer the curve is to the center line, the more likely it is to be just as good as a random classifier.

The scope of what is useful when is more sample dependent but these curves should be a good starting point in the analysis of the first binary classifier that one builds.