Confusion Matrix is a matrix built for binary classification problems.
It is an important starting tool in understanding how well a binary
classifier is performing and provides a whole bunch of metrics to be
analysed and compared.

Here, I present an intuitive visualization given that most of the times
the definition gets confusing.

How to read the visualization?

Before we go ahead and read the visualization, let us remember the definitions.

True Negatives - All samples that were identified as negative labels and
were truly negative

False Negatives - All samples that were identified as negative labels and
were in fact positive

True Positives - All samples that were identified as positive labels and
were truly positive

False Positives - All samples that were identified as positive labels and
were in fact negative

Now, each array in the visualization above specifies the name of the metric that
we are going to measure, and the start point of each ray represents the
numerator of that metric and the span of the ray represents the summation
of the adjacent terms. Note that each metric is essentially a fraction.

Let us read the most popular ones from the visualization.

$\text{Recall} = \frac{TP}{TP+FN}$

$\text{Precision} = \frac{TP}{TP+FP}$

When are they useful?

These metrics come in handy when trying to determine the best threshold
to separate the positive classes from the negative classes in a binary
classification problem.

For instance, a popular trade-off is the precision-recall trade-off which
is realized in the graph below. Precision tends to be more wriggly by nature.

More simply we might just choose a Precision v/s Recall Curve. This curve
shows that we still have scope for improvement towards the right as it
suddenly shows a dip in precision with increase in recall.

Or another popular curve called the ROC-Curve which maps between the
True Positive Rate and False Positive Rate. It can also be seen
as the Sensitivity v/s 1-Specificity. The closer this curve is
to the left-top corner, the better the classifier. Or alternatively,
the closer the curve is to the center line, the more likely it is to be
just as good as a random classifier.

The scope of what is useful when is more sample dependent but these
curves should be a good starting point in the analysis of the first
binary classifier that one builds.