![]() |
What is the Confusion Matrix in Machine Learning? |
Confusion Matrix in Machine Learning
In the field of machine learning and explicitly the issue of
statistical classification, a confusion matrix, otherwise called an error
matrix.
A confusion matrix is a table that is frequently used to
portray the execution of an order model (or "classifier") on a lot of
test information for which the genuine qualities are known. It permits the
perception of the execution of a calculation.
It permits simple ID of confusion between classes, for
example, one class is usually mislabeled as the other. Most execution measures
are registered from the confusion matrix.
This article aims at:
1. What the confusion matrix is and why you have to utilize
it.
2. The most effective method to figure a confusion matrix
for a 2-class grouping issue sans preparation.
3. The most effective method to make a confusion matrix in
Python.
Confusion Matrix:
A confusion matrix is a rundown of forecast results on a
classification problem.
The quantity of right and mistaken forecasts are abridged
with tally esteems and separated by each class. This is the way to the
confusion matrix.
The confusion matrix demonstrates the manners by which your
characterization model is befuddled when it makes predictions.
It gives us knowledge not just into the errors being made by
a classifier however more critically the sorts of errors that are being made.
Here,
• Class 1: Positive
• Class 2: Negative
Definition of the Terms:
• Positive (P): Observation is positive (for example: is an
apple).
• Negative (N): Observation is not positive (for example:
is not an apple).
• True Positive (TP): Observation is positive, and is
predicted to be positive.
• False Negative (FN): Observation is positive, but is
predicted negative.
• True Negative (TN): Observation is negative, and is
predicted to be negative.
• False Positive (FP): Observation is negative, but is
predicted positive.
Classification Rate/Accuracy:
Classification Rate or Accuracy is given by the relation:
In any case, there are issues with precision. It acknowledges
identical costs for the two sorts of errors. A 99% precision can be astounding,
extraordinary, unremarkable, poor or awful depending on the issue.
Recall:
Recall can be characterized as the proportion of the all out
number of effectively grouped positive models partition to the all outnumber
of positive precedents. High Recall demonstrates the class is effectively
perceived (a small number of FN).
The recall is given by the relation:
Precision:
To get the value of precision we divide the total number of
correctly classified positive examples by the total number of predicted
positive examples. High Precision indicates an example of labelled as positive is
indeed positive (a small number of FP).
Precision is given by the relation:
High recall, low precision: This means that most of the
positive examples are correctly recognized (low FN) but there are a lot of
false positives.
Low recall, high precision: This shows that we miss a lot of
positive examples (high FN) but those we predict as positive are indeed
positive (low FP)
F-measure:
Since we have two measures (Precision and Recall) it has an estimation that speaks to the two. We ascertain an F-measure which utilizes
Harmonic Mean instead of Arithmetic Mean as it rebuffs the outrageous qualities
more.
The F-Measure will always be nearer to the smaller value of
Precision or Recall.
Let’s consider an example now, in which we have infinite
data elements of class B and a single element of class A and the model is
predicting class A against all the instances in the test data.
Here,
Precision : 0.0
Recall : 1.0
Now:
The arithmetic mean: 0.5
Harmonic mean: 0.0
When taking the arithmetic mean, it would have 50% correct.
Despite being the worst possible outcome! While taking the harmonic mean, the
F-measure is 0.
Example to interpret confusion matrix:
For the simplification of the above confusion matrix i have
added all the terms like TP,FP,etc and the row and column totals in the
following image:
Now,
Classification Rate/Accuracy:
Accuracy = (TP + TN) / (TP + TN + FP + FN)= (100+50)
/(100+5+10+50)= 0.90
Recall: Recall gives us an idea about when it’s
actually yes, how often does it predict yes.
Recall=TP / (TP + FN)=100/(100+5)=0.95
Precision: Precision tells us about when it predicts
yes, how often is it correct.
Precision = TP / (TP + FP)=100/ (100+10)=0.91
F-measure:
Fmeasure=(2*Recall*Precision)/(Recall+Presision)=(2*0.95*0.91)/(0.91+0.95)=0.92
Here is a python code which shows how to make a confusion
matrix on an anticipated model. For this, we need to import the confusion matrix
module from the sklearn library which encourages us to create the confusion matrix.
Below is the Python implementation of the above explanation :
Note that this program might not run on Geeksforgeeks IDE,
but it can run easily on your local python interpreter, provided, you have installed
the required libraries.
# Python script for confusion matrix creation.
from sklearn.metrics import
confusion_matrix
from sklearn.metrics import
accuracy_score
from sklearn.metrics import
classification_report
actual = [1, 1, 0, 1, 0, 0,
1, 0, 0, 0]
predicted = [1, 0, 0, 1, 0,
0, 1, 1, 1, 0]
results =
confusion_matrix(actual, predicted)
print 'Confusion Matrix :'
print(results)
print 'Accuracy Score
:',accuracy_score(actual, predicted)
print 'Report : '
print
classification_report(actual, predicted)
OUTPUT ->
Confusion Matrix :
[[4 2]
[1 3]]
Accuracy Score: 0.7
Report :
precision recall f1-score
support
0 0.80 0.67 0.73 6
1 0.60 0.75 0.67 4
avg / total
0.72 0.70 0.70 10
0 comments:
Post a Comment