Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs return description of binary_confusion_matrix incorrect #183

Open
ChrisBog-MV opened this issue Sep 18, 2023 · 2 comments
Open

Docs return description of binary_confusion_matrix incorrect #183

ChrisBog-MV opened this issue Sep 18, 2023 · 2 comments

Comments

@ChrisBog-MV
Copy link

ChrisBog-MV commented Sep 18, 2023

Not sure if I'm being daft here but I think this line is incorrect:

Compute binary confusion matrix, a 2 by 2 tensor with counts ( (true positive, false negative) , (false positive, true negative) )

It says that the returned tensor contains the values:

( (true positive, false negative) , (false positive, true negative) )

But if you look at the examples, it shows (e.g.):

>>> input = torch.tensor([0, 1, 0.7, 0.6])
>>> target = torch.tensor([0, 1, 1, 0])
>>> binary_confusion_matrix(input, target)
        tensor([[1, 1],
                [0, 2]])

Which for those inputs, I count:

tn = 1 #indices = 0
tp = 2 #indices = 1,2
fp = 1 #indices = 3
fn = 0

So I believe that would mean the actual tensor being returned is either [[fp, tn], [fn, tp]] or [[tn, fp], [fn, tp]]. From my own experiments, I'm pretty sure it's the latter.

Been scratching my head all morning about why my results look wrong and I think this is why.

@bobakfb
Copy link
Contributor

bobakfb commented Sep 18, 2023

Hi @ChrisBog-MV Thanks for pointing this out!

@JKSenthil I went through and looked at this. The issue is that we are following sklearn's convention of using

TN | FP
FN | TP

which is not standard for binary confusion matrices.

I would suggest we flip the rows and columns here with a bit of slicing

return _confusion_matrix_compute(matrix, normalize)[[1,0]][:, [1,0]]

tests and docstring examples will need to be updated as well. But basically we will then return

TP | FN
FP | TN

which is standard.

@anordin95
Copy link

anordin95 commented Jan 6, 2024

Hi!

+1. Also, I made an example that illustrates the problem with expected unique counts for each category (i.e. FP, FN, etc.)

confusion_matrix_example.py

import torcheval.metrics
import numpy as np

binary_confusion_matrix_metric = torcheval.metrics.BinaryConfusionMatrix()
# Intentionally construct a dataset that should evaluate to these counts:
# TP: 4; TN: 3; FP: 2; FN: 1.
predictions_and_labels = np.array([
    # False negatives:
    (0, 1),
    # False positives:
    (1, 0),
    (1, 0),
    # True negatives:
    (0, 0),
    (0, 0),
    (0, 0),
    # True positives:
    (1, 1),
    (1, 1),
    (1, 1),
    (1, 1),
])

predictions = predictions_and_labels[: , 0]
labels = predictions_and_labels[: , 1]

binary_confusion_matrix_metric.update(torch.from_numpy(predictions), torch.from_numpy(labels))
binary_confusion_matrix = binary_confusion_matrix_metric.compute()
print(binary_confusion_matrix)

# The torcheval docs indicate the return value should be:
# ( (true positive, false negative) , (false positive, true negative) )
# So, we'd expect ( (4, 1), (2, 3) ).

# However, we don't observe what we expect!
# Instead I think the docs should say:
# ( (true negative, false positive) , (false negative, true positive) )

$ python confusion_matrix_example.py

tensor([[3., 2.],
        [1., 4.]])

PS: a torch.flip(binary_confusion_matrix, dims=[0, 1]) would also do the trick and perhaps be a bit more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants