Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] correct logic k-NN algos kneighbors() call when algorithm='brute' and fit with GPU #2056

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Sep 17, 2024

Description

Calling k-NN algorithms kneighbors() after fitting with GPU and algorithm = 'brute' will assume that it was fit using daal4py, and will fail in yielding kneighbors. This adds corrections in that check, and a test for kneighbors without input.

kneighbors is a bit special because it doesn't require an input to yield numerical results (re-using the fitted X values). The fit can occur in the daal4py backend or onedal backend. The check for predict and kneighbors now will check for which train object was generated, and will use it.


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.

@icfaust
Copy link
Contributor Author

icfaust commented Sep 17, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Sep 18, 2024

onedal knn implementation is a nightmare

@icfaust icfaust marked this pull request as ready for review September 19, 2024 04:57
@icfaust
Copy link
Contributor Author

icfaust commented Sep 20, 2024

/intelci: run

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @icfaust.
Just a questions: could we validate the perf, since the flow is changed a little bit?

onedal/neighbors/neighbors.py Outdated Show resolved Hide resolved
@icfaust
Copy link
Contributor Author

icfaust commented Sep 25, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Sep 25, 2024

/intelci: run

@icfaust icfaust added the bug Something isn't working label Oct 7, 2024
Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icfaust I am good with the changes proposed. Just minor comments
I recommend wait CI back for the last validation

result = neigh.kneighbors(test, 2, return_distance=False)
result = _as_numpy(result)
assert "sklearnex" in neigh.__module__
assert_allclose(result, [[2, 0]])
result = neigh.kneighbors()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding: could you please explain why this is required after the assertion call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All use of kneighbors in test_neighbors passes a value to kwarg X. The datatype and queue are always extracted from it. However, that doesn't cover the case when the default X=None is used. This addition makes sure that the default use of kneighbors runs, though the output is not checked via an assert_allclose. This closes a testing gap.

Copy link
Contributor

@samir-nasibli samir-nasibli Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icfaust thank you! I think make sense to add notes in comment section, since not that obvious in the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants