Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute principal components slow on Windows #3398

Open
zm711 opened this issue Sep 11, 2024 · 4 comments
Open

Compute principal components slow on Windows #3398

zm711 opened this issue Sep 11, 2024 · 4 comments
Labels
continuous integration Related to CI performance Performance issues/improvements qualitymetrics Related to qualitymetrics module

Comments

@zm711
Copy link
Collaborator

zm711 commented Sep 11, 2024

#3249

Based on @chrishalcrow testing computing PCA on windows is an extremely slow step in our testing. I know the current implementation goes straight to ProcessPoolExecutor so maybe we need to revisit this and I can test locally on Windows? @alejoe91 ?

@zm711 zm711 added qualitymetrics Related to qualitymetrics module continuous integration Related to CI performance Performance issues/improvements labels Sep 11, 2024
@alejoe91
Copy link
Member

Thanks for writing this up @zm711.

I think that the problem could also be an interaction between processes and threads. Sklearn will by default try to max out the number of threads, but we add our layer of process parallelization. In the ChunkRecordingExecutor we hav an additional max_threads_per_process arg, but the machinery is a bit more complicated. I think we should give it a try and see if it fixes the issue

@zm711
Copy link
Collaborator Author

zm711 commented Sep 12, 2024

Let me link this where Chris saw this happening on his Windows machine too for newer versions of sklearn and not older.
#2817

@zm711
Copy link
Collaborator Author

zm711 commented Sep 12, 2024

But that could be cool if it speeds things up on Windows since that is a big workflow and testing bottle neck. I haven't dug deeply into the PCA code to see how complicated it would be :)

@HWWiggins
Copy link

Please do try to reduce the execution time of the PCA computation in the export to Phy process. That will be a significant help to the compute pipeline on Windows machines. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continuous integration Related to CI performance Performance issues/improvements qualitymetrics Related to qualitymetrics module
Projects
None yet
Development

No branches or pull requests

3 participants