-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParalleRunner hangs on Linux Server #4176
Comments
Can you provide some more context, if possible to share a simplified version of repository that we can try to reproduce locally. |
@noklam Yeah, of course. Let me try to put together something simple that will hang on the server and then I'll share the repo with you. |
@noklam Okay, I figured out why it is not working. I just don't understand why it doesn't work on Linux but it does on Windows. Here is a simple example: https://github.com/Dekermanjian/test-parallel-runner The reason it is not working on the linux server is because I am loading a parquet file in my settings.py file. When I load that file in the simple example the ParallelRunner will hang at the loading dataset stage. If you comment that line out (line 6) then it will work. You can generate the data by running the notebook I created. Sorry let me add the command to run: kedro run --runner=ParallelRunner -p data_processing |
Hey @noklam, were you able to reproduce the issue? I am wondering if this is just happening on my end based on how the Linux server I am using is set up. |
Hey @Dekermanjian, I was trying to reproduce this but I lack the input data, would you be able to commit it to your test project repo or share it with us? (assuming that it is sanitised and shareable) |
@ankatiyar, yes, I can do that now. I am sorry, I forgot to adjust the .gitignore before pushing. Okay, I have now pushed the data to the repo. A couple of things that I noticed while further testing. I noticed that in |
@Dekermanjian thanks for the quick response, I have been able to reproduce it on Gitpod which has linux but it runs just fine on my Mac M1 locally. It also works when I use |
Okay, perfect! That is also what I am experiencing. In my actual project, I create a dynamic pipeline that runs a model on patient level data every hour (one pipeline per patient). Some patients don't have any new data between hours so I read in a file in Thank you, for taking the time to reproduce this issue @ankatiyar |
Description
I have a pipeline that I would like to run using the ParallelRunner. When I run this pipeline on my local windows machine it works just fine. However, when I try running the exact same pipeline on a Linux server (Rocky Linux) it will just hang at the loading datasets stage.
pip show kedro
orkedro -V
): 0.19.8python -V
): 3.11.8The text was updated successfully, but these errors were encountered: