Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to recognize GPU address - Spark Distributor Tensorflow #171

Open
ghost opened this issue Sep 22, 2020 · 0 comments
Open

Unable to recognize GPU address - Spark Distributor Tensorflow #171

ghost opened this issue Sep 22, 2020 · 0 comments

Comments

@ghost
Copy link

ghost commented Sep 22, 2020

I currently have a local spark cluster 3.0 which consists of 3 machines. Two machines have 2 NVIDIA GPUS and One machine is the spark client master which has no NVIDIA GPU.
When I create a spark cluster, I see it recognizes the GPUs as resources on the dashboard.
I'm trying to run the example posted for the Spark Distributor Tensorflow page.
When I create a spark context:

sc = pyspark.SparkContext(master = "spark://192.168.1.113:7077", 
                         appName="Spark GPU"
                          )

I see that the GPUs are being utilized as resource executors.

However, when I run the following:

MirroredStrategyRunner(num_slots=8).run(train)

It results in the following errors:

raise ValueError(f'Found GPU addresses {addresses} which '
ValueError: Found GPU addresses [''] which are not all in the correct format for CUDA_VISIBLE_DEVICES, which requires integers with no zero padding.

I'm not sure why it wasn't able to detect the GPUs on the remote machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants