You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank0]: Traceback (most recent call last): [rank0]: File "dlrm_main.py", line 729, in <module> [rank0]: invoke_main() # pragma: no cover [rank0]: File "dlrm_main.py", line 725, in invoke_main [rank0]: main(sys.argv[1:]) [rank0]: File "dlrm_main.py", line 710, in main [rank0]: train_val_test( [rank0]: File "dlrm_main.py", line 482, in train_val_test [rank0]: _train( [rank0]: File "dlrm_main.py", line 429, in _train [rank0]: pipeline.progress(batched_iterator) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 306, in progress [rank0]: self.fill_pipeline(dataloader_iter) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 290, in fill_pipeline [rank0]: self._init_pipelined_modules( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 388, in _init_pipelined_modules [rank0]: self._pipeline_model(batch, context, pipelined_forward) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 367, in _pipeline_model [rank0]: self.start_sparse_data_dist(batch, context) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 441, in start_sparse_data_dist [rank0]: _start_data_dist(self._pipelined_modules, batch, context) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/utils.py", line 435, in _start_data_dist [rank0]: context.input_dist_splits_requests[forward.name] = module.input_dist( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/embeddingbag.py", line 1021, in input_dist [rank0]: awaitables.append(input_dist(features_by_shard)) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/sharding/rw_sharding.py", line 316, in forward [rank0]: ) = bucketize_kjt_before_all2all( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/embedding_sharding.py", line 241, in bucketize_kjt_before_all2all [rank0]: ) = torch.ops.fbgemm.block_bucketize_sparse_features( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/_ops.py", line 1061, in __call__ [rank0]: return self_._op(*args, **(kwargs or {})) [rank0]: RuntimeError: CUDA error: no kernel image is available for execution on the device [rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. [rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 [rank0]: Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.
However pytorch can see gpus:
The text was updated successfully, but these errors were encountered:
Hi @uncle-sann, I think the CUDA error: no kernel image is available for execution on the device error you are running into indicates that your versions of pytorch and CUDA are incompatible. Please refer to pytorch/pytorch#31285. Thanks!
[rank0]: Traceback (most recent call last): [rank0]: File "dlrm_main.py", line 729, in <module> [rank0]: invoke_main() # pragma: no cover [rank0]: File "dlrm_main.py", line 725, in invoke_main [rank0]: main(sys.argv[1:]) [rank0]: File "dlrm_main.py", line 710, in main [rank0]: train_val_test( [rank0]: File "dlrm_main.py", line 482, in train_val_test [rank0]: _train( [rank0]: File "dlrm_main.py", line 429, in _train [rank0]: pipeline.progress(batched_iterator) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 306, in progress [rank0]: self.fill_pipeline(dataloader_iter) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 290, in fill_pipeline [rank0]: self._init_pipelined_modules( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 388, in _init_pipelined_modules [rank0]: self._pipeline_model(batch, context, pipelined_forward) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 367, in _pipeline_model [rank0]: self.start_sparse_data_dist(batch, context) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/train_pipelines.py", line 441, in start_sparse_data_dist [rank0]: _start_data_dist(self._pipelined_modules, batch, context) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/train_pipeline/utils.py", line 435, in _start_data_dist [rank0]: context.input_dist_splits_requests[forward.name] = module.input_dist( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/embeddingbag.py", line 1021, in input_dist [rank0]: awaitables.append(input_dist(features_by_shard)) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/sharding/rw_sharding.py", line 316, in forward [rank0]: ) = bucketize_kjt_before_all2all( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torchrec/distributed/embedding_sharding.py", line 241, in bucketize_kjt_before_all2all [rank0]: ) = torch.ops.fbgemm.block_bucketize_sparse_features( [rank0]: File "/home/john/miniconda3/envs/tfrecsys/lib/python3.8/site-packages/torch/_ops.py", line 1061, in __call__ [rank0]: return self_._op(*args, **(kwargs or {})) [rank0]: RuntimeError: CUDA error: no kernel image is available for execution on the device [rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. [rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 [rank0]: Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
However pytorch can see gpus:
The text was updated successfully, but these errors were encountered: