You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you find confusing? Please describe.
I was searching for documentation regarding distributed training with own docker containers. The current documentation explains how to create containers or extend them to be able to use distributed training with the required modules installation guide , but its does not provide information on how to configure the Estimator class or any other launch parameters to start the distributed training as it does for PyTorch or Tensorflow classes.
What did you find confusing? Please describe.
I was searching for documentation regarding distributed training with own docker containers. The current documentation explains how to create containers or extend them to be able to use distributed training with the required modules installation guide , but its does not provide information on how to configure the Estimator class or any other launch parameters to start the distributed training as it does for PyTorch or Tensorflow classes.
Describe how documentation can be improved
Add text that describe how to launch the distributed training after creating or extending the docker image.
Do it at these sections:
https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api (here is a typo in the link that you should also fix, skd instead of sdk)
https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-bring-your-own-container
The text was updated successfully, but these errors were encountered: