You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use Case Description:
This example demonstrates how to use TensorBoard with Amazon SageMaker JumpStart to visualize training metrics, such as loss curves, while training a LLaMA3 (8B model for testing purposes). TensorBoard will be integrated to export and monitor the loss curves during training.
Steps to Use TensorBoard with SageMaker JumpStart
Set up your SageMaker environment:
Launch a SageMaker notebook instance with the necessary permissions to interact with SageMaker JumpStart and S3.
Install TensorBoard:
On your SageMaker notebook, install TensorBoard if not already installed:
bash
Copy code
pip install tensorboard
Select a Model from SageMaker JumpStart:
Use SageMaker JumpStart to load a pre-trained LLaMA3 model (8B) or start training from scratch.
Example:
python
Copy code
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models, retrieve_jumpstart_training_uri
List available JumpStart models (search for LLaMA3)
models = list_jumpstart_models()
print([model for model in models if "llama3" in model.lower()])
Retrieve model training URI
training_uri = retrieve_jumpstart_training_uri(model_id="huggingface-llama3-8B", region="us-west-2")
Prepare the Dataset:
Use your own dataset stored in S3. Ensure the dataset is formatted correctly for the LLaMA3 model. For example:
python
Copy code
dataset_s3_uri = "s3://your-bucket/your-dataset/"
Modify the Training Script:
Adapt the SageMaker training script to log metrics compatible with TensorBoard. For example, add TensorBoard logging using torch.utils.tensorboard.SummaryWriter:
python
Copy code
from torch.utils.tensorboard import SummaryWriter
for epoch in range(num_epochs):
for batch_idx, batch in enumerate(train_dataloader):
loss = model.training_step(batch)
writer.add_scalar("Loss/train", loss.item(), epoch * len(train_dataloader) + batch_idx)
Launch Training in SageMaker:
Start the training job on SageMaker with TensorBoard configured to log outputs to S3.
Example:
python
Copy code
from sagemaker.pytorch import PyTorch
Define the estimator
pytorch_estimator = PyTorch(
entry_point="train.py", # Your training script
source_dir="src", # Directory containing training script
role="SageMakerRole",
instance_count=1,
instance_type="ml.p3.16xlarge", # Adjust based on LLaMA3 size
framework_version="1.12.1",
py_version="py38",
hyperparameters={
"epochs": 5,
"batch_size": 16
},
output_path="s3://your-bucket/tensorboard-logs/",
)
After training, download the TensorBoard logs from S3 to your local machine or directly use SageMaker Studio.
Start TensorBoard and point it to the logs directory:
bash
Copy code
tensorboard --logdir=s3://your-bucket/tensorboard-logs/
Monitor Loss Curves:
Open the TensorBoard web UI (e.g., http://localhost:6006/), and you should see the loss curves and other metrics.
Involved Services:
SageMaker JumpStart: Model training and deployment.
TensorBoard: Visualization of training metrics.
S3: Storage of datasets and TensorBoard logs.
Dataset:
Use your custom dataset uploaded to an S3 bucket (e.g., s3://your-bucket/your-dataset/).
This approach ensures you can monitor loss curves and other training metrics effectively while using SageMaker JumpStart and TensorBoard. Let me know if you'd like more details on specific steps!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Use Case Description:
This example demonstrates how to use TensorBoard with Amazon SageMaker JumpStart to visualize training metrics, such as loss curves, while training a LLaMA3 (8B model for testing purposes). TensorBoard will be integrated to export and monitor the loss curves during training.
Steps to Use TensorBoard with SageMaker JumpStart
Set up your SageMaker environment:
Launch a SageMaker notebook instance with the necessary permissions to interact with SageMaker JumpStart and S3.
Install TensorBoard:
On your SageMaker notebook, install TensorBoard if not already installed:
bash
Copy code
pip install tensorboard
Select a Model from SageMaker JumpStart:
Use SageMaker JumpStart to load a pre-trained LLaMA3 model (8B) or start training from scratch.
Example:
python
Copy code
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models, retrieve_jumpstart_training_uri
List available JumpStart models (search for LLaMA3)
models = list_jumpstart_models()
print([model for model in models if "llama3" in model.lower()])
Retrieve model training URI
training_uri = retrieve_jumpstart_training_uri(model_id="huggingface-llama3-8B", region="us-west-2")
Prepare the Dataset:
Use your own dataset stored in S3. Ensure the dataset is formatted correctly for the LLaMA3 model. For example:
python
Copy code
dataset_s3_uri = "s3://your-bucket/your-dataset/"
Modify the Training Script:
Adapt the SageMaker training script to log metrics compatible with TensorBoard. For example, add TensorBoard logging using torch.utils.tensorboard.SummaryWriter:
python
Copy code
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir="/opt/ml/output/tensorboard")
for epoch in range(num_epochs):
for batch_idx, batch in enumerate(train_dataloader):
loss = model.training_step(batch)
writer.add_scalar("Loss/train", loss.item(), epoch * len(train_dataloader) + batch_idx)
Launch Training in SageMaker:
Start the training job on SageMaker with TensorBoard configured to log outputs to S3.
Example:
python
Copy code
from sagemaker.pytorch import PyTorch
Define the estimator
pytorch_estimator = PyTorch(
entry_point="train.py", # Your training script
source_dir="src", # Directory containing training script
role="SageMakerRole",
instance_count=1,
instance_type="ml.p3.16xlarge", # Adjust based on LLaMA3 size
framework_version="1.12.1",
py_version="py38",
hyperparameters={
"epochs": 5,
"batch_size": 16
},
output_path="s3://your-bucket/tensorboard-logs/",
)
Start training
pytorch_estimator.fit({"train": dataset_s3_uri})
Access TensorBoard Logs:
After training, download the TensorBoard logs from S3 to your local machine or directly use SageMaker Studio.
Start TensorBoard and point it to the logs directory:
bash
Copy code
tensorboard --logdir=s3://your-bucket/tensorboard-logs/
Monitor Loss Curves:
Open the TensorBoard web UI (e.g., http://localhost:6006/), and you should see the loss curves and other metrics.
Involved Services:
SageMaker JumpStart: Model training and deployment.
TensorBoard: Visualization of training metrics.
S3: Storage of datasets and TensorBoard logs.
Dataset:
Use your custom dataset uploaded to an S3 bucket (e.g., s3://your-bucket/your-dataset/).
This approach ensures you can monitor loss curves and other training metrics effectively while using SageMaker JumpStart and TensorBoard. Let me know if you'd like more details on specific steps!
Beta Was this translation helpful? Give feedback.
All reactions