Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much time spent in 'Other' category in execution graph #779

Open
shradhasehgal opened this issue Jul 6, 2023 · 0 comments
Open

Too much time spent in 'Other' category in execution graph #779

shradhasehgal opened this issue Jul 6, 2023 · 0 comments
Labels
bug Something isn't working plugin PyTorch Profiler TensorBoard Plugin related

Comments

@shradhasehgal
Copy link

Hi, could you clarify what all the 'Other' category covers in the execution time graph? The script below spends nearly 60% of the time in the 'Other' category, but it is just performing PyTorch dataloading and model training.

I read the docs and it seems the existing categories provide good coverage (kernel, memcpy, communication, runtime, dataloader, CPU exec), so could you provide some examples for what kind of things the 'other' category could cover?

I am trying to understand what it is in the below script that causes it to spend majority of its time in 'Other' operations.

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.profiler import ProfilerActivity, profile, schedule

device = "cuda" if torch.cuda.is_available() else "cpu"
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
training_data = datasets.FashionMNIST(
        root='./data',
        train=True,
        download=True,
        transform=ToTensor())

dataloader = DataLoader(
        training_data,
        batch_size=32)

model: nn.Module = NeuralNetwork()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
model.to(device)
model.train()


with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    schedule=schedule(skip_first=1, wait=1, warmup=1, active=3, repeat=1),
    on_trace_ready=torch.profiler.tensorboard_trace_handler('/tmp/logs'),
    record_shapes=True,
    with_stack=True,
    profile_memory=True) as p:

    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        output = model(X)
        loss = loss_fn(output, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        p.step()
Screenshot 2023-07-05 at 8 36 11 PM
@aaronenyeshi aaronenyeshi added bug Something isn't working plugin PyTorch Profiler TensorBoard Plugin related labels Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working plugin PyTorch Profiler TensorBoard Plugin related
Projects
None yet
Development

No branches or pull requests

2 participants