Input tensor is not an XLA tensor on AWS Trainium instance #8510

JmeanJmy · 2024-12-20T17:51:33Z

Hi team, I'm currently testing my training job on AWS Trainium instance. I encountered error Input tensor is not an XLA tensor: torch.FloatTensor when using pytorch Conv1d/Linear module. I’ve confirmed that the input tensor has been moved to xla as I explicitly called .to(xm.xla_device()) when passing the input tensor to the module forward method. However, I found out the error was actually caused by the weight and bias generated within those pytorch module, eg here: https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/conv.py#L375, I printed the device location for self.weght and self.bias and they are on cpu. I have to modify the source Conv1d code to resolve the issue, eg:

def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
    input = input.to(self.device)
    weight = weight.to(self.device)
    if bias is not None:
        bias = bias.to(self.device)

    if self.padding_mode != 'zeros':
        return F.conv1d(
            F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
            weight, bias, self.stride, _single(0), self.dilation, self.groups
        )
    return F.conv1d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)

Does anyone know how to make sure those are on the xla device?

The text was updated successfully, but these errors were encountered:

radna0 · 2024-12-25T21:17:55Z

Speaking from experience @JmeanJmy , Input tensor is not an XLA tensor is somewhat misleading as it can either mean the model or tensor are not on the xla device. Have you tried just moving the Conv.weight and Conv.bias (or just moving the whole Conv) to device if it's not on the device outside of the source code? I believe something this trivial should not need source code modification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input tensor is not an XLA tensor on AWS Trainium instance #8510

Input tensor is not an XLA tensor on AWS Trainium instance #8510

JmeanJmy commented Dec 20, 2024 •

edited

Loading

radna0 commented Dec 25, 2024 •

edited

Loading

Input tensor is not an XLA tensor on AWS Trainium instance #8510

Input tensor is not an XLA tensor on AWS Trainium instance #8510

Comments

JmeanJmy commented Dec 20, 2024 • edited Loading

radna0 commented Dec 25, 2024 • edited Loading

JmeanJmy commented Dec 20, 2024 •

edited

Loading

radna0 commented Dec 25, 2024 •

edited

Loading