Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial inductor on Windows CPU #3062

Merged
merged 18 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions prototype_source/inductor_windows_cpu.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
How to use TorchInductor on Windows CPU
=======================================

**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_



TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels.
This tutorial will guide you through the process of using TorchInductor on a Windows CPU.

.. grid:: 2

.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
:class-card: card-prerequisites

svekars marked this conversation as resolved.
Show resolved Hide resolved
* How to compile and execute a Python function with PyTorch, optimized for Windows CPU
* Basics of TorchInductor's optimization using C++/Triton kernels.

.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
:class-card: card-prerequisites

* PyTorch v2.5 or later
* Microsoft Visual C++ (MSVC)
* Miniforge for Windows

Install the Required Software
-----------------------------

First, let's install the required software. C++ compiler is required for TorchInductor optimization.
We will use Microsoft Visual C++ (MSVC) for this example.

1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.

2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software

.. note::

We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_.
Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_.

3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__.

Set Up the Environment
----------------------

#. Open the command line environment via ``cmd.exe``.
#. Activate ``MSVC`` with the following command:

.. code-block:: sh

"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
#. Activate ``conda`` with the following command:

.. code-block:: sh

"C:/ProgramData/miniforge3/Scripts/activate.bat"
#. Create and activate a customer conda environment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "custom" conda environment, shouldn't it?


.. code-block:: sh

conda create -n inductor_cpu_windows python=3.10 -y
conda activate inductor_cpu_windows

#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later.
svekars marked this conversation as resolved.
Show resolved Hide resolved

Using TorchInductor on Windows CPU
----------------------------------

Here’s a simple example to demonstrate how to use TorchInductor:

.. code-block:: python


import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
Comment on lines +75 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If function takes two arguments, should 2nd one be used somewhere? (i.e. y argument is currently unused in the codebase, is it?)

return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

The code above returns the following output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not, would it? As inputs are random


.. code-block:: sh

tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])

Using an Alternative Compiler for Better Performance
-------------------------------------------

To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
agunapal marked this conversation as resolved.
Show resolved Hide resolved

Intel Compiler
^^^^^^^^^^^^^^

#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version.
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``.

LLVM Compiler
^^^^^^^^^^^^^

#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version.
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``.

Conclusion
----------

In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed
further performance improvements with Intel Compiler and LLVM Compiler.
8 changes: 8 additions & 0 deletions prototype_source/prototype_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,13 @@ Prototype features are not available as part of binary distributions like PyPI o
:link: ../prototype/inductor_cpp_wrapper_tutorial.html
:tags: Model-Optimization

.. customcarditem::
:header: Inductor Windows CPU Tutorial
:card_description: Speed up your models with Inductor On Windows CPU
:image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
:link: ../prototype/inductor_windows_cpu.html
:tags: Model-Optimization

.. Distributed
.. customcarditem::
:header: Flight Recorder Tutorial
Expand Down Expand Up @@ -249,6 +256,7 @@ Prototype features are not available as part of binary distributions like PyPI o
prototype/flight_recorder_tutorial.html
prototype/graph_mode_dynamic_bert_tutorial.html
prototype/inductor_cpp_wrapper_tutorial.html
prototype/inductor_windows_cpu.html
prototype/pt2e_quantizer.html
prototype/pt2e_quant_ptq.html
prototype/pt2e_quant_qat.html
Expand Down
Loading