pytorch · svekars · Sep 30, 2024 · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024
diff --git a/prototype_source/inductor_windows_cpu.rst b/prototype_source/inductor_windows_cpu.rst
@@ -0,0 +1,128 @@
+How to use TorchInductor on Windows CPU
+=======================================
+
+**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_
+
+
+
+TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels.
+This tutorial will guide you through the process of using TorchInductor on a Windows CPU.
+
+.. grid:: 2
+
+    .. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
+       :class-card: card-prerequisites
+
+       * How to compile and execute a Python function with PyTorch, optimized for Windows CPU
+       * Basics of TorchInductor's optimization using C++/Triton kernels.
+
+    .. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
+       :class-card: card-prerequisites
+
+       * PyTorch v2.5 or later
+       * Microsoft Visual C++ (MSVC)
+       * Miniforge for Windows
+
+Install the Required Software
+-----------------------------
+
+First, let's install the required software. C++ compiler is required for TorchInductor optimization.
+We will use Microsoft Visual C++ (MSVC) for this example. 
+
+1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
+
+2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software
+
+.. note::
+
+     We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_.
+     Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_.
+
+3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__.
+
+Set Up the Environment
+----------------------
+
+#. Open the command line environment via ``cmd.exe``.
+#. Activate ``MSVC`` with the following command:
+
+   .. code-block:: sh
+
+    "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
+#. Activate ``conda`` with the following command:
+
+   .. code-block:: sh
+
+    "C:/ProgramData/miniforge3/Scripts/activate.bat"
+#. Create and activate a customer conda environment:
+
+   .. code-block:: sh
+
+    conda create -n inductor_cpu_windows python=3.10 -y 
+    conda activate inductor_cpu_windows
+
+#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later.
+
+Using TorchInductor on Windows CPU
+----------------------------------
+
+Here’s a simple example to demonstrate how to use TorchInductor:
+
+.. code-block:: python
+
+
+    import torch
+    def foo(x, y):
+        a = torch.sin(x)
+        b = torch.cos(x)
+        return a + b
+    opt_foo1 = torch.compile(foo)
+    print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
+
+The code above returns the following output: 
+
+.. code-block:: sh
+
+    tensor([[-3.9074e-02,  1.3994e+00,  1.3894e+00,  3.2630e-01,  8.3060e-01,
+            1.1833e+00,  1.4016e+00,  7.1905e-01,  9.0637e-01, -1.3648e+00],
+            [ 1.3728e+00,  7.2863e-01,  8.6888e-01, -6.5442e-01,  5.6790e-01,
+            5.2025e-01, -1.2647e+00,  1.2684e+00, -1.2483e+00, -7.2845e-01],
+            [-6.7747e-01,  1.2028e+00,  1.1431e+00,  2.7196e-02,  5.5304e-01,
+            6.1945e-01,  4.6654e-01, -3.7376e-01,  9.3644e-01,  1.3600e+00],
+            [-1.0157e-01,  7.7200e-02,  1.0146e+00,  8.8175e-02, -1.4057e+00,
+            8.8119e-01,  6.2853e-01,  3.2773e-01,  8.5082e-01,  8.4615e-01],
+            [ 1.4140e+00,  1.2130e+00, -2.0762e-01,  3.3914e-01,  4.1122e-01,
+            8.6895e-01,  5.8852e-01,  9.3310e-01,  1.4101e+00,  9.8318e-01],
+            [ 1.2355e+00,  7.9290e-02,  1.3707e+00,  1.3754e+00,  1.3768e+00,
+            9.8970e-01,  1.1171e+00, -5.9944e-01,  1.2553e+00,  1.3394e+00],
+            [-1.3428e+00,  1.8400e-01,  1.1756e+00, -3.0654e-01,  9.7973e-01,
+            1.4019e+00,  1.1886e+00, -1.9194e-01,  1.3632e+00,  1.1811e+00],
+            [-7.1615e-01,  4.6622e-01,  1.2089e+00,  9.2011e-01,  1.0659e+00,
+            9.0892e-01,  1.1932e+00,  1.3888e+00,  1.3898e+00,  1.3218e+00],
+            [ 1.4139e+00, -1.4000e-01,  9.1192e-01,  3.0175e-01, -9.6432e-01,
+            -1.0498e+00,  1.4115e+00, -9.3212e-01, -9.0964e-01,  1.0127e+00],
+            [ 5.7244e-04,  1.2799e+00,  1.3595e+00,  1.0907e+00,  3.7191e-01,
+            1.4062e+00,  1.3672e+00,  6.8502e-02,  8.5216e-01,  8.6046e-01]])
+
+Using an Alternative Compiler for Better Performance
+-------------------------------------------
+
+To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
+
+Intel Compiler
+^^^^^^^^^^^^^^
+
+#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version.
+#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``.
+
+LLVM Compiler
+^^^^^^^^^^^^^
+
+#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version.
+#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``.
+
+Conclusion
+----------
+
+In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed
+further performance improvements with Intel Compiler and LLVM Compiler.
diff --git a/prototype_source/prototype_index.rst b/prototype_source/prototype_index.rst
@@ -217,6 +217,13 @@ Prototype features are not available as part of binary distributions like PyPI o
    :link: ../prototype/inductor_cpp_wrapper_tutorial.html
    :tags: Model-Optimization
 
+.. customcarditem::
+   :header: Inductor Windows CPU Tutorial
+   :card_description: Speed up your models with Inductor On Windows CPU
+   :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
+   :link: ../prototype/inductor_windows_cpu.html
+   :tags: Model-Optimization
+
 .. Distributed
 .. customcarditem::
    :header: Flight Recorder Tutorial
@@ -249,6 +256,7 @@ Prototype features are not available as part of binary distributions like PyPI o
    prototype/flight_recorder_tutorial.html
    prototype/graph_mode_dynamic_bert_tutorial.html
    prototype/inductor_cpp_wrapper_tutorial.html
+   prototype/inductor_windows_cpu.html
    prototype/pt2e_quantizer.html
    prototype/pt2e_quant_ptq.html
    prototype/pt2e_quant_qat.html