How to install and run Cuda aware MPI with Numba and send device (GPU) memory via MPI
- Install Ubuntu on usb link
- Eject the usb once formatted
- Use Disk Partition in Windows to free up space (> 100GB) for Ubuntu
- Turn off the computer
- Insert the USB stick
- Repeatedly tap F2 to enter BIOS
- In the BIOS go to the boot menu and select the USB stick
- Select Ubuntu (Safe Graphics). The regular "Ubuntu" doesn't render the installation process with my RTX 3090 graphics card
- Option B: Start Linux. If it freezes press "e" on Ubuntu and add
nomodeset
at the end ofLinux
and press Ctrl+x to continue reference link - Select Custom Installation type (Not "alongside Windows Boot Manager" because we manually partitioned space in step 2
- Click the "free space" we made earlier
- Click the + icon to create a partition
- Mount point "/" since we'll use only one mount point for everything Explanation. Note the amount of space you are using
- Scroll down and select the partition you just made (ex: "/dev/nvme0n1p5") which you can find by the amount of space you chose in the previous step
- Install stuff
- sudo apt-get install make
- sudo apt-get install gcc g++
- sudo apt-get install python3.8
- sudo apt-get install pip
- Install cuda using the debian installer or runfile installer installation guide
- Update path
- export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
- export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- export CUDA_HOME=/usr/local/cuda-11.4
- Install Open MPI
- openmpi-4.1.1.tar.gz
- tar -xzf openmpi-4.1.1.tar.gz
- cd openmpi-4.1.1
- ./configure --with-cuda=/usr/local/cuda-11.4
- sudo make all install
- If there are failures in the process (like missing make) delete the folder unzip again and repeat
- At this point you should be able to run
mpicc
andmpiexec
. If not you may need to add it to your path (look up it should show where it installed it to)- You may also need to set
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
where /usr/local/lib is where openmpi installed the libraries to if you get an error such asmpiexec: error while loading shared libraries: libopen-rte.so.40: cannot open shared object file: No such file or directory
- You may also need to set
ompi_info --parsable -l 9 --all | grep mpi_built_with_cuda_support:value
should showmca:mpi:base:param:mpi_built_with_cuda_support:value:true
if you're MPI has cuda support
- Install anaconda
- Download from the anaconda website
- bash ./Anaconda3-2021.11-Linux-x86_64.sh
- You probably don't want to have anaconda be initialized at startup as this which set aliases for pip and python
- Install numba
- Make sure CUDA_HOME is to the path that specifies the cuda that you build OpenMPI with numba cudatoolkit installation reference export CUDA_HOME= /usr/local/cuda-11.4
- conda install numba
- conda install cudatoolkit
- Run with cuda-aware MPI, you can send NDDeviceArrays over MPI! Device memory can be sent via MPI
- Note that you'll have to run with conda's version of python3 (ex: ~/anaconda/bin/python3) and install packages with conda's version of pip (ex: ~/anaconda/bin/pip3)
# ~/anaconda/bin/pi3 install mpi4py
from mpi4py import MPI
from numba import cuda
import numpy as np
@cuda.jit()
def kernel(array_on_gpu):
array_on_gpu[0] = 0.5 # FAST!
def main():
rank = MPI.COMM_WORLD.Get_rank()
if rank == 0:
input_array = np.zeros((100,), dtype=np.float64)
gpu_input_array = cuda.to_device(input_array)
MPI.COMM_WORLD.send(gpu_input_array.get_ipc_handle(), dest=1)
else:
handle = MPI.COMM_WORLD.recv(source=0)
received_gpu_input_array = handle.open() # FAST
# received_gpu_input_array.copy_to_host() # SLOW
kernel[32, 32](received_gpu_input_array)
# handle.close() # SLOW
print("Success!")
# mpirun -np 2 ~/anaconda/bin/python3 main.py
if __name__ == "__main__":
main()