Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiny-cuda-nn wheel does not build in Docker image (it loops indefinitely without failing) #475

Open
violetamenendez opened this issue Oct 31, 2024 · 1 comment

Comments

@violetamenendez
Copy link

Hi,

I am trying to create a Docker image for nerfstudio based on this one: https://hub.docker.com/layers/dromni/nerfstudio/1.1.4/images/sha256-ff0107a7db96bb8ee29c638729328b832b268b890c50f2a2ff25988bb84d4f75?context=explore

But the tiny-cuda-nn wheel build loops forever, not failing, but also not succeeding, until the build times out.

I am following the installation instructions from nerfstudio here: https://github.com/nerfstudio-project/nerfstudio?tab=readme-ov-file#dependencies
Which coindices with the instructions in this tiny-cuda-nn repo. In fact, when I use a previous Docker image version, dromni/nerfstudio:0.1.16, with older version of the libraries and CUDA 11.7, it all works fine. The problematic Docker file is:

FROM dromni/nerfstudio:1.1.4
WORKDIR /
USER root
# Setup NeRFStudio
RUN cd /workspace && git clone https://github.com/nerfstudio-project/nerfstudio.git && \
    cd /workspace/nerfstudio && \
    pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 && \
    pip install ninja gsplat && \
    pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch && \
    pip install --upgrade pip setuptools && \
    pip install -e .

If I remove the installation of tiny-cuda-nn, everything else builds perfectly fine. Otherwise I get this log:

#5 174.6 Collecting git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
#5 174.6   Cloning https://github.com/NVlabs/tiny-cuda-nn/ to /tmp/pip-req-build-_rc_iady
#5 174.6   Running command git clone --filter=blob:none --quiet https://github.com/NVlabs/tiny-cuda-nn/ /tmp/pip-req-build-_rc_iady
#5 176.4   Resolved https://github.com/NVlabs/tiny-cuda-nn/ to commit c91138bcd4c6877c8d5e60e483c0581aafc70cce
#5 176.4   Running command git submodule update --init --recursive -q
#5 183.6   Preparing metadata (setup.py): started
#5 187.7   Preparing metadata (setup.py): finished with status 'done'
#5 187.9 Collecting ninja
#5 188.0   Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
#5 188.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 KB 2.4 MB/s eta 0:00:00
#5 188.2 Building wheels for collected packages: tinycudann
#5 188.2   Building wheel for tinycudann (setup.py): started
#5 278.6   Building wheel for tinycudann (setup.py): still running...
#5 592.7   Building wheel for tinycudann (setup.py): still running...
#5 777.5   Building wheel for tinycudann (setup.py): still running...
#5 1176.3   Building wheel for tinycudann (setup.py): still running...
#5 1270.0   Building wheel for tinycudann (setup.py): still running...
#5 1651.6   Building wheel for tinycudann (setup.py): still running...
#5 1917.0   Building wheel for tinycudann (setup.py): still running...
#5 2252.7   Building wheel for tinycudann (setup.py): still running...
#5 2339.5   Building wheel for tinycudann (setup.py): still running...
#5 2701.9   Building wheel for tinycudann (setup.py): still running...
#5 2940.4   Building wheel for tinycudann (setup.py): still running...
#5 3287.7   Building wheel for tinycudann (setup.py): still running...
#5 CANCELED
context canceled
ERROR: Job failed: execution took longer than 1h0m0s seconds

I passed the --verbose flag to pip and I got one numpy error early on (which does not make the job fail), and then looping through some warnings while the wheel tries to build indefinitely:

Numpy:

#6 156.8 Building wheels for collected packages: tinycudann
#6 156.8   Building wheel for tinycudann (setup.py): started
#6 156.8   Running command python setup.py bdist_wheel
#6 157.8 
#6 157.8   A module that was compiled using NumPy 1.x cannot be run in
#6 157.8   NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
#6 157.8   versions of NumPy, modules must be compiled with NumPy 2.0.
#6 157.8   Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
#6 157.8 
#6 157.8   If you are a user of the module, the easiest solution will be to
#6 157.8   downgrade to 'numpy<2' or try to upgrade the affected module.
#6 157.8   We expect that some modules will need time to support NumPy 2.
#6 157.8 
#6 157.8   Traceback (most recent call last):  File "<string>", line 2, in <module>
#6 157.8     File "<pip-setuptools-caller>", line 34, in <module>
#6 157.8     File "/tmp/pip-req-build-cm6ig4ie/bindings/torch/setup.py", line 9, in <module>
#6 157.8       import torch
#6 157.8     File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1382, in <module>
#6 157.8       from .functional import *  # noqa: F403
#6 157.8     File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 7, in <module>
#6 157.8       import torch.nn.functional as F
#6 157.8     File "/usr/local/lib/python3.10/dist-packages/torch/nn/__init__.py", line 1, in <module>
#6 157.8       from .modules import *  # noqa: F403
#6 157.8     File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/__init__.py", line 35, in <module>
#6 157.8       from .transformer import TransformerEncoder, TransformerDecoder, \
#6 157.8     File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py", line 20, in <module>
#6 157.8       device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
#6 157.8   /usr/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
#6 157.8     device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),

Warnings loop:

#6 189.3   [6/10] /usr/local/cuda/bin/nvcc  -I/tmp/pip-req-build-cm6ig4ie/include -I/tmp/pip-req-build-cm6ig4ie/dependencies -I/tmp/pip-req-build-cm6ig4ie/dependencies/cutlass/include -I/tmp/pip-req-build-cm6ig4ie/dependencies/cutlass/tools/util/include -I/tmp/pip-req-build-cm6ig4ie/dependencies/fmt/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-cm6ig4ie/src/object.cu -o /tmp/pip-req-build-cm6ig4ie/bindings/torch/src/object.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -DTCNN_PARAMS_UNALIGNED -DTCNN_MIN_GPU_ARCH=90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_90_C -D_GLIBCXX_USE_CXX11_ABI=0
#6 189.3   /tmp/pip-req-build-cm6ig4ie/dependencies/fmt/include/fmt/core.h(288): warning #1675-D: unrecognized GCC pragma
#6 189.3 
#6 189.3   /tmp/pip-req-build-cm6ig4ie/dependencies/fmt/include/fmt/core.h(288): warning #1675-D: unrecognized GCC pragma
#6 189.3 
#6 241.3   [7/10] c++ -MMD -MF /tmp/pip-req-build-cm6ig4ie/bindings/torch/build/temp.linux-x86_64-3.10/tinycudann/bindings.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-req-build-cm6ig4ie/include -I/tmp/pip-req-build-cm6ig4ie/dependencies -I/tmp/pip-req-build-cm6ig4ie/dependencies/cutlass/include -I/tmp/pip-req-build-cm6ig4ie/dependencies/cutlass/tools/util/include -I/tmp/pip-req-build-cm6ig4ie/dependencies/fmt/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp -o /tmp/pip-req-build-cm6ig4ie/bindings/torch/build/temp.linux-x86_64-3.10/tinycudann/bindings.o -std=c++17 -DTCNN_PARAMS_UNALIGNED -DTCNN_MIN_GPU_ARCH=90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_90_C -D_GLIBCXX_USE_CXX11_ABI=0
#6 241.3   In file included from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/Exceptions.h:14,
#6 241.3                    from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
#6 241.3                    from /usr/local/lib/python3.10/dist-packages/torch/include/torch/extension.h:9,
#6 241.3                    from /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp:34:
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<tcnn::cpp::LogSeverity>’:
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:2170:7:   required from ‘class pybind11::enum_<tcnn::cpp::LogSeverity>’
#6 241.3   /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp:283:52:   required from here
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:1496:7: warning: ‘pybind11::class_<tcnn::cpp::LogSeverity>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
#6 241.3    1496 | class class_ : public detail::generic_type {
#6 241.3         |       ^~~~~~
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<tcnn::cpp::Precision>’:
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:2170:7:   required from ‘class pybind11::enum_<tcnn::cpp::Precision>’
#6 241.3   /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp:292:48:   required from here
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:1496:7: warning: ‘pybind11::class_<tcnn::cpp::Precision>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<tcnn::cpp::Context>’:
#6 241.3   /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp:309:45:   required from here
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:1496:7: warning: ‘pybind11::class_<tcnn::cpp::Context>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<Module>’:
#6 241.3   /tmp/pip-req-build-cm6ig4ie/bindings/torch/tinycudann/bindings.cpp:316:32:   required from here
#6 241.3   /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/pybind11.h:1496:7: warning: ‘pybind11::class_<Module>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]

I have attached a longer log output for more context
tiny-cuda-nn-wheel-docker-log.txt

I cannot really make much sense of these logs, and I have ran out of ideas on how to debug this, so any help is very appreciated.
Thank you!

@j-nordling
Copy link

I am running into the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants