Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Basic linux commands (nano, htop) cause segmentation faults in a dev image #489

Open
rohitdwivedula opened this issue Aug 14, 2022 · 1 comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@rohitdwivedula
Copy link

Describe the bug
This is the problematic docker image: rapidsai/rapidsai-core-dev:22.06-cuda11.5-devel-ubuntu20.04-py3.9. After pulling this image and running this container using the instructions given in the release selector, trying to use basic commands like nano or htop from the command line results in `Segmentation fault (core dumped).

Steps/Code to reproduce bug

docker pull rapidsai/rapidsai-core-dev:22.06-cuda11.5-devel-ubuntu20.04-py3.9
sudo docker run --gpus all --rm -it  --shm-size=1g --ulimit memlock=-1  -p 8888:8888 -p 8787:8787 -p 8786:8786     rapidsai/rapidsai-core-dev:22.06-cuda11.5-devel-ubuntu20.04-py3.9

This will open a terminal inside the container. In the container try doing:

(rapids) root@bd7436c2aa8b:/rapids/notebooks# cd
(rapids) root@bd7436c2aa8b:~# touch a b c d
 (rapids) root@bd7436c2aa8b:~# ls
a  b  c  d
(rapids) root@bd7436c2aa8b:~# watch -n 1 ls
Segmentation fault (core dumped)

Additionally, let's say we try installing stuff like nano or htop:

$ apt-get update
$ apt-get install nano htop
$ nano a
Segmentation fault (core dumped)
$ htop
Segmentation fault (core dumped)

Expected behavior
People using this docker image should be able to use basic tools like these.

Additional Context:
I tried installing GDB and running nano/htop/watch with gdb to see what happens:

$ gdb /usr/bin/nano 
+ gdb /usr/bin/nano
(gdb) run
Starting program: /usr/bin/nano
warning: Error disabling address space randomization: Operation not permitted

Program received signal SIGSEGV, Segmentation fault.
0x00007f195a257b9f in termattrs_sp () from /opt/conda/envs/rapids/lib/libncursesw.so.6
(gdb) run
Starting program: /usr/bin/htop
warning: Error disabling address space randomization: Operation not permitted 
Program received signal SIGSEGV, Segmentation fault. 
0x00007f0d45772b9f in termattrs_sp () from /opt/conda/envs/rapids/lib/libncursesw.so.6

Also, this seems to be a problem affecting the development images only. For example trying to run nano/htop/watch on the base container causes no issues. Docker run command used for base container:

docker run --gpus all --rm -it     --shm-size=1g --ulimit memlock=-1     rapidsai/rapidsai-core:22.06-cuda11.5-base-ubuntu20.04-py3.9

Environment details (please complete the following information):

  • Environment location: Cloud (Azure) - though it's a dedicated VM.
  • Method of install: Docker (docker pull and run commands provided above)
@rohitdwivedula rohitdwivedula added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 14, 2022
@rohitdwivedula
Copy link
Author

Some additional information about the machine I ran it on:

$ lsb_release -a 
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04 
Codename:       focal
$ nvidia-smi
NVIDIA-SMI 470.129.06   
Driver Version: 470.129.06   
CUDA Version: 11.4
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
$ apt list | grep nvidia-docker
nvidia-docker2/bionic,bionic,now 2.11.0-1 all [installed]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant