Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add runhouse base image #1318

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add runhouse base image #1318

wants to merge 1 commit into from

Conversation

jlewitt1
Copy link
Collaborator

@jlewitt1 jlewitt1 commented Oct 9, 2024

No description provided.

@jlewitt1 jlewitt1 force-pushed the create-rh-image branch 4 times, most recently from 416dad2 to 27a0173 Compare October 9, 2024 17:54
fi

# Keep the container running, allowing SkyPilot to connect and run commands inside the container
CMD ["bash"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Feel like it's not standard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea fair point, it's not so standard. but we do need to keep the container up and not exit while Sky is doing the provisioning / setup step. prob better to run supervisord as the main process

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand - isn't the Skypilot work happening after the container already built? What happens if we try to launch a docker cluster without any line here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without that line (or similar) the container would exit since no process is keeping it alive. It does seem like sky is doing some work to run commands inside the container after it has started - snippet below from the provisioning logs:

Error response from daemon: Container 77772152c4654088ac87318f08620c24aba7aef619fa744ff72b99668971d907 is not running

I 10-09 19:33:56 instance_setup.py:88] _initialize_docker: Retrying in 1.3 seconds, due to Command docker exec sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo service ssh start)'"'"' failed with return code 1.

I 10-09 19:33:56 instance_setup.py:88] Failed to run docker setup commands

D 10-09 19:33:57 docker_utils.py:154] + command -v docker || echo 'NoExist'

/usr/bin/docker

D 10-09 19:33:58 docker_utils.py:154] + docker inspect -f "{{.State.Running}}" sky_container || true

false

D 10-09 19:33:58 docker_utils.py:154] + docker start sky_container

sky_container

D 10-09 19:33:59 docker_utils.py:154] + docker exec sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo service ssh start)'"'"'

Error response from daemon: Container 77772152c4654088ac87318f08620c24aba7aef619fa744ff72b99668971d907 is not running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants