Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1802] Fail the celeborn master/worker start if CELEBORN_CONF_DIR is not directory #3030

Closed
wants to merge 3 commits into from

Conversation

turboFei
Copy link
Member

@turboFei turboFei commented Dec 25, 2024

What changes were proposed in this pull request?

Fail the celeborn master/worker start if CELEBORN_CONF_DIR is not directory. Otherwise, the process would run into unexpected status.

Why are the changes needed?

In the celeborn-daemon.sh , if we specify the --config <conf-dir> option. It would fail the master/worker start if the conf-dir is not a directory, likes the systemctl ConditionPathExists=$CELEBORN_CONF_DIR requirement check.

echo " <conf-dir> Override CELEBORN_CONF_DIR"

if [ "$1" == "--config" ]
then
shift
conf_dir="$1"
if [ ! -d "$conf_dir" ]
then
echo "ERROR : $conf_dir is not a directory"
usage
exit 1
else

But before this PR, for the start master/worker scripts, it did not check if the CELEBORN_CONF_DIR is dirctory because the scripts did not leverage --config <conf-dir> option.

In this PR, we check the final CELEBORN_CONF_DIR before start celeborn, so that all the scripts would check if the CELEBORN_CONF_DIR is a directory before start.

Does this PR introduce any user-facing change?

Yes, it would fail the start if CELEBORN_CONF_DIR is not a directory.

How was this patch tested?

image

@turboFei turboFei changed the title [CELEBORN-1802] Fail the celeborn master/worker start on CELEBORN_CONF_DIR does not exists [CELEBORN-1802] Fail the celeborn master/worker start on CELEBORN_CONF_DIR does not exist Dec 25, 2024
@turboFei turboFei changed the title [CELEBORN-1802] Fail the celeborn master/worker start on CELEBORN_CONF_DIR does not exist [CELEBORN-1802] Fail the celeborn master/worker start on CELEBORN_CONF_DIR is not directory Dec 25, 2024
@turboFei turboFei marked this pull request as draft December 25, 2024 02:41
@turboFei turboFei changed the title [CELEBORN-1802] Fail the celeborn master/worker start on CELEBORN_CONF_DIR is not directory [CELEBORN-1802] Fail the celeborn master/worker start if CELEBORN_CONF_DIR is not directory Dec 25, 2024
@turboFei turboFei marked this pull request as ready for review December 25, 2024 02:44
@turboFei
Copy link
Member Author

turboFei commented Dec 25, 2024

Maybe I can use the below commands in systemctl.d/supervisor.d directly.

$CELEBORN_HOME/sbin/celeborn-daemon.sh --config $CELEBORN_CONF_DIR start org.apache.celeborn.service.deploy.master.Master

$CELEBORN_HOME/sbin/celeborn-daemon.sh --config $CELEBORN_CONF_DIR start org.apache.celeborn.service.deploy.master.Worker 1

@turboFei turboFei closed this Dec 25, 2024
@turboFei
Copy link
Member Author

turboFei commented Dec 25, 2024

reopen it incase someone think it is necessary

@turboFei turboFei reopened this Dec 25, 2024
@turboFei turboFei marked this pull request as draft December 25, 2024 03:46
@turboFei turboFei marked this pull request as ready for review December 25, 2024 04:09
@turboFei turboFei requested a review from pan3793 December 25, 2024 04:09
then
echo "ERROR : CELEBORN_CONF_DIR: $CELEBORN_CONF_DIR is not a directory"
exit 1
fi
Copy link
Member Author

@turboFei turboFei Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The celeborn-daemon.sh would load the celenorn-env.sh after checking the --config option.

. "${CELEBORN_HOME}/sbin/load-celeborn-env.sh"

So we just check it before run the start command.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, also cc @FMX @RexXiong

Copy link
Contributor

@RexXiong RexXiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @pan3793

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Merged into main(v0.6.0).

@FMX FMX closed this in 1b3bd6e Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants