Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fence_sbd: if sbd devices are not specified with option, read SBD_DEVICE #591

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

liangxin1300
Copy link

@liangxin1300 liangxin1300 commented Jul 26, 2024

from environment

Copy link

knet-jenkins bot commented Jul 26, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/1/input

@wenningerk
Copy link
Contributor

Guess we have to carefully think if this behavior is desirable.
Since it is possible to have nodes that do support running the SBD daemon (e.g. proper watchdog-device) and others that don't I'd rather have the fence-agent behave the same way regardless on which node it is running.
On the other hand if the SBD devices are available e.g. under different names on the nodes this would be an easy way to simplify the configuration in the CIB.
What was your use-case that triggered the idea?

Klaus

@liangxin1300
Copy link
Author

What was your use-case that triggered the idea?

  • keep the same with external/sbd
  • If the sbd devices change (are removed or replaced), we have to remember to change fence_sbd's parameter. That might be inconsistent with /etc/sysconfig/sbd

@oalbrigt
Copy link
Collaborator

You have to run make xml-upload and attach the updated test metadata to avoid CI failing.

Copy link

knet-jenkins bot commented Jul 26, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/2/input

Copy link
Member

@gao-yan gao-yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @liangxin1300 !

Guess we have to carefully think if this behavior is desirable. Since it is possible to have nodes that do support running the SBD daemon (e.g. proper watchdog-device) and others that don't I'd rather have the fence-agent behave the same way regardless on which node it is running.

It still makes sense to respect explicitly specified "devices" especially since there might be the cases where some of the cluster nodes are not running sbd daemon but can be a fencing executor. But OTOH IMO, it also makes sense to be possible to omit the parameter for the normal cases where all nodes are configured to be running sbd daemon. So that by falling back to sysconfig, it'll avoid unnecessary duplication or even inconsistency with what's configured in the sysconfig.

agents/sbd/fence_sbd.py Outdated Show resolved Hide resolved
Copy link

knet-jenkins bot commented Sep 25, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/3/input

@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from f542fab to 4aff0dc Compare September 25, 2024 09:42
Copy link

knet-jenkins bot commented Sep 25, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/4/input

configure.ac Outdated Show resolved Hide resolved
@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from 4aff0dc to f4222cb Compare September 25, 2024 13:43
Copy link

knet-jenkins bot commented Sep 25, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/5/input

@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from f4222cb to bda9d27 Compare September 25, 2024 22:47
Copy link

knet-jenkins bot commented Sep 25, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/6/input

@wenningerk
Copy link
Contributor

Puh ... that is a while back but was there a reason to explicitly parse sbd-config? Not 100% sure but I guess systemd is executing it as shell-script (if not we would at least have consistent behavior) so we might do here as well. And shouldn't it even be in the environment of pacemaker and thus be inherited to the fence-agents?
To be on the safe side we could as well check if sbd is enabled on the node before we use a config-file that isn't used otherwise.
Up to now not specifying devices is caught as configuration error. Not to change that behavior we might think of some explicit content here - like 'auto'. Maybe even something that lists the nodes where the config should be taken from sbd-config ... could even be combined with explicit devices for other nodes. Maybe some brainstorming in that direction makes sense ...

@gao-yan
Copy link
Member

gao-yan commented Sep 26, 2024

Puh ... that is a while back but was there a reason to explicitly parse sbd-config?

SBD is not an easy thing to set up. It involves setups of watchdog & block devices, on-disk metadata, sysconfig file, CIB configuration... When the other sbd fence agent was implemented, I believe it was exactly for the purpose of easing the setup and more importantly ensuring the consistency by reading already known configuration from sysconfig rather than unnecessarily repeating them in the CIB.

Not 100% sure but I guess systemd is executing it as shell-script

Unlikely so. It seems to do the parsing by itself:
https://github.com/systemd/systemd/blob/5acca1b88551633c91ae602351f9a23af178ce6b/src/basic/env-file.c#L22

(if not we would at least have consistent behavior) so we might do here as well.

The python module "configparser" could be a helper. Although it was originally meant for parsing ini files, but apparently it's used by us as well such as:

https://github.com/ClusterLabs/resource-agents/blob/f2932e218b912761c4374fd592dbaa6225abe6ae/tools/nfsconvert.in#L282-L284

And shouldn't it even be in the environment of pacemaker and thus be inherited to the fence-agents?

It'd probably be asking too much of pacemaker if it was purely for the purpose of passing on the environment variables to the fence agent? It might be more convincing if pacemaker itself should care about any of the sbd settings.

To be on the safe side we could as well check if sbd is enabled on the node before we use a config-file that isn't used otherwise.

Sounds sensible.

Up to now not specifying devices is caught as configuration error. Not to change that behavior we might think of some explicit content here - like 'auto'.

It might arguably be a change of behavior but it wouldn't affect existing setups. I believe all of them have always had "devices" explicitly configured.

I'd rather not unnecessarily complicate this. This agent handles fencing via disk-based sbd. It seems very strait-forward to me that it should try finding the device list from sysconfig if "devices" is not explicitly specified. Of course if SBD_DEVICE is not even configured in sysconfig, an error should be dropped.

Maybe even something that lists the nodes where the config should be taken from sbd-config ... could even be combined with explicit devices for other nodes.

Separate sbd resources configured for different nodes with specific "devices" or without it, together with location constraints should do?

@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from bda9d27 to 3afaf47 Compare September 26, 2024 14:28
Copy link

knet-jenkins bot commented Sep 26, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/7/input

@oalbrigt
Copy link
Collaborator

retest this please

@wenningerk
Copy link
Contributor

And shouldn't it even be in the environment of pacemaker and thus be inherited to the fence-agents?

It'd probably be asking too much of pacemaker if it was purely for the purpose of passing on the environment variables to the fence agent? It might be more convincing if pacemaker itself should care about any of the sbd settings.

I guess that was a misunderstanding. I meant that the environment that is currently passed to the fence-agents by pacemaker should already contain the definitions from /etc/sysconfig/sbd. Didn't check though.

@gao-yan
Copy link
Member

gao-yan commented Sep 27, 2024

I guess that was a misunderstanding. I meant that the environment that is currently passed to the fence-agents by pacemaker should already contain the definitions from /etc/sysconfig/sbd. Didn't check though.

Ah, you are right. Somehow I forgot pacemaker already retrieves sbd sysconfig. In that case, it'd be even better to just reference the environment variable SBD_DEVICE in here.

@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from 3afaf47 to 794b26e Compare September 27, 2024 11:24
Copy link

knet-jenkins bot commented Sep 27, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/9/input

@liangxin1300 liangxin1300 force-pushed the 20240726_read_sbd_device_from_config branch from 794b26e to 744d534 Compare September 27, 2024 11:27
Copy link

knet-jenkins bot commented Sep 27, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-591/10/input

@oalbrigt
Copy link
Collaborator

The CI fail has nothing to do with the code, so no need to worry about it.

@wenningerk
Copy link
Contributor

wenningerk commented Sep 28, 2024

Was thinking of a smart way to check for a locally running sbd daemon.
Well one more reason why it would make sense for the sbd-binary running in cmdline-mode to talk to the sbd-binary running as daemon. Unfortunately for now there is nothing and I don't see a way of using the cmdline-tool as is to check in a generic way. Actually I'd like to move the whole logic to sbd - see if there is a daemon to talk to otherwise handle in the cmdline-tool. We could do that with the whole disk-interaction so that the fence-agent doesn't have to deal with it together with all the nastyness of hanging block-devices. Unfortunately that would be a major overhaul but it might pay off in the future to have a stable interface to the cmdline-tool and have that handle everything internally.
What stays is doing it as pacemaker does - take the pid-file and check if it is running. (We already know the sbd-binary and I don't think sbd is gonna split into daemon & client binary any time soon. Only dependency we pull in is name/location of the pid-file.)
Other thing would be to check with systemd if sbd is enabled. (Has to handle the freebsd-case. But anyway we should think of a way of not touching the whole agent for a freebsd build as sbd is anyway not supported.)

@gao-yan
Copy link
Member

gao-yan commented Sep 30, 2024

Was thinking of a smart way to check for a locally running sbd daemon. Well one more reason why it would make sense for the sbd-binary running in cmdline-mode to talk to the sbd-binary running as daemon. Unfortunately for now there is nothing and I don't see a way of using the cmdline-tool as is to check in a generic way. Actually I'd like to move the whole logic to sbd - see if there is a daemon to talk to otherwise handle in the cmdline-tool. We could do that with the whole disk-interaction so that the fence-agent doesn't have to deal with it together with all the nastyness of hanging block-devices. Unfortunately that would be a major overhaul but it might pay off in the future to have a stable interface to the cmdline-tool and have that handle everything internally.

But as long as there should be the use cases where a node is not running sbd daemon but could be an sbd fencing executor, either fence agent or sbd CLI should be able to handle hanging block devices by itself anyway such as ClusterLabs/sbd#148 ?

What stays is doing it as pacemaker does - take the pid-file and check if it is running. (We already know the sbd-binary and I don't think sbd is gonna split into daemon & client binary any time soon. Only dependency we pull in is name/location of the pid-file.) Other thing would be to check with systemd if sbd is enabled. (Has to handle the freebsd-case. But anyway we should think of a way of not touching the whole agent for a freebsd build as sbd is anyway not supported.)

Checking with systemd might be more straight-forward but of course technically less generic and more of a dependency. OTOH doing it via sbd.pid would of course require logic in configure.ac dealing with $(runstatedir) or ${localstatedir}/run ...

@wenningerk
Copy link
Contributor

Guess we should have checking for the daemon in this PR. @oalbrigt what would you prefer?

@oalbrigt
Copy link
Collaborator

oalbrigt commented Oct 8, 2024

Yeah. I guess it would make sense to check if sbd is configured and running in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants