Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-4335: Watchdog to ignore stale pidfile #2841

Merged
merged 1 commit into from
Feb 22, 2024

Conversation

mikeweilgart
Copy link
Contributor

Fixes CFE-4335.

@mikeweilgart
Copy link
Contributor Author

I don't have an easy way to test it, but in isolation the code seems to work.

/proc exists at least on all Linux + Solaris, though I'm unable to find confirmation that it's required by POSIX. For the basic point of, "how old is this process?" (checking mtime with ls -t) it should work.

Options to head and ls only include those specified by POSIX.

Hat tip to https://stackoverflow.com/q/40205638/5419599 where I got the idea for this code.

@mikeweilgart
Copy link
Contributor Author

Does not work on MacOS, not sure how broadly watchdog is intended to work.

@mikeweilgart
Copy link
Contributor Author

Also, shouldn't the pidfile be in /var/run/ ? That exists on all modern unixes, including macos, and gets wiped out on boot which is the most common reason for stale pidfiles generally (existing past boot). Reference: https://www.pathname.com/fhs/2.2/fhs-5.13.html

Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nickanderson nickanderson changed the title Watchdog to ignore stale pidfile CFE-4335: Watchdog to ignore stale pidfile Feb 22, 2024
@nickanderson
Copy link
Member

@flynn1973 Can you confirm if this change improves the situation for you?

@flynn1973
Copy link
Contributor

flynn1973 commented Feb 22, 2024

a quick run of the changed script shows a missing "then" ;-)

if [ "$actual_process" = "$newer" ];

error:

root@aixtest01: /root # /var/cfengine/bin/watchdog_changed
/var/cfengine/bin/watchdog_changed[52]: syntax error at line 64 : `else' unexpected

after adding it the missing statement the script runs without errors....all in all this change will definitely improve the situation..good idea.

@nickanderson
Copy link
Member

@flynn1973 Patched.

@nickanderson
Copy link
Member

@mikeweilgart do you want to squash these commits together and amend the commit body to include something like

Ticket: CFE-4335
Changelog: AIX watchdog now handles stale pids

@nickanderson
Copy link
Member

@mikeweilgart thanks, one more thing, can you re-word the commit subject to be past tense?

Something like "Made AIX watchdog ignore stale pid files".

@nickanderson
Copy link
Member

@mikeweilgart I think this watchdog is specific to AIX. It might work on Linux as well, but the Linux one is much more simple.

The aix watchdog uses this template here: https://github.com/cfengine/masterfiles/blob/master/cfe_internal/core/watchdog/watchdog.cf#L174-L179

and the linux watchdog, well the watchdog used when /etc/cron.d is present uses https://github.com/cfengine/masterfiles/blob/master/cfe_internal/core/watchdog/watchdog.cf#L225-L227

Would be good to combine them, but that would be separate work.

Ticket: CFE-4335
Changelog: AIX watchdog now handles stale pids

Code uses /proc opportunistically only if it exists; this improves
accuracy for systems with /proc and doesn't affect other systems.
(Turns out this doesn't matter as the template is only for AIX which
does have /proc)

This leaves in place a race condition if many watchdog processes are
started in very short succession, since the pidfile is not atomically
checked and updated.  For purposes of the watchdog this is probably good
enough.  (See https://stackoverflow.com/a/688365/5419599 for more on
this.)
@nickanderson
Copy link
Member

Thanks @mikeweilgart !

@nickanderson nickanderson merged commit 26d781d into cfengine:master Feb 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants