Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: implement a proper shutdown mechanism #4578

Merged
merged 3 commits into from
Jul 17, 2024

Conversation

jiceatscion
Copy link
Contributor

@jiceatscion jiceatscion commented Jul 16, 2024

The "running" flag is now accessed as an atomic (so with sequential ordering per the Go memory model). Added a shutdown entry point which cannot race with initialization.
Fixes #4116

@jiceatscion
Copy link
Contributor Author

This change is Reviewable

@jiceatscion jiceatscion changed the title router: Implement a proper shutdown mechanism router: implement a proper shutdown mechanism Jul 16, 2024
@jiceatscion jiceatscion requested a review from oncilla July 16, 2024 16:34
Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @oncilla)

@jiceatscion jiceatscion merged commit 8ba6b38 into scionproto:master Jul 17, 2024
5 checks passed
@jiceatscion jiceatscion deleted the fix4116 branch July 17, 2024 09:28
@jcp19
Copy link
Contributor

jcp19 commented Jul 17, 2024

Sorry for being late to this discussion. I wonder if there is still the possibility for a kind of ABA problem here.

In the scenario where we have a router d being initialized in one thread and started with a call to Run, it would be possible for a separate thread to call d.Shutdown() and then call one of the setters like AddExternalInterface with a new external interface (such that we do not get an alreadySet error). If we get an unfortunate timing, the calls to Shutdown and AddExternalInterface from the second thread might occur in between calls to d.IsRunning() in another thread that performs routing. Thus, it may be possible that we try to modify one of the fields of d (e.g., d.external) while other threads may access them. This would be problematic because the data structures stored in these fields are not thread-safe.

This scenario is definitely unlikely, but I think there is still the possibility of a race here. If you agree with this, then I think that a valid fix would be to replace the running flag by a field named state comprised of an atomic integer that stores one of three possible values: Initializing (the default value), Running, and Finished. All setters should guarantee that the state is Initializing; Shutdown() sets the state to Finished. This should avoid the race that I think might happen.

PS: I wonder if it would make sense to call d.Shutdown() after <-ctx.Done() in the Run method as well, such that all forwarding threads terminate after a client asks the router to terminate.

@jiceatscion
Copy link
Contributor Author

I agree. I don't much see a real use case for sharing the management of a router between multiple threads but it is theoretically possible. Multiple threads could also fight over interfaces during initialization, which is also unlikely to produce sensible results. Note that after a call to Shutdown, the router is committed to terminating, so there's not much incentive in using Shutdown() as a way to allow reconfigurations. As things stand at the moment, the shutdown mechanism only exists so unit tests can terminate the router. I almost did what you described (i.e. more than two states) and then stopped myself...YAGNI.

But if that loose end bothers you, you are mot welcome to file a new issue (or re-open yours) and propose such an improvement. It shouldn't add much complexity, so I won't object on those grounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possibly missing synchronization in the DataPlane.Run method
3 participants