-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend range of communicator IDs for the RDMA protocol #367
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AmedeoSapio
added
the
BuildTriggerRequest
CI build will be triggered when this label is set
label
Mar 29, 2024
bwbarrett
requested changes
Mar 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I only reviewed the last two commits, since the other ones are covered in a different PR. My assumption is that this PR will rebase once that one is committed.
rauteric
reviewed
Mar 29, 2024
AmedeoSapio
removed
the
BuildTriggerRequest
CI build will be triggered when this label is set
label
Mar 30, 2024
AmedeoSapio
force-pushed
the
extend_comm_id
branch
from
March 30, 2024 05:42
7092add
to
56ae205
Compare
bwbarrett
requested changes
Mar 30, 2024
AmedeoSapio
force-pushed
the
extend_comm_id
branch
from
March 30, 2024 18:14
56ae205
to
67b905e
Compare
AmedeoSapio
added
the
BuildTriggerRequest
CI build will be triggered when this label is set
label
Mar 30, 2024
rauteric
reviewed
Mar 30, 2024
This is changing the msgbuff so that it can support any number of bits for the sequence number, not just 16. This is in preparation to increase the range of comm IDs, which requires changing the number of bits used for the msg_seq_num. To fit a larger comm ID in the immediate data, the message sequence number bit-width must be reduced. With the 16 bits msg_seq_num the msgbuff relied on the wraparound of 16 bits integers, so to reduce the number of bits of msg_seq_num we need to make the wrapping around explicit and implement a distance function to deal with the wraparound case. This makes the msgbuff more generic, as it can now support any msg_seq_num space and any number of inflight messages, as long as they are powers of 2. This also extended the msgbuff unit test to test the case when message sequence numbers wrap around. Signed-off-by: Amedeo Sapio <[email protected]>
Changing the number of bits used for the communicator ID from 12 to 18 to reduce the chance of running out of comm IDs. To fit a larger comm ID in the immediate data, the message sequence number has been reduced from 16 bits to 10, which is still more than enough since usually the number of inflight messages is not that high. The msg_seq_num space can be further reduced if needed. Signed-off-by: Amedeo Sapio <[email protected]>
AmedeoSapio
force-pushed
the
extend_comm_id
branch
from
March 30, 2024 22:15
67b905e
to
382d001
Compare
rauteric
approved these changes
Mar 31, 2024
bwbarrett
approved these changes
Mar 31, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
The RDMA protocol is currently using 12 bits for the communicator ID, limiting the number of communicators to 4K, which can be hit in large deployments.
Description of changes:
This PR is hanging the number of bits used for the communicator ID from 12 to 18 to reduce
the chance of running out of comm IDs. To fit the comm ID in the immediate data,
the message sequence number has been reduced from 16 bits to 10, which is still
more than enough since usually the number of inflight messages is not that high.
The msg_seq_num space can be further reduced if needed.
With the 16 bits msg_seq_num the msgbuff relied on the wraparound of 16 bits integers,
so to reduce the number of bits of msg_seq_num we need to make the wrapping around
explicit and implement a subtraction to deal with the wraparound case. This makes
the msgbuff more generic, as it can now support any msg_seq_num space and any
number of inflight messages (not only powers of 2).
The msgbuff unit test has also been extended to test the case when message sequence numbers wrap around.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.