Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move rdma control channel traffic to its own endpoint #321

Closed
wants to merge 3 commits into from

Conversation

bwbarrett
Copy link
Contributor

This is a series of cleanup patches that includes splitting some of the initialization of Libfabric between the send/recv and rdma transports, with the end goal of making it easier to move the control channel data from the data endpoint to its own endpoint (always on "NIC 0"), so that we can eventually increase the priority level of control messages over data messages (when Libfabric/rdma-core supports the service level feature through to the card).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@bwbarrett bwbarrett requested a review from a team January 23, 2024 18:43
Copy link
Contributor

@rauteric rauteric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't finish reviewing yet but left comments on the last commit.

include/nccl_ofi_rdma.h Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
@bwbarrett bwbarrett added the BuildTriggerRequest CI build will be triggered when this label is set label Jan 25, 2024
rajachan
rajachan previously approved these changes Jan 25, 2024
Copy link
Member

@rajachan rajachan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes, so I'd need to read through again to digest it all, but the primary change of moving to rail 0 for ctrl and the ofiutil changes looks good to me.

src/nccl_ofi_net.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
@bwbarrett
Copy link
Contributor Author

rebased against the master branch and resolved conflicts from the id pool changes.

src/nccl_ofi_net.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
@bwbarrett bwbarrett marked this pull request as draft February 19, 2024 20:54
@bwbarrett bwbarrett marked this pull request as ready for review February 20, 2024 19:07
@bwbarrett
Copy link
Contributor Author

Fixed an mr_mode initialization error in the send/recv path that was causing the P4 failures.

/* look for control messages and then retry the message search
to avoid unnecessary polling / queueing. */
if (OFI_UNLIKELY(!polled_cq && !have_ctrl)) {
ofi_process_cq_rail(ep, &ep->control_rail);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check return value

@@ -4659,6 +4661,14 @@ static int send(nccl_net_ofi_send_comm_t *send_comm, void *data, int size, int t
goto error;
}

/* look for control messages and then retry the message search
to avoid unnecessary polling / queueing. */
if (OFI_UNLIKELY(!polled_cq && !have_ctrl)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this unlikely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's going to take forever, so optimize for the fast path.

@AvivBenchorin AvivBenchorin added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Feb 20, 2024
@bwbarrett bwbarrett marked this pull request as draft February 20, 2024 23:41
@bwbarrett
Copy link
Contributor Author

Converting this PR to a draft. #344 includes all but the last two patches, which actually create the control QP/CQ. Those two patches impact performance in a way that needs more work. Leaving this PR so that we don't lose the work, but in the mean time, we'll get the refactoring code in as part of #344.

@a-szegel a-szegel removed the BuildTriggerRequest CI build will be triggered when this label is set label Jun 3, 2024
@a-szegel
Copy link
Contributor

a-szegel commented Jun 3, 2024

Please merge master.

For the long message RDMA protocol, we want to make sure that we
never starve the sender for data to move, which means prioritizing
control messages from the receiver to the sender.  This patch moves
both the communicator setup and recv control messages to a new
endpoint, which is always on device rail 0.  Future patches will
optimize polling of the control message cq in the send path
and setting priority bits on the control cq.

Signed-off-by: Brian Barrett <[email protected]>
If a match isn't found for the current send, poll the control cq
to see if the match can be found.  While this extends the current
send() call, it potentially lowers the time until data transfer
starts.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett
Copy link
Contributor Author

This is now so hopelessly out of date that I'm going to close the PR. @rajachan is working on refactoring to work with HEAD of master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants