Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdma: support NCCL multi-recv interface #348

Closed
wants to merge 6 commits into from

Conversation

rauteric
Copy link
Contributor

The multi-recv interface allows aggregating up to 8 receive requests in a single request. The changes include msgbuff changes to be tag-aware.

  • Temporarily disables eager; it will be re-enabled in a future commit.
  • Makes connect() blocking; we need to better understand why this is necessary.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

include/nccl_ofi.h Outdated Show resolved Hide resolved
include/nccl_ofi_rdma.h Outdated Show resolved Hide resolved
src/nccl_ofi_api.c Outdated Show resolved Hide resolved
@AvivBenchorin AvivBenchorin added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Feb 22, 2024
@@ -59,7 +59,8 @@ extern "C" {
#define MIN_TAG_BITS_FOR_RING_ID (32 + 1)

/* Maximum number of grouped receives */
#define NCCL_OFI_MAX_RECVS 1
#define NCCL_OFI_MAX_RECVS 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to break out the two protocols (which makes sense), then we should move these into their corresponding header files. And we really shouldn't leave something named as NCCL_OFI_MAX_RECVS when it's only sometimes right.

destination buffer */
typedef struct nccl_net_ofi_rdma_ctrl_msg {
typedef struct nccl_net_ofi_rdma_ctrl_msg_entry {
int multi_recv_tag;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multirecv tag should go at the end for padding reasons.

include/nccl_ofi_rdma.h Outdated Show resolved Hide resolved
src/nccl_ofi_msgbuff.c Outdated Show resolved Hide resolved
@@ -25,7 +26,7 @@ nccl_ofi_msgbuff_t *nccl_ofi_msgbuff_init(uint16_t buffer_size)
goto error;
}
msgbuff->buff_size = buffer_size;
if (!(msgbuff->buff = malloc(sizeof(nccl_ofi_msgbuff_elem_t)*buffer_size))) {
if (!(msgbuff->buff = calloc((4*buffer_size), sizeof(nccl_ofi_msgbuff_elem_t)))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this magic constant?

NCCL_OFI_WARN("Error locking mutex");
return NCCL_OFI_MSGBUFF_ERROR;
}
NCCL_OFI_WARN("Error locking mutex");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

own commit.

nccl_ofi_msgbuff_result_t ret = NCCL_OFI_MSGBUFF_ERROR;

*msg_idx_status = nccl_ofi_msgbuff_get_idx_status(msgbuff, msg_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unrelated to multirecv; more details on this change would be good.

In the current implementation of connect/accept, it is possible for
`accept` to complete (i.e., return a non-NULL communicator) after the
corresponding `connect` returned a NULL communicator (while waiting for
a completion for the connection message). This is a strange semantic,
and evidently causes NCCL to be unhappy, particularly in the multi-recv
case (which is being added in a future commit).

So, after sending the connect message, defer waiting for completion;
block when closing the send comm if necessary.

Signed-off-by: Eric Raut <[email protected]>
These tests do not support multi-recv.

Signed-off-by: Eric Raut <[email protected]>
@rauteric
Copy link
Contributor Author

rauteric commented Feb 24, 2024

Addressed the major points of Brian's review: splitting into more commits and removing abstraction-breaking changes from msgbuff.

Some stylistic things are still WIP.

Also rebased on master.

The previous implementation of the ring unit test posts sends before
receives, which is incompatible with the current multi-recv behavior of
rejecting sends until ctrl information is avaiable.

Signed-off-by: Eric Raut <[email protected]>
msgbuff functions accept tag and multi-recv information. RDMA protocol
code is updated to pass dummy values for these fields. Multi-recv
support for RDMA protocol will be an upcoming commit.

Signed-off-by: Eric Raut <[email protected]>
The multi-recv interface allows aggregating up to 8 receive requests in
a single request.

This commit does not yet advertise support for multi-recv to NCCL.

* Temporarily disables eager; it will be re-enabled in a future commit.

Signed-off-by: Eric Raut <[email protected]>
RDMA protocol will now support up to 8 multi-recv buffers at a time.

Signed-off-by: Eric Raut <[email protected]>
@rauteric
Copy link
Contributor Author

"Fixed" hang in ring unit test: 9207e74

@a-szegel
Copy link
Contributor

a-szegel commented Jun 3, 2024

Please merge master.

@a-szegel a-szegel removed the BuildTriggerRequest CI build will be triggered when this label is set label Jun 3, 2024
@rauteric
Copy link
Contributor Author

rauteric commented Oct 2, 2024

Going to close this PR for now. This branch will need to be rebased, and we need to address the perf issues here before adding this feature.

@rauteric rauteric closed this Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants