Skip to content

Commit

Permalink
Advertise multi-recv support to NCCL for RDMA protocol
Browse files Browse the repository at this point in the history
RDMA protocol will now support up to 8 multi-recv buffers at a time.

Signed-off-by: Eric Raut <[email protected]>
  • Loading branch information
rauteric committed Feb 25, 2024
1 parent 1278cf6 commit 5fddb2e
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 2 deletions.
3 changes: 2 additions & 1 deletion include/nccl_ofi.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ extern "C" {
#define MIN_TAG_BITS_FOR_RING_ID (32 + 1)

/* Maximum number of grouped receives */
#define NCCL_OFI_MAX_RECVS 1
#define NCCL_OFI_MAX_RECVS 8
#define NCCL_OFI_MAX_RECVS_SENDRECV 1

/*
* This defines a higher value than maximum inflight requests supported by NCCL
Expand Down
2 changes: 1 addition & 1 deletion src/nccl_ofi_net.c
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ static int set_nic_props_default(int dev_id, struct fi_info *nic_prov,
* impacted with this feature as NCCL doesn't aggregate receives from
* same source.
*/
props->max_group_receives = NCCL_OFI_MAX_RECVS;
props->max_group_receives = NCCL_OFI_MAX_RECVS_SENDRECV;

if (support_gdr == GDR_SUPPORTED) {
props->hmem_support = true;
Expand Down
3 changes: 3 additions & 0 deletions src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,9 @@ static inline int get_properties(nccl_net_ofi_device_t *base_dev,
struct fi_info *info = device->device_rails[0].info;
int ret = nccl_net_ofi_info_properties(info, dev_id, base_dev->plugin->num_devs, props);

/* Multi-recv adjustment */
props->max_group_receives = NCCL_OFI_MAX_RECVS;

/* Scale speed by the total number of rails. Assume that all
* reails have the same speed. */
if (ret == 0) {
Expand Down

0 comments on commit 5fddb2e

Please sign in to comment.