Torchrec sharding plan setting #2642

shan-jiang-faire · 2024-12-17T19:58:57Z

I'm using torchrec to make model parallelization. Is that possible to manually setting up a group of sparse features to be sharded in the same machine? For example, I have 3 sparse features A, B and C. They are interact frequently in my model so I'd like them to be in the same GPU to reduce the communication between machines. Is there any way to make it in torchrec?

Thanks very much!

sarckk · 2024-12-20T01:19:14Z

Hi @shan-jiang-faire, I'm assuming these 3 features will be in separate embedding tables? afaik, if you're using the TorchRec planner to generate a sharding plan, there isn't an easy way to enforce all 3 tables will be on the same GPU/rank. However, you can use the construct_module_sharding_plan API [source] to manually define a sharding plan such that the 3 tables are table-wise sharded on the same rank.

Example:

plan = construct_module_sharding_plan(
  ebc,
  {
     "table_1": table_wise(rank=0),
     "table_2": table_wise(rank=0),
     "table_3": table_wise(rank=0),
  },
)

Would that work for your use case?

shan-jiang-faire · 2024-12-20T16:37:22Z

Thank you very much! Yes, they are in separate tables. I think it is what I need here. Let me try it. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchrec sharding plan setting #2642

Torchrec sharding plan setting #2642

shan-jiang-faire commented Dec 17, 2024

sarckk commented Dec 20, 2024

shan-jiang-faire commented Dec 20, 2024

Torchrec sharding plan setting #2642

Torchrec sharding plan setting #2642

Comments

shan-jiang-faire commented Dec 17, 2024

sarckk commented Dec 20, 2024

shan-jiang-faire commented Dec 20, 2024