Granular layer selection during Pipeline Parallelism #598

bhuvan777 · 2024-10-03T15:06:18Z

When configuring pipeline splitting by specifying exact layers in the config (--experimental.pipeline_parallel_split_points), we are unable to assign sub-layers (e.g., layer.4.attn.qvw). If we attempt to do so, all layers are allocated to device rank 0, leaving device rank 1 without any layers(considering we are using 2 devices). To avoid this, we need to specify the layer as a whole (e.g., layer.4).

Is there a specific reason for this limitation, or could we expect support for more granular layer-level assignment (e.g., sub-layers like layer.4.attn.qvw) in future updates?

The text was updated successfully, but these errors were encountered:

tianyu-l · 2024-10-03T18:21:43Z

cc: @H-Huang @wconstab

H-Huang · 2024-10-04T18:49:09Z

Hi @bhuvan777, right, we currently only support splitting at the Transformer block level. Splitting at a more granular layer is possible but would just require more code to do so, making the pipeline parallel code a bit more complex. If we split at the block level, then it is simpler to determine what the input/output activations that need to be used in send/recv during pipeline parallelism.

Curious about what is your particular use case for splitting up a block?

tianyu-l added the question Further information is requested label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Granular layer selection during Pipeline Parallelism #598

Granular layer selection during Pipeline Parallelism #598

bhuvan777 commented Oct 3, 2024

tianyu-l commented Oct 3, 2024

H-Huang commented Oct 4, 2024

Granular layer selection during Pipeline Parallelism #598

Granular layer selection during Pipeline Parallelism #598

Comments

bhuvan777 commented Oct 3, 2024

tianyu-l commented Oct 3, 2024

H-Huang commented Oct 4, 2024