You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When configuring pipeline splitting by specifying exact layers in the config (--experimental.pipeline_parallel_split_points), we are unable to assign sub-layers (e.g., layer.4.attn.qvw). If we attempt to do so, all layers are allocated to device rank 0, leaving device rank 1 without any layers(considering we are using 2 devices). To avoid this, we need to specify the layer as a whole (e.g., layer.4).
Is there a specific reason for this limitation, or could we expect support for more granular layer-level assignment (e.g., sub-layers like layer.4.attn.qvw) in future updates?
The text was updated successfully, but these errors were encountered:
Hi @bhuvan777, right, we currently only support splitting at the Transformer block level. Splitting at a more granular layer is possible but would just require more code to do so, making the pipeline parallel code a bit more complex. If we split at the block level, then it is simpler to determine what the input/output activations that need to be used in send/recv during pipeline parallelism.
Curious about what is your particular use case for splitting up a block?
When configuring pipeline splitting by specifying exact layers in the config (
--experimental.pipeline_parallel_split_points
), we are unable to assign sub-layers (e.g.,layer.4.attn.qvw
). If we attempt to do so, all layers are allocated to device rank 0, leaving device rank 1 without any layers(considering we are using 2 devices). To avoid this, we need to specify the layer as a whole (e.g.,layer.4
).Is there a specific reason for this limitation, or could we expect support for more granular layer-level assignment (e.g., sub-layers like
layer.4.attn.qvw
) in future updates?The text was updated successfully, but these errors were encountered: