Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granular layer selection during Pipeline Parallelism #598

Open
bhuvan777 opened this issue Oct 3, 2024 · 2 comments
Open

Granular layer selection during Pipeline Parallelism #598

bhuvan777 opened this issue Oct 3, 2024 · 2 comments
Labels
question Further information is requested

Comments

@bhuvan777
Copy link

When configuring pipeline splitting by specifying exact layers in the config (--experimental.pipeline_parallel_split_points), we are unable to assign sub-layers (e.g., layer.4.attn.qvw). If we attempt to do so, all layers are allocated to device rank 0, leaving device rank 1 without any layers(considering we are using 2 devices). To avoid this, we need to specify the layer as a whole (e.g., layer.4).

Is there a specific reason for this limitation, or could we expect support for more granular layer-level assignment (e.g., sub-layers like layer.4.attn.qvw) in future updates?

@tianyu-l
Copy link
Contributor

tianyu-l commented Oct 3, 2024

cc: @H-Huang @wconstab

@tianyu-l tianyu-l added the question Further information is requested label Oct 4, 2024
@H-Huang
Copy link
Member

H-Huang commented Oct 4, 2024

Hi @bhuvan777, right, we currently only support splitting at the Transformer block level. Splitting at a more granular layer is possible but would just require more code to do so, making the pipeline parallel code a bit more complex. If we split at the block level, then it is simpler to determine what the input/output activations that need to be used in send/recv during pipeline parallelism.

Curious about what is your particular use case for splitting up a block?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants