Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/refactor partition strategy #13

Merged

Conversation

huangting4201
Copy link
Collaborator

Motivation

Refactor data(sequence), weight, gradients and os partition strategy.

1. size: int, the size of weight parallel.
2. overlap: bool, enable/disable all_gather/reduce_scatter communication overlap, defaults to False.
3. memory_pool: bool, enable/disable memory pool, defaults to False.
"""
parallel = dict(
zero1=dict(size=8, fsdp=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然我们都用了自己的wp了,fsdp这个接口要不要就隐藏了,不在范例中体现了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也可以

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新62a665d

pipeline=dict(size=1, interleaved_overlap=True),
sequence_parallel=False,
weight=dict(size=1, overlap=True, memory_pool=True),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否需要新增一个带WP的example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和测例

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前的config 7B_sft.py就是带wp的;测例的话可以加一个

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,鹏哥的意思加一个wp size大于1的样例

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加62a665d

module_shapes: Dict[str, torch.Size] = None


class MemoryPool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huangting4201 @mwiacx 如果升级到pytorch lastest版本(假设使用vmm api的版本从rc版本变为正式版)后,memory pool是可以不用了?

@sunpengsdu sunpengsdu merged commit ae5a7ee into InternLM:develop Feb 1, 2024
13 checks passed
expert_parallel_size (int): Size of expert parallel.
"""

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.rank_num_per_group = self.tensor_parallel_size * self.pipeline_parallel_size
self.num_group = self.world_size // self.rank_num_per_group
self.num_tensor_parallel_group = self.world_size // self.tensor_parallel_size
Copy link
Contributor

@QiaolingChen00 QiaolingChen00 Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里不用管 pp 了?


def _get_expert_parallel_ranks(self):
"""
Create expert and data parallel groups
Example: world_size = 8, model_parallel_size = 2, expert_parallel_size = 2
Example: world_size = 8, tensor_parallel_size = 2, expert_parallel_size = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP 遵循的方式?

)


class ISPLinear(ColumnParallelLinear):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为 ISP 设linear 是为了加 communicator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants