Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataPipe] incremental shuffle #404

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

tmbdev
Copy link
Contributor

@tmbdev tmbdev commented May 13, 2022

This PR adds a filter that performs inline shuffling. Unlike the default shuffle, the incremental shuffle permits fast startup by having both a minimal buffer size and a maximum buffer size. As soon as more than the minimum buffer size samples have been read, samples are passed on to the next stage.

This filter is commonly used with WebDataset to shuffle training samples inline.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2022
@msaroufim
Copy link
Member

msaroufim commented May 14, 2022

@msaroufim msaroufim requested review from ejguan and msaroufim May 17, 2022 00:09
@VitalyFedyunin VitalyFedyunin changed the title incremental shuffle [DataPipe] incremental shuffle May 19, 2022
@VitalyFedyunin VitalyFedyunin self-requested a review May 19, 2022 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants