-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure times #439
Measure times #439
Conversation
This pull request was exported from Phabricator. Differential Revision: D46853794 |
ea251ca
to
04b08fb
Compare
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. They can be accessed at the start of the next iteration. This is done because we want to capture the time of the whole iteration, including callbacks, so it cannot be accessed by callbacks. Another possibility would be to keep a list of all the times and data times, not sure if that would be preferrable. Reviewed By: daniellepintz Differential Revision: D46853794 fbshipit-source-id: 439a9ae17bf867f5722cd86f21cd42599966bd9c
This pull request was exported from Phabricator. Differential Revision: D46853794 |
04b08fb
to
933cb21
Compare
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Reviewed By: daniellepintz Differential Revision: D46853794 fbshipit-source-id: 888c9559c48961eccb1f9a6ecbd976cecdb90100
This pull request was exported from Phabricator. Differential Revision: D46853794 |
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Differential Revision: D46853794 fbshipit-source-id: 87d0cd58963c6cfd15c2e497801ed993668b6d29
933cb21
to
2087ed2
Compare
This pull request was exported from Phabricator. Differential Revision: D46853794 |
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Differential Revision: D46853794 fbshipit-source-id: 64a4f55f8fb0985831c0cd94e93eb7be67a30faa
2087ed2
to
90b75d4
Compare
This pull request was exported from Phabricator. Differential Revision: D46853794 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D46853794 |
90b75d4
to
99aa3b0
Compare
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: 6d5c5d4b2ab3372c606f60a4c12f2a5e0092e867
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: 71178021e60a78c993474ae3bdfeb208aa1e4a88
99aa3b0
to
c836b76
Compare
Codecov Report
@@ Coverage Diff @@
## master #439 +/- ##
==========================================
+ Coverage 86.95% 87.02% +0.07%
==========================================
Files 106 106
Lines 8407 8455 +48
==========================================
+ Hits 7310 7358 +48
Misses 1097 1097
... and 26 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This pull request was exported from Phabricator. Differential Revision: D46853794 |
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: e0c74d7147a8da57f51ef8c9bd7bbb6ec38858c9
c836b76
to
e8c04dd
Compare
This pull request was exported from Phabricator. Differential Revision: D46853794 |
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: dafc454fb790d91debfd3ff452c49c0ff6f84135
e8c04dd
to
64da29c
Compare
Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: ac2e9f992f21ac66625f89d1c75d228397740e1f
This pull request was exported from Phabricator. Differential Revision: D46853794 |
64da29c
to
1fab468
Compare
Summary:
Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place.
They can be accessed at the start of the next iteration. This is done because we want to capture the time of the whole iteration, including callbacks, so it cannot be accessed by callbacks.
Another possibility would be to keep a list of all the times and data times, not sure if that would be preferrable.
Differential Revision: D46853794