Measure times #439

miqueljubert · 2023-07-03T15:27:38Z

Summary:
Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place.

They can be accessed at the start of the next iteration. This is done because we want to capture the time of the whole iteration, including callbacks, so it cannot be accessed by callbacks.

Another possibility would be to keep a list of all the times and data times, not sure if that would be preferrable.

Differential Revision: D46853794

facebook-github-bot · 2023-07-03T15:28:21Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. They can be accessed at the start of the next iteration. This is done because we want to capture the time of the whole iteration, including callbacks, so it cannot be accessed by callbacks. Another possibility would be to keep a list of all the times and data times, not sure if that would be preferrable. Reviewed By: daniellepintz Differential Revision: D46853794 fbshipit-source-id: 439a9ae17bf867f5722cd86f21cd42599966bd9c

facebook-github-bot · 2023-07-17T15:03:37Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Reviewed By: daniellepintz Differential Revision: D46853794 fbshipit-source-id: 888c9559c48961eccb1f9a6ecbd976cecdb90100

facebook-github-bot · 2023-08-07T14:14:51Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored on the state, not sure if the unit would be a better place. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Differential Revision: D46853794 fbshipit-source-id: 87d0cd58963c6cfd15c2e497801ed993668b6d29

facebook-github-bot · 2023-08-07T14:41:43Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, all of them are stored. Differential Revision: D46853794 fbshipit-source-id: 64a4f55f8fb0985831c0cd94e93eb7be67a30faa

facebook-github-bot · 2023-08-08T11:43:52Z

This pull request was exported from Phabricator. Differential Revision: D46853794

facebook-github-bot · 2023-08-18T14:08:46Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: 6d5c5d4b2ab3372c606f60a4c12f2a5e0092e867

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: 71178021e60a78c993474ae3bdfeb208aa1e4a88

codecov · 2023-08-21T16:43:57Z

Codecov Report

Merging #439 (c836b76) into master (a690136) will increase coverage by 0.07%.
The diff coverage is 100.00%.

❗ Current head c836b76 differs from pull request most recent head 1fab468. Consider uploading reports for the commit 1fab468 to get more accurate results

@@            Coverage Diff             @@
##           master     #439      +/-   ##
==========================================
+ Coverage   86.95%   87.02%   +0.07%     
==========================================
  Files         106      106              
  Lines        8407     8455      +48     
==========================================
+ Hits         7310     7358      +48     
  Misses       1097     1097

Files Changed	Coverage Δ
tests/framework/test_train.py	`100.00% <100.00%> (ø)`
tests/utils/test_timer.py	`92.64% <100.00%> (+0.64%)`	⬆️
torchtnt/framework/auto_unit.py	`79.74% <100.00%> (+0.06%)`	⬆️
torchtnt/framework/state.py	`100.00% <100.00%> (ø)`
torchtnt/framework/train.py	`97.75% <100.00%> (+0.02%)`	⬆️
torchtnt/utils/timer.py	`94.11% <100.00%> (+0.61%)`	⬆️

... and 26 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

facebook-github-bot · 2023-08-23T07:07:31Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: e0c74d7147a8da57f51ef8c9bd7bbb6ec38858c9

facebook-github-bot · 2023-08-23T08:17:48Z

This pull request was exported from Phabricator. Differential Revision: D46853794

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: dafc454fb790d91debfd3ff452c49c0ff6f84135

Summary: Pull Request resolved: pytorch#439 Add two counters, containing the iteration and time blocked on data for the last iteration. These are stored and recorded using a timer that does not synchronise with CUDA, and that only records these two values. - iteration time is recorded in the training loop for all jobs. - data time is recorded if the training loop does the data fetching, otherwise the user needs to instrument the logic which reads the data from the iterable. Had to rework the approach slightly since the state does not seem the right place. With the changes, it is more natural to make it similar to progress, and store the info in a Stateful. This has the additional advantage that all values should be storable and be restored with the checkpoints. Also rather than the last value, the last LOWER_BOUND values are stored. I hardcoded this in the torchtnt state, to not bloat parameters, since intuitively the last 1e4 values for each timer action here should be enough for monitoring purposes. Reviewed By: daniellepintz, ananthsub Differential Revision: D46853794 fbshipit-source-id: ac2e9f992f21ac66625f89d1c75d228397740e1f

facebook-github-bot · 2023-08-23T08:24:31Z

This pull request was exported from Phabricator. Differential Revision: D46853794

facebook-github-bot added cla signed fb-exported labels Jul 3, 2023

miqueljubert force-pushed the export-D46853794 branch from ea251ca to 04b08fb Compare July 17, 2023 15:03

miqueljubert force-pushed the export-D46853794 branch from 04b08fb to 933cb21 Compare August 7, 2023 14:14

miqueljubert force-pushed the export-D46853794 branch from 933cb21 to 2087ed2 Compare August 7, 2023 14:41

miqueljubert force-pushed the export-D46853794 branch from 2087ed2 to 90b75d4 Compare August 8, 2023 11:43

miqueljubert force-pushed the export-D46853794 branch from 90b75d4 to 99aa3b0 Compare August 18, 2023 14:08

miqueljubert force-pushed the export-D46853794 branch from 99aa3b0 to c836b76 Compare August 21, 2023 09:13

miqueljubert force-pushed the export-D46853794 branch from c836b76 to e8c04dd Compare August 23, 2023 07:07

miqueljubert force-pushed the export-D46853794 branch from e8c04dd to 64da29c Compare August 23, 2023 08:17

miqueljubert force-pushed the export-D46853794 branch from 64da29c to 1fab468 Compare August 23, 2023 08:24

facebook-github-bot closed this in 77e380f Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure times #439

Measure times #439

miqueljubert commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 17, 2023

facebook-github-bot commented Aug 7, 2023

facebook-github-bot commented Aug 7, 2023

facebook-github-bot commented Aug 8, 2023

facebook-github-bot commented Aug 18, 2023

codecov bot commented Aug 21, 2023 •

edited

Loading

facebook-github-bot commented Aug 23, 2023

facebook-github-bot commented Aug 23, 2023

facebook-github-bot commented Aug 23, 2023

Measure times #439

Measure times #439

Conversation

miqueljubert commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 17, 2023

facebook-github-bot commented Aug 7, 2023

facebook-github-bot commented Aug 7, 2023

facebook-github-bot commented Aug 8, 2023

facebook-github-bot commented Aug 18, 2023

codecov bot commented Aug 21, 2023 • edited Loading

Codecov Report

facebook-github-bot commented Aug 23, 2023

facebook-github-bot commented Aug 23, 2023

facebook-github-bot commented Aug 23, 2023

codecov bot commented Aug 21, 2023 •

edited

Loading