-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Fewer syncs during calls to to
#819
Open
vmoens
wants to merge
2
commits into
main
Choose a base branch
from
less-sync
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jun 18, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 27.4520μs | 16.9719μs | 58.9210 KOps/s | 60.1777 KOps/s | |
test_plain_set_stack_nested | 47.1480μs | 17.9692μs | 55.6508 KOps/s | 58.2602 KOps/s | |
test_plain_set_nested_inplace | 48.0300μs | 20.0397μs | 49.9008 KOps/s | 52.1245 KOps/s | |
test_plain_set_stack_nested_inplace | 45.1840μs | 20.0824μs | 49.7948 KOps/s | 51.4836 KOps/s | |
test_items | 21.4900μs | 2.6558μs | 376.5401 KOps/s | 370.7209 KOps/s | |
test_items_nested | 1.2875ms | 0.2697ms | 3.7075 KOps/s | 3.7934 KOps/s | |
test_items_nested_locked | 0.4232ms | 0.2713ms | 3.6857 KOps/s | 3.7998 KOps/s | |
test_items_nested_leaf | 0.1498ms | 77.5556μs | 12.8940 KOps/s | 13.0905 KOps/s | |
test_items_stack_nested | 0.3307ms | 0.2733ms | 3.6597 KOps/s | 3.7117 KOps/s | |
test_items_stack_nested_leaf | 0.1516ms | 79.5025μs | 12.5782 KOps/s | 12.6924 KOps/s | |
test_items_stack_nested_locked | 0.4243ms | 0.2708ms | 3.6926 KOps/s | 3.7499 KOps/s | |
test_keys | 49.6840μs | 3.7922μs | 263.6985 KOps/s | 257.4315 KOps/s | |
test_keys_nested | 0.2848ms | 0.1413ms | 7.0784 KOps/s | 7.2194 KOps/s | |
test_keys_nested_locked | 0.7179ms | 0.1459ms | 6.8548 KOps/s | 7.0620 KOps/s | |
test_keys_nested_leaf | 0.2355ms | 0.1205ms | 8.2986 KOps/s | 8.6306 KOps/s | |
test_keys_stack_nested | 0.2034ms | 0.1399ms | 7.1471 KOps/s | 7.2552 KOps/s | |
test_keys_stack_nested_leaf | 0.2410ms | 0.1202ms | 8.3214 KOps/s | 8.6069 KOps/s | |
test_keys_stack_nested_locked | 0.2553ms | 0.1442ms | 6.9342 KOps/s | 7.0825 KOps/s | |
test_values | 8.7224μs | 1.1674μs | 856.5931 KOps/s | 876.7103 KOps/s | |
test_values_nested | 94.7880μs | 51.2814μs | 19.5002 KOps/s | 19.4146 KOps/s | |
test_values_nested_locked | 0.1052ms | 51.7041μs | 19.3408 KOps/s | 19.5007 KOps/s | |
test_values_nested_leaf | 93.4070μs | 46.4710μs | 21.5188 KOps/s | 21.4449 KOps/s | |
test_values_stack_nested | 94.8780μs | 52.4486μs | 19.0663 KOps/s | 19.1127 KOps/s | |
test_values_stack_nested_leaf | 98.1440μs | 46.6711μs | 21.4265 KOps/s | 21.2884 KOps/s | |
test_values_stack_nested_locked | 0.1074ms | 52.1143μs | 19.1886 KOps/s | 19.2780 KOps/s | |
test_membership | 30.3770μs | 1.3611μs | 734.6876 KOps/s | 754.4887 KOps/s | |
test_membership_nested | 32.5510μs | 3.4793μs | 287.4120 KOps/s | 294.1604 KOps/s | |
test_membership_nested_leaf | 23.0430μs | 3.5253μs | 283.6643 KOps/s | 281.6909 KOps/s | |
test_membership_stacked_nested | 31.8200μs | 3.4933μs | 286.2582 KOps/s | 296.6883 KOps/s | |
test_membership_stacked_nested_leaf | 41.0370μs | 3.4425μs | 290.4903 KOps/s | 295.0962 KOps/s | |
test_membership_nested_last | 45.0620μs | 4.2040μs | 237.8698 KOps/s | 241.1007 KOps/s | |
test_membership_nested_leaf_last | 47.5250μs | 4.2421μs | 235.7331 KOps/s | 237.8992 KOps/s | |
test_membership_stacked_nested_last | 40.3450μs | 4.8098μs | 207.9091 KOps/s | 187.4677 KOps/s | |
test_membership_stacked_nested_leaf_last | 44.8740μs | 4.8588μs | 205.8114 KOps/s | 189.6819 KOps/s | |
test_nested_getleaf | 47.2690μs | 11.3476μs | 88.1244 KOps/s | 95.5364 KOps/s | |
test_nested_get | 39.2930μs | 10.7850μs | 92.7211 KOps/s | 99.5649 KOps/s | |
test_stacked_getleaf | 31.6500μs | 11.3625μs | 88.0085 KOps/s | 97.0107 KOps/s | |
test_stacked_get | 50.3750μs | 10.7035μs | 93.4277 KOps/s | 101.9007 KOps/s | |
test_nested_getitemleaf | 43.0710μs | 11.7907μs | 84.8124 KOps/s | 89.7102 KOps/s | |
test_nested_getitem | 34.9760μs | 11.0513μs | 90.4868 KOps/s | 97.1827 KOps/s | |
test_stacked_getitemleaf | 41.9290μs | 11.7690μs | 84.9689 KOps/s | 90.7619 KOps/s | |
test_stacked_getitem | 36.9290μs | 11.0143μs | 90.7911 KOps/s | 98.1992 KOps/s | |
test_lock_nested | 0.7709ms | 0.3437ms | 2.9096 KOps/s | 2.9384 KOps/s | |
test_lock_stack_nested | 0.4281ms | 0.3105ms | 3.2207 KOps/s | 3.2712 KOps/s | |
test_unlock_nested | 0.7538ms | 0.3451ms | 2.8974 KOps/s | 2.9018 KOps/s | |
test_unlock_stack_nested | 0.3772ms | 0.3177ms | 3.1479 KOps/s | 3.2020 KOps/s | |
test_flatten_speed | 0.5498ms | 95.8806μs | 10.4296 KOps/s | 10.3721 KOps/s | |
test_unflatten_speed | 0.6185ms | 0.4246ms | 2.3552 KOps/s | 2.4576 KOps/s | |
test_common_ops | 3.3784ms | 0.7144ms | 1.3999 KOps/s | 1.4228 KOps/s | |
test_creation | 15.5390μs | 1.8912μs | 528.7603 KOps/s | 525.6850 KOps/s | |
test_creation_empty | 34.7660μs | 10.5216μs | 95.0427 KOps/s | 96.8261 KOps/s | |
test_creation_nested_1 | 57.8490μs | 13.3237μs | 75.0544 KOps/s | 76.5377 KOps/s | |
test_creation_nested_2 | 37.5710μs | 16.5537μs | 60.4093 KOps/s | 60.7456 KOps/s | |
test_clone | 97.1220μs | 13.5201μs | 73.9637 KOps/s | 74.5563 KOps/s | |
test_getitem[int] | 35.1160μs | 11.4210μs | 87.5582 KOps/s | 85.2826 KOps/s | |
test_getitem[slice_int] | 64.0800μs | 21.9760μs | 45.5042 KOps/s | 42.4712 KOps/s | |
test_getitem[range] | 78.8080μs | 58.8326μs | 16.9974 KOps/s | 16.7296 KOps/s | |
test_getitem[tuple] | 48.0610μs | 18.5085μs | 54.0292 KOps/s | 51.8040 KOps/s | |
test_getitem[list] | 0.1202ms | 41.0517μs | 24.3595 KOps/s | 23.8675 KOps/s | |
test_setitem_dim[int] | 64.5610μs | 34.4546μs | 29.0237 KOps/s | 27.0361 KOps/s | |
test_setitem_dim[slice_int] | 99.0560μs | 59.8697μs | 16.7029 KOps/s | 15.3170 KOps/s | |
test_setitem_dim[range] | 0.1288ms | 83.0458μs | 12.0415 KOps/s | 11.5278 KOps/s | |
test_setitem_dim[tuple] | 91.8320μs | 49.4098μs | 20.2389 KOps/s | 19.1029 KOps/s | |
test_setitem | 58.9410μs | 20.0971μs | 49.7584 KOps/s | 49.6156 KOps/s | |
test_set | 62.3170μs | 19.7327μs | 50.6774 KOps/s | 50.9618 KOps/s | |
test_set_shared | 1.1408ms | 0.1413ms | 7.0750 KOps/s | 6.9247 KOps/s | |
test_update | 0.1057ms | 21.6336μs | 46.2244 KOps/s | 47.2356 KOps/s | |
test_update_nested | 67.6770μs | 30.3228μs | 32.9785 KOps/s | 33.2471 KOps/s | |
test_update__nested | 55.8050μs | 26.1034μs | 38.3091 KOps/s | 39.7347 KOps/s | |
test_set_nested | 54.1620μs | 22.2873μs | 44.8685 KOps/s | 46.3313 KOps/s | |
test_set_nested_new | 61.1050μs | 26.3505μs | 37.9500 KOps/s | 38.7554 KOps/s | |
test_select | 0.9548ms | 41.9560μs | 23.8345 KOps/s | 23.4570 KOps/s | |
test_select_nested | 0.1221ms | 60.2343μs | 16.6018 KOps/s | 16.3883 KOps/s | |
test_exclude_nested | 0.1954ms | 0.1202ms | 8.3216 KOps/s | 8.1897 KOps/s | |
test_empty[True] | 0.4738ms | 0.3966ms | 2.5212 KOps/s | 2.5124 KOps/s | |
test_empty[False] | 7.0732μs | 1.1717μs | 853.4628 KOps/s | 871.1122 KOps/s | |
test_unbind_speed | 0.4084ms | 0.2539ms | 3.9392 KOps/s | 3.9213 KOps/s | |
test_unbind_speed_stack0 | 0.3485ms | 0.2527ms | 3.9567 KOps/s | 4.0123 KOps/s | |
test_unbind_speed_stack1 | 68.6579ms | 0.7236ms | 1.3819 KOps/s | 1.3934 KOps/s | |
test_split | 63.6140ms | 1.5800ms | 632.9043 Ops/s | 619.8271 Ops/s | |
test_chunk | 63.9552ms | 1.5890ms | 629.3255 Ops/s | 620.1725 Ops/s | |
test_creation[device0] | 0.1530ms | 83.5224μs | 11.9728 KOps/s | 11.7932 KOps/s | |
test_creation_from_tensor | 0.2163ms | 84.0522μs | 11.8974 KOps/s | 11.6311 KOps/s | |
test_add_one[memmap_tensor0] | 65.8630μs | 5.3638μs | 186.4352 KOps/s | 187.9410 KOps/s | |
test_contiguous[memmap_tensor0] | 6.7820μs | 0.6369μs | 1.5701 MOps/s | 1.6052 MOps/s | |
test_stack[memmap_tensor0] | 22.8120μs | 3.6477μs | 274.1436 KOps/s | 281.5431 KOps/s | |
test_memmaptd_index | 0.9910ms | 0.2554ms | 3.9160 KOps/s | 3.9734 KOps/s | |
test_memmaptd_index_astensor | 0.7441ms | 0.3288ms | 3.0415 KOps/s | 3.0633 KOps/s | |
test_memmaptd_index_op | 1.1634ms | 0.6124ms | 1.6330 KOps/s | 1.6728 KOps/s | |
test_serialize_model | 0.1723s | 0.1135s | 8.8127 Ops/s | 8.6590 Ops/s | |
test_serialize_model_pickle | 0.4471s | 0.3796s | 2.6344 Ops/s | 2.6218 Ops/s | |
test_serialize_weights | 0.1722s | 0.1115s | 8.9706 Ops/s | 9.5646 Ops/s | |
test_serialize_weights_returnearly | 0.1833s | 0.1301s | 7.6849 Ops/s | 7.3784 Ops/s | |
test_serialize_weights_pickle | 0.6937s | 0.4857s | 2.0587 Ops/s | 2.3532 Ops/s | |
test_serialize_weights_filesystem | 0.1598s | 0.1011s | 9.8893 Ops/s | 9.9802 Ops/s | |
test_serialize_model_filesystem | 94.3306ms | 92.0260ms | 10.8665 Ops/s | 10.7636 Ops/s | |
test_reshape_pytree | 62.9380μs | 25.2270μs | 39.6401 KOps/s | 39.7371 KOps/s | |
test_reshape_td | 95.4290μs | 34.2551μs | 29.1927 KOps/s | 28.9931 KOps/s | |
test_view_pytree | 63.5390μs | 25.4574μs | 39.2813 KOps/s | 40.0870 KOps/s | |
test_view_td | 82.5960μs | 38.1396μs | 26.2195 KOps/s | 25.7785 KOps/s | |
test_unbind_pytree | 66.2050μs | 28.9829μs | 34.5031 KOps/s | 34.4760 KOps/s | |
test_unbind_td | 0.3982ms | 37.9507μs | 26.3500 KOps/s | 26.3833 KOps/s | |
test_split_pytree | 66.5150μs | 29.5963μs | 33.7880 KOps/s | 34.8065 KOps/s | |
test_split_td | 0.1227ms | 40.7699μs | 24.5279 KOps/s | 24.2802 KOps/s | |
test_add_pytree | 0.1099ms | 37.1185μs | 26.9407 KOps/s | 28.8627 KOps/s | |
test_add_td | 0.1022ms | 56.6162μs | 17.6628 KOps/s | 17.7298 KOps/s | |
test_distributed | 0.2282ms | 0.1002ms | 9.9763 KOps/s | 9.8006 KOps/s | |
test_tdmodule | 29.8860μs | 18.0783μs | 55.3149 KOps/s | 50.2899 KOps/s | |
test_tdmodule_dispatch | 65.8330μs | 35.9529μs | 27.8142 KOps/s | 28.6405 KOps/s | |
test_tdseq | 44.3530μs | 21.4608μs | 46.5965 KOps/s | 47.7909 KOps/s | |
test_tdseq_dispatch | 64.1310μs | 41.3412μs | 24.1890 KOps/s | 21.2054 KOps/s | |
test_instantiation_functorch | 2.7877ms | 1.3253ms | 754.5553 Ops/s | 770.1453 Ops/s | |
test_instantiation_td | 70.0751ms | 1.1044ms | 905.4358 Ops/s | 989.6496 Ops/s | |
test_exec_functorch | 0.2249ms | 0.1634ms | 6.1188 KOps/s | 6.1792 KOps/s | |
test_exec_functional_call | 0.3177ms | 0.1503ms | 6.6524 KOps/s | 6.6214 KOps/s | |
test_exec_td | 0.2556ms | 0.1466ms | 6.8201 KOps/s | 6.8317 KOps/s | |
test_exec_td_decorator | 0.3500ms | 0.2243ms | 4.4581 KOps/s | 4.5178 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.9067ms | 0.4888ms | 2.0460 KOps/s | 2.0489 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.6940ms | 0.4849ms | 2.0624 KOps/s | 2.0542 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.5284ms | 0.3922ms | 2.5498 KOps/s | 2.5195 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6521ms | 0.3938ms | 2.5392 KOps/s | 2.5306 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.9865ms | 0.5539ms | 1.8053 KOps/s | 1.7778 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8642ms | 0.5519ms | 1.8120 KOps/s | 1.6283 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.6293ms | 0.4519ms | 2.2127 KOps/s | 2.1309 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7777ms | 0.4546ms | 2.1999 KOps/s | 2.0591 KOps/s | |
test_to_module_speed[True] | 2.3973ms | 1.7173ms | 582.3160 Ops/s | 596.3287 Ops/s | |
test_to_module_speed[False] | 2.3065ms | 1.6822ms | 594.4620 Ops/s | 602.0977 Ops/s | |
test_tc_init | 60.4840μs | 29.6378μs | 33.7407 KOps/s | 35.6483 KOps/s | |
test_tc_init_nested | 0.1459ms | 61.4368μs | 16.2769 KOps/s | 18.2784 KOps/s | |
test_tc_first_layer_tensor | 4.7289μs | 0.6984μs | 1.4318 MOps/s | 1.4685 MOps/s | |
test_tc_first_layer_nontensor | 1.8289μs | 0.6743μs | 1.4831 MOps/s | 1.5083 MOps/s | |
test_tc_second_layer_tensor | 21.7410μs | 1.8586μs | 538.0396 KOps/s | 541.7695 KOps/s | |
test_tc_second_layer_nontensor | 8.9970μs | 1.5321μs | 652.7104 KOps/s | 657.7871 KOps/s | |
test_unbind | 80.5611ms | 6.3890ms | 156.5196 Ops/s | 140.5268 Ops/s | |
test_full_like | 16.4915ms | 10.8209ms | 92.4136 Ops/s | 97.7183 Ops/s | |
test_zeros_like | 11.8999ms | 5.5794ms | 179.2316 Ops/s | 171.0368 Ops/s | |
test_ones_like | 14.0965ms | 5.9528ms | 167.9894 Ops/s | 164.8356 Ops/s | |
test_clone | 12.2555ms | 7.4496ms | 134.2349 Ops/s | 129.5448 Ops/s | |
test_squeeze | 68.2780μs | 14.9362μs | 66.9514 KOps/s | 70.9312 KOps/s | |
test_unsqueeze | 0.1110ms | 60.5221μs | 16.5229 KOps/s | 16.2168 KOps/s | |
test_split | 0.1696ms | 0.1120ms | 8.9263 KOps/s | 8.9043 KOps/s | |
test_permute | 0.2093ms | 0.1260ms | 7.9338 KOps/s | 7.8436 KOps/s | |
test_stack | 24.7824ms | 20.8681ms | 47.9199 Ops/s | 46.7340 Ops/s | |
test_cat | 24.4549ms | 20.8878ms | 47.8748 Ops/s | 46.3966 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1141ms | 13.6671μs | 73.1682 KOps/s | 78.0721 KOps/s | |
test_plain_set_stack_nested | 28.9510μs | 13.9635μs | 71.6152 KOps/s | 76.6288 KOps/s | |
test_plain_set_nested_inplace | 39.1300μs | 15.0363μs | 66.5056 KOps/s | 70.9863 KOps/s | |
test_plain_set_stack_nested_inplace | 39.7810μs | 15.0733μs | 66.3425 KOps/s | 70.2583 KOps/s | |
test_items | 23.0100μs | 4.6166μs | 216.6110 KOps/s | 217.5550 KOps/s | |
test_items_nested | 0.3831ms | 0.3360ms | 2.9759 KOps/s | 2.9795 KOps/s | |
test_items_nested_locked | 0.3931ms | 0.3386ms | 2.9534 KOps/s | 2.9827 KOps/s | |
test_items_nested_leaf | 0.1028ms | 82.4014μs | 12.1357 KOps/s | 12.1664 KOps/s | |
test_items_stack_nested | 0.4040ms | 0.3428ms | 2.9170 KOps/s | 2.9640 KOps/s | |
test_items_stack_nested_leaf | 0.1104ms | 84.0762μs | 11.8940 KOps/s | 12.1617 KOps/s | |
test_items_stack_nested_locked | 0.4155ms | 0.3434ms | 2.9124 KOps/s | 2.9696 KOps/s | |
test_keys | 31.5310μs | 4.2978μs | 232.6765 KOps/s | 230.2486 KOps/s | |
test_keys_nested | 91.2810μs | 67.7513μs | 14.7599 KOps/s | 14.9879 KOps/s | |
test_keys_nested_locked | 2.0737ms | 72.9738μs | 13.7036 KOps/s | 13.9437 KOps/s | |
test_keys_nested_leaf | 86.9510μs | 57.7867μs | 17.3050 KOps/s | 17.5431 KOps/s | |
test_keys_stack_nested | 96.6420μs | 67.8984μs | 14.7279 KOps/s | 14.9547 KOps/s | |
test_keys_stack_nested_leaf | 95.0210μs | 58.4006μs | 17.1231 KOps/s | 17.4023 KOps/s | |
test_keys_stack_nested_locked | 0.1030ms | 72.3150μs | 13.8284 KOps/s | 13.9753 KOps/s | |
test_values | 12.3037μs | 1.8059μs | 553.7391 KOps/s | 549.4052 KOps/s | |
test_values_nested | 65.2100μs | 35.1922μs | 28.4153 KOps/s | 28.2309 KOps/s | |
test_values_nested_locked | 61.5910μs | 36.8270μs | 27.1540 KOps/s | 27.1737 KOps/s | |
test_values_nested_leaf | 48.3710μs | 31.0585μs | 32.1973 KOps/s | 31.9263 KOps/s | |
test_values_stack_nested | 62.2110μs | 35.8252μs | 27.9133 KOps/s | 27.8208 KOps/s | |
test_values_stack_nested_leaf | 71.9810μs | 31.7500μs | 31.4961 KOps/s | 31.0021 KOps/s | |
test_values_stack_nested_locked | 65.1310μs | 37.7592μs | 26.4836 KOps/s | 26.5730 KOps/s | |
test_membership | 1.6405μs | 0.7002μs | 1.4281 MOps/s | 1.4289 MOps/s | |
test_membership_nested | 13.9910μs | 2.4741μs | 404.1912 KOps/s | 403.0279 KOps/s | |
test_membership_nested_leaf | 15.8400μs | 2.4679μs | 405.2100 KOps/s | 401.8616 KOps/s | |
test_membership_stacked_nested | 26.8310μs | 2.5236μs | 396.2577 KOps/s | 397.7349 KOps/s | |
test_membership_stacked_nested_leaf | 32.8710μs | 2.4729μs | 404.3821 KOps/s | 400.9003 KOps/s | |
test_membership_nested_last | 16.3190μs | 2.9954μs | 333.8439 KOps/s | 333.1456 KOps/s | |
test_membership_nested_leaf_last | 43.1010μs | 3.0109μs | 332.1263 KOps/s | 334.2476 KOps/s | |
test_membership_stacked_nested_last | 15.9800μs | 3.0028μs | 333.0225 KOps/s | 334.4661 KOps/s | |
test_membership_stacked_nested_leaf_last | 26.5500μs | 3.0054μs | 332.7296 KOps/s | 334.7012 KOps/s | |
test_nested_getleaf | 38.9300μs | 8.3285μs | 120.0695 KOps/s | 120.3894 KOps/s | |
test_nested_get | 31.5500μs | 7.7818μs | 128.5045 KOps/s | 127.2977 KOps/s | |
test_stacked_getleaf | 31.4000μs | 8.3107μs | 120.3263 KOps/s | 119.6821 KOps/s | |
test_stacked_get | 36.0310μs | 7.8422μs | 127.5148 KOps/s | 126.8039 KOps/s | |
test_nested_getitemleaf | 25.0410μs | 8.4593μs | 118.2132 KOps/s | 116.8286 KOps/s | |
test_nested_getitem | 31.7800μs | 8.0956μs | 123.5244 KOps/s | 124.2038 KOps/s | |
test_stacked_getitemleaf | 34.8710μs | 8.7148μs | 114.7473 KOps/s | 117.2304 KOps/s | |
test_stacked_getitem | 23.9600μs | 8.0673μs | 123.9571 KOps/s | 124.8569 KOps/s | |
test_lock_nested | 58.8258ms | 0.3960ms | 2.5254 KOps/s | 2.4835 KOps/s | |
test_lock_stack_nested | 0.3650ms | 0.2976ms | 3.3599 KOps/s | 3.3498 KOps/s | |
test_unlock_nested | 60.9245ms | 0.4036ms | 2.4778 KOps/s | 2.4605 KOps/s | |
test_unlock_stack_nested | 0.3616ms | 0.3087ms | 3.2389 KOps/s | 3.2517 KOps/s | |
test_flatten_speed | 0.3420ms | 0.1008ms | 9.9226 KOps/s | 9.9772 KOps/s | |
test_unflatten_speed | 0.3455ms | 0.2915ms | 3.4300 KOps/s | 3.4453 KOps/s | |
test_common_ops | 1.0860ms | 0.6060ms | 1.6501 KOps/s | 1.7184 KOps/s | |
test_creation | 36.5210μs | 1.6044μs | 623.2779 KOps/s | 625.1654 KOps/s | |
test_creation_empty | 24.6610μs | 10.3911μs | 96.2358 KOps/s | 113.1088 KOps/s | |
test_creation_nested_1 | 30.5510μs | 12.0518μs | 82.9755 KOps/s | 93.4381 KOps/s | |
test_creation_nested_2 | 41.8610μs | 14.3056μs | 69.9029 KOps/s | 78.2383 KOps/s | |
test_clone | 71.6810μs | 11.3747μs | 87.9146 KOps/s | 88.3223 KOps/s | |
test_getitem[int] | 35.8600μs | 10.5382μs | 94.8926 KOps/s | 91.9311 KOps/s | |
test_getitem[slice_int] | 36.6000μs | 20.0597μs | 49.8513 KOps/s | 48.7748 KOps/s | |
test_getitem[range] | 64.7200μs | 45.7488μs | 21.8585 KOps/s | 21.7729 KOps/s | |
test_getitem[tuple] | 42.1000μs | 18.2530μs | 54.7856 KOps/s | 54.9595 KOps/s | |
test_getitem[list] | 0.1592ms | 33.8962μs | 29.5019 KOps/s | 30.6659 KOps/s | |
test_setitem_dim[int] | 49.5610μs | 32.4252μs | 30.8402 KOps/s | 35.7706 KOps/s | |
test_setitem_dim[slice_int] | 73.9610μs | 52.9934μs | 18.8703 KOps/s | 20.4952 KOps/s | |
test_setitem_dim[range] | 96.7500μs | 71.6504μs | 13.9566 KOps/s | 15.4394 KOps/s | |
test_setitem_dim[tuple] | 67.0110μs | 45.6037μs | 21.9280 KOps/s | 24.3818 KOps/s | |
test_setitem | 57.4510μs | 16.8511μs | 59.3432 KOps/s | 61.3303 KOps/s | |
test_set | 47.7010μs | 16.3718μs | 61.0808 KOps/s | 64.7284 KOps/s | |
test_set_shared | 1.3730ms | 99.0246μs | 10.0985 KOps/s | 10.2984 KOps/s | |
test_update | 95.5210μs | 19.9502μs | 50.1247 KOps/s | 56.3009 KOps/s | |
test_update_nested | 87.0720μs | 24.5887μs | 40.6690 KOps/s | 43.7156 KOps/s | |
test_update__nested | 52.3510μs | 21.5634μs | 46.3748 KOps/s | 45.9103 KOps/s | |
test_set_nested | 61.5710μs | 17.1380μs | 58.3499 KOps/s | 59.6343 KOps/s | |
test_set_nested_new | 95.5720μs | 20.0822μs | 49.7952 KOps/s | 52.0651 KOps/s | |
test_select | 67.1610μs | 32.4807μs | 30.7875 KOps/s | 31.1908 KOps/s | |
test_select_nested | 89.3320μs | 54.9990μs | 18.1822 KOps/s | 18.2351 KOps/s | |
test_exclude_nested | 0.1563ms | 0.1104ms | 9.0576 KOps/s | 9.0741 KOps/s | |
test_empty[True] | 0.4027ms | 0.3397ms | 2.9436 KOps/s | 2.9382 KOps/s | |
test_empty[False] | 2.8911μs | 0.9237μs | 1.0826 MOps/s | 1.0901 MOps/s | |
test_to | 97.2020μs | 71.1886μs | 14.0472 KOps/s | 12.9805 KOps/s | |
test_to_nonblocking | 0.1143ms | 64.4849μs | 15.5075 KOps/s | 16.5345 KOps/s | |
test_unbind_speed | 0.2930ms | 0.2598ms | 3.8491 KOps/s | 3.8159 KOps/s | |
test_unbind_speed_stack0 | 0.3705ms | 0.2617ms | 3.8210 KOps/s | 3.8346 KOps/s | |
test_unbind_speed_stack1 | 75.8264ms | 0.7970ms | 1.2547 KOps/s | 1.2563 KOps/s | |
test_split | 76.2312ms | 1.6960ms | 589.6111 Ops/s | 590.5487 Ops/s | |
test_chunk | 76.1631ms | 1.7091ms | 585.1034 Ops/s | 591.2135 Ops/s | |
test_creation[device0] | 0.1319ms | 56.3764μs | 17.7379 KOps/s | 17.8680 KOps/s | |
test_creation_from_tensor | 0.1340ms | 52.8389μs | 18.9255 KOps/s | 18.8382 KOps/s | |
test_add_one[memmap_tensor0] | 79.1610μs | 6.8268μs | 146.4806 KOps/s | 149.4470 KOps/s | |
test_contiguous[memmap_tensor0] | 10.2010μs | 0.6282μs | 1.5919 MOps/s | 1.5684 MOps/s | |
test_stack[memmap_tensor0] | 28.8700μs | 4.7401μs | 210.9675 KOps/s | 215.7849 KOps/s | |
test_memmaptd_index | 1.0617ms | 0.2816ms | 3.5512 KOps/s | 3.5211 KOps/s | |
test_memmaptd_index_astensor | 0.6121ms | 0.3506ms | 2.8519 KOps/s | 2.8241 KOps/s | |
test_memmaptd_index_op | 0.9681ms | 0.6836ms | 1.4629 KOps/s | 1.5446 KOps/s | |
test_serialize_model | 0.1830s | 0.1109s | 9.0183 Ops/s | 8.5146 Ops/s | |
test_serialize_model_pickle | 1.3579s | 1.2357s | 0.8093 Ops/s | 0.8085 Ops/s | |
test_serialize_weights | 0.1806s | 0.1080s | 9.2550 Ops/s | 8.7874 Ops/s | |
test_serialize_weights_returnearly | 0.3002s | 0.1043s | 9.5919 Ops/s | 10.3566 Ops/s | |
test_serialize_weights_pickle | 1.3539s | 1.2485s | 0.8009 Ops/s | 0.8087 Ops/s | |
test_reshape_pytree | 48.8610μs | 25.6972μs | 38.9148 KOps/s | 38.3664 KOps/s | |
test_reshape_td | 74.6610μs | 30.1510μs | 33.1664 KOps/s | 32.2733 KOps/s | |
test_view_pytree | 90.6910μs | 25.8604μs | 38.6692 KOps/s | 39.2348 KOps/s | |
test_view_td | 61.2610μs | 35.5705μs | 28.1132 KOps/s | 28.2579 KOps/s | |
test_unbind_pytree | 54.3710μs | 31.4364μs | 31.8102 KOps/s | 31.9740 KOps/s | |
test_unbind_td | 0.4767ms | 40.3232μs | 24.7996 KOps/s | 24.1878 KOps/s | |
test_split_pytree | 59.3710μs | 34.3811μs | 29.0857 KOps/s | 28.1578 KOps/s | |
test_split_td | 0.1061ms | 38.7872μs | 25.7817 KOps/s | 25.5367 KOps/s | |
test_add_pytree | 74.3510μs | 37.9485μs | 26.3515 KOps/s | 26.4985 KOps/s | |
test_add_td | 85.1420μs | 57.3453μs | 17.4382 KOps/s | 19.7347 KOps/s | |
test_distributed | 2.8447ms | 88.4177μs | 11.3100 KOps/s | 14.9865 KOps/s | |
test_tdmodule | 30.9410μs | 15.5608μs | 64.2640 KOps/s | 67.5866 KOps/s | |
test_tdmodule_dispatch | 52.9200μs | 30.8563μs | 32.4083 KOps/s | 34.7692 KOps/s | |
test_tdseq | 33.4100μs | 17.4805μs | 57.2067 KOps/s | 60.3959 KOps/s | |
test_tdseq_dispatch | 51.5920μs | 34.3819μs | 29.0851 KOps/s | 31.1933 KOps/s | |
test_instantiation_functorch | 1.6502ms | 1.5217ms | 657.1749 Ops/s | 657.5885 Ops/s | |
test_instantiation_td | 1.5694ms | 1.0524ms | 950.2055 Ops/s | 961.1444 Ops/s | |
test_exec_functorch | 0.2351ms | 0.1488ms | 6.7196 KOps/s | 6.6723 KOps/s | |
test_exec_functional_call | 0.1832ms | 0.1349ms | 7.4104 KOps/s | 7.3576 KOps/s | |
test_exec_td | 0.1677ms | 0.1348ms | 7.4169 KOps/s | 7.4480 KOps/s | |
test_exec_td_decorator | 0.7006ms | 0.2072ms | 4.8258 KOps/s | 4.8106 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7542ms | 0.5755ms | 1.7376 KOps/s | 1.7722 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7149ms | 0.5770ms | 1.7331 KOps/s | 1.7481 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.5952ms | 0.5217ms | 1.9167 KOps/s | 2.0136 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6070ms | 0.5186ms | 1.9283 KOps/s | 2.0294 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.3773ms | 0.6363ms | 1.5716 KOps/s | 1.5967 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.7556ms | 0.6324ms | 1.5813 KOps/s | 1.5985 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7134ms | 0.5618ms | 1.7801 KOps/s | 1.8157 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7254ms | 0.5674ms | 1.7625 KOps/s | 1.8021 KOps/s | |
test_vmap_transformer_speed[True-True] | 7.6819ms | 7.3985ms | 135.1632 Ops/s | 134.2127 Ops/s | |
test_vmap_transformer_speed[True-False] | 7.6557ms | 7.3623ms | 135.8265 Ops/s | 129.0384 Ops/s | |
test_vmap_transformer_speed[False-True] | 7.8032ms | 7.4004ms | 135.1281 Ops/s | 133.1076 Ops/s | |
test_vmap_transformer_speed[False-False] | 7.7914ms | 7.4244ms | 134.6906 Ops/s | 133.4651 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 18.7091ms | 18.1782ms | 55.0111 Ops/s | 54.3071 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 18.4415ms | 18.0661ms | 55.3522 Ops/s | 53.9662 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 18.5552ms | 18.0265ms | 55.4739 Ops/s | 54.5605 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 18.8151ms | 18.0841ms | 55.2972 Ops/s | 54.8361 Ops/s | |
test_to_module_speed[True] | 1.6920ms | 1.5440ms | 647.6740 Ops/s | 630.1736 Ops/s | |
test_to_module_speed[False] | 2.0284ms | 1.5262ms | 655.2077 Ops/s | 647.1554 Ops/s | |
test_tc_init | 50.9910μs | 29.1908μs | 34.2574 KOps/s | 38.1760 KOps/s | |
test_tc_init_nested | 88.6620μs | 57.4431μs | 17.4085 KOps/s | 18.1597 KOps/s | |
test_tc_first_layer_tensor | 3.2050μs | 0.3965μs | 2.5218 MOps/s | 2.8059 MOps/s | |
test_tc_first_layer_nontensor | 1.4724μs | 0.3897μs | 2.5662 MOps/s | 2.5836 MOps/s | |
test_tc_second_layer_tensor | 3.8680μs | 0.9687μs | 1.0324 MOps/s | 936.8412 KOps/s | |
test_tc_second_layer_nontensor | 2.6539μs | 0.8064μs | 1.2400 MOps/s | 1.2162 MOps/s | |
test_unbind | 0.1102s | 6.6968ms | 149.3243 Ops/s | 204.3132 Ops/s | |
test_full_like | 13.7695ms | 13.1701ms | 75.9298 Ops/s | 75.3013 Ops/s | |
test_zeros_like | 7.9992ms | 7.8265ms | 127.7714 Ops/s | 128.3861 Ops/s | |
test_ones_like | 8.0176ms | 7.8113ms | 128.0200 Ops/s | 126.9401 Ops/s | |
test_clone | 9.5457ms | 9.3962ms | 106.4257 Ops/s | 107.3349 Ops/s | |
test_squeeze | 71.5220μs | 10.5980μs | 94.3570 KOps/s | 94.1865 KOps/s | |
test_unsqueeze | 0.1196ms | 49.8651μs | 20.0541 KOps/s | 20.0138 KOps/s | |
test_split | 0.1478ms | 98.2883μs | 10.1742 KOps/s | 9.9351 KOps/s | |
test_permute | 0.1622ms | 0.1096ms | 9.1279 KOps/s | 8.6173 KOps/s | |
test_stack | 27.6940ms | 27.3483ms | 36.5653 Ops/s | 36.7886 Ops/s | |
test_cat | 27.3653ms | 27.1452ms | 36.8389 Ops/s | 36.8376 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.