Skip to content

[Feature] Packing #3060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 29, 2025
Merged

[Feature] Packing #3060

merged 3 commits into from
Jul 29, 2025

Conversation

vmoens
Copy link
Collaborator

@vmoens vmoens commented Jul 11, 2025

No description provided.

Copy link

pytorch-bot bot commented Jul 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3060

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2025
@vmoens vmoens added the enhancement New feature or request label Jul 11, 2025
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 154. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}20$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.2366μs 83.0857μs 12.0358 KOps/s 12.2790 KOps/s $\color{#d91a1a}-1.98\%$
test_tensor_to_bytestream_speed[torch.save] 0.1400ms 0.1390ms 7.1953 KOps/s 7.0755 KOps/s $\color{#35bf28}+1.69\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1095s 0.1091s 9.1659 Ops/s 8.7207 Ops/s $\textbf{\color{#35bf28}+5.10\%}$
test_tensor_to_bytestream_speed[numpy] 2.8131μs 2.7921μs 358.1589 KOps/s 357.3443 KOps/s $\color{#35bf28}+0.23\%$
test_tensor_to_bytestream_speed[safetensors] 44.2459μs 41.6773μs 23.9939 KOps/s 24.3689 KOps/s $\color{#d91a1a}-1.54\%$
test_simple 0.5380s 0.5375s 1.8606 Ops/s 1.7827 Ops/s $\color{#35bf28}+4.37\%$
test_transformed 1.1024s 1.1016s 0.9078 Ops/s 0.8837 Ops/s $\color{#35bf28}+2.72\%$
test_serial 1.6501s 1.6479s 0.6068 Ops/s 0.5878 Ops/s $\color{#35bf28}+3.23\%$
test_parallel 1.1521s 1.0726s 0.9323 Ops/s 0.9340 Ops/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-True-True-True-True] 0.2593ms 44.4365μs 22.5040 KOps/s 22.4955 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[True-True-True-True-False] 50.9410μs 25.0910μs 39.8549 KOps/s 40.1068 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-True-True-False-True] 56.7510μs 25.1343μs 39.7862 KOps/s 39.9053 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-True-True-False-False] 37.4810μs 13.9240μs 71.8186 KOps/s 72.5448 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-True-False-True-True] 79.6710μs 46.9959μs 21.2785 KOps/s 21.0248 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-True-False] 63.6310μs 27.2315μs 36.7221 KOps/s 36.7234 KOps/s $-0.00\%$
test_step_mdp_speed[True-True-False-False-True] 58.2810μs 27.3689μs 36.5379 KOps/s 36.1451 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[True-True-False-False-False] 43.8810μs 16.3017μs 61.3434 KOps/s 60.8619 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[True-False-True-True-True] 0.1001ms 50.0215μs 19.9914 KOps/s 20.3037 KOps/s $\color{#d91a1a}-1.54\%$
test_step_mdp_speed[True-False-True-True-False] 63.4910μs 30.4144μs 32.8791 KOps/s 33.3975 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[True-False-True-False-True] 56.9310μs 27.5987μs 36.2335 KOps/s 36.1391 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[True-False-True-False-False] 46.5100μs 16.4950μs 60.6244 KOps/s 60.4739 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-False-True-True] 85.1510μs 51.8102μs 19.3012 KOps/s 19.3987 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[True-False-False-True-False] 64.5410μs 32.6589μs 30.6195 KOps/s 31.0102 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-False-False-False-True] 74.4310μs 30.5875μs 32.6931 KOps/s 33.1539 KOps/s $\color{#d91a1a}-1.39\%$
test_step_mdp_speed[True-False-False-False-False] 55.5210μs 19.2649μs 51.9078 KOps/s 52.4671 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[False-True-True-True-True] 81.9020μs 50.1096μs 19.9563 KOps/s 20.4545 KOps/s $\color{#d91a1a}-2.44\%$
test_step_mdp_speed[False-True-True-True-False] 66.2610μs 30.1960μs 33.1170 KOps/s 33.6540 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-True-True-False-True] 60.7510μs 31.0365μs 32.2201 KOps/s 32.0887 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-True-True-False-False] 47.0010μs 18.2350μs 54.8395 KOps/s 54.7106 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[False-True-False-True-True] 2.7563ms 52.7758μs 18.9481 KOps/s 18.9917 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-True-False-True-False] 66.7420μs 32.9470μs 30.3517 KOps/s 30.0644 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-True-False-False-True] 65.4310μs 34.7440μs 28.7819 KOps/s 29.2723 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[False-True-False-False-False] 50.8610μs 21.4696μs 46.5775 KOps/s 46.7671 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[False-False-True-True-True] 86.5720μs 55.1240μs 18.1409 KOps/s 18.1275 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[False-False-True-True-False] 72.8110μs 35.5187μs 28.1541 KOps/s 28.2581 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-False-True-False-True] 65.7710μs 33.6958μs 29.6773 KOps/s 29.2649 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-False-True-False-False] 60.0310μs 21.3005μs 46.9473 KOps/s 46.8440 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-False-False-True-True] 86.2420μs 57.4419μs 17.4089 KOps/s 17.4185 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-False-False-True-False] 85.2710μs 37.4437μs 26.7068 KOps/s 26.2896 KOps/s $\color{#35bf28}+1.59\%$
test_step_mdp_speed[False-False-False-False-True] 64.9210μs 35.7500μs 27.9721 KOps/s 27.6342 KOps/s $\color{#35bf28}+1.22\%$
test_step_mdp_speed[False-False-False-False-False] 60.6610μs 23.3167μs 42.8877 KOps/s 43.1850 KOps/s $\color{#d91a1a}-0.69\%$
test_values[generalized_advantage_estimate-True-True] 10.7984ms 10.4735ms 95.4793 Ops/s 92.4498 Ops/s $\color{#35bf28}+3.28\%$
test_values[vec_generalized_advantage_estimate-True-True] 15.4368ms 10.9578ms 91.2594 Ops/s 90.1687 Ops/s $\color{#35bf28}+1.21\%$
test_values[td0_return_estimate-False-False] 0.2448ms 0.1241ms 8.0612 KOps/s 7.6402 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_values[td1_return_estimate-False-False] 29.5218ms 27.8321ms 35.9297 Ops/s 35.0628 Ops/s $\color{#35bf28}+2.47\%$
test_values[vec_td1_return_estimate-False-False] 11.4251ms 11.0543ms 90.4624 Ops/s 90.1999 Ops/s $\color{#35bf28}+0.29\%$
test_values[td_lambda_return_estimate-True-False] 43.1043ms 41.0405ms 24.3661 Ops/s 23.5572 Ops/s $\color{#35bf28}+3.43\%$
test_values[vec_td_lambda_return_estimate-True-False] 11.1939ms 11.0001ms 90.9083 Ops/s 90.3092 Ops/s $\color{#35bf28}+0.66\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.3387ms 9.2560ms 108.0376 Ops/s 103.9379 Ops/s $\color{#35bf28}+3.94\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7116ms 1.5352ms 651.3856 Ops/s 634.7972 Ops/s $\color{#35bf28}+2.61\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5133ms 0.4156ms 2.4063 KOps/s 2.3907 KOps/s $\color{#35bf28}+0.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.6329ms 29.7882ms 33.5704 Ops/s 32.0021 Ops/s $\color{#35bf28}+4.90\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8405ms 1.7267ms 579.1352 Ops/s 585.7618 Ops/s $\color{#d91a1a}-1.13\%$
test_dqn_speed[False-None] 6.6000ms 1.4591ms 685.3335 Ops/s 725.3459 Ops/s $\textbf{\color{#d91a1a}-5.52\%}$
test_dqn_speed[False-backward] 2.0066ms 1.9315ms 517.7345 Ops/s 532.3785 Ops/s $\color{#d91a1a}-2.75\%$
test_dqn_speed[True-None] 0.7648ms 0.5244ms 1.9069 KOps/s 1.9023 KOps/s $\color{#35bf28}+0.24\%$
test_dqn_speed[True-backward] 1.0006ms 0.9603ms 1.0414 KOps/s 892.4032 Ops/s $\textbf{\color{#35bf28}+16.69\%}$
test_dqn_speed[reduce-overhead-None] 0.6077ms 0.5192ms 1.9259 KOps/s 1.8783 KOps/s $\color{#35bf28}+2.53\%$
test_dqn_speed[reduce-overhead-backward] 1.0096ms 0.9674ms 1.0337 KOps/s 1.0213 KOps/s $\color{#35bf28}+1.21\%$
test_ddpg_speed[False-None] 3.1690ms 2.9298ms 341.3248 Ops/s 356.2304 Ops/s $\color{#d91a1a}-4.18\%$
test_ddpg_speed[False-backward] 4.2418ms 4.1361ms 241.7743 Ops/s 248.5391 Ops/s $\color{#d91a1a}-2.72\%$
test_ddpg_speed[True-None] 1.5981ms 1.3596ms 735.5160 Ops/s 711.1404 Ops/s $\color{#35bf28}+3.43\%$
test_ddpg_speed[True-backward] 2.4392ms 2.3335ms 428.5394 Ops/s 410.9188 Ops/s $\color{#35bf28}+4.29\%$
test_ddpg_speed[reduce-overhead-None] 1.5596ms 1.3625ms 733.9411 Ops/s 710.9848 Ops/s $\color{#35bf28}+3.23\%$
test_ddpg_speed[reduce-overhead-backward] 2.4500ms 2.3604ms 423.6514 Ops/s 417.9015 Ops/s $\color{#35bf28}+1.38\%$
test_sac_speed[False-None] 8.3189ms 7.8981ms 126.6122 Ops/s 128.2820 Ops/s $\color{#d91a1a}-1.30\%$
test_sac_speed[False-backward] 11.6945ms 11.1611ms 89.5967 Ops/s 91.4597 Ops/s $\color{#d91a1a}-2.04\%$
test_sac_speed[True-None] 2.2262ms 2.0888ms 478.7488 Ops/s 469.0788 Ops/s $\color{#35bf28}+2.06\%$
test_sac_speed[True-backward] 4.1050ms 4.0033ms 249.7957 Ops/s 238.9572 Ops/s $\color{#35bf28}+4.54\%$
test_sac_speed[reduce-overhead-None] 2.2886ms 2.1098ms 473.9703 Ops/s 464.4550 Ops/s $\color{#35bf28}+2.05\%$
test_sac_speed[reduce-overhead-backward] 4.1290ms 4.0078ms 249.5153 Ops/s 246.5224 Ops/s $\color{#35bf28}+1.21\%$
test_redq_speed[False-None] 10.9423ms 10.5024ms 95.2164 Ops/s 95.6485 Ops/s $\color{#d91a1a}-0.45\%$
test_redq_speed[False-backward] 19.1329ms 18.0675ms 55.3481 Ops/s 55.7262 Ops/s $\color{#d91a1a}-0.68\%$
test_redq_speed[True-None] 4.5564ms 4.2815ms 233.5644 Ops/s 234.3201 Ops/s $\color{#d91a1a}-0.32\%$
test_redq_speed[True-backward] 10.1334ms 9.7741ms 102.3108 Ops/s 100.6577 Ops/s $\color{#35bf28}+1.64\%$
test_redq_speed[reduce-overhead-None] 4.4840ms 4.3000ms 232.5579 Ops/s 229.5399 Ops/s $\color{#35bf28}+1.31\%$
test_redq_speed[reduce-overhead-backward] 10.1625ms 9.8268ms 101.7623 Ops/s 101.2239 Ops/s $\color{#35bf28}+0.53\%$
test_redq_deprec_speed[False-None] 12.0007ms 11.3743ms 87.9175 Ops/s 92.0192 Ops/s $\color{#d91a1a}-4.46\%$
test_redq_deprec_speed[False-backward] 16.7670ms 16.3062ms 61.3262 Ops/s 63.1069 Ops/s $\color{#d91a1a}-2.82\%$
test_redq_deprec_speed[True-None] 3.7449ms 3.5758ms 279.6538 Ops/s 269.1968 Ops/s $\color{#35bf28}+3.88\%$
test_redq_deprec_speed[True-backward] 7.6461ms 7.4195ms 134.7797 Ops/s 131.8175 Ops/s $\color{#35bf28}+2.25\%$
test_redq_deprec_speed[reduce-overhead-None] 3.7083ms 3.5347ms 282.9109 Ops/s 280.3787 Ops/s $\color{#35bf28}+0.90\%$
test_redq_deprec_speed[reduce-overhead-backward] 7.6867ms 7.4182ms 134.8036 Ops/s 133.2242 Ops/s $\color{#35bf28}+1.19\%$
test_td3_speed[False-None] 8.0668ms 7.9235ms 126.2070 Ops/s 127.8663 Ops/s $\color{#d91a1a}-1.30\%$
test_td3_speed[False-backward] 11.2966ms 10.7621ms 92.9187 Ops/s 95.1314 Ops/s $\color{#d91a1a}-2.33\%$
test_td3_speed[True-None] 1.8143ms 1.7790ms 562.1186 Ops/s 552.5525 Ops/s $\color{#35bf28}+1.73\%$
test_td3_speed[True-backward] 3.6416ms 3.5124ms 284.7062 Ops/s 249.8826 Ops/s $\textbf{\color{#35bf28}+13.94\%}$
test_td3_speed[reduce-overhead-None] 1.8222ms 1.7955ms 556.9442 Ops/s 560.5691 Ops/s $\color{#d91a1a}-0.65\%$
test_td3_speed[reduce-overhead-backward] 3.6527ms 3.5362ms 282.7904 Ops/s 281.0515 Ops/s $\color{#35bf28}+0.62\%$
test_cql_speed[False-None] 30.3939ms 27.6570ms 36.1572 Ops/s 39.2154 Ops/s $\textbf{\color{#d91a1a}-7.80\%}$
test_cql_speed[False-backward] 37.9995ms 36.7144ms 27.2372 Ops/s 28.1074 Ops/s $\color{#d91a1a}-3.10\%$
test_cql_speed[True-None] 14.4811ms 12.2001ms 81.9664 Ops/s 81.3057 Ops/s $\color{#35bf28}+0.81\%$
test_cql_speed[True-backward] 18.3354ms 17.8774ms 55.9364 Ops/s 57.6815 Ops/s $\color{#d91a1a}-3.03\%$
test_cql_speed[reduce-overhead-None] 12.2185ms 11.9778ms 83.4877 Ops/s 82.9472 Ops/s $\color{#35bf28}+0.65\%$
test_cql_speed[reduce-overhead-backward] 18.2794ms 17.7761ms 56.2554 Ops/s 58.1049 Ops/s $\color{#d91a1a}-3.18\%$
test_a2c_speed[False-None] 6.6022ms 6.4522ms 154.9855 Ops/s 190.6744 Ops/s $\textbf{\color{#d91a1a}-18.72\%}$
test_a2c_speed[False-backward] 12.9575ms 12.5854ms 79.4573 Ops/s 85.5327 Ops/s $\textbf{\color{#d91a1a}-7.10\%}$
test_a2c_speed[True-None] 3.7825ms 3.6559ms 273.5316 Ops/s 273.6516 Ops/s $\color{#d91a1a}-0.04\%$
test_a2c_speed[True-backward] 9.1638ms 8.5050ms 117.5783 Ops/s 108.6395 Ops/s $\textbf{\color{#35bf28}+8.23\%}$
test_a2c_speed[reduce-overhead-None] 3.7557ms 3.6397ms 274.7511 Ops/s 271.5540 Ops/s $\color{#35bf28}+1.18\%$
test_a2c_speed[reduce-overhead-backward] 8.9194ms 8.4805ms 117.9172 Ops/s 117.3814 Ops/s $\color{#35bf28}+0.46\%$
test_ppo_speed[False-None] 7.1621ms 6.7195ms 148.8199 Ops/s 172.6991 Ops/s $\textbf{\color{#d91a1a}-13.83\%}$
test_ppo_speed[False-backward] 13.6198ms 13.1641ms 75.9642 Ops/s 80.5492 Ops/s $\textbf{\color{#d91a1a}-5.69\%}$
test_ppo_speed[True-None] 3.7262ms 3.6046ms 277.4201 Ops/s 267.8134 Ops/s $\color{#35bf28}+3.59\%$
test_ppo_speed[True-backward] 8.5844ms 8.3913ms 119.1714 Ops/s 118.0163 Ops/s $\color{#35bf28}+0.98\%$
test_ppo_speed[reduce-overhead-None] 3.8517ms 3.5900ms 278.5550 Ops/s 274.2287 Ops/s $\color{#35bf28}+1.58\%$
test_ppo_speed[reduce-overhead-backward] 8.7220ms 8.3700ms 119.4748 Ops/s 119.6008 Ops/s $\color{#d91a1a}-0.11\%$
test_reinforce_speed[False-None] 5.8390ms 5.3874ms 185.6187 Ops/s 220.4700 Ops/s $\textbf{\color{#d91a1a}-15.81\%}$
test_reinforce_speed[False-backward] 8.5255ms 8.0950ms 123.5334 Ops/s 134.9039 Ops/s $\textbf{\color{#d91a1a}-8.43\%}$
test_reinforce_speed[True-None] 2.9556ms 2.8291ms 353.4690 Ops/s 350.6766 Ops/s $\color{#35bf28}+0.80\%$
test_reinforce_speed[True-backward] 8.5070ms 7.5855ms 131.8308 Ops/s 130.8994 Ops/s $\color{#35bf28}+0.71\%$
test_reinforce_speed[reduce-overhead-None] 2.9496ms 2.8037ms 356.6735 Ops/s 352.3678 Ops/s $\color{#35bf28}+1.22\%$
test_reinforce_speed[reduce-overhead-backward] 7.6663ms 7.5299ms 132.8046 Ops/s 130.2287 Ops/s $\color{#35bf28}+1.98\%$
test_iql_speed[False-None] 27.2776ms 22.0648ms 45.3210 Ops/s 50.6235 Ops/s $\textbf{\color{#d91a1a}-10.47\%}$
test_iql_speed[False-backward] 32.9131ms 31.9689ms 31.2804 Ops/s 32.9128 Ops/s $\color{#d91a1a}-4.96\%$
test_iql_speed[True-None] 8.7982ms 8.3206ms 120.1839 Ops/s 115.4281 Ops/s $\color{#35bf28}+4.12\%$
test_iql_speed[True-backward] 16.5926ms 16.2900ms 61.3875 Ops/s 59.2583 Ops/s $\color{#35bf28}+3.59\%$
test_iql_speed[reduce-overhead-None] 8.6462ms 8.3396ms 119.9100 Ops/s 119.0631 Ops/s $\color{#35bf28}+0.71\%$
test_iql_speed[reduce-overhead-backward] 16.7576ms 16.3166ms 61.2873 Ops/s 60.6448 Ops/s $\color{#35bf28}+1.06\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.5786ms 6.1501ms 162.5993 Ops/s 162.6605 Ops/s $\color{#d91a1a}-0.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8869ms 0.3711ms 2.6943 KOps/s 3.1009 KOps/s $\textbf{\color{#d91a1a}-13.11\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6821ms 0.3397ms 2.9438 KOps/s 3.7716 KOps/s $\textbf{\color{#d91a1a}-21.95\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2754ms 5.8294ms 171.5436 Ops/s 168.6690 Ops/s $\color{#35bf28}+1.70\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1959ms 0.3115ms 3.2108 KOps/s 3.5733 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7649ms 0.3094ms 3.2324 KOps/s 3.5714 KOps/s $\textbf{\color{#d91a1a}-9.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6347ms 1.3445ms 743.7552 Ops/s 815.3767 Ops/s $\textbf{\color{#d91a1a}-8.78\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7171ms 1.2792ms 781.7535 Ops/s 873.6460 Ops/s $\textbf{\color{#d91a1a}-10.52\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4011ms 6.0078ms 166.4504 Ops/s 164.8987 Ops/s $\color{#35bf28}+0.94\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9698ms 0.4184ms 2.3901 KOps/s 2.0894 KOps/s $\textbf{\color{#35bf28}+14.39\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8355ms 0.4272ms 2.3410 KOps/s 2.4035 KOps/s $\color{#d91a1a}-2.60\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2518ms 5.8814ms 170.0262 Ops/s 167.4772 Ops/s $\color{#35bf28}+1.52\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9734ms 0.3433ms 2.9130 KOps/s 3.6887 KOps/s $\textbf{\color{#d91a1a}-21.03\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5353ms 0.3441ms 2.9065 KOps/s 3.9027 KOps/s $\textbf{\color{#d91a1a}-25.52\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 9.9364ms 5.9255ms 168.7625 Ops/s 167.7529 Ops/s $\color{#35bf28}+0.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7162ms 0.2668ms 3.7485 KOps/s 3.3128 KOps/s $\textbf{\color{#35bf28}+13.15\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5809s 0.9811ms 1.0192 KOps/s 3.1339 KOps/s $\textbf{\color{#d91a1a}-67.48\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4586ms 6.0640ms 164.9084 Ops/s 163.7261 Ops/s $\color{#35bf28}+0.72\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2427ms 0.4877ms 2.0503 KOps/s 2.1680 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7849ms 0.4698ms 2.1288 KOps/s 2.1877 KOps/s $\color{#d91a1a}-2.70\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.3735ms 5.6509ms 176.9638 Ops/s 180.2121 Ops/s $\color{#d91a1a}-1.80\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.4228ms 2.0736ms 482.2451 Ops/s 432.1165 Ops/s $\textbf{\color{#35bf28}+11.60\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.3414ms 1.1723ms 853.0478 Ops/s 824.3398 Ops/s $\color{#35bf28}+3.48\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4327s 14.1413ms 70.7150 Ops/s 60.7570 Ops/s $\textbf{\color{#35bf28}+16.39\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.1791ms 1.9984ms 500.3943 Ops/s 488.8560 Ops/s $\color{#35bf28}+2.36\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.5401ms 1.2470ms 801.9319 Ops/s 864.8438 Ops/s $\textbf{\color{#d91a1a}-7.27\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.7646ms 5.8702ms 170.3533 Ops/s 172.7100 Ops/s $\color{#d91a1a}-1.36\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.9844ms 2.1920ms 456.1978 Ops/s 438.7708 Ops/s $\color{#35bf28}+3.97\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.0727ms 1.3451ms 743.4482 Ops/s 777.8286 Ops/s $\color{#d91a1a}-4.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 63.6300ms 60.0585ms 16.6504 Ops/s 17.0768 Ops/s $\color{#d91a1a}-2.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.2520ms 16.2474ms 61.5485 Ops/s 61.0794 Ops/s $\color{#35bf28}+0.77\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 62.5204ms 57.9428ms 17.2584 Ops/s 16.6862 Ops/s $\color{#35bf28}+3.43\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 17.8857ms 16.4466ms 60.8027 Ops/s 59.2160 Ops/s $\color{#35bf28}+2.68\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 59.6087ms 57.3390ms 17.4401 Ops/s 17.0174 Ops/s $\color{#35bf28}+2.48\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.3099ms 17.9453ms 55.7249 Ops/s 54.5168 Ops/s $\color{#35bf28}+2.22\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.7034μs 81.1357μs 12.3250 KOps/s 12.1754 KOps/s $\color{#35bf28}+1.23\%$
test_tensor_to_bytestream_speed[torch.save] 0.1438ms 0.1422ms 7.0300 KOps/s 7.0437 KOps/s $\color{#d91a1a}-0.19\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1255s 0.1246s 8.0228 Ops/s 8.6102 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_tensor_to_bytestream_speed[numpy] 2.8365μs 2.8285μs 353.5406 KOps/s 348.3811 KOps/s $\color{#35bf28}+1.48\%$
test_tensor_to_bytestream_speed[safetensors] 43.4156μs 41.1050μs 24.3279 KOps/s 24.5975 KOps/s $\color{#d91a1a}-1.10\%$
test_simple 0.7894s 0.7848s 1.2742 Ops/s 1.2788 Ops/s $\color{#d91a1a}-0.35\%$
test_transformed 1.4067s 1.4050s 0.7118 Ops/s 0.7067 Ops/s $\color{#35bf28}+0.72\%$
test_serial 2.3606s 2.3494s 0.4256 Ops/s 0.4333 Ops/s $\color{#d91a1a}-1.76\%$
test_parallel 1.9385s 1.8764s 0.5329 Ops/s 0.5347 Ops/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[True-True-True-True-True] 0.2097ms 43.6783μs 22.8947 KOps/s 22.5523 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[True-True-True-True-False] 55.1110μs 25.0967μs 39.8459 KOps/s 39.7829 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[True-True-True-False-True] 56.0620μs 25.3052μs 39.5176 KOps/s 40.2519 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-True-True-False-False] 57.2910μs 13.9228μs 71.8247 KOps/s 72.6257 KOps/s $\color{#d91a1a}-1.10\%$
test_step_mdp_speed[True-True-False-True-True] 94.5130μs 47.3380μs 21.1247 KOps/s 21.4003 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-False-True-False] 58.7020μs 27.7633μs 36.0187 KOps/s 36.3310 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-True-False-False-True] 56.4310μs 27.5299μs 36.3242 KOps/s 36.1521 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-True-False-False-False] 56.4710μs 16.5351μs 60.4775 KOps/s 60.2401 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[True-False-True-True-True] 86.3030μs 50.2993μs 19.8810 KOps/s 20.1778 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[True-False-True-True-False] 61.6920μs 30.3111μs 32.9912 KOps/s 33.0876 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[True-False-True-False-True] 56.4410μs 27.5823μs 36.2552 KOps/s 36.1461 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-False-True-False-False] 35.4410μs 16.4143μs 60.9225 KOps/s 59.9028 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-False-True-True] 85.6130μs 52.0464μs 19.2136 KOps/s 19.1302 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-False-False-True-False] 88.0120μs 32.5331μs 30.7379 KOps/s 30.6504 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-False-False-True] 57.6020μs 29.5616μs 33.8276 KOps/s 33.2610 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-False-False-False] 50.7510μs 19.0151μs 52.5897 KOps/s 52.3732 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-True-True-True-True] 79.9120μs 49.5580μs 20.1784 KOps/s 20.2785 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-True-True-True-False] 71.2410μs 30.6008μs 32.6788 KOps/s 33.2704 KOps/s $\color{#d91a1a}-1.78\%$
test_step_mdp_speed[False-True-True-False-True] 66.6510μs 31.6739μs 31.5717 KOps/s 31.6074 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[False-True-True-False-False] 48.6510μs 18.5039μs 54.0426 KOps/s 53.2939 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[False-True-False-True-True] 2.8860ms 52.9944μs 18.8699 KOps/s 18.8718 KOps/s $-0.01\%$
test_step_mdp_speed[False-True-False-True-False] 63.5920μs 32.9415μs 30.3569 KOps/s 30.3774 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-False-False-True] 67.4420μs 33.6407μs 29.7259 KOps/s 28.7110 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[False-True-False-False-False] 62.7120μs 21.4590μs 46.6004 KOps/s 46.4705 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[False-False-True-True-True] 92.4620μs 55.4692μs 18.0280 KOps/s 18.3260 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[False-False-True-True-False] 73.2320μs 35.5598μs 28.1216 KOps/s 28.1892 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-False-True-False-True] 68.5810μs 33.8042μs 29.5821 KOps/s 28.7344 KOps/s $\color{#35bf28}+2.95\%$
test_step_mdp_speed[False-False-True-False-False] 44.1610μs 20.6898μs 48.3331 KOps/s 46.9044 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[False-False-False-True-True] 95.7730μs 55.7023μs 17.9526 KOps/s 17.5551 KOps/s $\color{#35bf28}+2.26\%$
test_step_mdp_speed[False-False-False-True-False] 61.2320μs 37.3412μs 26.7800 KOps/s 26.6124 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-False-False-False-True] 73.4320μs 36.2630μs 27.5763 KOps/s 27.8635 KOps/s $\color{#d91a1a}-1.03\%$
test_step_mdp_speed[False-False-False-False-False] 50.2120μs 23.1198μs 43.2530 KOps/s 42.7046 KOps/s $\color{#35bf28}+1.28\%$
test_values[generalized_advantage_estimate-True-True] 23.8466ms 23.4701ms 42.6074 Ops/s 46.1458 Ops/s $\textbf{\color{#d91a1a}-7.67\%}$
test_values[vec_generalized_advantage_estimate-True-True] 0.1476s 3.8660ms 258.6672 Ops/s 256.1574 Ops/s $\color{#35bf28}+0.98\%$
test_values[td0_return_estimate-False-False] 0.1131ms 84.1050μs 11.8899 KOps/s 12.5051 KOps/s $\color{#d91a1a}-4.92\%$
test_values[td1_return_estimate-False-False] 56.7844ms 55.4871ms 18.0222 Ops/s 19.6525 Ops/s $\textbf{\color{#d91a1a}-8.30\%}$
test_values[vec_td1_return_estimate-False-False] 1.4640ms 1.1318ms 883.5491 Ops/s 913.2972 Ops/s $\color{#d91a1a}-3.26\%$
test_values[td_lambda_return_estimate-True-False] 91.3488ms 90.5843ms 11.0394 Ops/s 12.3662 Ops/s $\textbf{\color{#d91a1a}-10.73\%}$
test_values[vec_td_lambda_return_estimate-True-False] 1.4379ms 1.1177ms 894.6781 Ops/s 917.0254 Ops/s $\color{#d91a1a}-2.44\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.0740ms 23.7447ms 42.1147 Ops/s 43.9853 Ops/s $\color{#d91a1a}-4.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0600ms 0.8042ms 1.2434 KOps/s 1.3347 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8661ms 0.7199ms 1.3891 KOps/s 1.4992 KOps/s $\textbf{\color{#d91a1a}-7.35\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5774ms 1.5329ms 652.3614 Ops/s 674.3347 Ops/s $\color{#d91a1a}-3.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7781ms 0.7371ms 1.3566 KOps/s 1.4651 KOps/s $\textbf{\color{#d91a1a}-7.40\%}$
test_dqn_speed[False-None] 7.2556ms 1.6469ms 607.2196 Ops/s 647.0482 Ops/s $\textbf{\color{#d91a1a}-6.16\%}$
test_dqn_speed[False-backward] 2.4521ms 2.2865ms 437.3564 Ops/s 452.0902 Ops/s $\color{#d91a1a}-3.26\%$
test_dqn_speed[True-None] 0.7385ms 0.6123ms 1.6331 KOps/s 1.7166 KOps/s $\color{#d91a1a}-4.87\%$
test_dqn_speed[True-backward] 1.3459ms 1.2956ms 771.8423 Ops/s 763.6822 Ops/s $\color{#35bf28}+1.07\%$
test_dqn_speed[reduce-overhead-None] 0.6624ms 0.6008ms 1.6644 KOps/s 1.6190 KOps/s $\color{#35bf28}+2.81\%$
test_dqn_speed[reduce-overhead-backward] 1.1745ms 1.1265ms 887.6747 Ops/s 981.9883 Ops/s $\textbf{\color{#d91a1a}-9.60\%}$
test_ddpg_speed[False-None] 3.2638ms 2.9449ms 339.5758 Ops/s 339.3561 Ops/s $\color{#35bf28}+0.06\%$
test_ddpg_speed[False-backward] 4.8625ms 4.4860ms 222.9174 Ops/s 231.4139 Ops/s $\color{#d91a1a}-3.67\%$
test_ddpg_speed[True-None] 1.5065ms 1.3905ms 719.1655 Ops/s 712.6069 Ops/s $\color{#35bf28}+0.92\%$
test_ddpg_speed[True-backward] 2.8344ms 2.7917ms 358.2060 Ops/s 371.3805 Ops/s $\color{#d91a1a}-3.55\%$
test_ddpg_speed[reduce-overhead-None] 1.5248ms 1.4232ms 702.6355 Ops/s 698.6615 Ops/s $\color{#35bf28}+0.57\%$
test_ddpg_speed[reduce-overhead-backward] 0.1943s 0.1896s 5.2732 Ops/s 4.3348 Ops/s $\textbf{\color{#35bf28}+21.65\%}$
test_sac_speed[False-None] 8.7321ms 8.2246ms 121.5859 Ops/s 121.7124 Ops/s $\color{#d91a1a}-0.10\%$
test_sac_speed[False-backward] 12.1557ms 11.6429ms 85.8896 Ops/s 87.9928 Ops/s $\color{#d91a1a}-2.39\%$
test_sac_speed[True-None] 2.0189ms 1.9211ms 520.5289 Ops/s 516.4793 Ops/s $\color{#35bf28}+0.78\%$
test_sac_speed[True-backward] 4.0923ms 3.9670ms 252.0769 Ops/s 260.1672 Ops/s $\color{#d91a1a}-3.11\%$
test_sac_speed[reduce-overhead-None] 20.1734ms 11.5425ms 86.6361 Ops/s 86.2437 Ops/s $\color{#35bf28}+0.46\%$
test_sac_speed[reduce-overhead-backward] 1.8130ms 1.7710ms 564.6561 Ops/s 610.0111 Ops/s $\textbf{\color{#d91a1a}-7.44\%}$
test_redq_deprec_speed[False-None] 9.5777ms 9.1917ms 108.7938 Ops/s 107.9021 Ops/s $\color{#35bf28}+0.83\%$
test_redq_deprec_speed[False-backward] 13.1438ms 12.6808ms 78.8595 Ops/s 79.4626 Ops/s $\color{#d91a1a}-0.76\%$
test_redq_deprec_speed[True-None] 2.9169ms 2.5684ms 389.3457 Ops/s 383.7831 Ops/s $\color{#35bf28}+1.45\%$
test_redq_deprec_speed[True-backward] 4.9928ms 4.6058ms 217.1155 Ops/s 217.4790 Ops/s $\color{#d91a1a}-0.17\%$
test_redq_deprec_speed[reduce-overhead-None] 2.6186ms 2.5618ms 390.3577 Ops/s 383.9889 Ops/s $\color{#35bf28}+1.66\%$
test_redq_deprec_speed[reduce-overhead-backward] 5.0935ms 4.6628ms 214.4655 Ops/s 219.6603 Ops/s $\color{#d91a1a}-2.36\%$
test_td3_speed[False-None] 8.1780ms 8.1189ms 123.1689 Ops/s 122.7157 Ops/s $\color{#35bf28}+0.37\%$
test_td3_speed[False-backward] 11.4604ms 10.8902ms 91.8253 Ops/s 93.1250 Ops/s $\color{#d91a1a}-1.40\%$
test_td3_speed[True-None] 1.8386ms 1.7450ms 573.0725 Ops/s 564.1805 Ops/s $\color{#35bf28}+1.58\%$
test_td3_speed[True-backward] 3.6425ms 3.5700ms 280.1147 Ops/s 289.2994 Ops/s $\color{#d91a1a}-3.17\%$
test_td3_speed[reduce-overhead-None] 50.6448ms 25.8578ms 38.6731 Ops/s 38.2533 Ops/s $\color{#35bf28}+1.10\%$
test_td3_speed[reduce-overhead-backward] 1.5066ms 1.4619ms 684.0562 Ops/s 749.5158 Ops/s $\textbf{\color{#d91a1a}-8.73\%}$
test_cql_speed[False-None] 17.4673ms 17.0587ms 58.6211 Ops/s 58.3680 Ops/s $\color{#35bf28}+0.43\%$
test_cql_speed[False-backward] 23.3770ms 22.8253ms 43.8110 Ops/s 44.0797 Ops/s $\color{#d91a1a}-0.61\%$
test_cql_speed[True-None] 3.5513ms 3.4577ms 289.2131 Ops/s 282.9813 Ops/s $\color{#35bf28}+2.20\%$
test_cql_speed[True-backward] 6.6565ms 6.0432ms 165.4744 Ops/s 162.6444 Ops/s $\color{#35bf28}+1.74\%$
test_cql_speed[reduce-overhead-None] 19.8431ms 12.5560ms 79.6432 Ops/s 79.3645 Ops/s $\color{#35bf28}+0.35\%$
test_cql_speed[reduce-overhead-backward] 2.0032ms 1.9329ms 517.3587 Ops/s 562.3510 Ops/s $\textbf{\color{#d91a1a}-8.00\%}$
test_a2c_speed[False-None] 3.3205ms 3.2494ms 307.7457 Ops/s 305.5892 Ops/s $\color{#35bf28}+0.71\%$
test_a2c_speed[False-backward] 7.2120ms 6.5746ms 152.1009 Ops/s 156.0007 Ops/s $\color{#d91a1a}-2.50\%$
test_a2c_speed[True-None] 1.4474ms 1.3532ms 738.9814 Ops/s 736.0370 Ops/s $\color{#35bf28}+0.40\%$
test_a2c_speed[True-backward] 3.3772ms 3.2955ms 303.4398 Ops/s 317.8270 Ops/s $\color{#d91a1a}-4.53\%$
test_a2c_speed[reduce-overhead-None] 15.7415ms 8.8562ms 112.9156 Ops/s 112.5129 Ops/s $\color{#35bf28}+0.36\%$
test_a2c_speed[reduce-overhead-backward] 1.7095ms 1.6024ms 624.0768 Ops/s 694.7184 Ops/s $\textbf{\color{#d91a1a}-10.17\%}$
test_ppo_speed[False-None] 4.0450ms 3.8638ms 258.8157 Ops/s 256.7526 Ops/s $\color{#35bf28}+0.80\%$
test_ppo_speed[False-backward] 7.8050ms 7.3781ms 135.5367 Ops/s 137.8184 Ops/s $\color{#d91a1a}-1.66\%$
test_ppo_speed[True-None] 1.5396ms 1.4544ms 687.5633 Ops/s 661.4968 Ops/s $\color{#35bf28}+3.94\%$
test_ppo_speed[True-backward] 3.7869ms 3.4675ms 288.3916 Ops/s 299.8416 Ops/s $\color{#d91a1a}-3.82\%$
test_ppo_speed[reduce-overhead-None] 1.6300ms 1.4416ms 693.6974 Ops/s 677.3470 Ops/s $\color{#35bf28}+2.41\%$
test_ppo_speed[reduce-overhead-backward] 3.5369ms 3.4524ms 289.6541 Ops/s 297.8126 Ops/s $\color{#d91a1a}-2.74\%$
test_reinforce_speed[False-None] 2.5399ms 2.3493ms 425.6659 Ops/s 424.3820 Ops/s $\color{#35bf28}+0.30\%$
test_reinforce_speed[False-backward] 3.5805ms 3.5167ms 284.3606 Ops/s 292.1657 Ops/s $\color{#d91a1a}-2.67\%$
test_reinforce_speed[True-None] 1.3971ms 1.3135ms 761.3468 Ops/s 757.6281 Ops/s $\color{#35bf28}+0.49\%$
test_reinforce_speed[True-backward] 3.3344ms 3.2436ms 308.3024 Ops/s 302.6675 Ops/s $\color{#35bf28}+1.86\%$
test_reinforce_speed[reduce-overhead-None] 18.8613ms 10.2684ms 97.3857 Ops/s 96.6353 Ops/s $\color{#35bf28}+0.78\%$
test_reinforce_speed[reduce-overhead-backward] 1.6872ms 1.6374ms 610.7225 Ops/s 660.5616 Ops/s $\textbf{\color{#d91a1a}-7.54\%}$
test_iql_speed[False-None] 10.0161ms 9.5183ms 105.0613 Ops/s 105.2117 Ops/s $\color{#d91a1a}-0.14\%$
test_iql_speed[False-backward] 14.3676ms 13.8088ms 72.4175 Ops/s 74.1204 Ops/s $\color{#d91a1a}-2.30\%$
test_iql_speed[True-None] 2.4571ms 2.3262ms 429.8765 Ops/s 428.3255 Ops/s $\color{#35bf28}+0.36\%$
test_iql_speed[True-backward] 5.6220ms 5.1986ms 192.3584 Ops/s 193.9166 Ops/s $\color{#d91a1a}-0.80\%$
test_iql_speed[reduce-overhead-None] 17.8265ms 10.6854ms 93.5853 Ops/s 91.9132 Ops/s $\color{#35bf28}+1.82\%$
test_iql_speed[reduce-overhead-backward] 2.1524ms 2.0354ms 491.2981 Ops/s 476.7638 Ops/s $\color{#35bf28}+3.05\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8211ms 6.3052ms 158.6000 Ops/s 158.3764 Ops/s $\color{#35bf28}+0.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6512ms 0.3395ms 2.9458 KOps/s 3.1785 KOps/s $\textbf{\color{#d91a1a}-7.32\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5807ms 0.3461ms 2.8893 KOps/s 3.6015 KOps/s $\textbf{\color{#d91a1a}-19.77\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4231ms 6.0275ms 165.9065 Ops/s 164.5818 Ops/s $\color{#35bf28}+0.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8929ms 0.3055ms 3.2730 KOps/s 3.1009 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6531ms 0.3270ms 3.0582 KOps/s 3.5039 KOps/s $\textbf{\color{#d91a1a}-12.72\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6739ms 1.4120ms 708.2147 Ops/s 704.0758 Ops/s $\color{#35bf28}+0.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6421ms 1.3100ms 763.3553 Ops/s 754.3009 Ops/s $\color{#35bf28}+1.20\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4364ms 6.2324ms 160.4516 Ops/s 161.0771 Ops/s $\color{#d91a1a}-0.39\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8649ms 0.4538ms 2.2035 KOps/s 2.2112 KOps/s $\color{#d91a1a}-0.35\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6355ms 0.3964ms 2.5227 KOps/s 2.3893 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2525ms 6.0052ms 166.5220 Ops/s 167.4580 Ops/s $\color{#d91a1a}-0.56\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9721ms 0.3336ms 2.9976 KOps/s 3.4323 KOps/s $\textbf{\color{#d91a1a}-12.66\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6320ms 0.3161ms 3.1638 KOps/s 3.4204 KOps/s $\textbf{\color{#d91a1a}-7.50\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2721ms 5.8994ms 169.5084 Ops/s 168.5509 Ops/s $\color{#35bf28}+0.57\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0340ms 0.2719ms 3.6783 KOps/s 2.8632 KOps/s $\textbf{\color{#35bf28}+28.47\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6433ms 0.3320ms 3.0117 KOps/s 3.3299 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2853ms 6.0544ms 165.1686 Ops/s 164.3643 Ops/s $\color{#35bf28}+0.49\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0785ms 0.4151ms 2.4093 KOps/s 2.3409 KOps/s $\color{#35bf28}+2.92\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6271ms 0.3972ms 2.5179 KOps/s 2.2219 KOps/s $\textbf{\color{#35bf28}+13.32\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.2384ms 5.5825ms 179.1325 Ops/s 51.2455 Ops/s $\textbf{\color{#35bf28}+249.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.5233ms 2.1345ms 468.4970 Ops/s 461.6026 Ops/s $\color{#35bf28}+1.49\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.1076ms 1.2995ms 769.5107 Ops/s 792.9116 Ops/s $\color{#d91a1a}-2.95\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.5291ms 5.7234ms 174.7199 Ops/s 177.2595 Ops/s $\color{#d91a1a}-1.43\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.4913s 11.9403ms 83.7497 Ops/s 463.1710 Ops/s $\textbf{\color{#d91a1a}-81.92\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.5512ms 1.2731ms 785.4945 Ops/s 823.3861 Ops/s $\color{#d91a1a}-4.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.6589ms 5.8811ms 170.0354 Ops/s 168.3925 Ops/s $\color{#35bf28}+0.98\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.6468ms 2.3299ms 429.2051 Ops/s 440.5107 Ops/s $\color{#d91a1a}-2.57\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.5411ms 1.3403ms 746.1262 Ops/s 699.0558 Ops/s $\textbf{\color{#35bf28}+6.73\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 61.6782ms 59.0480ms 16.9354 Ops/s 16.8992 Ops/s $\color{#35bf28}+0.21\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.0671ms 17.2733ms 57.8927 Ops/s 59.5224 Ops/s $\color{#d91a1a}-2.74\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 60.7245ms 58.9771ms 16.9557 Ops/s 16.7587 Ops/s $\color{#35bf28}+1.18\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.5211ms 17.1262ms 58.3901 Ops/s 57.5571 Ops/s $\color{#35bf28}+1.45\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 63.9061ms 61.1748ms 16.3466 Ops/s 16.8158 Ops/s $\color{#d91a1a}-2.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.1415ms 18.7860ms 53.2311 Ops/s 53.3259 Ops/s $\color{#d91a1a}-0.18\%$

@vmoens vmoens merged commit 3f10cb1 into main Jul 29, 2025
51 of 63 checks passed
@vmoens vmoens deleted the packed-seq branch July 29, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants