Skip to content

[tests] Speed up AnimateDiff test #7707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Apr 18, 2024

What does this PR do?

Part of #7677

Before:

pytest -s tests/pipelines/animatediff/test_animatediff.py  213.80s user 10.60s system 182% cpu 2:02.86 total

After:

pytest -s tests/pipelines/animatediff/test_animatediff.py  40.48s user 4.53s system 88% cpu 51.084 total

Errors before these changes:

FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_float16_inference - AssertionError: nan not less than 0.05 : The outputs of the fp16 and fp32 pipelines are too different.
FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_from_pipe_consistent_forward_pass_cpu_offload - AssertionError: 0.011593044 not less than 0.001 : The outputs of the pipelines created with `from_pipe` and `__init__` are different.
FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_save_load_float16 - AssertionError: nan not less than 0.01 : The output of the fp16 pipeline changed after saving and loading.
FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_vae_tiling - TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Errors after these changes (need help fixing):

FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_from_pipe_consistent_config - RuntimeError: Error(s) in loading state_dict for ModuleList:
FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_from_pipe_consistent_forward_pass_cpu_offload - AssertionError: 0.0140529275 not less than 0.001 : The outputs of the pipelines created with `from_pipe` and `__init__` are different.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

@@ -1493,7 +1492,7 @@ def __init__(
TransformerTemporalModel(
num_attention_heads=temporal_num_attention_heads,
in_channels=out_channels,
norm_num_groups=temporal_norm_num_groups,
norm_num_groups=resnet_groups,
Copy link
Member Author

@a-r-r-o-w a-r-r-o-w Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in DownBlockMotion, we make use of resnet_groups too and temporal_norm_num_groups does not seem to be present there. I think this is correct but not too sure. Not doing this causes multiple errors with the group norm in_channels being indivisible by num_groups. There is no control over temporal_norm_num_groups parameter because it is not present in get_up_block(). Everywhere else that passes norm_num_groups into TemporalTransformerModel does what is present here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not touch the default values here please especially when we're not too sure. Will defer to @DN6 to verify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this change should be safe. I think it was a mistake here passing in temporal_norm_num_groups

Copy link
Member Author

@a-r-r-o-w a-r-r-o-w Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! What are your thoughts on the first failing test after these changes? If I make any changes there, multiple other tests are most likely going to break. We are trying to load hf-internal-testing/tiny-stable-diffusion-pipe weights into a different model definition due to the changes in unet causing weight mismatch...

Copy link
Collaborator

@DN6 DN6 Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay. We can make a change to that model repo or change the test specifically for animatediff.

@a-r-r-o-w
Copy link
Member Author

Hi @sayakpaul @DN6, some updates on the failing tests.

FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_from_pipe_consistent_forward_pass_cpu_offload

This fails in both old and new version of tests and I believe it is somewhat device dependent. I tested on different diffusers checkouts but seems like it always fails locally for me but not on CI.

FAILED tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_from_pipe_consistent_config - RuntimeError: Error(s) in loading state_dict for ModuleList:

This test might need some rework because it assumes that the underlying model is the same as hf-internal-testing/tiny-stable-diffusion-pipe but since we've modified UNet to make the tests faster, it no longer can load the state dict correctly.

@yiyixuxu yiyixuxu requested a review from sayakpaul April 22, 2024 21:45
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. But let's maybe not change the default value when we're not too sure.

@sayakpaul sayakpaul requested a review from DN6 April 23, 2024 02:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants