Skip to content

[WIP] Add img2img #3426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 17, 2023
Merged

[WIP] Add img2img #3426

merged 10 commits into from
May 17, 2023

Conversation

ayushtues
Copy link
Contributor

Adding img2img pipeline for Kandinksy, part of #3308

@ayushtues ayushtues changed the title [WIP ]Add img2img [WIP] Add img2img May 13, 2023
@ayushtues
Copy link
Contributor Author

@yiyixuxu the UnCLIP scheduler doesn't have an add_noise function, which is needed for img2img to noise the image to a particular timestep, I am not familiar with UnCLIP scheduler, but is it the same as DDPM?

@yiyixuxu
Copy link
Collaborator

I am not familiar with UnCLIP scheduler, but is it the same as DDPM?

yes yes!! see this comments here #3308 (comment) - let's just use DDPM for img2img. We will need to swap unclip scheduler with DDPM for other 2 pipelines too! but we can wait to do that later

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@ayushtues
Copy link
Contributor Author

ayushtues commented May 14, 2023

Got some initial results working

from diffusers import KandinskyPipeline, KandinskyPriorPipeline, KandinskyImg2ImgPipeline
# from diffusers.src.pipelines import KandinskyImg2ImgPipeline
from transformers import AutoTokenizer
import requests
from PIL import Image
from io import BytesIO
from diffusers import DDPMScheduler
import torch
import numpy as np
import gc

ddpm_config = {
  "clip_sample": True,
  "clip_sample_range": 2.0,
  "sample_max_value": None,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "variance_type": "learned_range",
  "thresholding": True,
  "beta_schedule": "linear",
  "beta_start": 0.00085,
  "beta_end":0.012
}

url = "https://preview.redd.it/yu4maxz3dxo91.jpg?width=1024&format=pjpg&auto=webp&v=enabled&s=64ebd870f1f0cab6c94b5ee75ca03a53a1070068"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 768))

prompt = "A red cartoon frog, 4k"
device = 'cuda'
batch_size=1 


# # create prior 
pipe_prior = KandinskyPriorPipeline.from_pretrained("YiYiXu/Kandinsky-prior")
pipe_prior.to("cuda")

# use prior to generate image_emb based on our prompt
generator = torch.Generator(device=device).manual_seed(0)
image_emb = pipe_prior(prompt, generator=generator,)
zero_image_emb = pipe_prior("")


pipe = KandinskyPipeline.from_pretrained("YiYiXu/Kandinsky")
ddpm = DDPMScheduler(**ddpm_config)
generator = torch.Generator(device=device).manual_seed(0)

pipe_img2img = KandinskyImg2ImgPipeline(text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer, text_proj=pipe.text_proj, unet=pipe.unet, scheduler=ddpm, movq=pipe.movq)
pipe_img2img.to(device)
out = pipe_inpainting(prompt=prompt, image=init_image, height=768, width=768, num_inference_steps=100, generator=generator, image_embeds=image_emb, negative_image_embeds=zero_image_emb, strength=0.2) 

out[0][0]

Before img2img
image

After img2img
image

@yiyixuxu
Copy link
Collaborator

@ayushtues
awesome! Did you compare it with the results from the original repo?

@yiyixuxu yiyixuxu mentioned this pull request May 14, 2023
11 tasks
@ayushtues
Copy link
Contributor Author

They use DDIM scheduler, this is what I got from the original repo

from kandinsky2 import get_kandinsky2
model = get_kandinsky2(
    'cuda', 
    task_type='text2img', 
    cache_dir='.', 
    model_version='2.1', 
)


import requests
from PIL import Image
from io import BytesIO

url = "https://preview.redd.it/yu4maxz3dxo91.jpg?width=1024&format=pjpg&auto=webp&v=enabled&s=64ebd870f1f0cab6c94b5ee75ca03a53a1070068"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 768))

out = model.generate_img2img(prompt="A red cartoon frog, 4k", pil_img=init_image, strength=0.8, h=768, w=768)

out[0]

image

@ayushtues
Copy link
Contributor Author

ayushtues commented May 15, 2023

Would we expect to get exactly the same image if I replace our scheduler with DDIM?

@yiyixuxu
Copy link
Collaborator

@ayushtues

I think we will have to change the code slightly to make it match entirely - I did that for text2img. I can help run it to make sure once this is merged in
maybe you can try running the original repo with p_sampler? the https://github.com/ai-forever/Kandinsky-2/blob/main/kandinsky2/kandinsky2_1_model.py#L438

anyways let me know when this is ready to merge in - we can always make changes later from the base PR!

and also do you have a twitter handle?

@yiyixuxu
Copy link
Collaborator

@ayushtues

ohh another thing you should try is to replace the Unclip scheduler in the text2img pipeline to use DDPM - that should be easy since you already did for img2img! And we will need to see the same results as unclip scheduler for that

@ayushtues
Copy link
Contributor Author

ayushtues commented May 16, 2023

@yiyixuxu I replaced UnCLIP in text2img, and found that it produced different images, that's because there seem to be slight differences between UnCLIP and DDPM, namely

  1. The step_ratio calculation - DDPM vs UnCLIP
  2. Thresholding and Clipping - Its either threshold or clipping in DDPM but its both threshold and clipping in UnCLIP

So getting the same results from them is not possible, since they are different in implementation.

Also my twitter handle is - ayush_tues

@ayushtues
Copy link
Contributor Author

ayushtues commented May 16, 2023

Running with p_sampler is giving a different result, probably some difference in the way they have implemented DDPM.

Other than that, if we are fine with not exactly getting the same results as the original repo, I think we can merge the repo, and then make changes to the schedulers in the base PR itself

Or I can dig deeper into their DDPM implementation and figure out where the difference is

@yiyixuxu
Copy link
Collaborator

@ayushtues I see - I will update our DDPM scheduler to make sure it works for our model. Don't worry about that for now

Can you add a test for img2img like what I did here for inpainting? https://github.com/huggingface/diffusers/pull/3308/files#diff-a94251ed6c7af41b0a066c4bc7cc78cadc0e0fe9c766c51ed5d308a383626cc1

@yiyixuxu yiyixuxu merged commit 88efed5 into huggingface:kandinsky May 17, 2023
yiyixuxu pushed a commit that referenced this pull request May 17, 2023
yiyixuxu pushed a commit that referenced this pull request May 17, 2023
@yiyixuxu
Copy link
Collaborator

@ayushtues I merged in and will make changes from my PR - Thanks for the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants