-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[WIP] Add img2img #3426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add img2img #3426
Conversation
@yiyixuxu the UnCLIP scheduler doesn't have an |
yes yes!! see this comments here #3308 (comment) - let's just use DDPM for img2img. We will need to swap unclip scheduler with DDPM for other 2 pipelines too! but we can wait to do that later |
The documentation is not available anymore as the PR was closed or merged. |
Got some initial results working from diffusers import KandinskyPipeline, KandinskyPriorPipeline, KandinskyImg2ImgPipeline
# from diffusers.src.pipelines import KandinskyImg2ImgPipeline
from transformers import AutoTokenizer
import requests
from PIL import Image
from io import BytesIO
from diffusers import DDPMScheduler
import torch
import numpy as np
import gc
ddpm_config = {
"clip_sample": True,
"clip_sample_range": 2.0,
"sample_max_value": None,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"variance_type": "learned_range",
"thresholding": True,
"beta_schedule": "linear",
"beta_start": 0.00085,
"beta_end":0.012
}
url = "https://preview.redd.it/yu4maxz3dxo91.jpg?width=1024&format=pjpg&auto=webp&v=enabled&s=64ebd870f1f0cab6c94b5ee75ca03a53a1070068"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 768))
prompt = "A red cartoon frog, 4k"
device = 'cuda'
batch_size=1
# # create prior
pipe_prior = KandinskyPriorPipeline.from_pretrained("YiYiXu/Kandinsky-prior")
pipe_prior.to("cuda")
# use prior to generate image_emb based on our prompt
generator = torch.Generator(device=device).manual_seed(0)
image_emb = pipe_prior(prompt, generator=generator,)
zero_image_emb = pipe_prior("")
pipe = KandinskyPipeline.from_pretrained("YiYiXu/Kandinsky")
ddpm = DDPMScheduler(**ddpm_config)
generator = torch.Generator(device=device).manual_seed(0)
pipe_img2img = KandinskyImg2ImgPipeline(text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer, text_proj=pipe.text_proj, unet=pipe.unet, scheduler=ddpm, movq=pipe.movq)
pipe_img2img.to(device)
out = pipe_inpainting(prompt=prompt, image=init_image, height=768, width=768, num_inference_steps=100, generator=generator, image_embeds=image_emb, negative_image_embeds=zero_image_emb, strength=0.2)
out[0][0] |
@ayushtues |
They use DDIM scheduler, this is what I got from the original repo from kandinsky2 import get_kandinsky2
model = get_kandinsky2(
'cuda',
task_type='text2img',
cache_dir='.',
model_version='2.1',
)
import requests
from PIL import Image
from io import BytesIO
url = "https://preview.redd.it/yu4maxz3dxo91.jpg?width=1024&format=pjpg&auto=webp&v=enabled&s=64ebd870f1f0cab6c94b5ee75ca03a53a1070068"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 768))
out = model.generate_img2img(prompt="A red cartoon frog, 4k", pil_img=init_image, strength=0.8, h=768, w=768)
out[0] |
Would we expect to get exactly the same image if I replace our scheduler with DDIM? |
I think we will have to change the code slightly to make it match entirely - I did that for text2img. I can help run it to make sure once this is merged in anyways let me know when this is ready to merge in - we can always make changes later from the base PR! and also do you have a twitter handle? |
ohh another thing you should try is to replace the Unclip scheduler in the text2img pipeline to use DDPM - that should be easy since you already did for img2img! And we will need to see the same results as unclip scheduler for that |
@yiyixuxu I replaced UnCLIP in text2img, and found that it produced different images, that's because there seem to be slight differences between UnCLIP and DDPM, namely
So getting the same results from them is not possible, since they are different in implementation. Also my twitter handle is - ayush_tues |
Running with Other than that, if we are fine with not exactly getting the same results as the original repo, I think we can merge the repo, and then make changes to the schedulers in the base PR itself Or I can dig deeper into their DDPM implementation and figure out where the difference is |
@ayushtues I see - I will update our DDPM scheduler to make sure it works for our model. Don't worry about that for now Can you add a test for img2img like what I did here for inpainting? https://github.com/huggingface/diffusers/pull/3308/files#diff-a94251ed6c7af41b0a066c4bc7cc78cadc0e0fe9c766c51ed5d308a383626cc1 |
This reverts commit 88efed5.
@ayushtues I merged in and will make changes from my PR - Thanks for the great work! |
Adding img2img pipeline for Kandinksy, part of #3308