티스토리 뷰

- What is forward diffusion and reverse diffusion?

In variational diffusion models, Forward diffusion is an encoding process that gradually corrupts an image to a complete noise map (standard gaussian), which is mapping image space to latent space. Forward diffusion does not involve learnable parameters and it is a fixed markov chain process that is defined as a linear Gaussian model at each timestep.

In variational diffusion models, Reverse diffusion is a decoding process that generates images from latent embeddings. During training, it predicts different levels of added noises whose levels depend on a timestep. In inference time, it gradually denoises the standard gaussian embedding to an image.

- What is variance preserving in forward diffusion process?

It is about diffusion scheduling, which preserves the noisy image x_t's variance to a unit scale. The purpose is that you don't want the variance of the noise images to be dependent on the number of timesteps.

Purpose of scaling mean in the forward diffusion process: https://stats.stackexchange.com/questions/600127/purpose-of-scaling-mean-by-sqrt1-beta-t-in-forward-diffusion-process

- Current diffusion model networks predicted added noise, instead of the clean image. Why? Are they mathematically equivalent? 

Predicting the added noise is mathematically equivalent to predicting the foward process posterior mean with some reparameterization. (Let's just abuse predicting and minimizing the loss during training) So, if the clean image indicates a "cleaner image", x_{t-1} I think it is yes. In DDPM, predicting the added noise is empirically more effective. If the clean image is x_0, it is also possibily the same, but according to DDPM, it is worse. DDPM: https://arxiv.org/pdf/2006.11239.pdf

- Does the added noise indicate a noise added from clean image or x_{t-1}?

Clean image. The part of the predicted added noise is subtracted from x_t, scaled, and random noise is added to it in DDPM, and becomes x_{t-1}.  https://erdem.pl/2023/11/step-by-step-visual-introduction-to-diffusion-models#reverse-diffusion-diagram

- If you use DDPM training, do you have to always use the same timesteps with training time for inference? Why do you need multiple denoising steps in reverse diffusion process in inference time different from training time?

No. You can use fewer timesteps after the training in inference time. You can do early stop, since the network is predicting the full noise that is added to the clean image at timestep t. But the quality might be worse, because there could be not much information in the input to denoise.  https://erdem.pl/2023/11/step-by-step-visual-introduction-to-diffusion-models#reverse-diffusion-diagram

 

Step by Step visual introduction to Diffusion Models. - Blog by Kemal Erdem

What is a diffusion model? The idea of the diffusion model is not that old. In the 2015 paper called “Deep Unsupervised Learning using Nonequilibrium Thermodynamics” [1], the Authors described it like this: The essential idea, inspired by non-equilibri

erdem.pl

이 사람 나랑 똑같은 생각하는 것 같음. 고맙다.

You can also use a different denoising schedule in inference time after training. For example, you could use a cosine scheduler from Improved DDPM paper, after training with DDPM (linear scheduler). 

- What is DDPM and DDIM? What is the difference? What DDIM changes from DDPM?

DDPM is a scheduler introduced in Denoising Diffusion Probablistic Models paper. It proposes to learn data distribution with forward and reverse diffusion by corrupting and denoising images. In forward process, it corrupts a clean image to a standard gaussian by iteratively adding gaussian noises. The forward process is a Markov Chain. It is not a learning process, but the reverse diffusion is a learning process that predicts the added noise (from a clean image) at each timestep. DDIM changes the reverse diffusion process. DDIM accelerates it by considering the reverse diffusion timesteps inside subsequences as a deterministic process. It considers the forward diffusion in subsequences non-Markovian, by setting forward posterior variance zero, and so it can skip the diffusion time steps inside subsequences.  It shares the same training object with DDPM, so it can use the pre-trained diffusion model for inference. DDIM: https://arxiv.org/pdf/2010.02502.pdf   DDPM, DDIM, and CFG: https://betterprogramming.pub/diffusion-models-ddpms-ddims-and-classifier-free-guidance-e07b297b2869

- How does variational diffusion model relate to score-based generative models? What is score-matching?

First, a score is the gradient of the data log probability that moves the probability to the optimal point, and score-based generative models predict the score. It has negative relationship with the added source noise, which makes sense. Moving data point to the negative direction of noise is denoising and it is increasing the data likelihood. The score-based generative models need access to ground truth score function, which is not possible. So the generative models approximate the training objective with score matching, by targeting the denoising score that matches the ground truth score direction. Score function blog: https://yang-song.net/blog/2021/score/ , Understanding Diffusion models: https://arxiv.org/pdf/2208.11970.pdf,  Denoising Score Macthing: https://johfischer.com/2022/09/18/denoising-score-matching/

From DDPM: A certain parameterization of diffusion models is equivalent to ldenoising score matching score matching over multiple levls of noises during training and with annealed Langevin dynamics during sampling. (Langevin dynamics is one type of Markov Chain Monte Carlo techniques)

- Why are Diffusion models better than VAE and GAN? If not, why?

Estimating a small purtubation (or iteratively denoising to real images) is more tractable than explicitly learning the real image distribution with a single pass. One could use hierarchical VAE for the iterative behavior, but still, diffusion models do not require to learn encoders, which makes the learning space simpler. Also, the diffusion process is known to capture data distribution of arbitrary form for smooth target distribution. (Maybe this sentence is related to mode collapse of VAE and GAN, and that diffusions trained with multi-level Gaussian noises learning small data regions.) Deep Unsupervised Learning using Nonequilibrium Thermodynamics: https://arxiv.org/pdf/1503.03585.pdf  Medium: https://medium.com/@kemalpiro/step-by-step-visual-introduction-to-diffusion-models-235942d2f15c

- What is classifer guidance and what is classifier free guidance?

Classifier guidance is used in conditional diffusion models. It guides the denoising with the pre-trained classifier like GAN. It is known to effectively guide the generation and increase image fidelity, but has trade-off against mode coverage. The disadvantage is that noisy images have less information about the condition and it is hard to classify from such input. Off-the-shelf pre-trained classifiers cannot handle those noisy images and it is hard to train classifier in noisy images too. Also, it has controversy of aversarial attack on certain evaluation metrics. 

Classifier free guidance reformulates the classifier guidance without requiring the classifier using bayesian rule. Guidance: https://sander.ai/2022/05/26/guidance.html

- How do you train classifier guidance and classifier free guidance?

Classifier guidance: pre-train classifier, pre-train uncoditional diffusion model, fine-tune the diffusion model given condition signal using pre-trained classifier

Classifier Free Diffusion Guidance: https://arxiv.org/pdf/2207.12598.pdf

Classifier free guidance: condition drop out. just train one model

- How do you inference with classifier guidance and classifier free guidance?

Adjust the guidance scale gamma

- What happens if you make the weight of the conditional diffusion model so strong?

sample diversity decreases, high fidelity (sharp texture), saturation 

Definition of saturation: https://learn.leighcotnoir.com/artspeak/elements-color/hue-value-saturation/

- What happens if you make the weight of the conditional diffusion model so weak?

random pictures, blury images

- Can you explain score distillation?

Score distillation sampling is learning to optimize the 3D parameters whose rendered images align with the output of 2D text-to-image models. It freezes the pre-trained text-to-image model, and optimize the 3D parameters with loss of noise prediction. It add noises to the rendered images and feed the noisy image to the pre-trained UNet, and compares the noise prediction with the actual noise added. Then, it drops the UNet's Jacobian when backpropagating, which is empirically proven to be effective. DreamFusion: https://arxiv.org/pdf/2209.14988.pdf

- What are the differences between SD v1.1, v1.5, v2.0, v2.1, XL, Turbo?

SD v1.1 wsa trained on 256px resolution.SD models with > v1.1 including v1.5 were trained on 512px resolution. The difference of SD v1 VS v2 is the CLIP text encoder. v2 replaces the OpenAI CLIP with the CLIP pretrained on the public LAION dataset, but v2.0 was known to rarely generate celebrities or artistic style images. Runway: https://huggingface.co/runwayml/stable-diffusion-v1-5  Stable Diffusion 1 vs 2 - What you need to know: https://www.assemblyai.com/blog/stable-diffusion-1-vs-2-what-you-need-to-know/

SD v2.1 modifies the NSWF filter from v2.0. XL

SDXL is a larger model than SD v1, using more cross attention blocks.

SDXL Turbo is based on Adversarial Diffusion Distillation, which optimizes the GAN loss and Score Distillation Sampling loss to acheive fewer step generation. https://stability.ai/research/adversarial-diffusion-distillation 

'Research (연구 관련)' 카테고리의 다른 글

What is offset noise?  (0) 2024.03.29
What is rigid align?  (0) 2024.03.29
Laplacian Smoothing / GraphCNN  (0) 2024.03.04
Norm  (0) 2024.02.27
R1 regularization  (0) 2024.02.27
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/04   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
글 보관함