Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems of the Used Loss Function #103

Open
AlonzoLeeeooo opened this issue Aug 9, 2024 · 1 comment
Open

Problems of the Used Loss Function #103

AlonzoLeeeooo opened this issue Aug 9, 2024 · 1 comment

Comments

@AlonzoLeeeooo
Copy link

AlonzoLeeeooo commented Aug 9, 2024

Hi @XavierCHEN34 ,

Thanks for your great work! I have been reading your published paper manuscript as well as the code implementation, and I came into a problem about the used loss function. It would be highly appreciated if you could explain how this works.

Here is how it goes. In the paper manuscript, specifically in Eq. (2), the overall training objective of AnyDoor is an MSE loss between the U-net output and the ground-truth image latents, corresponding to the following figure:
image

In the code implementation, the loss type is controlled by self.parameterization, where self.parameterization is set to "eps" by default. It is also not changed in the configuration file (configs/anydoor.yaml).

Therefore, in the p_losses() function of ldm/models/diffusion/ddpm.py (line 367 to line 411), we can see:

    def get_loss(self, pred, target, mean=True):
        if self.loss_type == 'l1':
            loss = (target - pred).abs()
            if mean:
                loss = loss.mean()
        elif self.loss_type == 'l2':
            if mean:
                loss = torch.nn.functional.mse_loss(target, pred)
            else:
                loss = torch.nn.functional.mse_loss(target, pred, reduction='none')
        else:
            raise NotImplementedError("unknown loss type '{loss_type}'")

        return loss

    def p_losses(self, x_start, t, noise=None):
        noise = default(noise, lambda: torch.randn_like(x_start))
        x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
        model_out = self.model(x_noisy, t)

        loss_dict = {}
        if self.parameterization == "eps":
            target = noise
        elif self.parameterization == "x0":
            target = x_start
        elif self.parameterization == "v":
            target = self.get_v(x_start, noise, t)
        else:
            raise NotImplementedError(f"Parameterization {self.parameterization} not yet supported")

        loss = self.get_loss(model_out, target, mean=False).mean(dim=[1, 2, 3])

        log_prefix = 'train' if self.training else 'val'

        loss_dict.update({f'{log_prefix}/loss_simple': loss.mean()})
        loss_simple = loss.mean() * self.l_simple_weight

        loss_vlb = (self.lvlb_weights[t] * loss).mean()
        loss_dict.update({f'{log_prefix}/loss_vlb': loss_vlb})

        loss = loss_simple + self.original_elbo_weight * loss_vlb

        loss_dict.update({f'{log_prefix}/loss': loss})

        return loss, loss_dict

if self.parameterization == "eps", target will become random Gaussian noise, where the loss function will be MSE loss between the U-net output and random Gaussian noise. This is confict with the one shown in the paper manuscript.

According to Eq. (2) in the paper manuscript, I suppose that self.parameterization should be set to "x0", resulting in that target will become x_start, so that the code implementation could align with the formula. Am I understanding this correct? Please enlighten me if I have get anything wrong. Looking forward to your reply.

Best regards

@mao-code
Copy link

Same question here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants