Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding convolution kernels in dilation layers #392

Open
redwrasse opened this issue Feb 10, 2020 · 4 comments
Open

Understanding convolution kernels in dilation layers #392

redwrasse opened this issue Feb 10, 2020 · 4 comments

Comments

@redwrasse
Copy link

redwrasse commented Feb 10, 2020

Hi @ibab,
I'm a bit late to the wavenet paper implementation party, but I'm reading the paper and your code and trying to understand where dilated convolution kernels are present. Your ASCII diagram shows


               |-> [gate]   -|        |-> 1x1 conv -> skip output
               |             |-> (*) -|
        input -|-> [filter] -|        |-> 1x1 conv -|
               |                                    |-> (+) -> dense output
               |------------------------------------|

        Where `[gate]` and `[filter]` are causal convolutions with a
        non-linear activation at the output. Biases and global conditioning
        are omitted due to the limits of ASCII art.

The Wavenet paper diagram shows a single 'Dilated Conv' fed into both tanh and sigmoid functions.
From your ASCII diagram and code (which agree), it seems there is in fact not not one dilated convolution but two dilated convolutions, one for the tanh (defining the 'filter'), and one for the sigmoid (defining the 'gate'). Is this correct and is this what was in fact intended in the Wavenet paper?

Additionally could you give justification for the parameter choices not mentioned in the paper?

  '''Implements the WaveNet network for generative audio.

    Usage (with the architecture as in the DeepMind paper):
        dilations = [2**i for i in range(N)] * M
        filter_width = 2  # Convolutions just use 2 samples.
        residual_channels = 16  # Not specified in the paper.
        dilation_channels = 32  # Not specified in the paper.
        skip_channels = 16      # Not specified in the paper.
        net = WaveNetModel(batch_size, dilations, filter_width,
                           residual_channels, dilation_channels,
                           skip_channels)
        loss = net.loss(input_batch)
    '''

Thanks in advance.

@redwrasse
Copy link
Author

Answering this for myself from looking through the literature, yes it looks like there are in fact two distinct dilated convolutions passed to the 'gated activation unit'- the original wavenet paper diagrams appear misleading.

@cheind
Copy link

cheind commented Dec 5, 2021

@redwrasse, I agree that the original paper misses some details here and there. Take a look at (Gated) PixelCNN by WaveNet's main author (https://arxiv.org/pdf/1606.05328.pdf) and you will find that he "copies" the gated activation from there. Also, it seems like they stacked them along the output function dims to spare a conv1d.

For the later, have a look here
https://github.com/cheind/autoregressive/blob/e1f9b72b0f9764f9b4d6b6f65f028cd50db6940e/autoregressive/wave.py#L63

@redwrasse
Copy link
Author

Thanks @cheind, I'll take a look. A side project I'd like to get back into.

@cheind
Copy link

cheind commented Dec 7, 2021

@redwrasse, same for me :) I just figured that it works nicely on 2D images as well (without the special architecture of PixelCNN, just plane WaveNet with unrolled images). In addition, once you have the joint distribution the model estimates, you might start to query all kind of things from the model (like given a wavenet conditioned on the speaker id, what is the probability that this speech was spoken by speaker X).

In case you are interested, I have a quite elaborate presentation + code here
https://github.com/cheind/autoregressive/tree/image-support

The branch will be closed soon and merged to main, so I leave a perm-link
https://github.com/cheind/autoregressive/tree/23701bd503843a1de82c6a32ba5bd6e8ad6965a3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants