Upsampling

class Upsampling[source]

Create the upsampling module, its role is to upsampling the hierarchical latent variables \(\hat{\mathbf{y}} = \{\hat{\mathbf{y}}_i \in \mathbb{Z}^{C_i \times H_i \times W_i}, i = 0, \ldots, L - 1\}\), where \(L\) is the number of latent resolutions and \(H_i = \frac{H}{2^i}\), \(W_i = \frac{W}{2^i}\) with \(W, H\) the width and height of the image.

The Upsampling transforms this hierarchical latent variable \(\hat{\mathbf{y}}\) into the dense representation \(\hat{\mathbf{z}}\) as follows:

\[\hat{\mathbf{z}} = f_{\upsilon}(\hat{\mathbf{y}}), \text{ with } \hat{\mathbf{z}} \in \mathbb{R}^{C \times H \times W} \text { and } C = \sum_i C_i.\]

For a toy example with 3 latent grids (--n_ft_per_res=1,1,1), the overall diagram of the upsampling is as follows.

      +---------+
y0 -> | TConv2d | -----+
      +---------+      |
                       v
      +--------+    +-----+    +---------+
y1 -> | Conv2d | -> | cat | -> | TConv2d | -----+
      +--------+    +-----+    +---------+      |
                                                v
                                 +--------+    +-----+    +---------+
y2 ----------------------------> | Conv2d | -> | cat | -> | TConv2d | -> dense
                                 +--------+    +-----+    +---------+

Where y0 has the smallest resolution, y1 has a resolution double of y0 etc.

There are two different sets of filters:

  • The TConvs filters actually perform the x2 upsampling. They are referred to as upsampling filters. Implemented using UpsamplingSeparableSymmetricConvTranspose2d.

  • The Convs filters pre-process the signal prior to concatenation. They are referred to as pre-concatenation filters. Implemented using UpsamplingSeparableSymmetricConv2d.

Kernel sizes for the upsampling and pre-concatenation filters are modified through the --ups_k_size and --ups_preconcat_k_size arguments.

Each upsampling filter and each pre-concatenation filter is different. They are all separable and symmetrical.

Upsampling convolutions are initialized with a bilinear or bicubic kernel depending on the required requested ups_k_size:

  • If ups_k_size >= 4 and ups_k_size < 8, a bilinear kernel (with zero padding if necessary) is used an initialization.

  • If ups_k_size >= 8, a bicubic kernel (with zero padding if necessary) is used an initialization.

Pre-concatenation convolutions are initialized with a Dirac kernel.

Warning

  • The ups_k_size must be at least 4 and a multiple of 2.

  • The ups_preconcat_k_size must be odd.

__init__(
ups_k_size: int,
ups_preconcat_k_size: int,
n_ups_kernel: int,
n_ups_preconcat_kernel: int,
)[source]
Parameters:
  • ups_k_size (int) – Upsampling (TransposedConv) kernel size. Should be even and >= 4.

  • ups_preconcat_k_size (int) – Pre-concatenation kernel size. Should be odd.

  • n_ups_kernel (int) – Number of different upsampling kernels. Usually it is set to the number of latent - 1 (because the full resolution latent is not upsampled). But this can also be set to one to share the same kernel across all variables.

  • n_ups_preconcat_kernel (int) – Number of different pre-concatenation filters. Usually it is set to the number of latent - 1 (because the smallest resolution is not filtered prior to concat). But this can also be set to one to share the same kernel across all variables.

forward(decoder_side_latent: List[Tensor]) Tensor[source]

Upsample a list of \(L\) tensors, where the i-th tensor has a shape \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\) to obtain a dense representation \((B, \sum_i C_i, H, W)\). This dense representation is ready to be used as the synthesis input.

Parameters:

decoder_side_latent (List[Tensor]) – list of \(L\) tensors with various shapes \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\)

Returns:

Dense representation \((B, \sum_i C_i, H, W)\).

Return type:

Tensor

get_param() OrderedDict[str, Tensor][source]

Return a copy of the weights and biases inside the module.

Returns:

A copy of all weights & biases in the layers.

Return type:

OrderedDict[str, Tensor]

set_param(param: OrderedDict[str, Tensor])[source]

Replace the current parameters of the module with param.

Parameters:

param (OrderedDict[str, Tensor]) – Parameters to be set.

reinitialize_parameters() None[source]

Re-initialize in place the parameters of the upsampling.

Return type:

None

class UpsamplingSeparableSymmetricConvTranspose2d[source]

A TransposedConv2D which has a separable and symmetric even kernel.

Separable means that the 2D-kernel \(\mathbf{w}_{2D}\) can be expressed as the outer product of a 1D kernel \(\mathbf{w}_{1D}\):

\[\mathbf{w}_{2D} = \mathbf{w}_{1D} \otimes \mathbf{w}_{1D}.\]

The 1D kernel \(\mathbf{w}_{1D}\) is also symmetric. That is, the 1D kernel is something like \(\mathbf{w}_{1D} = \left(a\ b\ c\ c\ b\ a \right).\)

The symmetric constraint is obtained through the module _Parameterization_Symmetric_1d. The separable constraint is obtained by calling twice the 1D kernel.

__init__(kernel_size: int)[source]
Parameters:

kernel_size (int) – Upsampling kernel size. Shall be even and >= 4.

initialize_parameters() None[source]

Initialize the parameters of a UpsamplingSeparableSymmetricConvTranspose2d layer.

  • Biases are always set to zero.

  • Weights are initialize as a (possibly padded) bilinear filter when target_k_size is 4 or 6, otherwise a bicubic filter is used.

Return type:

None

forward(x: Tensor) Tensor[source]

Perform the spatial upsampling (with scale 2) of an input with a single channel. Note that the upsampling filter is both symmetrical and separable. The actual implementation of the forward depends on self.training.

If we’re training, we use a non-separable implementation. That is, we first compute the 2D kernel through an outer product and then use a single 2D convolution. This is more stable.

If we’re not training, we use two successive 1D convolutions.

Parameters:

x (Tensor) – Single channel input with shape \((B, 1, H, W)\)

Returns:

Upsampled version of the input with shape \((B, 1, 2H, 2W)\)

Return type:

Tensor

class UpsamplingSeparableSymmetricConv2d[source]

A conv2D which has a separable and symmetric odd kernel.

Separable means that the 2D-kernel \(\mathbf{w}_{2D}\) can be expressed as the outer product of a 1D kernel \(\mathbf{w}_{1D}\):

\[\mathbf{w}_{2D} = \mathbf{w}_{1D} \otimes \mathbf{w}_{1D}.\]

The 1D kernel \(\mathbf{w}_{1D}\) is also symmetric. That is, the 1D kernel is something like \(\mathbf{w}_{1D} = \left(a\ b\ c\ b\ a \right).\)

The symmetric constraint is obtained through the module _Parameterization_Symmetric_1d. The separable constraint is obtained by calling twice the 1D kernel.

__init__(kernel_size: int)[source]
kernel_size: Size of the kernel \(\mathbf{w}_{1D}\) e.g. 7 to

obtain a symmetrical, separable 7x7 filter. Must be odd!

Parameters:

kernel_size (int)

initialize_parameters() None[source]

Initialize the weights and the biases of the transposed convolution layer performing the upsampling.

  • Biases are always set to zero.

  • Weights are set to \((0,\ 0,\ 0,\ \ldots, 1)\) so that when the symmetric reparameterization is applied a Dirac kernel is obtained e.g. \((0,\ 0,\ 0,\ \ldots, 1, \ldots, 0,\ 0,\ 0,)\).

Return type:

None

forward(x: Tensor) Tensor[source]

Perform a “normal” 2D convolution, except that the underlying kernel is both separable & symmetrical. The actual implementation of the forward depends on self.training.

If we’re training, we use a non-separable implementation. That is, we first compute the 2D kernel through an outer product and then use a single 2D convolution. This is more stable.

If we’re not training, we use two successive 1D convolutions.

Warning

There is a residual connexion in the forward.

Parameters:

x (Tensor) – [B, 1, H, W] tensor to be filtered. Must have one only channel.

Returns:

Filtered tensor [B, 1, H, W].

Return type:

Tensor