Upsampling¶

class Upsampling[source]¶

Create the upsampling module, its role is to upsampling the hierarchical latent variables \(\hat{\mathbf{y}} = \{\hat{\mathbf{y}}_i \in \mathbb{Z}^{C_i \times H_i \times W_i}, i = 0, \ldots, L - 1\}\), where \(L\) is the number of latent resolutions and \(H_i = \frac{H}{2^i}\), \(W_i = \frac{W}{2^i}\) with \(W, H\) the width and height of the image.

The Upsampling transforms this hierarchical latent variable \(\hat{\mathbf{y}}\) into the dense representation \(\hat{\mathbf{z}}\) as follows:

\[\hat{\mathbf{z}} = f_{\upsilon}(\hat{\mathbf{y}}), \text{ with } \hat{\mathbf{z}} \in \mathbb{R}^{C \times H \times W} \text { and } C = \sum_i C_i.\]

For a toy example with 3 latent grids (--n_ft_per_res=1,1,1), the overall diagram of the upsampling is as follows.

      +---------+
y0 -> | TConv2d | -----+
      +---------+      |
                       v
      +--------+    +-----+    +---------+
y1 -> | Conv2d | -> | cat | -> | TConv2d | -----+
      +--------+    +-----+    +---------+      |
                                                v
                                 +--------+    +-----+    +---------+
y2 ----------------------------> | Conv2d | -> | cat | -> | TConv2d | -> dense
                                 +--------+    +-----+    +---------+

Where y0 has the smallest resolution, y1 has a resolution double of y0 etc.

There are two different sets of filters:

The TConvs filters actually perform the x2 upsampling. They are referred to as upsampling filters. Implemented using UpsamplingSeparableSymmetricConvTranspose2d.

The Convs filters pre-process the signal prior to concatenation. They are referred to as pre-concatenation filters. Implemented using UpsamplingSeparableSymmetricConv2d.

Kernel sizes for the upsampling and pre-concatenation filters are modified through the --ups_k_size and --ups_preconcat_k_size arguments.

Each upsampling filter and each pre-concatenation filter is different. They are all separable and symmetrical.

Upsampling convolutions are initialized with a bilinear or bicubic kernel depending on the required requested ups_k_size:

If ups_k_size >= 4 and ups_k_size < 8, a bilinear kernel (with zero padding if necessary) is used an initialization.
If ups_k_size >= 8, a bicubic kernel (with zero padding if necessary) is used an initialization.

Pre-concatenation convolutions are initialized with a Dirac kernel.

Warning

The ups_k_size must be at least 4 and a multiple of 2.
The ups_preconcat_k_size must be odd.

__init__( ups_k_size: int, ups_preconcat_k_size: int, n_ups_kernel: int, n_ups_preconcat_kernel: int, )[source]¶

Parameters:

ups_k_size (int) – Upsampling (TransposedConv) kernel size. Should be even and >= 4.
ups_preconcat_k_size (int) – Pre-concatenation kernel size. Should be odd.
n_ups_kernel (int) – Number of different upsampling kernels. Usually it is set to the number of latent - 1 (because the full resolution latent is not upsampled). But this can also be set to one to share the same kernel across all variables.
n_ups_preconcat_kernel (int) – Number of different pre-concatenation filters. Usually it is set to the number of latent - 1 (because the smallest resolution is not filtered prior to concat). But this can also be set to one to share the same kernel across all variables.

forward(decoder_side_latent: List[Tensor]) → Tensor[source]¶

Upsample a list of \(L\) tensors, where the i-th tensor has a shape \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\) to obtain a dense representation \((B, \sum_i C_i, H, W)\). This dense representation is ready to be used as the synthesis input.

Parameters:: decoder_side_latent (List[Tensor]) – list of \(L\) tensors with various shapes \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\)
Returns:: Dense representation \((B, \sum_i C_i, H, W)\).
Return type:: Tensor

get_param() → OrderedDict[str, Tensor][source]¶

Return a copy of the weights and biases inside the module.

Returns:: A copy of all weights & biases in the layers.
Return type:: OrderedDict[str, Tensor]

set_param(param: OrderedDict[str, Tensor])[source]¶

Replace the current parameters of the module with param.

Parameters:: param (OrderedDict[str, Tensor]) – Parameters to be set.

reinitialize_parameters() → None[source]¶

Re-initialize in place the parameters of the upsampling.

Return type:: None

class UpsamplingSeparableSymmetricConvTranspose2d[source]¶

A TransposedConv2D which has a separable and symmetric even kernel.

Separable means that the 2D-kernel \(\mathbf{w}_{2D}\) can be expressed as the outer product of a 1D kernel \(\mathbf{w}_{1D}\):

\[\mathbf{w}_{2D} = \mathbf{w}_{1D} \otimes \mathbf{w}_{1D}.\]

The 1D kernel \(\mathbf{w}_{1D}\) is also symmetric. That is, the 1D kernel is something like \(\mathbf{w}_{1D} = \left(a\ b\ c\ c\ b\ a \right).\)

The symmetric constraint is obtained through the module _Parameterization_Symmetric_1d. The separable constraint is obtained by calling twice the 1D kernel.

__init__(kernel_size: int)[source]¶

Parameters:: kernel_size (int) – Upsampling kernel size. Shall be even and >= 4.

initialize_parameters() → None[source]¶

Initialize the parameters of a UpsamplingSeparableSymmetricConvTranspose2d layer.

Biases are always set to zero.

Weights are initialize as a (possibly padded) bilinear filter when target_k_size is 4 or 6, otherwise a bicubic filter is used.

Return type:: None

forward(x: Tensor) → Tensor[source]¶

Perform the spatial upsampling (with scale 2) of an input with a single channel. Note that the upsampling filter is both symmetrical and separable. The actual implementation of the forward depends on self.training.

If we’re training, we use a non-separable implementation. That is, we first compute the 2D kernel through an outer product and then use a single 2D convolution. This is more stable.

If we’re not training, we use two successive 1D convolutions.

Parameters:: x (Tensor) – Single channel input with shape \((B, 1, H, W)\)
Returns:: Upsampled version of the input with shape \((B, 1, 2H, 2W)\)
Return type:: Tensor

class UpsamplingSeparableSymmetricConv2d[source]¶

A conv2D which has a separable and symmetric odd kernel.

Separable means that the 2D-kernel \(\mathbf{w}_{2D}\) can be expressed as the outer product of a 1D kernel \(\mathbf{w}_{1D}\):

\[\mathbf{w}_{2D} = \mathbf{w}_{1D} \otimes \mathbf{w}_{1D}.\]

The 1D kernel \(\mathbf{w}_{1D}\) is also symmetric. That is, the 1D kernel is something like \(\mathbf{w}_{1D} = \left(a\ b\ c\ b\ a \right).\)

The symmetric constraint is obtained through the module _Parameterization_Symmetric_1d. The separable constraint is obtained by calling twice the 1D kernel.

__init__(kernel_size: int)[source]¶

kernel_size: Size of the kernel \(\mathbf{w}_{1D}\) e.g. 7 to: obtain a symmetrical, separable 7x7 filter. Must be odd!

Parameters:: kernel_size (int)

initialize_parameters() → None[source]¶

Initialize the weights and the biases of the transposed convolution layer performing the upsampling.

Biases are always set to zero.

Weights are set to \((0,\ 0,\ 0,\ \ldots, 1)\) so that when the symmetric reparameterization is applied a Dirac kernel is obtained e.g. \((0,\ 0,\ 0,\ \ldots, 1, \ldots, 0,\ 0,\ 0,)\).

Return type:: None

forward(x: Tensor) → Tensor[source]¶

Perform a “normal” 2D convolution, except that the underlying kernel is both separable & symmetrical. The actual implementation of the forward depends on self.training.

If we’re training, we use a non-separable implementation. That is, we first compute the 2D kernel through an outer product and then use a single 2D convolution. This is more stable.

If we’re not training, we use two successive 1D convolutions.

Warning

There is a residual connection in the forward.

Parameters:: x (Tensor) – [B, 1, H, W] tensor to be filtered. Must have one only channel.
Returns:: Filtered tensor [B, 1, H, W].
Return type:: Tensor