Upsampling¶

class Upsampling[source]¶

Create the upsampling module, its role is to upsampling the hierarchical latent variables \(\hat{\mathbf{y}} = \{\hat{\mathbf{y}}_i \in \mathbb{Z}^{C_i \times H_i \times W_i}, i = 0, \ldots, L - 1\}\), where \(L\) is the number of latent resolutions and \(H_i = \frac{H}{2^i}\), \(W_i = \frac{W}{2^i}\) with \(W, H\) the width and height of the image.

The Upsampling transforms this hierarchical latent variable \(\hat{\mathbf{y}}\) into the dense representation \(\hat{\mathbf{z}}\) as follows:

\[\hat{\mathbf{z}} = f_{\upsilon}(\hat{\mathbf{y}}), \text{ with } \hat{\mathbf{z}} \in \mathbb{R}^{C \times H \times W} \text { and } C = \sum_i C_i.\]

The upsampling relies on a single custom transpose convolution UpsamplingConvTranspose2d performing a 2x upsampling of a 1-channel input. This transpose convolution is called over and over to upsampling each channel of each resolution until they reach the required \(H \times W\) dimensions.

The kernel of the UpsamplingConvTranspose2d depending on the value of the flag static_upsampling_kernel. In either case, the kernel initialization is based on well-known bilinear or bicubic kernel depending on the requested upsampling_kernel_size:

  • If upsampling_kernel_size >= 4 and upsampling_kernel_size < 8, a bilinear kernel (with zero padding if necessary) is used an initialization.

  • If upsampling_kernel_size >= 8, a bicubic kernel (with zero padding if necessary) is used an initialization.

Warning

The upsampling_kernel_size must be at least 4 and a multiple of 2.

__init__(upsampling_kernel_size: int, static_upsampling_kernel: bool)[source]¶
Parameters:
  • upsampling_kernel_size (int) – Upsampling kernel size. Should be bigger or equal to 4 and a multiple of two.

  • static_upsampling_kernel (bool) – If true, don’t learn the upsampling kernel.

forward(decoder_side_latent: List[Tensor]) Tensor[source]¶

Upsample a list of \(L\) tensors, where the i-th tensor has a shape \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\) to obtain a dense representation \((B, \sum_i C_i, H, W)\). This dense representation is ready to be used as the synthesis input.

Parameters:

decoder_side_latent (List[Tensor]) – list of \(L\) tensors with various shapes \((B, C_i, \frac{H}{2^i}, \frac{W}{2^i})\)

Returns:

Dense representation \((B, \sum_i C_i, H, W)\).

Return type:

Tensor

get_param() OrderedDict[str, Tensor][source]¶

Return a copy of the weights and biases inside the module.

Returns:

A copy of all weights & biases in the layers.

Return type:

OrderedDict[str, Tensor]

set_param(param: OrderedDict[str, Tensor])[source]¶

Replace the current parameters of the module with param.

Parameters:

param (OrderedDict[str, Tensor]) – Parameters to be set.

reinitialize_parameters() None[source]¶

Re-initialize in place the parameters of the upsampling.

Return type:

None

class UpsamplingConvTranspose2d[source]¶

Wrapper around the usual nn.TransposeConv2d layer. It performs a 2x upsampling of a latent variable with a single input and output channel. It can be learned or not, depending on the flag static_upsampling_kernel. Its initialization depends on the requested kernel size. If the kernel size is 4 or 6, we use the bilinear kernel with zero padding if necessary. Otherwise, if the kernel size is 8 or bigger, we rely on the bicubic kernel.

__init__(upsampling_kernel_size: int, static_upsampling_kernel: bool)[source]¶
Parameters:
  • upsampling_kernel_size (int) – Upsampling kernel size. Should be >= 4 and a multiple of two.

  • static_upsampling_kernel (bool) – If true, don’t learn the upsampling kernel.

initialize_parameters() None[source]¶

Initialize **in-place ** the weights and the biases of the transposed convolution layer performing the upsampling.

  • Biases are always set to zero.

  • Weights are set to a (padded) bicubic kernel if kernel size is at least 8. If kernel size is greater than or equal to 4, weights are set to a (padded) bilinear kernel.

Return type:

None

forward(x: Tensor) Tensor[source]¶

Perform the spatial upsampling (with scale 2) of an input with a single channel.

Parameters:

x (Tensor) – Single channel input with shape \((B, 1, H, W)\)

Returns:

Upsampled version of the input with shape \((B, 1, 2H, 2W)\)

Return type:

Tensor