Auto-Regressive Module (ARM)ΒΆ

class Arm[source]ΒΆ

Autoregressive probability module, modelling the conditional distribution \(p_{\psi}(\hat{y}_i \mid \mathbf{s}_i, \mathbf{f}_i)\) of a (quantized) latent pixel \(\hat{y}_i\), conditioned on neighboring already decoded context pixels. These context pixels are either causal spatial neighbors \(\mathbf{s}_i\), extracted from the same latent grid, or inter-feature context \(\mathbf{f}_i\) extracted thanks to an IFCE module from already decoded.

The distribution \(p_{\psi}\) is assumed to follow a Laplace distribution, parameterized by an expectation \(\mu\) and a scale \(b\), where the scale and the variance \(\sigma^2\) are related as follows \(\sigma^2 = 2 b ^2\).

The parameters of the Laplace distribution for a given latent pixel \(\hat{y}_i\) are obtained by passing the context pixel through an MLP \(f_{\psi}\):

\[p_{\psi}(\hat{y}_i \mid \mathbf{c}_i) \sim \mathcal{L}(\mu_i, b_i), \text{ where } \mu_i, b_i = f_{\psi}(\mathtt{concat}(\mathbf{s}_i,\mathbf{f}_i)).\]

Attention

The MLP \(f_{\psi}\) has a few constraint on its architecture:

  • The width of all hidden layers (i.e. the output of all layers except the final one) are identical to the number of pixel contexts;

  • All layers except the last one are residual layers, followed by a ReLU non-linearity;

The MLP \(f_{\psi}\) is made of custom Linear layers instantiated from the ArmLinear class.

__init__(dim_arm, n_hidden_layers_arm, n_out_features=2, flag_linear_stabiliser=True)[source]ΒΆ
Parameters:
  • dim_arm (int) – Number of context pixels and dimension of all hidden layers.

  • n_hidden_layers_arm (int) – Number of hidden layers. Set it to 0 for a linear ARM.

  • n_out_features (int) – Number of output features. Should usually be 2 for the expecation \(\mu\) and scale \(b\).

  • flag_linear_stabiliser (bool) – True to add a linear stabiliser running parallel to the main trunk layers, as presented in the diagram below:

       β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β” trunk β”Œβ”€β”€β”€β”€β”€β”
x ──►─── Lin β”œβ”€β–Ίβ”€β”€ ReLU β”œβ”€β–Ίβ”€β”€β”€ Lin β”œβ”€β–Ίβ”€β”€ ReLU β”œβ”€β”€β”€β”€β”€β”€β”€β”€  +  β”œβ”€β–Ί (mu, logscale)
β”‚      β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”˜
β–Ό                                                        β–²
β”‚                      β”Œβ”€β”€β”€β”€β”€β”               stabiliser  β”‚
└──►──────────────────── Lin β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β””β”€β”€β”€β”€β”€β”˜
forward(x)[source]ΒΆ

Perform the auto-regressive module (ARM) forward pass. The ARM takes as input a tensor of shape \((B, C_{in})\) i.e. \(B\) contexts with \(C\) values each. ARM outputs \((B, C_{out})\).

Usually, \(C_{out} = 2\) i.e., two values per pixel describing the expectation and scale of the Laplace distribution. The function reparameterize_input transforms these quantities into proper expectation and scale.

Warning

Note that the ARM expects input to be flattened i.e. spatial dimensions \(H, W\) are collapsed into a single batch-like dimension \(B = HW\), leading to an input of shape \((B, C)\), gathering the \(C\) contexts for each of the \(B\) pixels to model.

Parameters:

x (Tensor) – Concatenation of all input contexts \(\mathbf{c}_i\). Tensor of shape \((B, C_{in})\).

Returns:

Concatenation of all output quantities derived from the input contexts.

Tensor of shape \((B, C_{out})\).

Return type:

Tuple[Tensor, Tensor, Tensor]

reparameterize_output(raw_output)[source]ΒΆ

Reparameterize the raw output of the :math:`(B, 2) ARM into mu and scale parameters.

The expectation \(\mu\) is left unchanged from the ARM output. The scale goes through an exponential reparameterization: \(b = e^{(x - 4)}\)

Parameters:
  • x – Raw ARM output. Shape is \((B, 2)\).

  • raw_output (Tensor)

Returns:

Tuple[Tensor, Tensor]. Mu and scale parameters an identical shape of \((B)\) elements.

Return type:

Tuple[Tensor, Tensor]

get_param(which=None)[source]ΒΆ

Return a copy of the weights and biases inside the module.

Parameters:

which (Optional[Literal[``”weight”, ``"bias"]]) – Wether to return only the weights or the biases. If None, return everything. Defaults to None.

Returns:

A copy of all weights & biases in the layers.

Return type:

OrderedDict[str, Tensor]

set_param(param)[source]ΒΆ

Replace the current parameters of the module with param.

Parameters:

param (OrderedDict[str, Tensor]) – Parameters to be set.

Return type:

None

reinitialize_parameters()[source]ΒΆ

Re-initialize in place the parameters of all the ArmLinear layers.

Return type:

None

class Ifce[source]ΒΆ

Inter Feature Context Extractor (IFCE) contains all the IFCE \(f_{\chi^(k)}\), each of them dedicated to the \(k\)-th latent grid.

The role of each IFCE \(f_{\chi^(k)}\) is to compute for each pixel of the \(k\)-th latent grid a context vector of \(C_f\) elements based on the already decoded latent grids.

__init__(input_features_ifce, output_features_ifce)[source]ΒΆ
Parameters:
  • input_features_ifce (List[int]) – Number of input features for each of the IFCE, one per latent grid. For instance input_features_ifce=[3,2,0,0] indicates that the first feature (highest resolution) uses the 3 already decoded features as context, while the second feature uses the 2 already decoded features as context. 0 indicates that no IFCE is used for the current feature.

  • output_features_ifce (int) – How many elements \(C_f\) are computed from the raw context values.

forward(x, latent_grid_idx)[source]ΒΆ

From a raw values extracted from already decoded latent grids \(\mathbf{r}\), compute a feature context \(\mathbf{f} = f_{\chi^(k)}(\mathbf{r})\).

Parameters:
  • x (Tensor) – Raw values extracted from already decoded latent grids \(\mathbf{r}\) Shape is \((B, C_{in}^{(i)})\), with \(C_{in}^{(i)}\) the \(i\)-th element in the input_features_ifce list from the __init__ function.

  • latent_grid_idx (int) – Index of the IFCE \(k\) (and of the assocaited latent grids).

Returns:

Transformed context \(\mathbf{f}\). Shape is \((B, C_f)\)

Return type:

Tensor

get_param(which=None)[source]ΒΆ

Return a copy of the weights and biases inside the module.

Parameters:

which (Optional[Literal[``”weight”, ``"bias"]]) – Wether to return only the weights or the biases. If None, return everything. Defaults to None.

Returns:

A copy of all weights & biases in the layers.

Return type:

OrderedDict[str, Tensor]

set_param(param)[source]ΒΆ

Replace the current parameters of the module with param.

Parameters:

param (OrderedDict[str, Tensor]) – Parameters to be set.

Return type:

None

reinitialize_parameters()[source]ΒΆ

Re-initialize in place the parameters of all the ArmLinear layer.

Return type:

None