Auto-Regressive Module (ARM)¶

class Arm[source]¶

Autoregressive probability module, modelling the conditional distribution \(p_{\psi}(\hat{y}_i \mid \mathbf{s}_i, \mathbf{f}_i)\) of a (quantized) latent pixel \(\hat{y}_i\), conditioned on neighboring already decoded context pixels. These context pixels are either causal spatial neighbors \(\mathbf{s}_i\), extracted from the same latent grid, or inter-feature context \(\mathbf{f}_i\) extracted thanks to an IFCE module from already decoded.

The distribution \(p_{\psi}\) is assumed to follow a Laplace distribution, parameterized by an expectation \(\mu\) and a scale \(b\), where the scale and the variance \(\sigma^2\) are related as follows \(\sigma^2 = 2 b ^2\).

The parameters of the Laplace distribution for a given latent pixel \(\hat{y}_i\) are obtained by passing the context pixel through an MLP \(f_{\psi}\):

\[p_{\psi}(\hat{y}_i \mid \mathbf{c}_i) \sim \mathcal{L}(\mu_i, b_i), \text{ where } \mu_i, b_i = f_{\psi}(\mathtt{concat}(\mathbf{s}_i,\mathbf{f}_i)).\]

Attention

The MLP \(f_{\psi}\) has a few constraint on its architecture:

The width of all hidden layers (i.e. the output of all layers except the final one) are identical to the number of pixel contexts;
All layers except the last one are residual layers, followed by a ReLU non-linearity;

The MLP \(f_{\psi}\) is made of custom Linear layers instantiated from the ArmLinear class.

__init__(dim_arm, n_hidden_layers_arm, n_out_features=2, flag_linear_stabiliser=True)[source]¶

Parameters:

dim_arm (int) – Number of context pixels and dimension of all hidden layers.
n_hidden_layers_arm (int) – Number of hidden layers. Set it to 0 for a linear ARM.
n_out_features (int) – Number of output features. Should usually be 2 for the expecation \(\mu\) and scale \(b\).
flag_linear_stabiliser (bool) – True to add a linear stabiliser running parallel to the main trunk layers, as presented in the diagram below:

       ┌─────┐   ┌──────┐    ┌─────┐   ┌──────┐ trunk ┌─────┐
x ──►──┤ Lin ├─►─┤ ReLU ├─►──┤ Lin ├─►─┤ ReLU ├───────┤  +  ├─► (mu, logscale)
│      └─────┘   └──────┘    └─────┘   └──────┘       └─────┘
▼                                                        ▲
│                      ┌─────┐               stabiliser  │
└──►───────────────────┤ Lin ├───────────────────────────┘
                       └─────┘

forward(x)[source]¶

Perform the auto-regressive module (ARM) forward pass. The ARM takes as input a tensor of shape \((B, C_{in})\) i.e. \(B\) contexts with \(C\) values each. ARM outputs \((B, C_{out})\).

Usually, \(C_{out} = 2\) i.e., two values per pixel describing the expectation and scale of the Laplace distribution. The function reparameterize_input transforms these quantities into proper expectation and scale.

Warning

Note that the ARM expects input to be flattened i.e. spatial dimensions \(H, W\) are collapsed into a single batch-like dimension \(B = HW\), leading to an input of shape \((B, C)\), gathering the \(C\) contexts for each of the \(B\) pixels to model.

Parameters:

x (Tensor) – Concatenation of all input contexts \(\mathbf{c}_i\). Tensor of shape \((B, C_{in})\).

Returns:

Concatenation of all output quantities derived from the input contexts.: Tensor of shape \((B, C_{out})\).

Return type:

Tuple[Tensor, Tensor, Tensor]

reparameterize_output(raw_output)[source]¶

Reparameterize the raw output of the :math:`(B, 2) ARM into mu and scale parameters.

The expectation \(\mu\) is left unchanged from the ARM output. The scale goes through an exponential reparameterization: \(b = e^{(x - 4)}\)

Parameters:

x – Raw ARM output. Shape is \((B, 2)\).
raw_output (Tensor)

Returns:

Tuple[Tensor, Tensor]. Mu and scale parameters an identical shape of \((B)\) elements.

Return type:

Tuple[Tensor, Tensor]

get_param(which=None)[source]¶

Return a copy of the weights and biases inside the module.

Parameters:: which (Optional[Literal[``”weight”, ``"bias"]]) – Wether to return only the weights or the biases. If None, return everything. Defaults to None.
Returns:: A copy of all weights & biases in the layers.
Return type:: OrderedDict[str, Tensor]

set_param(param)[source]¶

Replace the current parameters of the module with param.

Parameters:: param (OrderedDict[str, Tensor]) – Parameters to be set.
Return type:: None

reinitialize_parameters()[source]¶

Re-initialize in place the parameters of all the ArmLinear layers.

Return type:: None

class Ifce[source]¶

Inter Feature Context Extractor (IFCE) contains all the IFCE \(f_{\chi^(k)}\), each of them dedicated to the \(k\)-th latent grid.

The role of each IFCE \(f_{\chi^(k)}\) is to compute for each pixel of the \(k\)-th latent grid a context vector of \(C_f\) elements based on the already decoded latent grids.

__init__(input_features_ifce, output_features_ifce)[source]¶

Parameters:

input_features_ifce (List[int]) – Number of input features for each of the IFCE, one per latent grid. For instance input_features_ifce=[3,2,0,0] indicates that the first feature (highest resolution) uses the 3 already decoded features as context, while the second feature uses the 2 already decoded features as context. 0 indicates that no IFCE is used for the current feature.
output_features_ifce (int) – How many elements \(C_f\) are computed from the raw context values.

forward(x, latent_grid_idx)[source]¶

From a raw values extracted from already decoded latent grids \(\mathbf{r}\), compute a feature context \(\mathbf{f} = f_{\chi^(k)}(\mathbf{r})\).

Parameters:

x (Tensor) – Raw values extracted from already decoded latent grids \(\mathbf{r}\) Shape is \((B, C_{in}^{(i)})\), with \(C_{in}^{(i)}\) the \(i\)-th element in the input_features_ifce list from the __init__ function.
latent_grid_idx (int) – Index of the IFCE \(k\) (and of the assocaited latent grids).

Returns:

Transformed context \(\mathbf{f}\). Shape is \((B, C_f)\)

Return type:

Tensor

get_param(which=None)[source]¶

Return a copy of the weights and biases inside the module.

Parameters:: which (Optional[Literal[``”weight”, ``"bias"]]) – Wether to return only the weights or the biases. If None, return everything. Defaults to None.
Returns:: A copy of all weights & biases in the layers.
Return type:: OrderedDict[str, Tensor]

set_param(param)[source]¶

Replace the current parameters of the module with param.

Parameters:: param (OrderedDict[str, Tensor]) – Parameters to be set.
Return type:: None

reinitialize_parameters()[source]¶

Re-initialize in place the parameters of all the ArmLinear layer.

Return type:: None