Auto-Regressive Module (ARM)ΒΆ
- class Arm[source]ΒΆ
Autoregressive probability module, modelling the conditional distribution \(p_{\psi}(\hat{y}_i \mid \mathbf{s}_i, \mathbf{f}_i)\) of a (quantized) latent pixel \(\hat{y}_i\), conditioned on neighboring already decoded context pixels. These context pixels are either causal spatial neighbors \(\mathbf{s}_i\), extracted from the same latent grid, or inter-feature context \(\mathbf{f}_i\) extracted thanks to an IFCE module from already decoded.
The distribution \(p_{\psi}\) is assumed to follow a Laplace distribution, parameterized by an expectation \(\mu\) and a scale \(b\), where the scale and the variance \(\sigma^2\) are related as follows \(\sigma^2 = 2 b ^2\).
The parameters of the Laplace distribution for a given latent pixel \(\hat{y}_i\) are obtained by passing the context pixel through an MLP \(f_{\psi}\):
\[p_{\psi}(\hat{y}_i \mid \mathbf{c}_i) \sim \mathcal{L}(\mu_i, b_i), \text{ where } \mu_i, b_i = f_{\psi}(\mathtt{concat}(\mathbf{s}_i,\mathbf{f}_i)).\]Attention
The MLP \(f_{\psi}\) has a few constraint on its architecture:
The width of all hidden layers (i.e. the output of all layers except the final one) are identical to the number of pixel contexts;
All layers except the last one are residual layers, followed by a
ReLUnon-linearity;
The MLP \(f_{\psi}\) is made of custom Linear layers instantiated from the
ArmLinearclass.- __init__(dim_arm, n_hidden_layers_arm, n_out_features=2, flag_linear_stabiliser=True)[source]ΒΆ
- Parameters:
dim_arm (int) β Number of context pixels and dimension of all hidden layers.
n_hidden_layers_arm (int) β Number of hidden layers. Set it to 0 for a linear ARM.
n_out_features (int) β Number of output features. Should usually be 2 for the expecation \(\mu\) and scale \(b\).
flag_linear_stabiliser (bool) β True to add a linear stabiliser running parallel to the main trunk layers, as presented in the diagram below:
βββββββ ββββββββ βββββββ ββββββββ trunk βββββββ x βββΊβββ€ Lin βββΊββ€ ReLU βββΊβββ€ Lin βββΊββ€ ReLU βββββββββ€ + βββΊ (mu, logscale) β βββββββ ββββββββ βββββββ ββββββββ βββββββ βΌ β² β βββββββ stabiliser β ββββΊββββββββββββββββββββ€ Lin βββββββββββββββββββββββββββββ βββββββ
- forward(x)[source]ΒΆ
Perform the auto-regressive module (ARM) forward pass. The ARM takes as input a tensor of shape \((B, C_{in})\) i.e. \(B\) contexts with \(C\) values each. ARM outputs \((B, C_{out})\).
Usually, \(C_{out} = 2\) i.e., two values per pixel describing the expectation and scale of the Laplace distribution. The function
reparameterize_inputtransforms these quantities into proper expectation and scale.Warning
Note that the ARM expects input to be flattened i.e. spatial dimensions \(H, W\) are collapsed into a single batch-like dimension \(B = HW\), leading to an input of shape \((B, C)\), gathering the \(C\) contexts for each of the \(B\) pixels to model.
- Parameters:
x (Tensor) β Concatenation of all input contexts \(\mathbf{c}_i\). Tensor of shape \((B, C_{in})\).
- Returns:
- Concatenation of all output quantities derived from the input contexts.
Tensor of shape \((B, C_{out})\).
- Return type:
Tuple[Tensor, Tensor, Tensor]
- reparameterize_output(raw_output)[source]ΒΆ
Reparameterize the raw output of the :math:`(B, 2) ARM into mu and scale parameters.
The expectation \(\mu\) is left unchanged from the ARM output. The scale goes through an exponential reparameterization: \(b = e^{(x - 4)}\)
- Parameters:
x β Raw ARM output. Shape is \((B, 2)\).
raw_output (Tensor)
- Returns:
Tuple[Tensor, Tensor]. Mu and scale parameters an identical shape of \((B)\) elements.
- Return type:
Tuple[Tensor, Tensor]
- get_param(which=None)[source]ΒΆ
Return a copy of the weights and biases inside the module.
- Parameters:
which (
Optional[Literal[``βweightβ, ``"bias"]]) β Wether to return only the weights or the biases. If None, return everything. Defaults to None.- Returns:
A copy of all weights & biases in the layers.
- Return type:
OrderedDict[str, Tensor]
- class Ifce[source]ΒΆ
Inter Feature Context Extractor (IFCE) contains all the IFCE \(f_{\chi^(k)}\), each of them dedicated to the \(k\)-th latent grid.
The role of each IFCE \(f_{\chi^(k)}\) is to compute for each pixel of the \(k\)-th latent grid a context vector of \(C_f\) elements based on the already decoded latent grids.
- __init__(input_features_ifce, output_features_ifce)[source]ΒΆ
- Parameters:
input_features_ifce (List[int]) β Number of input features for each of the IFCE, one per latent grid. For instance
input_features_ifce=[3,2,0,0]indicates that the first feature (highest resolution) uses the 3 already decoded features as context, while the second feature uses the 2 already decoded features as context. 0 indicates that no IFCE is used for the current feature.output_features_ifce (int) β How many elements \(C_f\) are computed from the raw context values.
- forward(x, latent_grid_idx)[source]ΒΆ
From a raw values extracted from already decoded latent grids \(\mathbf{r}\), compute a feature context \(\mathbf{f} = f_{\chi^(k)}(\mathbf{r})\).
- Parameters:
x (
Tensor) β Raw values extracted from already decoded latent grids \(\mathbf{r}\) Shape is \((B, C_{in}^{(i)})\), with \(C_{in}^{(i)}\) the \(i\)-th element in theinput_features_ifcelist from the__init__function.latent_grid_idx (
int) β Index of the IFCE \(k\) (and of the assocaited latent grids).
- Returns:
Transformed context \(\mathbf{f}\). Shape is \((B, C_f)\)
- Return type:
Tensor
- get_param(which=None)[source]ΒΆ
Return a copy of the weights and biases inside the module.
- Parameters:
which (
Optional[Literal[``βweightβ, ``"bias"]]) β Wether to return only the weights or the biases. If None, return everything. Defaults to None.- Returns:
A copy of all weights & biases in the layers.
- Return type:
OrderedDict[str, Tensor]