Cool-chic Encoder¶

class CoolChicEncoder[source]¶

CoolChicEncoder for a single frame.

__init__(param: CoolChicEncoderParameter)[source]¶

Instantiate a cool-chic encoder for one frame.

Parameters:: param (CoolChicEncoderParameter) – Architecture of the CoolChicEncoder. See the documentation of CoolChicEncoderParameter for more information

forward( quantizer_noise_type: Literal['kumaraswamy', 'gaussian', 'none'] = 'kumaraswamy', quantizer_type: Literal['softround_alone', 'softround', 'hardround', 'ste', 'none'] = 'softround', soft_round_temperature: Tensor | None = tensor(0.3000), noise_parameter: Tensor | None = tensor(1.), AC_MAX_VAL: int = -1, flag_additional_outputs: bool = False, ) → CoolChicEncoderOutput[source]¶

Perform CoolChicEncoder forward pass, to be used during the training. The main step are as follows:

Scale & quantize the encoder-side latent \(\mathbf{y}\) to get the decoder-side latent

\[\hat{\mathbf{y}} = \mathrm{Q}(\Gamma_{enc}\ \mathbf{y}),\]

with \(\Gamma_{enc} \in \mathbb{R}\) a scalar encoder gain defined in self.param.encoder_gains and \(\mathrm{Q}\) the quantization operation.

Measure the rate of the decoder-side latent with the ARM:

\[\mathrm{R}(\hat{\mathbf{y}}) = -\log_2 p_{\psi}(\hat{\mathbf{y}}),\]

where \(p_{\psi}\) is given by the Auto-Regressive Module (ARM).

Upsample and synthesize the latent to get the output

\[\hat{\mathbf{x}} = f_{\theta}(f_{\upsilon}(\hat{\mathbf{y}})),\]

with \(f_{\psi}\) the Upsampling and \(f_{\theta}\) the Synthesis.

Parameters:

quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to "kumaraswamy".
quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to "softround".
soft_round_temperature (Tensor | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.3.
noise_parameter (Tensor | None) – noise distribution parameter. Defaults to 1.0.
AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.
flag_additional_outputs (bool) – True to fill CoolChicEncoderOutput['additional_data'] with many different quantities which can be used to analyze Cool-chic behavior. Defaults to False.

Returns:

Output of Cool-chic training forward pass.

Return type:

CoolChicEncoderOutput

get_param() → OrderedDict[str, Tensor][source]¶

Return a copy of the weights and biases inside the module.

Returns:: A copy of all weights & biases in the module.
Return type:: OrderedDict[str, Tensor]

set_param(param: OrderedDict[str, Tensor])[source]¶

Replace the current parameters of the module with param.

Parameters:: param (OrderedDict[str, Tensor]) – Parameters to be set.

initialize_latent_grids() → None[source]¶

Initialize the latent grids. The different tensors composing the latent grids must have already been created e.g. through torch.empty().

Return type:: None

reinitialize_parameters()[source]¶: Reinitialize in place the different parameters of a CoolChicEncoder namely the latent grids, the arm, the upsampling and the weights.

get_flops() → None[source]¶

Compute the number of MAC & parameters for the model. Update self.total_flops (integer describing the number of total MAC) and self.flops_str, a pretty string allowing to print the model complexity somewhere.

Attention

fvcore measures MAC (multiplication & accumulation) but calls it FLOP (floating point operation)… We do the same here and call everything FLOP even though it would be more accurate to use MAC.

Return type:: None

get_network_rate() → Tuple[DescriptorCoolChic, int][source]¶

Return the rate (in bits) associated to the parameters (weights and biases) of the different modules

Returns:: The rate (in bits) associated with the weights and biases of each module. Also return the total rate in bits.
Return type:: Tuple[DescriptorCoolChic, int]

get_network_quantization_step() → DescriptorCoolChic[source]¶

Return the quantization step associated to the parameters (weights and biases) of the different modules. Those quantization can be None if the model has not yet been quantized.

Returns:: The quantization step associated with the weights and biases of each module.
Return type:: DescriptorCoolChic

get_network_expgol_count() → DescriptorCoolChic[source]¶

Return the Exp-Golomb count parameter associated to the parameters (weights and biases) of the different modules. Those exp-golomb param can be None if the model has not yet been quantized.

Returns:: The Exp-Golomb count parameter associated with the weights and biases of each module.
Return type:: DescriptorCoolChic

str_complexity() → str[source]¶

Return a string describing the number of MAC (not mac per pixel) and the number of parameters for the different modules of CoolChic

Returns:: A pretty string about CoolChic complexity.
Return type:: str

get_total_mac_per_pixel() → float[source]¶

Count the number of Multiplication-Accumulation (MAC) per decoded pixel for this model.

Returns:: number of floating point operations per decoded pixel.
Return type:: float

to_device(device: Literal['cpu', 'cuda:0']) → None[source]¶

Push a model to a given device.

Parameters:: device (POSSIBLE_DEVICE) – The device on which the model should run.
Return type:: None

pretty_string(print_detailed_archi: bool = False) → str[source]¶

Get a pretty string representing the layer of a CoolChicEncoder

Parameters:: print_detailed_archi (bool) – True to print the detailed decoder architecture
Returns:: a pretty string ready to be printed out
Return type:: str

class CoolChicEncoderParameter[source]¶

Dataclass storing the parameters of a CoolChicEncoder.

Parameters:

img_size (Tuple[int, int]) – Height and width \((H, W)\) of the frame to be coded
layers_synthesis (List[str]) – Describes the architecture of the synthesis transform. See the synthesis documentation for more information.
n_ft_per_res (List[int]) – Number of latent features for each latent resolution i.e. n_ft_per_res[i] gives the number of channel \(C_i\) of the latent with resolution \(\frac{H}{2^i}, \frac{W}{2^i}\).
dim_arm (int, Optional) – Number of context pixels for the ARM. Also corresponds to the ARM hidden layer width. See the ARM documentation for more information. Defaults to 24
n_hidden_layers_arm (int, Optional) – Number of hidden layers in the ARM. Set n_hidden_layers_arm = 0 for a linear ARM. Defaults to 2.
ups_k_size (int, Optional) – Upsampling kernel size for the transposed convolutions. See the upsampling documentation for more information. Defaults to 8.
ups_preconcat_k_size (int, Optional) – Upsampling kernel size for the pre-concatenation convolutions. See the upsampling documentation for more information. Defaults to 7.
encoder_gain (int, Optional) – Multiply the latent by this value before quantization. See the documentation of Cool-chic forward pass. Defaults to 16.

latent_n_grids: int¶: Automatically computed, number of different latent resolutions

img_size: Tuple[int, int] | None = None¶: Height and width \((H, W)\) of the frame to be coded. Must be set using the set_image_size() function.

set_image_size(img_size: Tuple[int, int]) → None[source]¶

Parameters:: img_size (Tuple[int, int]) – Height and width \((H, W)\) of the frame to be coded
Return type:: None

pretty_string(coolchic_name: str = '') → str[source]¶

Return a pretty string presenting the CoolChicEncoderParameter.

Parameters:: coolchic_name (str) – Optional name added to the title. Only for display purpose. Defaults to “”.
Returns:: Pretty string ready to be printed.
Return type:: str

class CoolChicEncoderOutput[source]¶

TypedDict representing the output of CoolChicEncoder forward.

Parameters:

raw_out (Tensor) – Output of the synthesis \(([B, C, H, W])\).
rate (Tensor) – rate associated to each latent (in bits). Shape is \((N)\), with \(N\) the total number of latent variables.
additional_data (Dict[str, Any]) – Any other data required to compute some logs, stored inside a dictionary