Cool-chic Encoder

class CoolChicEncoderParameter[source]

Dataclass storing the parameters of a CoolChicEncoder.

Parameters:
  • layers_synthesis (List[str]) – Describes the architecture of the synthesis transform. See the synthesis documentation for more information.

  • linear_stabiliser_synth (bool) – Flag indicating the usage of the linear stabiliser for the synthesis.

  • ups_k_size (int) – Upsampling kernel size for the transposed convolutions. See the upsampling documentation for more information.

  • ups_preconcat_k_size (int) – Upsampling kernel size for the pre-concatenation convolutions. See the upsampling documentation for more information.

  • ifce_resolution (Optional[Tuple[int, int]]) – Lowest and highest base two downsampling of the latent using the IFCEs. E.g., (0, 2) means latents between downsampling 1/2^0 and 1/2^2. Set to None to disable.

  • output_feature_ifce (int) – Number of output features of the IFCEs. Ignored if ifce_resolution is None.

  • spatial_context_arm (int) – Number of spatial contexts for the ARM.

  • linear_stabiliser_arm (bool) – Flag indicating the usage of the linear stabiliser for the ARM

  • n_hidden_layers_arm (int) – Number of hidden layers in the ARM. Set to zero for a linear ARM.

  • latent_resolution (Tuple[int, int]) – Lowest and highest base two downsampling of the latent grids. E.g., (0, 4) means 5 latent grids from downsampling 1/2^0 to 1/2^4.

  • hyper_latent_resolution (Optional[Tuple[int, int]]) – Identical to latent_resolution but for hyperlatent i.e., additional latent grids which are used only for the entropy modeling and not by the synthesis. Set to None to disable

  • flag_common_randomness (bool) – with resolution identical to the latent_resolution parameters.

  • img_size (Tuple[int, int]) – Height and width \((H, W)\) of the frame to be coded

  • encoder_gain (int) – Multiply the latent by this value before quantization. Defaults to 16.

  • final_upsampling_type (Literal[``”nearest”, ``"bilinear", "bicubic"]) – the biggest latent grid is smaller than the input image, upsample it using the specified filter to the image size.

pretty_string()[source]

Return a pretty string presenting the CoolChicEncoderParameter.

Return type:

str

class CoolChicEncoder[source]

CoolChicEncoder for a single frame.

__init__(param)[source]

Instantiate a cool-chic encoder for one frame.

Parameters:

param (CoolChicEncoderParameter) – Architecture of the CoolChicEncoder. See the documentation of CoolChicEncoderParameter for more information

forward(
quantizer_noise_type='gaussian',
quantizer_type='softround',
soft_round_temperature=tensor(0.3500),
noise_parameter=tensor(0.2200),
AC_MAX_VAL=-1,
flag_additional_outputs=False,
no_common_randomness=False,
only_common_randomness=False,
)[source]

Perform CoolChicEncoder forward pass, to be used during the training. The main step are as follows:

  1. Scale & quantize the encoder-side latent \(\mathbf{y}\) to get the decoder-side latent

    \[\hat{\mathbf{y}} = \mathrm{Q}(\Gamma_{enc}\ \mathbf{y}),\]

    with \(\Gamma_{enc} \in \mathbb{R}\) a scalar encoder gain defined in self.param.encoder_gains and \(\mathrm{Q}\) the quantization operation.

  2. Measure the rate of the decoder-side latent with the ARM and IFCE:

    \[\mathrm{R}(\hat{\mathbf{y}}) = -\log_2 p_{\psi}(\hat{\mathbf{y}}),\]

    where \(p_{\psi}\) is given by the Auto-Regressive Module (ARM).

  3. Upsample and synthesize the latent to get the output

    \[\hat{\mathbf{x}} = f_{\theta}(f_{\upsilon}(\hat{\mathbf{y}})),\]

    with \(f_{\psi}\) the Upsampling and \(f_{\theta}\) the Synthesis.

Parameters:
  • quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to "gaussian".

  • quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to "softround".

  • soft_round_temperature (Tensor | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.35.

  • noise_parameter (Tensor | None) – noise distribution parameter. Defaults to 0.22.

  • AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.

  • flag_additional_outputs (bool) – True to fill CoolChicEncoderOutput['additional_data'] with many different quantities which can be used to analyze Cool-chic behavior. Defaults to False.

  • no_common_randomness (bool)

  • only_common_randomness (bool)

Returns:

Output of Cool-chic training forward pass.

Return type:

CoolChicEncoderOutput

get_param()[source]

Return a copy of the weights and biases inside the module.

Returns:

A copy of all weights & biases in the module.

Return type:

OrderedDict[str, Tensor]

set_param(param)[source]

Replace the current parameters of the module with param.

Parameters:

param (OrderedDict[str, Tensor]) – Parameters to be set.

initialize_latent_grids()[source]

Initialize the latent grids. The different tensors composing the latent grids must have already been created e.g. through torch.empty().

Return type:

None

reinitialize_parameters()[source]

Reinitialize in place the different parameters of a CoolChicEncoder namely the latent grids, the arm, the upsampling and the weights.

get_flops()[source]

Compute the number of MAC & parameters for the model. Update self.total_flops (integer describing the number of total MAC) and self.flops_str, a pretty string allowing to print the model complexity somewhere.

Attention

fvcore measures MAC (multiplication & accumulation) but calls it FLOP (floating point operation)… We do the same here and call everything FLOP even though it would be more accurate to use MAC.

Return type:

None

get_network_rate()[source]

Return the rate (in bits) associated to the parameters (weights and biases) of the different modules

Returns:

The rate (in bits) associated with the weights and biases of each module. Also return the total rate in bits.

Return type:

Tuple[DescriptorCoolChic, int]

get_network_quantization_step()[source]

Return the quantization step associated to the parameters (weights and biases) of the different modules. Those quantization can be None if the model has not yet been quantized.

Returns:

The quantization step associated with the weights and biases of each module.

Return type:

DescriptorCoolChic

get_network_expgol_count()[source]

Return the Exp-Golomb count parameter associated to the parameters (weights and biases) of the different modules. Those exp-golomb param can be None if the model has not yet been quantized.

Returns:

The Exp-Golomb count parameter associated with the weights and biases of each module.

Return type:

DescriptorCoolChic

str_complexity()[source]

Return a string describing the number of MAC (not mac per pixel) and the number of parameters for the different modules of CoolChic

Returns:

A pretty string about CoolChic complexity.

Return type:

str

get_total_mac_per_pixel()[source]

Count the number of Multiplication-Accumulation (MAC) per decoded pixel for this model.

Returns:

number of floating point operations per decoded pixel.

Return type:

float

to_device(device)[source]

Push a model to a given device.

Parameters:

device (device)

Return type:

None

pretty_string()[source]

Get a pretty string detailing the complexity of a CoolChicEncoder

Returns:

a pretty string ready to be printed out

Return type:

str