Cool-chic Encoder¶
- class CoolChicEncoderParameter[source]¶
Dataclass storing the parameters of a
CoolChicEncoder.- Parameters:
layers_synthesis (
List[str]) – Describes the architecture of the synthesis transform. See the synthesis documentation for more information.linear_stabiliser_synth (
bool) – Flag indicating the usage of the linear stabiliser for the synthesis.ups_k_size (
int) – Upsampling kernel size for the transposed convolutions. See the upsampling documentation for more information.ups_preconcat_k_size (
int) – Upsampling kernel size for the pre-concatenation convolutions. See the upsampling documentation for more information.ifce_resolution (
Optional[Tuple[int,int]]) – Lowest and highest base two downsampling of the latent using the IFCEs. E.g., (0, 2) means latents between downsampling 1/2^0 and 1/2^2. Set to None to disable.output_feature_ifce (
int) – Number of output features of the IFCEs. Ignored if ifce_resolution is None.spatial_context_arm (
int) – Number of spatial contexts for the ARM.linear_stabiliser_arm (
bool) – Flag indicating the usage of the linear stabiliser for the ARMn_hidden_layers_arm (
int) – Number of hidden layers in the ARM. Set to zero for a linear ARM.latent_resolution (
Tuple[int,int]) – Lowest and highest base two downsampling of the latent grids. E.g., (0, 4) means 5 latent grids from downsampling 1/2^0 to 1/2^4.hyper_latent_resolution (
Optional[Tuple[int,int]]) – Identical to latent_resolution but for hyperlatent i.e., additional latent grids which are used only for the entropy modeling and not by the synthesis. Set to None to disableflag_common_randomness (
bool) – with resolution identical to the latent_resolution parameters.img_size (
Tuple[int,int]) – Height and width \((H, W)\) of the frame to be codedencoder_gain (
int) – Multiply the latent by this value before quantization. Defaults to 16.final_upsampling_type (
Literal[``”nearest”, ``"bilinear","bicubic"]) – the biggest latent grid is smaller than the input image, upsample it using the specified filter to the image size.
- class CoolChicEncoder[source]¶
CoolChicEncoder for a single frame.
- __init__(param)[source]¶
Instantiate a cool-chic encoder for one frame.
- Parameters:
param (
CoolChicEncoderParameter) – Architecture of the CoolChicEncoder. See the documentation of CoolChicEncoderParameter for more information
- forward(
- quantizer_noise_type='gaussian',
- quantizer_type='softround',
- soft_round_temperature=tensor(0.3500),
- noise_parameter=tensor(0.2200),
- AC_MAX_VAL=-1,
- flag_additional_outputs=False,
- no_common_randomness=False,
- only_common_randomness=False,
Perform CoolChicEncoder forward pass, to be used during the training. The main step are as follows:
Scale & quantize the encoder-side latent \(\mathbf{y}\) to get the decoder-side latent
\[\hat{\mathbf{y}} = \mathrm{Q}(\Gamma_{enc}\ \mathbf{y}),\]with \(\Gamma_{enc} \in \mathbb{R}\) a scalar encoder gain defined in
self.param.encoder_gainsand \(\mathrm{Q}\) the quantization operation.Measure the rate of the decoder-side latent with the ARM and IFCE:
\[\mathrm{R}(\hat{\mathbf{y}}) = -\log_2 p_{\psi}(\hat{\mathbf{y}}),\]where \(p_{\psi}\) is given by the Auto-Regressive Module (ARM).
Upsample and synthesize the latent to get the output
\[\hat{\mathbf{x}} = f_{\theta}(f_{\upsilon}(\hat{\mathbf{y}})),\]with \(f_{\psi}\) the Upsampling and \(f_{\theta}\) the Synthesis.
- Parameters:
quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to
"gaussian".quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to
"softround".soft_round_temperature (Tensor | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.35.
noise_parameter (Tensor | None) – noise distribution parameter. Defaults to 0.22.
AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.
flag_additional_outputs (bool) – True to fill
CoolChicEncoderOutput['additional_data']with many different quantities which can be used to analyze Cool-chic behavior. Defaults to False.no_common_randomness (bool)
only_common_randomness (bool)
- Returns:
Output of Cool-chic training forward pass.
- Return type:
CoolChicEncoderOutput
- get_quantize_latent(
- quantizer_noise_type='kumaraswamy',
- quantizer_type='softround',
- soft_round_temperature=tensor(0.3000),
- noise_parameter=tensor(1.),
- AC_MAX_VAL=-1,
Compute the quantized (i.e., decoder side) latent grids.
Return a list of tensors [hat{y}^0, hat{y}^1, …, hat{y}^{L-1}] where the shape of hat{y}^i is [1, 1, H^i, W^i].
- Parameters:
quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to
"gaussian".quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to
"softround".soft_round_temperature (Tensor | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.35.
noise_parameter (Tensor | None) – noise distribution parameter. Defaults to 0.22.
AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.
- Returns:
Quantized decoder-side latent
- Return type:
List[Tensor]
- get_rate_latent(decoder_side_latent)[source]¶
Compute the per-element rate of the latent. Also return expectation and scale. Rate, expectation and scale are flattened one-dimensional tensor.
- Parameters:
decoder_side_latent (List[Tensor]) – List of quantized latent grids.
- Returns:
1-dimensional tensor. Rate in bits, expectation, scale
- Return type:
Tuple[Tensor, Tensor, Tensor]
- get_latent_context(decoder_side_latent)[source]¶
Return the context for each latent. Also return the flat latent. Flat latent shape is [B], context shape is [B, C] with C the number of contexts per latent. This C-value context is obtained by concatening the spatial context and the IFCE context (if any).
- Parameters:
decoder_side_latent (List[Tensor]) – List of quantized latent grids.
- Returns:
_description_
- Return type:
Tuple[Tensor, Tensor]
- get_ifce_output(decoder_side_latent, intermediate_latent_ups)[source]¶
Forward the quantized latent into the different IFCEs to obtain the inter-feature context.
- Parameters:
decoder_side_latent (List[Tensor]) – List of quantized latent grids
intermediate_latent_ups (List[Tensor]) – List of the intermediate upsampling. At each index of the list, there is a dense tensor [1, C^i, H^i, W^i] representing the concatenation of all already decoded (and upsampled) latent grids.
- Returns:
Inter feature context. Shape is [B, Number of inter-feature context]
- Return type:
Tensor
- get_spatial_context_flat_latent(decoder_side_latent)[source]¶
Extract the spatial context (causal neighbors). Also return the (flattened) latent to code. Shape of flat_latent is [B], shape of the spatial context is [B, S] with S the number of spatial context per latent. spatial_context[i, :] is the context for flat_latent[i]
- Parameters:
decoder_side_latent (List[Tensor]) – List of quantized latent grids
- Returns:
Spatial context, flat latent.
- Return type:
Tuple[Tensor, Tensor]
- discard_hyperlatent(latents)[source]¶
Given a list of latents, remove those which are hyperlatent (i.e., they are not used to generate the image) and return them.
- Parameters:
latents (List[Tensor]) – List of all transmitted latents
- Returns:
List of latents which are not hyperlatent.
- Return type:
List[Tensor]
- rescale_output(syn_out)[source]¶
Perform a final upsampling (non-learned) so that the synthesis output is resized to self.param.img_size
- Parameters:
syn_out (Tensor) – Synthesis output, possibly at lower resolution than self.param.img_size
- Returns:
Upscaled output. Shape is [1, C, *self.param.img_size]
- Return type:
Tensor
- get_param()[source]¶
Return a copy of the weights and biases inside the module.
- Returns:
A copy of all weights & biases in the module.
- Return type:
OrderedDict[str, Tensor]
- set_param(param, strict=False)[source]¶
Replace the current parameters of the module with param.
- Parameters:
param (
OrderedDict[str,Tensor]) – Parameters to be set.
- initialize_latent_grids()[source]¶
Initialize the latent grids. The different tensors composing the latent grids must have already been created e.g. through
torch.empty().- Return type:
None
- reinitialize_parameters()[source]¶
Reinitialize in place the different parameters of a CoolChicEncoder namely the latent grids, the arm, the upsampling and the weights.
- get_flops()[source]¶
Compute the number of MAC & parameters for the model. Update
self.total_flops(integer describing the number of total MAC) andself.flops_str, a pretty string allowing to print the model complexity somewhere.Attention
fvcoremeasures MAC (multiplication & accumulation) but calls it FLOP (floating point operation)… We do the same here and call everything FLOP even though it would be more accurate to use MAC.- Return type:
None
- get_network_rate()[source]¶
Return the rate (in bits) associated to the parameters (weights and biases) of the different modules
- Returns:
The rate (in bits) associated with the weights and biases of each module. Also return the total rate in bits.
- Return type:
Tuple[DescriptorCoolChic, int]
- get_network_quantization_step()[source]¶
Return the quantization step associated to the parameters (weights and biases) of the different modules. Those quantization can be
Noneif the model has not yet been quantized.- Returns:
The quantization step associated with the weights and biases of each module.
- Return type:
DescriptorCoolChic
- get_network_expgol_count()[source]¶
Return the Exp-Golomb count parameter associated to the parameters (weights and biases) of the different modules. Those exp-golomb param can be
Noneif the model has not yet been quantized.- Returns:
The Exp-Golomb count parameter associated with the weights and biases of each module.
- Return type:
DescriptorCoolChic
- str_complexity()[source]¶
Return a string describing the number of MAC (not mac per pixel) and the number of parameters for the different modules of CoolChic
- Returns:
A pretty string about CoolChic complexity.
- Return type:
str
- get_total_mac_per_pixel()[source]¶
Count the number of Multiplication-Accumulation (MAC) per decoded pixel for this model.
- Returns:
number of floating point operations per decoded pixel.
- Return type:
float