Cool-chic Encoder¶
- class CoolChicEncoder[source]¶
CoolChicEncoder for a single frame.
- __init__(param: CoolChicEncoderParameter)[source]¶
Instantiate a cool-chic encoder for one frame.
- Parameters:
param (
CoolChicEncoderParameter
) – Architecture of the CoolChicEncoder. See the documentation of CoolChicEncoderParameter for more information
- forward(
- quantizer_noise_type: Literal['kumaraswamy', 'gaussian', 'none'] = 'kumaraswamy',
- quantizer_type: Literal['softround_alone', 'softround', 'hardround', 'ste', 'none'] = 'softround',
- soft_round_temperature: float | None = 0.3,
- noise_parameter: float | None = 1.0,
- AC_MAX_VAL: int = -1,
- flag_additional_outputs: bool = False,
Perform CoolChicEncoder forward pass, to be used during the training. The main step are as follows:
Scale & quantize the encoder-side latent \(\mathbf{y}\) to get the decoder-side latent
\[\hat{\mathbf{y}} = \mathrm{Q}(\Gamma_{enc}\ \mathbf{y}),\]with \(\Gamma_{enc} \in \mathbb{R}\) a scalar encoder gain defined in
self.param.encoder_gains
and \(\mathrm{Q}\) the quantization operation.Measure the rate of the decoder-side latent with the ARM:
\[\mathrm{R}(\hat{\mathbf{y}}) = -\log_2 p_{\psi}(\hat{\mathbf{y}}),\]where \(p_{\psi}\) is given by the Auto-Regressive Module (ARM).
Upsample and synthesize the latent to get the output
\[\hat{\mathbf{x}} = f_{\theta}(f_{\upsilon}(\hat{\mathbf{y}})),\]with \(f_{\psi}\) the Upsampling and \(f_{\theta}\) the Synthesis.
- Parameters:
quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to
"kumaraswamy"
.quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to
"softround"
.soft_round_temperature (float | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.3.
noise_parameter (float | None) – noise distribution parameter. Defaults to 1.0.
AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.
flag_additional_outputs (bool) – True to fill
CoolChicEncoderOutput['additional_data']
with many different quantities which can be used to analyze Cool-chic behavior. Defaults to False.
- Returns:
Output of Cool-chic training forward pass.
- Return type:
- get_param() OrderedDict[str, Tensor] [source]¶
Return a copy of the weights and biases inside the module.
- Returns:
A copy of all weights & biases in the module.
- Return type:
OrderedDict[str, Tensor]
- set_param(param: OrderedDict[str, Tensor])[source]¶
Replace the current parameters of the module with param.
- Parameters:
param (
OrderedDict[str, Tensor]
) – Parameters to be set.
- initialize_latent_grids() None [source]¶
Initialize the latent grids. The different tensors composing the latent grids must have already been created e.g. through
torch.empty()
.- Return type:
None
- reinitialize_parameters()[source]¶
Reinitialize in place the different parameters of a CoolChicEncoder namely the latent grids, the arm, the upsampling and the weights.
- get_flops() None [source]¶
Compute the number of MAC & parameters for the model. Update
self.total_flops
(integer describing the number of total MAC) andself.flops_str
, a pretty string allowing to print the model complexity somewhere.Attention
fvcore
measures MAC (multiplication & accumulation) but calls it FLOP (floating point operation)… We do the same here and call everything FLOP even though it would be more accurate to use MAC.- Return type:
None
- get_network_rate() DescriptorCoolChic [source]¶
Return the rate (in bits) associated to the parameters (weights and biases) of the different modules
- Returns:
The rate (in bits) associated with the weights and biases of each module
- Return type:
DescriptorCoolChic
- get_network_quantization_step() DescriptorCoolChic [source]¶
Return the quantization step associated to the parameters (weights and biases) of the different modules. Those quantization can be
None
if the model has not yet been quantized.- Returns:
The quantization step associated with the weights and biases of each module.
- Return type:
DescriptorCoolChic
- get_network_expgol_count() DescriptorCoolChic [source]¶
Return the Exp-Golomb count parameter associated to the parameters (weights and biases) of the different modules. Those quantization can be
None
if the model has not yet been quantized.- Returns:
The Exp-Golomb count parameter associated with the weights and biases of each module.
- Return type:
DescriptorCoolChic
- str_complexity() str [source]¶
Return a string describing the number of MAC (not mac per pixel) and the number of parameters for the different modules of CoolChic
- Returns:
A pretty string about CoolChic complexity.
- Return type:
str
- class CoolChicEncoderParameter[source]¶
Dataclass storing the parameters of a
CoolChicEncoder
.- Parameters:
img_size (
Tuple[int, int]
) – Height and width \((H, W)\) of the frame to be codedlayers_synthesis (
List[str]
) – Describes the architecture of the synthesis transform. See the synthesis documentation for more information.n_ft_per_res (
List[int]
) – Number of latent features for each latent resolution i.e.n_ft_per_res[i]
gives the number of channel \(C_i\) of the latent with resolution \(\frac{H}{2^i}, \frac{W}{2^i}\).dim_arm (
int, Optional
) – Number of context pixels for the ARM. Also corresponds to the ARM hidden layer width. See the ARM documentation for more information. Defaults to 24n_hidden_layers_arm (
int, Optional
) – Number of hidden layers in the ARM. Setn_hidden_layers_arm = 0
for a linear ARM. Defaults to 2.upsampling_kernel_size (
int, Optional
) – Kernel size for the upsampler. See the upsampling documentation for more information. Defaults to 8.static_upsampling_kernel (
bool, Optional
) – Set this flag toTrue
to prevent learning the upsampling kernel. Defaults toFalse
.encoder_gain (
int, Optional
) – Multiply the latent by this value before quantization. See the documentation of Cool-chic forward pass. Defaults to 16.
- latent_n_grids: int¶
Automatically computed, number of different latent resolutions
- img_size: Tuple[int, int] | None = None¶
Height and width \((H, W)\) of the frame to be coded. Must be set using the
set_image_size()
function.
- class CoolChicEncoderOutput[source]¶
TypedDict
representing the output of CoolChicEncoder forward.- Parameters:
raw_out (
Tensor
) – Output of the synthesis \(([B, C, H, W])\).rate (
Tensor
) – rate associated to each latent (in bits). Shape is \((N)\), with \(N\) the total number of latent variables.additional_data (
Dict[str, Any]
) – Any other data required to compute some logs, stored inside a dictionary