Frame Encoder¶

class FrameEncoder[source]¶

A FrameEncoder is the object containing everything required to encode a video frame or an image. It is composed of one or more CoolChicEncoder.

__init__( coolchic_enc_param: Dict[Literal['residue', 'motion'], CoolChicEncoderParameter], warp_parameter: WarpParameter | None = None, frame_type: Literal['I', 'P', 'B'] = 'I', frame_data_type: Literal['rgb', 'yuv420', 'yuv444', 'flow'] = 'rgb', bitdepth: Literal[8, 9, 10, 11, 12, 13, 14, 15, 16] = 8, index_references: List[int] = [], frame_display_index: int = 0, )[source]¶

Parameters:

coolchic_enc_param (Dict[Literal['residue', 'motion'], ~enc.component.coolchic.CoolChicEncoderParameter]) – Parameters for the underlying CoolChicEncoders
warp_parameter (WarpParameter | None) – Parameters for the Warper. Can be None for intra frame.
frame_type (Literal['I', 'P', 'B']) – More info in coding_structure.py. Defaults to “I”.
frame_data_type (Literal['rgb', 'yuv420', 'yuv444', 'flow']) – More info in coding_structure.py. Defaults to “rgb”
bitdepth (Literal[8, 9, 10, 11, 12, 13, 14, 15, 16]) – More info in coding_structure.py. Defaults to 8.
index_references (List[int]) – List of the display index of the references. Defaults to []
frame_display_index (int) – display index of the frame being encoded.

forward( reference_frames: List[Tensor] | None = None, quantizer_noise_type: Literal['kumaraswamy', 'gaussian', 'none'] = 'kumaraswamy', quantizer_type: Literal['softround_alone', 'softround', 'hardround', 'ste', 'none'] = 'softround', soft_round_temperature: Tensor | None = tensor(0.3000), noise_parameter: Tensor | None = tensor(1.), AC_MAX_VAL: int = -1, flag_additional_outputs: bool = False, ) → FrameEncoderOutput[source]¶

Perform the entire forward pass of a video frame / image.

Simulate Cool-chic decoding to obtain both the decoded image \(\hat{\mathbf{x}}\) as a \((B, 3, H, W)\) tensor and its associated rate \(\mathrm{R}(\hat{\mathbf{x}})\) as as \((N)\) tensor`, where \(N\) is the number of latent pixels. The rate is given in bits.
Simulate the saving of the image to a file (Optional).
Only if the model has been set in test mode e.g. self.set_to_eval() . Take into account that \(\hat{\mathbf{x}}\) is a float Tensor, which is gonna be saved as integer values in a file.

\[\hat{\mathbf{x}}_{saved} = \mathtt{round}(\Delta_q \ \hat{\mathbf{x}}) / \Delta_q, \text{ with } \Delta_q = 2^{bitdepth} - 1\]
Downscale to YUV 420 (Optional). Only if the required output format is YUV420. The current output is a dense Tensor. Downscale the last two channels to obtain a YUV420-like representation. This is done with a nearest neighbor downsampling.
Clamp the output to be in \([0, 1]\).

Parameters:

reference_frames (List[Tensor] | None) – List of tensors representing the reference frames. Can be set to None if no reference frame is available. Default to None.
quantizer_noise_type (Literal['kumaraswamy', 'gaussian', 'none']) – Defaults to "kumaraswamy".
quantizer_type (Literal['softround_alone', 'softround', 'hardround', 'ste', 'none']) – Defaults to "softround".
soft_round_temperature (Tensor | None) – Soft round temperature. This is used for softround modes as well as the ste mode to simulate the derivative in the backward. Defaults to 0.3.
noise_parameter (Tensor | None) – noise distribution parameter. Defaults to 1.0.
AC_MAX_VAL (int) – If different from -1, clamp the value to be in \([-AC\_MAX\_VAL; AC\_MAX\_VAL + 1]\) to write the actual bitstream. Defaults to -1.
flag_additional_outputs (bool) – True to fill CoolChicEncoderOutput['additional_data'] with many different quantities which can be used to analyze Cool-chic behavior. Defaults to False.

Returns:

Output of the FrameEncoder for the forward pass.

Return type:

FrameEncoderOutput

get_param() → OrderedDict[Literal['residue', 'motion'], Tensor][source]¶

Return a copy of the weights and biases inside the module.

Returns:: A copy of all weights & biases in the module.
Return type:: OrderedDict[NAME_COOLCHIC_ENC, Tensor]

set_param( param: OrderedDict[Literal['residue', 'motion'], Tensor], )[source]¶

Replace the current parameters of the module with param.

Parameters:: param (OrderedDict[NAME_COOLCHIC_ENC, Tensor]) – Parameters to be set.

reinitialize_parameters() → None[source]¶

Reinitialize in place the different parameters of a FrameEncoder.

Return type:: None

set_to_train() → None[source]¶

Set the current model to training mode, in place.

Return type:: None

set_to_eval() → None[source]¶

Set the current model to test mode, in place. This affects latent quantization, forcing it to mode=”hardround” in eval mode. For video coding, it also affects the optical flows value, quantizing them at a given subpixel accuracy, defined in self.warp_parameter.fractional_accuracy.

Return type:: None

set_global_flow(global_flow_1: Tensor, global_flow_2: Tensor) → None[source]¶

Set the value of the global flows.

The global flows are 2-element tensors. The first one is the horizontal displacement and the second one the vertical displacement.

Parameters:

global_flow_1 (Tensor) – Value of global flow for reference 1. Must have 2 elements.
global_flow_2 (Tensor) – Value of global flow for reference 2. Must have 2 elements.

Return type:

None

get_network_rate() → Tuple[Dict[Literal['residue', 'motion'], DescriptorCoolChic], int][source]¶

Return the rate (in bits) associated to the parameters (weights and biases) of the different modules

Returns:: The rate (in bits) associated with the weights and biases of each module of each cool-chic decoder. Also return the overall rate in bits.
Return type:: Tuple[Dict[NAME_COOLCHIC_ENC, DescriptorCoolChic], int]

get_network_quantization_step() → Dict[Literal['residue', 'motion'], DescriptorCoolChic][source]¶

Return the quantization step associated to the parameters (weights and biases) of the different modules of each cool-chic decoder. Those quantization can be None if the model has not yet been quantized.

E.g. {“residue”: {“arm”: 4, “upsampling”: 12, “synthesis”: 1}}

Returns:: The quantization step associated with the weights and biases of each module of each cool-chic decoder.
Return type:: Dict[NAME_COOLCHIC_ENC, DescriptorCoolChic]

get_network_expgol_count() → Dict[Literal['residue', 'motion'], DescriptorCoolChic][source]¶

Return the Exp-Golomb count parameter associated to the parameters (weights and biases) of the different modules of each cool-chic decoder. Those exp-golomb param can be None if the model has not yet been quantized.

E.g. {“residue”: {“arm”: 4, “upsampling”: 12, “synthesis”: 1}}

Returns:: The exp-golomb count parameter associated with the weights and biases of each module of each cool-chic decoder.
Return type:: Dict[NAME_COOLCHIC_ENC, DescriptorCoolChic]

get_total_mac_per_pixel() → float[source]¶

Count the number of Multiplication-Accumulation (MAC) per decoded pixel for this model.

Returns:: number of floating point operations per decoded pixel.
Return type:: float

get_warp_mac_per_pixel() → float[source]¶

Compute the mac per pixel of a warping. The formula is derived from the paper “Efficient Sub-pixel Motion Compensation in Learned Video Codecs” from Ladune et al.

Coefficient are supposed pre-computed. Warping is applyed on blocks

Returns:: Mac per pixel of the warping
Return type:: float

to_device(device: Literal['cpu', 'cuda:0']) → None[source]¶

Push a model to a given device.

Parameters:: device (Literal['cpu', 'cuda:0']) – The device on which the model should run.
Return type:: None

save( path_file: str, frame_encoder_manager: FrameEncoderManager | None = None, ) → None[source]¶

Save the FrameEncoder into a bytes buffer and return it.: Optionally save a frame_encoder_manager alongside the current frame encoder to keep track of the training time, record loss etc.

Parameters:

path_file (str) – Where to save the FrameEncoder
frame_encoder_manager (FrameEncoderManager | None) – Contains (among other things) the rate constraint \(\lambda\) and description of the warm-up preset. It is also used to track the total encoding time and encoding iterations.
Returns – Bytes representing the saved coolchic model

Return type:

None

pretty_string(print_detailed_archi: bool = False) → str[source]¶

Get a pretty string representing the architectures of the different CoolChicEncoder composing the current FrameEncoder.

Parameters:: print_detailed_archi (bool) – True to print the detailed decoder architecture
Returns:: a pretty string ready to be printed out
Return type:: str

pretty_string_param() → str[source]¶

Get a pretty string representing the parameters of the different CoolChicEncoderParameters parameterising the current FrameEncoder

Return type:: str

class FrameEncoderOutput[source]¶: Dataclass representing the output of FrameEncoder forward.

load_frame_encoder( path_file: str, ) → Tuple[FrameEncoder, FrameEncoderManager | None][source]¶

From already loaded raw bytes, load & return a CoolChicEncoder

Parameters:: path_file (str) – Path of the FrameEncoder to be loaded
Returns:: Tuple with a FrameEncoder loaded by the function and an optional FrameEncoderManager
Return type:: Tuple[FrameEncoder, FrameEncoderManager | None]