Coding Structure¶

class CodingStructure[source]¶

Dataclass representing the organization of the video i.e. which frames are coded using which references.

A few examples:

# A low-delay P configuration
# I0 ---> P1 ---> P2 ---> P3 ---> P4 ---> P5 ---> P6 ---> P7 ---> P8
intra_period=8 p_period=1

# A hierarchical Random Access configuration
# I0 -----------------------------------------------------> P8
# \-------------------------> B4 <-------------------------/
#  \----------> B2 <---------/ \----------> B6 <----------/
#   \--> B1 <--/ \--> B3 <--/   \--> B5 <--/  \--> B7 <--/
intra_period=8 p_period=8

# There is no more prediction from I0 to P8. Instead the GOP in split in
# half so that there is no inter frame with reference further than --p_period

# I0 -----------------------> P4 ------------------------> P8
#  \----------> B2 <---------/ \----------> B6 <----------/
#   \--> B1 <--/ \--> B3 <--/   \--> B5 <--/  \--> B7 <--/
intra_period=8 p_period=4

A coding is composed of a few hyper-parameters and most importantly a list of Frame describing the different frames to code.

Parameters:
  • intra_period (int) – Number of inter frames in the GOP. As such, the first (intra) frame of two successive GOPs would be spaced by intra_period inter frames. Set this to 0 for all intra coding.

  • p_period (int) – Distance to the furthest P prediction in the GOP. Set this to 1 for low-delay P or to intra_period for the usual random access configuration.

  • seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".

frames: List[Frame]¶

All the frames to code, deduced from the GOP type, intra period and P period. Frames are index in display order (i.e. temporal order). frames[0] is the 1st frame, while frames[-1] is the last one.

compute_gop(
intra_period: int,
p_period: int,
) List[Frame][source]¶

Return a list of frames with one intra followed by intra_period inter frames. The relation between the inter frames is implied by p_period. See examples in the class description.

Parameters:
  • intra_period (int) – Number of inter frames in the GOP.

  • p_period (int) – Distance between I0 and the first P frame or between subsequent P-frames.

Returns:

List describing the frames to code.

Return type:

List[Frame]

pretty_string() str[source]¶

Return a pretty string formatting the data within the class

Return type:

str

get_number_of_frames() int[source]¶

Return the number of frames in the coding structure.

Returns:

Number of frames in the coding structure.

Return type:

int

get_max_depth() int[source]¶

Return the maximum depth of a coding configuration

Returns:

Maximum depth of the coding configuration

Return type:

int

get_all_frames_of_depth(
depth: int,
) List[Frame][source]¶

Return a list with all the frames for a given depth

Parameters:

depth (int) – Depth for which we want the frames.

Returns:

List of frames with the given depth

Return type:

List[Frame]

get_max_coding_order() int[source]¶

Return the maximum coding order of a coding configuration

Returns:

Maximum coding order of the coding configuration

Return type:

int

get_frame_from_coding_order(
coding_order: int,
) Frame | None[source]¶

Return the frame whose coding order is equal to coding_order. Return None if no frame has been found.

Parameters:

coding_order (int) – Coding order for which we want the frame.

Returns:

Frame whose coding order is equal to coding_order.

Return type:

Frame | None

get_max_display_order() int[source]¶

Return the maximum display order of a coding configuration

Returns:

Maximum display order of the coding configuration

Return type:

int

get_frame_from_display_order(
display_order: int,
) Frame | None[source]¶

Return the frame whose display order is equal to display_order. Return None if no frame has been found.

Parameters:

display_order (int) – Coding order for which we want the frame.

Returns:

Frame whose coding order is equal to display_order.

Return type:

Frame | None

set_encoded_flag(coding_order: int, flag_value: bool) None[source]¶

Set the flag self.already_encode of the frame whose coding order is coding_order to the value flag_value.

Parameters:
  • coding_order (int) – Coding order of the frame for which we’ll change the flag

  • flag_value (bool) – Value to be set

Return type:

None

unload_all_decoded_data() None[source]¶

Remove the data describing the decoded data from the memory. This is used before saving the coding structure. The decoded data can be retrieved by re-inferring the trained model.

Return type:

None

unload_all_original_frames() None[source]¶

Remove the data describing the original frame from the memory. This is used before saving the coding structure. The original frames can be retrieved by reloading the sequence

Return type:

None

unload_all_references_data() None[source]¶

Remove the data describing all the references from the memory. This is used before saving the coding structure. The reference data can be retrieved by re-inferring the trained model.

Return type:

None

get_frame_depth_in_gop(idx_frame: int) int[source]¶

Return the depth of a frame with index <idx_frame> within a hierarchical GOP.

Some notes:
  • idx_frame == 0 always corresponds to an intra frame i.e. depth = 0

  • idx_frame == p_period is the P-frame i.e. depth = 1

  • This should be used separately for the successive chained GOPs.

Parameters:
  • idx_frame (int) – Display order of the frame in the GOP.

  • p_period – P-period. Should be a power of two.

Returns:

Depth of the frame in the GOP.

Return type:

int

class Frame[source]¶

Dataclass representing a frame to be encoded. It contains useful info like the display & coding indices, the indices of its references as well as the data of the decoded references and the original (i.e. uncompressed) frame.

Parameters:
  • coding_order (int) – Frame with coding_order=0 is coded first.

  • display_order (int) – Frame with display_order=0 is displayed first.

  • depth (int) – Depth of the frame in the GOP. 0 for Intra, 1 for P-frame, 2 or more for B-frames. Roughly corresponds to the notion of temporal layers in conventional codecs. Defaults to 0.

  • seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".

  • data (Optional[FrameData]) – Data of the uncompressed image to be coded. Defaults to None.

  • already_encoded (bool) – True if the frame has already been coded by the VideoEncoder. Defaults to False

  • index_references (List[int]) – Index of the frame(s) used as references, in display_order. Leave empty when no reference are available i.e. for I-frame. Defaults to [].

  • ref_data (List[FrameData]) – The actual data describing the decoded references. Leave empty when no reference are available i.e. for I-frame. Defaults to [].

frame_type: Literal['I', 'P', 'B']¶

Automatically set from the number of entry in self.index_references.

set_frame_data(
data: Tensor | DictTensorYUV,
frame_data_type: Literal['rgb', 'yuv420', 'yuv444'],
bitdepth: Literal[8, 10],
) None[source]¶

Set the data representing the frame i.e. create the FrameData object describing the actual frame.

Parameters:
  • data (Tensor | DictTensorYUV) – RGB or YUV value of the frame.

  • frame_data_type (Literal['rgb', 'yuv420', 'yuv444']) – Data type.

  • bitdepth (Literal[8, 10]) – Bitdepth.

Return type:

None

set_decoded_data(decoded_data: FrameData) None[source]¶

Set the data representing the decoded frame.

Parameters:
  • refs_data – Data of the reference(s)

  • decoded_data (FrameData)

Return type:

None

set_refs_data(refs_data: List[FrameData]) None[source]¶

Set the data representing the reference(s).

Parameters:

refs_data (List[FrameData]) – Data of the reference(s)

Return type:

None

upsample_reference_to_444() None[source]¶

Upsample the references from 420 to 444 in place. Do nothing if this is already the case.

Return type:

None

to_device(device: Literal['cpu', 'cuda:0']) None[source]¶

Push the data attribute to the relevant device in place.

Parameters:

device (Literal['cpu', 'cuda:0']) – The device on which the model should run.

Return type:

None

class FrameData[source]¶

FrameData is a dataclass storing the actual pixel values of a frame and a few additional information about its size, bitdepth of color space.

Parameters:
  • bitdepth (POSSIBLE_BITDEPTH) – Bitdepth, either "8" or "10".

  • frame_data_type (FRAME_DATA_TYPE) – Data type, either "rgb", "yuv420", "yuv444".

  • data (Union[Tensor, DictTensorYUV]) – The actual RGB or YUV data

img_size: Tuple[int, int]¶

Height & width of the video \((H, W)\)

n_pixels: int¶

Number of pixels \(H \times W\)

to_device(device: Literal['cpu', 'cuda:0']) None[source]¶

Push the data attribute to the relevant device in place.

Parameters:

device (Literal['cpu', 'cuda:0']) – The device on which the model should run.

Return type:

None

class DictTensorYUV[source]¶

TypedDict representing a YUV420 frame..

Hint

torch.jit requires I/O of modules to be either Tensor, List or Dict. So we don’t use a python dataclass here and rely on TypedDict instead.

Parameters:
  • y (Tensor) – \(([B, 1, H, W])\).

  • u (Tensor) – \(([B, 1, \frac{H}{2}, \frac{W}{2}])\).

  • v (Tensor) – \(([B, 1, \frac{H}{2}, \frac{W}{2}])\).

yuv_dict_to_device(
yuv: DictTensorYUV,
device: Literal['cpu', 'cuda:0'],
) DictTensorYUV[source]¶

Send a DictTensor to a device.

Parameters:
  • yuv (DictTensorYUV) – Data to be sent to a device.

  • device (Literal['cpu', 'cuda:0']) – The requested device

Returns:

Data on the appropriate device.

Return type:

DictTensorYUV

convert_444_to_420(yuv444: Tensor) DictTensorYUV[source]¶

From a 4D YUV 444 tensor \((B, 3, H, W)\), return a DictTensorYUV. The U and V tensors are down sampled using a nearest neighbor downsampling.

Parameters:

yuv444 (Tensor) – YUV444 data \((B, 3, H, W)\)

Returns:

YUV420 dictionary of 4D tensors

Return type:

DictTensorYUV

convert_420_to_444(yuv420: DictTensorYUV) Tensor[source]¶

Convert a DictTensorYUV to a 4D tensor:math:(B, 3, H, W). The U and V tensors are up sampled using a nearest neighbor upsampling

Parameters:

yuv420 (DictTensorYUV) – YUV420 dictionary of 4D tensor

Returns:

YUV444 Tensor \((B, 3, H, W)\)

Return type:

Tensor