Coding Structure¶

class CodingStructure[source]¶

Dataclass representing the organization of the video i.e. which frames are coded using which references.

A few examples:

# A low-delay P configuration
# I0
# \------> P1
#             \-------> P2
#                         \------> P3
#                                     \-------> P4
--n_frames=5 --intra_pos=0 --p_pos=1-4

# A hierarchical Random Access configuration, with a closed GOP
# I0
# \-------------------------------------------------------------------------------------> P8
# \----------------------------------------> B4 <----------------------------------------/
# \-----------------> B2 <------------------/  \------------------> B6 <-----------------/
# \------> B1 <------/  \-------> B3 <------/  \------> B5 <-------/  \------> B7 <------/
--n_frames=8 --intra_pos=0 --p_pos=-1

# A hierarchical Random Access configuration, with an open GOP
# I0                                                                                      I8
# \----------------------------------------> B4 <----------------------------------------/
# \-----------------> B2 <------------------/  \------------------> B6 <-----------------/
# \------> B1 <------/  \-------> B3 <------/  \------> B5 <-------/  \------> B7 <------/
--n_frames=8 --intra_pos=0,-1

# Or some very peculiar structures...
# I0
#   \---------------------------------------------------------------> P6
#   \-----------------------------> B3 <-----------------------------/  \-----------------> P8
#   \------> B1 <------------------/  \------> B4 <------------------/  \------> B7 <------/
#              \------> B2 <-------/             \------> B5 <-------/
--n_frames=8 --intra_pos=0 --p_pos=6,8

A coding is composed of a few hyper-parameters and most importantly a list of Frame describing the different frames to code.

Parameters:
  • n_frames (int) – Number of frames in the coding structure

  • frame_offset (int) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame.

  • intra_pos (List[int]) – Position of all the intra frames in display order

  • p_pos (List[int]) – Position of all the P frames in display order

  • seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".

frames: List[Frame]¶

All the frames to code, deduced from the GOP type, intra period and P period. Frames are index in display order (i.e. temporal order). frames[0] is the 1st frame, while frames[-1] is the last one.

compute_coding_struct(
n_frames: int,
intra_pos: List[int],
p_pos: List[int],
) List[Frame][source]¶

Construct a coding structure of n_frames. The algorithm works as follows.

Step 1:¶

Position all the intra frames following intra_pos.

Step 2:¶

Position all the P frames following p_pos. A P-frame use the closest frame in the past as a reference.

Step 3:¶

Automatically fill the remaining frames with hierarchical B-frames. This is achieved by iterating on the list of frames and inserting B-frames in between already added frames each time there is a gap. For instance:

frames = [I0, P4]

==> [I0, B2, P4] # Fill the middle frame ==> [I0, B1, B2, P4] # Fill the middle frame ==> [I0, B1, B2, B3 P4] # Fill the middle frame

param n_frames:

Number of frames in the coding structure

type n_frames:

int

param intra_pos:

Position of all the intra frames in display order

type intra_pos:

List[int]

param p_pos:

Position of all the P frames in display order

type p_pos:

List[int]

returns:

List of all the frames within the coding structure.

rtype:

List[Frame]

Parameters:
  • n_frames (int)

  • intra_pos (List[int])

  • p_pos (List[int])

Return type:

List[Frame]

pretty_structure_diagram() str[source]¶

Return a nice diagram presenting the coding structure. Like:

I0 -----------------------------------------------------> P8
\-------------------------> B4 <-------------------------/
 \----------> B2 <---------/ \----------> B6 <----------/
  \--> B1 <--/ \--> B3 <--/   \--> B5 <--/  \--> B7 <--/
Returns:

A string describing the coding structure. Ready to be printed.

Return type:

str

pretty_string(print_detailed_struct: bool = False) str[source]¶

Return a pretty string formatting the data within the class

Parameters:

print_detailed_struct (bool) – True to print the detailed coding structure

Returns:

a pretty string ready to be printed out

Return type:

str

get_number_of_frames() int[source]¶

Return the number of frames in the coding structure.

Returns:

Number of frames in the coding structure.

Return type:

int

get_max_depth() int[source]¶

Return the maximum depth of a coding configuration

Returns:

Maximum depth of the coding configuration

Return type:

int

get_all_frames_of_depth(
depth: int,
) List[Frame][source]¶

Return a list with all the frames for a given depth

Parameters:

depth (int) – Depth for which we want the frames.

Returns:

List of frames with the given depth

Return type:

List[Frame]

get_max_coding_order() int[source]¶

Return the maximum coding order of a coding configuration

Returns:

Maximum coding order of the coding configuration

Return type:

int

get_frame_from_coding_order(
coding_order: int,
) Frame | None[source]¶

Return the frame whose coding order is equal to coding_order. Return None if no frame has been found.

Parameters:

coding_order (int) – Coding order for which we want the frame.

Returns:

Frame whose coding order is equal to coding_order.

Return type:

Frame | None

get_max_display_order() int[source]¶

Return the maximum display order of a coding configuration

Returns:

Maximum display order of the coding configuration

Return type:

int

get_frame_from_display_order(
display_order: int,
) Frame | None[source]¶

Return the frame whose display order is equal to display_order. Return None if no frame has been found.

Parameters:

display_order (int) – Coding order for which we want the frame.

Returns:

Frame whose coding order is equal to display_order.

Return type:

Frame | None

get_all_frames_using_one_ref(
display_order_ref: int,
) List[Frame][source]¶

Return a list of frames using the frame <display_order_ref> as a reference.

Parameters:

display_order_ref (int) – Display order of the frame that is used as reference

Returns:

List of frames using one given frame as a reference.

Return type:

List[Frame]

class Frame[source]¶

Dataclass representing a frame to be encoded. It contains useful info like the display & coding indices, the indices of its references as well as the data of the decoded references and the original (i.e. uncompressed) frame.

Parameters:
  • coding_order (int) – Frame with coding_order=0 is coded first.

  • display_order (int) – Frame with display_order=0 is displayed first.

  • frame_offset (int) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame. This is only used to load the data + for logging purposes Defaults to 0.

  • depth (int) – Depth of the frame in the GOP. 0 for Intra, 1 for P-frame, 2 or more for B-frames. Roughly corresponds to the notion of temporal layers in conventional codecs. Defaults to 0.

  • seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".

  • data (Optional[FrameData]) – Data of the uncompressed image to be coded. Defaults to None.

  • index_references (List[int]) – Index of the frame(s) used as references, in display_order. Leave empty when no reference are available i.e. for I-frame. Defaults to [].

  • ref_data (List[FrameData]) – The actual data describing the decoded references. Leave empty when no reference are available i.e. for I-frame. Defaults to [].

frame_type: Literal['I', 'P', 'B']¶

Automatically set from the number of entry in self.index_references.

set_frame_data(data: FrameData) None[source]¶

Set the data representing the frame i.e. create the FrameData object describing the actual frame.

Parameters:

data (FrameData) – FrameData object representing the frame.

Return type:

None

set_refs_data(refs_data: List[FrameData]) None[source]¶

Set the data representing the reference(s).

Parameters:

refs_data (List[FrameData]) – Data of the reference(s)

Return type:

None

pretty_string(show_header: bool = False, show_bottom_line: bool = False) str[source]¶

Return a string describing the frame.

Parameters:
  • show_header (bool) – Also print column nam. Defaults to False.

  • show_bottom_line (bool) – Print a line below the frame description to close the array. Defaults to False.

Returns:

Pretty string describing the frame

Return type:

str

class FrameData[source]¶

FrameData is a dataclass storing the actual pixel values of a frame and a few additional information about its size, bitdepth of color space.

Parameters:
  • bitdepth – Bitdepth, should be in``[8, 9, 10, 11, 12, 13, 14, 15, 16]``.

  • frame_data_type – Data type, either "rgb", "yuv420", "yuv444".

  • data – The actual RGB or YUV data

data: Any¶

Union[Tensor, DictTensorYUV]

img_size: Tuple[int, int]¶

Height & width of the video \((H, W)\)

n_pixels: int¶

Number of pixels \(H \times W\)

to_string() str[source]¶

Pretty string describing the frame data.

Return type:

str