Coding Structure¶

class CodingStructure[source]¶

Dataclass representing the organization of the video i.e. which frames are coded using which references.

A few examples:

# A low-delay P configuration
# I0
# \------> P1
#             \-------> P2
#                         \------> P3
#                                     \-------> P4
--n_frames=5 --intra_pos=0 --p_pos=1-4

# A hierarchical Random Access configuration, with a closed GOP
# I0
# \-------------------------------------------------------------------------------------> P8
# \----------------------------------------> B4 <----------------------------------------/
# \-----------------> B2 <------------------/  \------------------> B6 <-----------------/
# \------> B1 <------/  \-------> B3 <------/  \------> B5 <-------/  \------> B7 <------/
--n_frames=8 --intra_pos=0 --p_pos=-1

# A hierarchical Random Access configuration, with an open GOP
# I0                                                                                      I8
# \----------------------------------------> B4 <----------------------------------------/
# \-----------------> B2 <------------------/  \------------------> B6 <-----------------/
# \------> B1 <------/  \-------> B3 <------/  \------> B5 <-------/  \------> B7 <------/
--n_frames=8 --intra_pos=0,-1

# Or some very peculiar structures...
# I0
#   \---------------------------------------------------------------> P6
#   \-----------------------------> B3 <-----------------------------/  \-----------------> P8
#   \------> B1 <------------------/  \------> B4 <------------------/  \------> B7 <------/
#              \------> B2 <-------/             \------> B5 <-------/
--n_frames=8 --intra_pos=0 --p_pos=6,8

A coding is composed of a few hyper-parameters and most importantly a list of Frame describing the different frames to code.

Parameters:

n_frames (int) – Number of frames in the coding structure
frame_offset (int) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame.
intra_pos (List[int]) – Position of all the intra frames in display order
p_pos (List[int]) – Position of all the P frames in display order
seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".

frames: List[Frame]¶: All the frames to code, deduced from the GOP type, intra period and P period. Frames are index in display order (i.e. temporal order). frames[0] is the 1st frame, while frames[-1] is the last one.

compute_coding_struct( n_frames: int, intra_pos: List[int], p_pos: List[int], ) → List[Frame][source]¶

Construct a coding structure of n_frames. The algorithm works as follows.

Step 1:¶

Position all the intra frames following intra_pos.

Step 2:¶

Position all the P frames following p_pos. A P-frame use the closest frame in the past as a reference.

Step 3:¶

Automatically fill the remaining frames with hierarchical B-frames. This is achieved by iterating on the list of frames and inserting B-frames in between already added frames each time there is a gap. For instance:

frames = [I0, P4]
==> [I0, B2, P4] # Fill the middle frame ==> [I0, B1, B2, P4] # Fill the middle frame ==> [I0, B1, B2, B3 P4] # Fill the middle frame

param n_frames:: Number of frames in the coding structure
type n_frames:: int
param intra_pos:: Position of all the intra frames in display order
type intra_pos:: List[int]
param p_pos:: Position of all the P frames in display order
type p_pos:: List[int]
returns:: List of all the frames within the coding structure.
rtype:: List[Frame]

Parameters:

n_frames (int)
intra_pos (List[int])
p_pos (List[int])

Return type:

List[Frame]

pretty_structure_diagram() → str[source]¶

Return a nice diagram presenting the coding structure. Like:

I0 -----------------------------------------------------> P8
\-------------------------> B4 <-------------------------/
 \----------> B2 <---------/ \----------> B6 <----------/
  \--> B1 <--/ \--> B3 <--/   \--> B5 <--/  \--> B7 <--/

Returns:: A string describing the coding structure. Ready to be printed.
Return type:: str

pretty_string(print_detailed_struct: bool = False) → str[source]¶

Return a pretty string formatting the data within the class

Parameters:: print_detailed_struct (bool) – True to print the detailed coding structure
Returns:: a pretty string ready to be printed out
Return type:: str

get_number_of_frames() → int[source]¶

Return the number of frames in the coding structure.

Returns:: Number of frames in the coding structure.
Return type:: int

get_max_depth() → int[source]¶

Return the maximum depth of a coding configuration

Returns:: Maximum depth of the coding configuration
Return type:: int

get_all_frames_of_depth( depth: int, ) → List[Frame][source]¶

Return a list with all the frames for a given depth

Parameters:: depth (int) – Depth for which we want the frames.
Returns:: List of frames with the given depth
Return type:: List[Frame]

get_max_coding_order() → int[source]¶

Return the maximum coding order of a coding configuration

Returns:: Maximum coding order of the coding configuration
Return type:: int

get_frame_from_coding_order( coding_order: int, ) → Frame | None[source]¶

Return the frame whose coding order is equal to coding_order. Return None if no frame has been found.

Parameters:: coding_order (int) – Coding order for which we want the frame.
Returns:: Frame whose coding order is equal to coding_order.
Return type:: Frame | None

get_max_display_order() → int[source]¶

Return the maximum display order of a coding configuration

Returns:: Maximum display order of the coding configuration
Return type:: int

get_frame_from_display_order( display_order: int, ) → Frame | None[source]¶

Return the frame whose display order is equal to display_order. Return None if no frame has been found.

Parameters:: display_order (int) – Coding order for which we want the frame.
Returns:: Frame whose coding order is equal to display_order.
Return type:: Frame | None

get_all_frames_using_one_ref( display_order_ref: int, ) → List[Frame][source]¶

Return a list of frames using the frame <display_order_ref> as a reference.

Parameters:: display_order_ref (int) – Display order of the frame that is used as reference
Returns:: List of frames using one given frame as a reference.
Return type:: List[Frame]

class Frame[source]¶

Dataclass representing a frame to be encoded. It contains useful info like the display & coding indices, the indices of its references as well as the data of the decoded references and the original (i.e. uncompressed) frame.

Parameters:

coding_order (int) – Frame with coding_order=0 is coded first.
display_order (int) – Frame with display_order=0 is displayed first.
frame_offset (int) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame. This is only used to load the data + for logging purposes Defaults to 0.
depth (int) – Depth of the frame in the GOP. 0 for Intra, 1 for P-frame, 2 or more for B-frames. Roughly corresponds to the notion of temporal layers in conventional codecs. Defaults to 0.
seq_name (str) – Name of the video. Mainly used for logging purposes. Defaults to "".
data (Optional[FrameData]) – Data of the uncompressed image to be coded. Defaults to None.
index_references (List[int]) – Index of the frame(s) used as references, in display_order. Leave empty when no reference are available i.e. for I-frame. Defaults to [].
ref_data (List[FrameData]) – The actual data describing the decoded references. Leave empty when no reference are available i.e. for I-frame. Defaults to [].

frame_type: Literal['I', 'P', 'B']¶: Automatically set from the number of entry in self.index_references.

set_frame_data(data: FrameData) → None[source]¶

Set the data representing the frame i.e. create the FrameData object describing the actual frame.

Parameters:: data (FrameData) – FrameData object representing the frame.
Return type:: None

set_refs_data(refs_data: List[FrameData]) → None[source]¶

Set the data representing the reference(s).

Parameters:: refs_data (List[FrameData]) – Data of the reference(s)
Return type:: None

pretty_string(show_header: bool = False, show_bottom_line: bool = False) → str[source]¶

Return a string describing the frame.

Parameters:

show_header (bool) – Also print column nam. Defaults to False.
show_bottom_line (bool) – Print a line below the frame description to close the array. Defaults to False.

Returns:

Pretty string describing the frame

Return type:

str

class FrameData[source]¶

FrameData is a dataclass storing the actual pixel values of a frame and a few additional information about its size, bitdepth of color space.

Parameters:

bitdepth – Bitdepth, should be in``[8, 9, 10, 11, 12, 13, 14, 15, 16]``.
frame_data_type – Data type, either "rgb", "yuv420", "yuv444".
data – The actual RGB or YUV data

data: Any¶: Union[Tensor, DictTensorYUV]

img_size: Tuple[int, int]¶: Height & width of the video \((H, W)\)

n_pixels: int¶: Number of pixels \(H \times W\)

to_string() → str[source]¶

Pretty string describing the frame data.

Return type:: str