Coding Structure¶
- class CodingStructure[source]¶
Dataclass representing the organization of the video i.e. which frames are coded using which references.
A few examples:
# A low-delay P configuration # I0 # \------> P1 # \-------> P2 # \------> P3 # \-------> P4 --n_frames=5 --intra_pos=0 --p_pos=1-4 # A hierarchical Random Access configuration, with a closed GOP # I0 # \-------------------------------------------------------------------------------------> P8 # \----------------------------------------> B4 <----------------------------------------/ # \-----------------> B2 <------------------/ \------------------> B6 <-----------------/ # \------> B1 <------/ \-------> B3 <------/ \------> B5 <-------/ \------> B7 <------/ --n_frames=8 --intra_pos=0 --p_pos=-1 # A hierarchical Random Access configuration, with an open GOP # I0 I8 # \----------------------------------------> B4 <----------------------------------------/ # \-----------------> B2 <------------------/ \------------------> B6 <-----------------/ # \------> B1 <------/ \-------> B3 <------/ \------> B5 <-------/ \------> B7 <------/ --n_frames=8 --intra_pos=0,-1 # Or some very peculiar structures... # I0 # \---------------------------------------------------------------> P6 # \-----------------------------> B3 <-----------------------------/ \-----------------> P8 # \------> B1 <------------------/ \------> B4 <------------------/ \------> B7 <------/ # \------> B2 <-------/ \------> B5 <-------/ --n_frames=8 --intra_pos=0 --p_pos=6,8
A coding is composed of a few hyper-parameters and most importantly a list of
Frame
describing the different frames to code.- Parameters:
n_frames (
int
) – Number of frames in the coding structureframe_offset (
int
) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame.intra_pos (
List[int]
) – Position of all the intra frames in display orderp_pos (
List[int]
) – Position of all the P frames in display orderseq_name (
str
) – Name of the video. Mainly used for logging purposes. Defaults to""
.
- frames: List[Frame]¶
All the frames to code, deduced from the GOP type, intra period and P period. Frames are index in display order (i.e. temporal order). frames[0] is the 1st frame, while frames[-1] is the last one.
- compute_coding_struct(
- n_frames: int,
- intra_pos: List[int],
- p_pos: List[int],
Construct a coding structure of n_frames. The algorithm works as follows.
Step 1:¶
Position all the intra frames following
intra_pos
.Step 2:¶
Position all the P frames following
p_pos
. A P-frame use the closest frame in the past as a reference.Step 3:¶
Automatically fill the remaining frames with hierarchical B-frames. This is achieved by iterating on the list of frames and inserting B-frames in between already added frames each time there is a gap. For instance:
- frames = [I0, P4]
==> [I0, B2, P4] # Fill the middle frame ==> [I0, B1, B2, P4] # Fill the middle frame ==> [I0, B1, B2, B3 P4] # Fill the middle frame
- param n_frames:
Number of frames in the coding structure
- type n_frames:
int
- param intra_pos:
Position of all the intra frames in display order
- type intra_pos:
List[int]
- param p_pos:
Position of all the P frames in display order
- type p_pos:
List[int]
- returns:
List of all the frames within the coding structure.
- rtype:
List[Frame]
- Parameters:
n_frames (int)
intra_pos (List[int])
p_pos (List[int])
- Return type:
List[Frame]
- pretty_structure_diagram() str [source]¶
Return a nice diagram presenting the coding structure. Like:
I0 -----------------------------------------------------> P8 \-------------------------> B4 <-------------------------/ \----------> B2 <---------/ \----------> B6 <----------/ \--> B1 <--/ \--> B3 <--/ \--> B5 <--/ \--> B7 <--/
- Returns:
A string describing the coding structure. Ready to be printed.
- Return type:
str
- pretty_string(print_detailed_struct: bool = False) str [source]¶
Return a pretty string formatting the data within the class
- Parameters:
print_detailed_struct (bool) – True to print the detailed coding structure
- Returns:
a pretty string ready to be printed out
- Return type:
str
- get_number_of_frames() int [source]¶
Return the number of frames in the coding structure.
- Returns:
Number of frames in the coding structure.
- Return type:
int
- get_max_depth() int [source]¶
Return the maximum depth of a coding configuration
- Returns:
Maximum depth of the coding configuration
- Return type:
int
- get_all_frames_of_depth(
- depth: int,
Return a list with all the frames for a given depth
- Parameters:
depth (int) – Depth for which we want the frames.
- Returns:
List of frames with the given depth
- Return type:
List[Frame]
- get_max_coding_order() int [source]¶
Return the maximum coding order of a coding configuration
- Returns:
Maximum coding order of the coding configuration
- Return type:
int
- get_frame_from_coding_order(
- coding_order: int,
Return the frame whose coding order is equal to
coding_order
. ReturnNone
if no frame has been found.- Parameters:
coding_order (int) – Coding order for which we want the frame.
- Returns:
Frame whose coding order is equal to
coding_order
.- Return type:
Frame | None
- get_max_display_order() int [source]¶
Return the maximum display order of a coding configuration
- Returns:
Maximum display order of the coding configuration
- Return type:
int
- get_frame_from_display_order(
- display_order: int,
Return the frame whose display order is equal to
display_order
. Return None if no frame has been found.- Parameters:
display_order (int) – Coding order for which we want the frame.
- Returns:
Frame whose coding order is equal to
display_order
.- Return type:
Frame | None
- get_all_frames_using_one_ref(
- display_order_ref: int,
Return a list of frames using the frame <display_order_ref> as a reference.
- Parameters:
display_order_ref (int) – Display order of the frame that is used as reference
- Returns:
List of frames using one given frame as a reference.
- Return type:
List[Frame]
- class Frame[source]¶
Dataclass representing a frame to be encoded. It contains useful info like the display & coding indices, the indices of its references as well as the data of the decoded references and the original (i.e. uncompressed) frame.
- Parameters:
coding_order (
int
) – Frame withcoding_order=0
is coded first.display_order (
int
) – Frame withdisplay_order=0
is displayed first.frame_offset (
int
) – Shift the position of the 0-th frame of the video. If frame_offset=15 skip the first 15 frames of the video. That is the display index 0 corresponds to the 16th frame. This is only used to load the data + for logging purposes Defaults to 0.depth (
int
) – Depth of the frame in the GOP. 0 for Intra, 1 for P-frame, 2 or more for B-frames. Roughly corresponds to the notion of temporal layers in conventional codecs. Defaults to 0.seq_name (
str
) – Name of the video. Mainly used for logging purposes. Defaults to""
.data (
Optional[FrameData]
) – Data of the uncompressed image to be coded. Defaults toNone
.index_references (
List[int]
) – Index of the frame(s) used as references, in display_order. Leave empty when no reference are available i.e. for I-frame. Defaults to[]
.ref_data (
List[FrameData]
) – The actual data describing the decoded references. Leave empty when no reference are available i.e. for I-frame. Defaults to[]
.
- frame_type: Literal['I', 'P', 'B']¶
Automatically set from the number of entry in
self.index_references
.
- set_frame_data(data: FrameData) None [source]¶
Set the data representing the frame i.e. create the
FrameData
object describing the actual frame.- Parameters:
data (FrameData) – FrameData object representing the frame.
- Return type:
None
- set_refs_data(refs_data: List[FrameData]) None [source]¶
Set the data representing the reference(s).
- Parameters:
refs_data (List[FrameData]) – Data of the reference(s)
- Return type:
None
- pretty_string(show_header: bool = False, show_bottom_line: bool = False) str [source]¶
Return a string describing the frame.
- Parameters:
show_header (bool) – Also print column nam. Defaults to False.
show_bottom_line (bool) – Print a line below the frame description to close the array. Defaults to False.
- Returns:
Pretty string describing the frame
- Return type:
str
- class FrameData[source]¶
FrameData is a dataclass storing the actual pixel values of a frame and a few additional information about its size, bitdepth of color space.
- Parameters:
bitdepth – Bitdepth, should be in``[8, 9, 10, 11, 12, 13, 14, 15, 16]``.
frame_data_type – Data type, either
"rgb"
,"yuv420"
,"yuv444"
.data – The actual RGB or YUV data
- data: Any¶
Union[Tensor, DictTensorYUV]
- img_size: Tuple[int, int]¶
Height & width of the video \((H, W)\)
- n_pixels: int¶
Number of pixels \(H \times W\)