Coding Structure¶
- class CodingStructure[source]¶
Dataclass representing the organization of the video i.e. which frames are coded using which references.
A few examples:
# A low-delay P configuration # I0 ---> P1 ---> P2 ---> P3 ---> P4 ---> P5 ---> P6 ---> P7 ---> P8 intra_period=8 p_period=1 # A hierarchical Random Access configuration # I0 -----------------------------------------------------> P8 # \-------------------------> B4 <-------------------------/ # \----------> B2 <---------/ \----------> B6 <----------/ # \--> B1 <--/ \--> B3 <--/ \--> B5 <--/ \--> B7 <--/ intra_period=8 p_period=8 # There is no more prediction from I0 to P8. Instead the GOP in split in # half so that there is no inter frame with reference further than --p_period # I0 -----------------------> P4 ------------------------> P8 # \----------> B2 <---------/ \----------> B6 <----------/ # \--> B1 <--/ \--> B3 <--/ \--> B5 <--/ \--> B7 <--/ intra_period=8 p_period=4
A coding is composed of a few hyper-parameters and most importantly a list of
Frame
describing the different frames to code.- Parameters:
intra_period (
int
) – Number of inter frames in the GOP. As such, the first (intra) frame of two successive GOPs would be spaced by intra_period inter frames. Set this to 0 for all intra coding.p_period (
int
) – Distance to the furthest P prediction in the GOP. Set this to 1 for low-delay P or tointra_period
for the usual random access configuration.seq_name (
str
) – Name of the video. Mainly used for logging purposes. Defaults to""
.
- frames: List[Frame]¶
All the frames to code, deduced from the GOP type, intra period and P period. Frames are index in display order (i.e. temporal order). frames[0] is the 1st frame, while frames[-1] is the last one.
- compute_gop(
- intra_period: int,
- p_period: int,
Return a list of frames with one intra followed by
intra_period
inter frames. The relation between the inter frames is implied by p_period. See examples in the class description.- Parameters:
intra_period (int) – Number of inter frames in the GOP.
p_period (int) – Distance between I0 and the first P frame or between subsequent P-frames.
- Returns:
List describing the frames to code.
- Return type:
List[Frame]
- pretty_string() str [source]¶
Return a pretty string formatting the data within the class
- Return type:
str
- get_number_of_frames() int [source]¶
Return the number of frames in the coding structure.
- Returns:
Number of frames in the coding structure.
- Return type:
int
- get_max_depth() int [source]¶
Return the maximum depth of a coding configuration
- Returns:
Maximum depth of the coding configuration
- Return type:
int
- get_all_frames_of_depth(
- depth: int,
Return a list with all the frames for a given depth
- Parameters:
depth (int) – Depth for which we want the frames.
- Returns:
List of frames with the given depth
- Return type:
List[Frame]
- get_max_coding_order() int [source]¶
Return the maximum coding order of a coding configuration
- Returns:
Maximum coding order of the coding configuration
- Return type:
int
- get_frame_from_coding_order(
- coding_order: int,
Return the frame whose coding order is equal to
coding_order
. ReturnNone
if no frame has been found.- Parameters:
coding_order (int) – Coding order for which we want the frame.
- Returns:
Frame whose coding order is equal to
coding_order
.- Return type:
Frame | None
- get_max_display_order() int [source]¶
Return the maximum display order of a coding configuration
- Returns:
Maximum display order of the coding configuration
- Return type:
int
- get_frame_from_display_order(
- display_order: int,
Return the frame whose display order is equal to
display_order
. Return None if no frame has been found.- Parameters:
display_order (int) – Coding order for which we want the frame.
- Returns:
Frame whose coding order is equal to
display_order
.- Return type:
Frame | None
- set_encoded_flag(coding_order: int, flag_value: bool) None [source]¶
Set the flag
self.already_encode
of the frame whose coding order iscoding_order
to the valueflag_value
.- Parameters:
coding_order (int) – Coding order of the frame for which we’ll change the flag
flag_value (bool) – Value to be set
- Return type:
None
- unload_all_decoded_data() None [source]¶
Remove the data describing the decoded data from the memory. This is used before saving the coding structure. The decoded data can be retrieved by re-inferring the trained model.
- Return type:
None
- unload_all_original_frames() None [source]¶
Remove the data describing the original frame from the memory. This is used before saving the coding structure. The original frames can be retrieved by reloading the sequence
- Return type:
None
- unload_all_references_data() None [source]¶
Remove the data describing all the references from the memory. This is used before saving the coding structure. The reference data can be retrieved by re-inferring the trained model.
- Return type:
None
- get_frame_depth_in_gop(idx_frame: int) int [source]¶
Return the depth of a frame with index <idx_frame> within a hierarchical GOP.
- Some notes:
idx_frame == 0
always corresponds to an intra frame i.e. depth = 0idx_frame == p_period
is the P-frame i.e. depth = 1This should be used separately for the successive chained GOPs.
- Parameters:
idx_frame (int) – Display order of the frame in the GOP.
p_period – P-period. Should be a power of two.
- Returns:
Depth of the frame in the GOP.
- Return type:
int
- class Frame[source]¶
Dataclass representing a frame to be encoded. It contains useful info like the display & coding indices, the indices of its references as well as the data of the decoded references and the original (i.e. uncompressed) frame.
- Parameters:
coding_order (
int
) – Frame withcoding_order=0
is coded first.display_order (
int
) – Frame withdisplay_order=0
is displayed first.depth (
int
) – Depth of the frame in the GOP. 0 for Intra, 1 for P-frame, 2 or more for B-frames. Roughly corresponds to the notion of temporal layers in conventional codecs. Defaults to 0.seq_name (
str
) – Name of the video. Mainly used for logging purposes. Defaults to""
.data (
Optional[FrameData]
) – Data of the uncompressed image to be coded. Defaults toNone
.already_encoded (
bool
) –True
if the frame has already been coded by the VideoEncoder. Defaults to Falseindex_references (
List[int]
) – Index of the frame(s) used as references, in display_order. Leave empty when no reference are available i.e. for I-frame. Defaults to[]
.ref_data (
List[FrameData]
) – The actual data describing the decoded references. Leave empty when no reference are available i.e. for I-frame. Defaults to[]
.
- frame_type: Literal['I', 'P', 'B']¶
Automatically set from the number of entry in
self.index_references
.
- set_frame_data(
- data: Tensor | DictTensorYUV,
- frame_data_type: Literal['rgb', 'yuv420', 'yuv444'],
- bitdepth: Literal[8, 10],
Set the data representing the frame i.e. create the
FrameData
object describing the actual frame.- Parameters:
data (Tensor | DictTensorYUV) – RGB or YUV value of the frame.
frame_data_type (Literal['rgb', 'yuv420', 'yuv444']) – Data type.
bitdepth (Literal[8, 10]) – Bitdepth.
- Return type:
None
- set_decoded_data(decoded_data: FrameData) None [source]¶
Set the data representing the decoded frame.
- Parameters:
refs_data – Data of the reference(s)
decoded_data (FrameData)
- Return type:
None
- set_refs_data(refs_data: List[FrameData]) None [source]¶
Set the data representing the reference(s).
- Parameters:
refs_data (List[FrameData]) – Data of the reference(s)
- Return type:
None
- class FrameData[source]¶
FrameData is a dataclass storing the actual pixel values of a frame and a few additional information about its size, bitdepth of color space.
- Parameters:
bitdepth (
POSSIBLE_BITDEPTH
) – Bitdepth, either"8"
or"10"
.frame_data_type (
FRAME_DATA_TYPE
) – Data type, either"rgb"
,"yuv420"
,"yuv444"
.data (
Union[Tensor, DictTensorYUV]
) – The actual RGB or YUV data
- img_size: Tuple[int, int]¶
Height & width of the video \((H, W)\)
- n_pixels: int¶
Number of pixels \(H \times W\)
- class DictTensorYUV[source]¶
TypedDict
representing a YUV420 frame..Hint
torch.jit
requires I/O of modules to be eitherTensor
,List
orDict
. So we don’t use a python dataclass here and rely onTypedDict
instead.- Parameters:
y (
Tensor
) – \(([B, 1, H, W])\).u (
Tensor
) – \(([B, 1, \frac{H}{2}, \frac{W}{2}])\).v (
Tensor
) – \(([B, 1, \frac{H}{2}, \frac{W}{2}])\).
- yuv_dict_to_device(
- yuv: DictTensorYUV,
- device: Literal['cpu', 'cuda:0'],
Send a
DictTensor
to a device.- Parameters:
yuv (DictTensorYUV) – Data to be sent to a device.
device (Literal['cpu', 'cuda:0']) – The requested device
- Returns:
Data on the appropriate device.
- Return type:
- convert_444_to_420(yuv444: Tensor) DictTensorYUV [source]¶
From a 4D YUV 444 tensor \((B, 3, H, W)\), return a
DictTensorYUV
. The U and V tensors are down sampled using a nearest neighbor downsampling.- Parameters:
yuv444 (Tensor) – YUV444 data \((B, 3, H, W)\)
- Returns:
YUV420 dictionary of 4D tensors
- Return type:
- convert_420_to_444(yuv420: DictTensorYUV) Tensor [source]¶
Convert a DictTensorYUV to a 4D tensor:math:(B, 3, H, W). The U and V tensors are up sampled using a nearest neighbor upsampling
- Parameters:
yuv420 (DictTensorYUV) – YUV420 dictionary of 4D tensor
- Returns:
YUV444 Tensor \((B, 3, H, W)\)
- Return type:
Tensor