Some insightful videos about how AIVC compress a video sequence. The examples are presented on the video sequence Sports_1080P-6710, extracted from the CLIC 2021 dataset.
The videos embedded in this page work best in Firefox or Safari. Refresh the page if the videos get out of sync.
The videos embedded in this page work best in Firefox or Safari. Refresh the page if the videos get out of sync.
Overall diagram of the AIVC codec
Videos from the paper
The videos presented here are from the Fig. 2 in the paper AIVC: Artificial
Intelligence for Video Coding, Ladune et al.
Original video $\color{black}{\mathbf{x}_t}$
|
|
Optical flow $\color{black}{\mathbf{v}_p}$
|
|
Optical flow $\color{black}{\mathbf{v}_f}$
|
|
Coding mode
selection $\color{black}{\boldsymbol{\alpha}}$
|
|
Skip mode contribution
$\color{black}{(1 - \boldsymbol{\alpha}) \odot \tilde{\mathbf{x}}_t}$
|
|
Decoded video $\color{black}{\hat{\mathbf{x}}_t}$
|
We also provide supplementary examples which displays some other quantities at
stake during the coding of a video sequence.
Bi-directional prediction
weighting $\color{black}{\boldsymbol{\beta}}$
|
|
Temporal prediction
$\color{black}{\tilde{x}_t}$
|
Conditional coding behavior
Conditional coding plays a key role in AIVC compression performance. In order to
better understand its behavior, we present some insightful videos based on the
separate synthesis of the analysis and conditioning MNet latent variables. We'll
have a look at one optical flow $\color{black}{\mathbf{v}_p}$ when it is synthesized from:
- Analysis latent variable only i.e. no decoder-side info used
- Conditioning latent variable only i.e. not a single bit conveyed
- Both latent variables
Optical flow $\color{black}{\mathbf{v}_p}$
Only from conditioning
latent variable
Decoder-side only!
|
|
Optical flow $\color{black}{\mathbf{v}_p}$
Only from analysis
latent variable
|
|
Optical flow $\color{black}{\mathbf{v}_p}$
From all latent variables
|
Recall that the conditioning is a decoder-side only transform, so the first
video represents the motion information inferred at the decoder without a
single bit received. Most of the small motions in the background are inferred at
the decoder thanks to the conditioning transform. Yet, the motion of the girl in
the foreground is too complex to be anticipated at the decoder. Thus, the
analysis transform transmits motion information solely for the girl.