Some insightful videos about how AIVC compress a video sequence. The examples are presented on the video sequence Sports_1080P-6710, extracted from the CLIC 2021 dataset.
The videos embedded in this page work best in Firefox or Safari. Refresh the page if the videos get out of sync.


Image with just alt text
Overall diagram of the AIVC codec


Videos from the paper

The videos presented here are from the Fig. 2 in the paper AIVC: Artificial Intelligence for Video Coding, Ladune et al.


Original video $\color{black}{\mathbf{x}_t}$
Optical flow $\color{black}{\mathbf{v}_p}$
Optical flow $\color{black}{\mathbf{v}_f}$
Coding mode
selection $\color{black}{\boldsymbol{\alpha}}$
Skip mode contribution
$\color{black}{(1 - \boldsymbol{\alpha}) \odot \tilde{\mathbf{x}}_t}$
Decoded video $\color{black}{\hat{\mathbf{x}}_t}$


We also provide supplementary examples which displays some other quantities at stake during the coding of a video sequence.


Bi-directional prediction
weighting $\color{black}{\boldsymbol{\beta}}$
Temporal prediction
$\color{black}{\tilde{x}_t}$


Conditional coding behavior

Conditional coding plays a key role in AIVC compression performance. In order to better understand its behavior, we present some insightful videos based on the separate synthesis of the analysis and conditioning MNet latent variables. We'll have a look at one optical flow $\color{black}{\mathbf{v}_p}$ when it is synthesized from:
Optical flow $\color{black}{\mathbf{v}_p}$
Only from conditioning
latent variable
Decoder-side only!
Optical flow $\color{black}{\mathbf{v}_p}$
Only from analysis
latent variable
Optical flow $\color{black}{\mathbf{v}_p}$
From all latent variables


Recall that the conditioning is a decoder-side only transform, so the first video represents the motion information inferred at the decoder without a single bit received. Most of the small motions in the background are inferred at the decoder thanks to the conditioning transform. Yet, the motion of the girl in the foreground is too complex to be anticipated at the decoder. Thus, the analysis transform transmits motion information solely for the girl.