Orange AIVC

Some insightful videos about how AIVC compress a video sequence. The examples are presented on the video sequence Sports_1080P-6710, extracted from the CLIC 2021 dataset.
The videos embedded in this page work best in Firefox or Safari. Refresh the page if the videos get out of sync.

Overall diagram of the AIVC codec

Videos from the paper

The videos presented here are from the Fig. 2 in the paper AIVC: Artificial Intelligence for Video Coding, Ladune et al.

Original video $\color{black}{\mathbf{x}_t}$
Optical flow $\color{black}{\mathbf{v}_p}$
Optical flow $\color{black}{\mathbf{v}_f}$
Coding mode selection $\color{black}{\boldsymbol{\alpha}}$
Skip mode contribution $\color{black}{(1 - \boldsymbol{\alpha}) \odot \tilde{\mathbf{x}}_t}$
Decoded video $\color{black}{\hat{\mathbf{x}}_t}$

We also provide supplementary examples which displays some other quantities at stake during the coding of a video sequence.

Bi-directional prediction weighting $\color{black}{\boldsymbol{\beta}}$
Temporal prediction $\color{black}{\tilde{x}_t}$

Conditional coding behavior

Conditional coding plays a key role in AIVC compression performance. In order to better understand its behavior, we present some insightful videos based on the separate synthesis of the analysis and conditioning MNet latent variables. We'll have a look at one optical flow $\color{black}{\mathbf{v}_p}$ when it is synthesized from:

Analysis latent variable only i.e. no decoder-side info used
Conditioning latent variable only i.e. not a single bit conveyed
Both latent variables

Optical flow $\color{black}{\mathbf{v}_p}$ Only from conditioning latent variable Decoder-side only!
Optical flow $\color{black}{\mathbf{v}_p}$ Only from analysis latent variable
Optical flow $\color{black}{\mathbf{v}_p}$ From all latent variables

Recall that the conditioning is a decoder-side only transform, so the first video represents the motion information inferred at the decoder without a single bit received. Most of the small motions in the background are inferred at the decoder thanks to the conditioning transform. Yet, the motion of the girl in the foreground is too complex to be anticipated at the decoder. Thus, the analysis transform transmits motion information solely for the girl.