Encoder configuration¶
As with conventional codecs, there is a trade-off between Cool-chic encoding
time and compression performance. The encoding settings of Cool-chic are set in
a configuration file. Examples of such configuration files are located in cfg/enc/intra/.
They include the following parameters:
Parameter |
Role |
Example value |
|---|---|---|
|
Training preset |
|
|
Initial learning rate |
|
|
Number of training iterations |
|
|
Number of independent encodings |
|
|
Optimize the MSE ( |
|
Tip
Each parameter listed in the configuration file can be overridden through a command line argument:
(venv) ~/Cool-Chic python coolchic/encode.py \
--enc_cfg=enc/cfg/intra_fast_10k.cfg # fast_10k.cfg has start_lr=1e-2
--start_lr=1e-3 # This override the value present in fast_10k.cfg
Some existing configuration files¶
Some configuration files are proposed in cfg/enc/intra/. Longer encoding gives
slightly better compression results. We provide comprehensive results for all
encoding configurations on the encoding complexity page.
Name |
Description |
|---|---|
|
Reasonable compression performance & fast training |
|
Balance compression performance & training duration |
|
Best performance at the cost of a longer training |
|
Optimize MSE and Wasserstein Distance for better subjective quality |
Tuning¶
The tuning parameters --tune allows to select the distortion metric(s) to be
optimized. When the mode --tune=mse is selected, the Mean Squared Error is
optimized. When --tune=wasserstein the distortion becomes a combination of
MSE and Wasserstein Distance, as proposed in Good, Cheap, and Fast: Overfitted
Image Compression with Wasserstein Distortion, Ballé et al.
Attention
Using --tune=wasserstein also introduces 7 features of common
randomness, as described in the aforementioned Ballé’s paper.
Presets¶
Cool-chic encoding works with tweakable presets i.e. different training parameters. Currently available presets are:
c3x_intraInspired by C3: High-performance and low-complexity neural compression from a single image or video, Kim et al. It is composed of two main phases:additive noise model and softround for the latent quantization
Actual quantization with softround in the backward
debugExtremely fast preset with very bad performance only for debugging purposes.
All presets feature a decreasing learning rate starting from start_lr.
The number of iterations in the first (and longest) phase of the c3x preset is
set using n_itr.
In order to circumvent some training irregularities, it is possible to perform
several independent encoding, keeping only the best one. We call that a training
loop. The number of training loops is set by n_train_loops.