Data Release Note

v1.0.0: merge_divide_and_maxwell_splits

The DIVIDE-3k dataset was originally proposed in DOVER, as the first dataset with aesthetic and technical labels in addition to overall labels. In this dataset, we invite trained subjects to conduct a subjective study in-lab.

In the project ExplainableVQA, we further invite the same group of trained subjects to provide 13 dimensions of explanation-level labels for the DIVIDE-3k (i.e. the MaxWell database). These subjects also further labeled around 1K new videos to expand the database.

As the videos in MaxWell is a superset of the videos in the DIVIDE-3k, after the internal discussion of our team, we decide to unify the train and val splits in the two datasets, and re-run DOVER and DOVER++ on the merged DIVIDE-MaxWell database (a superset of DIVIDE-3k, each one labeled with aesthetic and technical, and overall perspective scores) on the same official split as provided in MaxWell, i.e. 3634 training videos (80%), and 909 validation videos. The results of both variants are shown as follows, while the model checkpoint of DOVER++ (on the official train / test sets) will be uploaded soon to faciliate further research.

Download Videos

Download the videos in Hugging Face Datasets.

Labels

The labels are provided here for the official training set and official validation set

Training for DOVER++

To run DOVER++ (enhance end-to-end training with Aesthetic+Technical+overall scores), the scripts are as follows:

python training_with_divide.py --train train-dividemaxwell --val val-dividemaxwell

Results on the Updated Train-Test Splits

As we have changed the train-test split, the results for FAST-VQA (technical branch of DOVER), DOVER and DOVER++ are also changed. See the following table for their results.

Zero-Shot

VQA Approaches

  • DOVER (pre-trained on LSVQ)
SROCC: 0.7477 PLCC: 0.7546 KROCC: 0.5510
  • FAST-VQA (==technical branch in DOVER, pre-trained on LSVQ)
SROCC: 0.7204 PLCC: 0.7282 KROCC: 0.5286
  • Aesthetic Branch in DOVER (pre-trained on LSVQ)
SROCC: 0.7184 PLCC: 0.7293 KROCC: 0.5249

IQA Approaches

  • SAQI (CLIP-ResNet-50)
SROCC: 0.5518 PLCC: 0.5549 KROCC: 0.3814
  • NIQE
SROCC: 0.2847 PLCC: 0.3014 KROCC: 0.2150
SROCC: 0.6821 PLCC: 0.6923 KROCC: 0.4949

Fine-tuned

Baseline Methods

  • VSFA (Li et al, 2019, trained on the training set of DIVIDE-MaxWell, only overall score used)
SROCC: 0.6671 PLCC: 0.6784 KROCC: 0.4875
  • BVQA (Zhang et al, 2022, trained on the training set of DIVIDE-MaxWell, only overall score used)
SROCC: 0.7418 PLCC: 0.7394 KROCC: 0.5341
  • FAST-VQA (Wu et al, 2022, trained on the training set of DIVIDE-MaxWell, only overall score used)
SROCC: 0.7798 PLCC: 0.7819 KROCC: 0.5868

Methods Proposed with DIVIDE or MaxWell

  • MaxVQA (trained on the training set of DIVIDE-MaxWell, 16 dimensions used)
SROCC: 0.8044 PLCC: 0.8131 KROCC: 0.6098
  • DOVER++ (trained on the training set of DIVIDE-MaxWell, 3 dimensions used)
SROCC: 0.8071 PLCC: 0.8126 KROCC: 0.6136

All methods are based on training and the results main contain randomness.