Model submission and evaluation | Salient360! – Visual Attention Modeling for 360° Content

Submission and evaluation process

The training dataset and evaluation tools are available in the section Datasets, so you can train and tune your algorithms as necessary and may also compute the benchmark scores as a reference for yourself.
When you have decided to go ahead with the algorithm submission, you can request the benchmark dataset (different to the training dataset and only containing the stimuli, keeping secret the corresponding ground-truth) sending an email to: salient360@univ-nantes.fr.
Then, you can run your models on the benchmark stimuli and send us the outputs (saliency maps and/or scan-paths).
Finally, we evaluate the performance of the models on the benchmark ground-truth (computing saliency/scan-path metrics on the provided outputs) and publish the results.

Types of models

The following categories can be considered for videos and/or images:

Model type 1 – Head motion based saliency model: these models are expected to predict Ground Truth Heat Map (GTHM) derived from the “movement of the head” only.
Model type 2 – (Head+Eye)-motion based saliency model: these models are expected to predict GTHM derived from the “movement of the head” as well as from the “movement of the eye within the viewport”.
Model type 3 – Scan-paths of eye-gaze observers in the entire 360 panorama: these models are expected to predict the Ground Truth Scan-Path (GTSP) of eye gaze that are obtained from the head and eye-movement data from several observers.
Model type 4 – Scan-paths of head-gaze (center of the viewport as a succession of head-fixation) observers in the entire 360 panorama: these models are expected to predict the GTSP of head gaze that are obtained from only the head data from several observers.

Submission format

We request participants to submit the outputs of their models according to the following descriptions:

Model types 1 and 2:
- Images: Binary files with a resolution of 2048×1024 that must contain the saliency values of the 360-image in equi-rectangular format, organized row-wise. No normalization of the saliency maps is required, since it will be carried out when computing the saliency metrics. Please, use the following naming convention: “ImageIndex_WidhtOfSaliencyMapxHeightOfSaliencyMap_32b.bin”
- Videos: Binary files representing the saliency-map sequence. These sequence will contain one saliency map per frame with a resolution of 2048×1024. In the binary file, the saliency values are organized row-wise and one frame after the other. No frame pooling or normalization of the saliency maps is required, since it will be carried out when computing the saliency metrics. Please, use the following naming convention: “VideoIndex_WidhtOfSaliencyMapxHeightOfSaliencyMapxNumberOfFrames_32b.bin”.
Model type 3:
- Images: The output scan-paths must be provided in a CSV text file in the same format as used for the training dataset (although only longitude, latitude, starting timestamp will be needed for evaluation). These scan-paths will be compared to the scan-paths of the left and right eyes from the ground truth, and only the best result will be considered.
- Videos: The output scan-paths must be provided in a CSV text file with the same format as used for the training dataset (although only longitude, latitude, starting frame and ending frame will be needed for evaluation). These scan-paths will be compared to the scan-paths of the left and right eyes from the ground truth, and only the best result will be considered. Please note that, as described in the dataset documentation, the models may consider that two starting positions are possible: center of the equi-rectangular projection (0º) and/or opposite longitude (180º).
Model type 4:
- Images: The output scan-paths containing head samples must be provided in a CSV text file with the same format as used for the training dataset (although only longitude, latitude and starting timestamp will be needed for evaluation).
- Videos: The output scan-paths containing head samples must be provided in a CSV text file with the same format as used for the training dataset (although only longitude, latitude, staring frame and ending frame will be needed for evaluation). Please note that, as described in the dataset documentation, the models may consider that two starting positions are possible: center of the equi-rectangular projection (0º) and/or opposite longitude (180º).

For easiness when evaluating the models, the outputs of the models should follow the same file naming convention that we used for the training dataset, according to the different types of models.