Visual saliency and Eyetracking

Eyetracking is a well known method used in high cost marketing campaigns to assess how visuals drive observer's attention. However, eyetracking is time consuming and costly, requiring the involvement of many people in viewing experiments. Using predictive models of how humans perceive and observe visuals allow to instantly analyse designs and compositions.

Computer saliency models have demonstrated high accuracy in predicting the first seconds of eye tracking experiments just considering the visual information in the image. It makes a pure graphic analysis to predict eye catching elements without taking into account any observer intention.

saliency heatmap prediction rebook advertising

Saliency prediction of reebook advertising.

eyetracking heatmap evaluation rebook advertising

Eyetracking heatmap of advertising assesment.

GazeHits is running on one of the most advanced available models of saliency - AWS, Adaptive Whitening Saliency by Antón García-Díaz - [1]. With an accuracy over 90% compared to that achieved in an eyetracking audit experiment. It has demonstrated top performance in predicting visual fixations in large scale datasets with a great variety of scenes (MIT saliency benchmark). Also recent third party reviews by top researchers in the field support this statement [2][3].


aws adaptive whitening saliency model

Adaptive Whitening Saliency model diagram.

Prediction of Visual Attention

Visual attention is a complex preprocessing step that enables biological systems to select the most relevant regions from a scene, while higher-level cognitive areas perform complex processes such as scene understanding, action selection and decision making.

In such a way, visual attention is categorized into two distinct functions:

  • Bottom-up attention refers to externally driven factors - the scene - that highlight salient image regions that are different from their surroundings.
  • Top-down attention refers to task driven factors, based on prior knowledge and intentions of the observer.

GazeHits focus on raw saliency - bottom-up attention free of biases - since it is clearly the most interesting attention driver for design purposes. - You cannot modify biases of the human visual system and you usually cannot be sure of the intentions and the experience of your observer, but you may modify a color or a texture to catch attention by boosting saliency at the desired point. -

Based on state-of-the-art computer models for prediction of saliency, it reproduces different adaptation, aggregation, and pop-out mechanisms observed in psychopysical experiments (i.e. desconposing images in a similar manner to neural responses observed in the visual cortex and applying operations that take place all along the visual pathway).

This makes GazeHits a suitable tool for working with general purpose visuals and compositions, since it does not make any assumption on scene category or observer intentions.

Comparison to Black Box Approaches

Models that resort to extensive machine learning usually omit any reference to psychophysical validation beyond prediction of fixations. They are black boxes that learn where fixations go on a specific dataset of images. As a result, they learn mostly biases of the human visual system (e.g. the tendency to fixate at the center) or biases of the data used for training.

This is important because biases associated to the "gist of the scene" do not generalize to different scenes, degrading the performance of machine learning approaches as scene variety increases. These factors limit the true accuracy and applicability of those approaches for general design optimization purposes.

These models usually explain their accuracy through raw AUC (or NSS) metrics, which do not take into account these biases. As an example of the issue, a generic map representing center bias (with maximum "saliency" in the center and decreasing values towards the borders) achieves very good AUC values without taking into account anything about the image content. However, it is widely accepted that sAUC, a modified metric to discount biases is the best approach to benchmark saliency models. In this case, the previous example performs as a random map when assessed with sAUC.

aws performance salincy mit benchmark

AWS performance with sAUC in the MIT saliency benchmark.

In contrast, we provide you with a measure of saliency free of bias. Moreover, since our model is not a black box we can trace back which features are firing (or failing to fire) attention and give you visual advice on how to modify your design to meet your goals.


[1] Garcia-Diaz, A., Leboran, V., Fdez-Vidal, X. R., & Pardo, X. M. (2012). On the relationship between optical variability, visual saliency, and eye fixations: A computational approach. [JoV 2012]

[2] Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. [IEEE Trans. on Image Processing, 22(1)]

[3] Rahman, S., & Bruce, N. (2015). Visual Saliency Prediction and Evaluation across Different Perceptual Tasks. [PloS one, 10(9)]