In Section 3.7, a saliency model (SUN) with more comprehensive statistics is only considered for its bottom-up part. However, the framework of the model includes both bottom-up and top-down parts. For the sake of completeness, here we briefly present its top-down part in [19].
The SUN model assumes that the salient location is closely related to the probability of a target's presence at each location. The location with higher probability of the target's appearance has a larger salient value. Let z be a point in the visual field (or a pixel in the input image) and let the binary random variable C denote whether or not the point belongs to a target class, C {0, 1}. The random variable l and random vector f denote the location (or pixel coordinates) and the visual features of a point, respectively. The saliency of a point z in the visual field is directly proportional to the probability , where fz represents the features observed at the location z (here fz is written as a vector denoted by a bold letter) and lz is the coordinate of z. The saliency at point z can be calculated using Bayes' rule as
Since the logarithm is a monotonically increasing function that does not affect the ranking of salience across locations in an image – as in the deduction mentioned in Section 3.7 – Equation 5.34 can be replaced with the following form:
In the above equation, the first term is the self-information at point z as shown in Chapter 3 and it represents the salience of the bottom-up part when the feature vector f takes fz, the features at point z. When the probability of the appearance of one or more feature decreases, the salience increases at point z, because the joint probability of all the features is the product of each feature probability in probability density estimation [19, 71]. The second term is a log-likelihood term that favours the feature vector (or feature values) consistent with the knowledge of the target's presence at point z. If one or several features of the target are already known (e.g., we know that the target is of red colour), then the log-likelihood term is much larger for the given feature point (a red point) than for the other colour points. The third term is related to prior knowledge of where the target is likely to be present and it is independent of features. For instance, a tiger often appears at the place where it frequently finds quarry.
Assuming that the location prior is not considered for simplicity (set the third term to zero or set the conditional probability to uniform), the first two terms lead to the pointwise mutual information between features and presence of a target. Equation 5.35 can be written as
The SUN model considering both bottom-up and top-down parts is to look for the salient regions of an image most likely to contain the target by maximizing the pointwise mutual information shown in Equation 5.36. In the case of searching a single target class, p(C = 1) can be simplified as a constant, so the right of Equation 5.36 can be explained as
Form Equation 5.37, the salience calculation of the SUN model with top-down part is to estimate the conditional probability .
Since the conditional probability density involves the class (target/background) and the features at each point, a probabilistic classifier, support vector machine (SVM), is adopted in [19] to estimate the probability density. The SVM is an effective classifier in pattern recognition, which can complete the pattern classification, fitting data (regression function) and probability density estimation by using the library for SVM (LIBSVM) [72]. For the features at each point z, the feature filters mentioned in Section 3.7 are considered. Section 3.7 adopts two kinds of feature filters: one is the difference of Gaussians at multiple scales and the other is the ICA filters. Since the ICA filters (bank of basis) can generate independent responses that make the estimation of the joint probability density in self-information more accurate, it is more admired than the SUN computation with top-down information [19].
The ICA bank of basis is estimated first from a large database for the image patches normalized to have zero mean, each one being an 11 × 11 × 3-dimensional vector, where the numeral 3 is the colour number in the colour image patch and the size of image patch is 11 × 11 pixels. One of the learning algorithms of ICA [73–75] is used with the database, and the bank of ICA basis (filters) is obtained after learning. The details of ICA filter computation was discussed in Sections 3.5 and 3.7. If the ICA filters are available, the implementation of SUN has the following three steps.