Submitted 4 June 2016
Accepted 28 July 2016
Published 22 August 2016

Corresponding author
Alexander Toet, lextoet@gmail.com

Academic editor
Klara Kedem

Additional Information and
Declarations can be found on
page 20

DOI 10.7717/peerj-cs.80

Copyright
2016 Toet

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

Iterative guided image fusion
Alexander Toet
TNO Soesterberg, Netherlands

ABSTRACT
We propose a multi-scale image fusion scheme based on guided filtering. Guided
filtering can effectively reduce noise while preserving detail boundaries. When applied
in an iterative mode, guided filtering selectively eliminates small scale details while
restoring larger scale edges. The proposed multi-scale image fusion scheme achieves
spatial consistency by using guided filtering both at the decomposition and at the
recombination stage of the multi-scale fusion process. First, size-selective iterative
guided filtering is applied to decompose the source images into approximation and
residual layers at multiple spatial scales. Then, frequency-tuned filtering is used
to compute saliency maps at successive spatial scales. Next, at each spatial scale
binary weighting maps are obtained as the pixelwise maximum of corresponding
source saliency maps. Guided filtering of the binary weighting maps with their
corresponding source images as guidance images serves to reduce noise and to restore
spatial consistency. The final fused image is obtained as the weighted recombination
of the individual residual layers and the mean of the approximation layers at the
coarsest spatial scale. Application to multiband visual (intensified) and thermal
infrared imagery demonstrates that the proposed method obtains state-of-the-art
performance for the fusion of multispectral nightvision images. The method has a
simple implementation and is computationally efficient.

Subjects Computer Vision
Keywords Image fusion, Guided filter, Saliency, Infrared, Nightvision, Thermal imagery,
Intensified imagery

INTRODUCTION
The increasing deployment and availability of co-registered multimodal imagery from
different types of sensors has spurred the development of image fusion techniques. The
information provided by different sensors registering the same scene can either be (partially)
redundant or complementary and may be corrupted with noise. Effective combinations
of complementary and partially redundant multispectral imagery can therefore visualize
information that is not directly evident from the individual input images. For instance,
in nighttime (low-light) outdoor surveillance applications, intensified visual (II) or near-
infrared (NIR) imagery often provides a detailed but noisy representation of a scene. While
different types of noise may result from several processes associated with the underlying
sensor physics, additive noise is typically the predominant noise component encountered
in II and NIR imagery (Petrovic & Xydeas, 2003). Additive noise can be modelled as a
random signal that is simply added to the original signal. As a result, additive noise may
obscure or distort relevant image details. In addition, targets of interest like persons or cars

How to cite this article Toet (2016), Iterative guided image fusion. PeerJ Comput. Sci. 2:e80; DOI 10.7717/peerj-cs.80

https://peerj.com
mailto:lextoet@gmail.com
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.80
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://dx.doi.org/10.7717/peerj-cs.80


are sometimes hard to distinguish in II or NIR imagery because of their low luminance
contrast. While thermal infrared (IR) imagery typically represents these targets with high
contrast, their background (context) is often washed out due to low thermal contrast.
In this case, a fused image that clearly represents both the targets and their background
enables a user to assess the location of targets relative to landmarks in their surroundings,
thus providing more information than either of the input images alone.

Some potential benefits of image fusion are: wider spatial and temporal coverage,
decreased uncertainty, improved reliability, and increased system robustness. Image fusion
has important applications in defense and security for situational awareness (Toet et al.,
1997), surveillance (Shah et al., 2013; Zhu & Huang, 2007), target tracking (Motamed,
Lherbier & Hamad, 2005; Zou & Bhanu, 2005), intelligence gathering (O’Brien & Irvine,
2004), concealed weapon detection (Bhatnagar & Wu, 2011; Liu et al., 2006; Toet, 2003;
Xue & Blum, 2003; Xue, Blum & Li, 2002; Yajie & Mowu, 2009), detection of abandoned
packages (Beyan, Yigit & Temizel, 2011) and buried explosives (Lepley & Averill, 2011),
and face recognition (Kong et al., 2007; Singh, Vatsa & Noore, 2008). Other important
image fusion applications are found in industry (Tian et al., 2009), art analysis (Zitová,
Beneš & Blažek, 2011), agriculture (Bulanona, Burks & Alchanatis, 2009), remote sensing
(Ghassemian, 2001; Jacobson & Gupta, 2005; Jacobson, Gupta & Cole, 2007; Jiang et al.,
2011) and medicine (Agarwal & Bedi, 2015; Biswas, Chakrabarti & Dey, 2015; Daneshvar
& Ghassemian, 2010; Singh & Khare, 2014; Wang, Li & Tian, 2014; Yang & Liu, 2013) (for
a survey of different applications of image fusion techniques see Blum & Liu (2006).

In general, image fusion aims to represent the visual information from any number of
input images in a single composite (fused) image that is more informative than each of
the input images alone, eliminating noise in the process while preventing both the loss
of essential information and the introduction of artefacts. This requires the availability of
filters that combine the extraction of relevant image details with noise reduction.

To date, a variety of image fusion algorithms have been proposed. A popular class
of algorithms are the multi-scale image fusion schemes, which decompose the source
images into spatial primitives at multiple spatial scales, then integrate these primitives to
form a new (‘fused’) multi-scale representation, and finally apply an inverse multi-scale
transform to reconstruct the fused image. Examples of this approach are for instance the
Laplacian pyramid (Burt & Adelson, 1983), the Ratio of Low-Pass pyramid (Toet, 1989b),
the contrast pyramid (Toet, Van Ruyven & Valeton, 1989), the filter-subtract-decimate
Laplacian pyramid (Burt, 1988; Burt & Kolczynski, 1993), the gradient pyramid (Burt, 1992;
Burt & Kolczynski, 1993), the morphological pyramid (Toet, 1989a), the discrete wavelet
transform (Lemeshewsky, 1999; Li, Manjunath & Mitra, 1995; Li, Kwok & Wang, 2002;
Scheunders & De Backer, 2001), the shift invariant discrete wavelet transform (Lemeshewsky,
1999; Rockinger, 1997; Rockinger, 1999; Rockinger & Fechner, 1998), the contourlet (Yang
et al., 2010), the shift-invariant shearlet transform (Wang, Li & Tian, 2014), the non-
subsampled shearlet transform (Kong, Wang & Lei, 2015; Liu et al., 2016; Zhang et al.,
2015), the ridgelet transform (Tao, Junping & Ye, 2005). The filters applied in several of the
earlier techniques typically produce halo artefacts near edges. More recent methods like

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 2/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


shearlets, contourlets and ridgelets are better capable to preserve local image features but
are often complex or time-consuming.

Non-linear edge-preserving smoothing filters such as anisotropic diffusion (Perona
& Malik, 1990), robust smoothing (Black et al., 1998) and the bilateral filter (Tomasi &
Manduchi, 1998) may appear effective tools to prevent artefacts that arise from spatial
inconsistencies in multi-scale image fusion schemes. However, anisotropic diffusion tends
to over-sharpen edges and is computationally expensive, which makes it less suitable for
application in multi-scale fusion schemes (Farbman et al., 2008). The non-linear bilateral
filter (BLF) assigns each pixel a weighted mean of its neighbors, with the weights decreasing
both with spatial distance and with difference in value (Tomasi & Manduchi, 1998). While
the BLF is quite effective at smoothing small intensity changes while preserving strong edges
and has efficient implementations, it also tends to blur across edges at larger spatial scales,
thereby limiting its value for application in multi-scale image decomposition schemes
(Farbman et al., 2008). In addition, the BLF has the undesirable property that it can reverse
the intensity gradient near sharp edges (the weighted average becomes unstable when a
pixel has only few similar pixels in its neighborhood: He, Sun & Tang, 2013). In the joint
(or cross) bilateral filter (JBLF) a second or guidance image serves to steer the edge stopping
range filter thus preventing over- or under- blur near edges (Petschnigg et al., 2004). Zhang
et al. (2014) showed that the application of the JBLF in an iterative framework results in
size selective filtering of small scale details combined with the recovery of larger scale edges.
The recently introduced Guided Filter (GF: He, Sun & Tang, 2013) is a computationally
efficient, edge-preserving translation-variant operator based on a local linear model which
avoids the drawbacks of bilateral filtering and other previous approaches. When the input
image also serves as the guidance image, the GF behaves like the edge preserving BLF.
Hence, the GF can gracefully eliminate small details while recovering larger scale edges
when applied in an iterative framework.

In this paper we propose a multi-scale image fusion scheme, where iterative guided
filtering is used to decompose the input images into approximate and residual layers at
successive spatial scales, and guided filtering is used to construct the weight maps used in
the recombination process.

The rest of this paper is organized as follows. ‘Edge Preserving Filtering’ briefly discusses
the principles of edge preserving filtering and introduces (iterative) guided filtering. In
‘Related Work’ we discuss related work. ‘Proposed Method’ presents the proposed guided
fusion based image fusion scheme. ‘Methods and Material’ presents the imagery and
computational methods that were used to assess the performance of the new image fusion
scheme. The results of the evaluation study are presented in ‘Results.’ Finally, in ‘Discussion
and Conclusions’ the results are discussed and some conclusions are presented.

EDGE PRESERVING FILTERING
In this section we briefly introduce the edge preserving bilateral and joint bilateral filters,
show how they are related to the guided filter, and how the application of a guided filter
in an iterative framework results in size selective filtering of small scale image details
combined with the recovery of larger scale edges.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 3/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Bilateral filter
Spatial filtering is a common operation in image processing that is typically used to reduce
noise or eliminate small spurious details (e.g., texture). In spatial filtering the value of the
filtered image at a given location is a function (e.g., a weighted average) of the original
pixel values in a small neighborhood of the same location. Although low pass filtering
or blurring (e.g., averaging with Gaussian kernel) can effectively reduce image noise, it
also seriously degrades the articulation of (blurs) significant image edges. Therefore, edge
preserving filters have been developed that reduce small image variations (noise or texture)
while preserving large discontinuities (edges).

The bilateral filter is a non-linear filter that computes the output at each pixel as a
Gaussian weighted average of their spatial and spectral distances. It prevents blurring
across edges by assigning larger weights to pixels that are spatially close and have similar
intensity values (Tomasi & Manduchi, 1998). It uses a combination of (typically Gaussian)
spatial and a range (intensity) filter kernels that perform a blurring in the spatial domain
weighted by the local variation in the intensity domain. It combines a classic low-pass filter
with an edge-stopping function that attenuates the filter kernel weights at locations where
the intensity difference between pixels is large. Bilateral filtering was developed as a fast
alternative to the computationally expensive technique of anisotropic diffusion, which uses
gradients of the filtering images itself to guide a diffusion process, avoiding edge blurring
(Perona & Malik, 1990). More formally, at a given image location (pixel) i, the filtered
output Oi is given by:

Oi=
1
Ki

∑
j∈�

Ij f (‖i−j‖) g(‖Ii−Ij‖) (1)

where f is the spatial filter kernel (e.g., a Gaussian centered at i), g is the range or intensity
(edge-stopping) filter kernel (centered at the image value at i), � is the spatial support of
the kernel, and Ki is a normalizing factor (the sum of the f · g filter weights).

Intensity edges are preserved since the bilateral filter decreases not only with the spatial
distance but also with the intensity distance. Though the filter is efficient and effectively
reduces noise while preserving edges in many situations, it has the undesirable property
that it can reverse the intensity gradient near sharp edges (the weighted average becomes
unstable when a pixel has only few similar pixels in its neighborhood: He, Sun & Tang,
2013).

In the joint (or cross) bilateral filter (JBLF) the range filter is applied to a second or
guidance image G (Petschnigg et al., 2004):

Oi=
1
Ki

∑
j∈�

Ij ·f (‖i−j‖)·g(‖Gi−Gj‖). (2)

The JBLF can prevent over- or under- blur near edges by using a related image G to guide
the edge stopping behavior of the range filter. That is, the JBLF smooths the image I
while preserving edges that are also represented in the image G. The JBLF is particularly
favored when the edges in the image that is to be filtered are unreliable (e.g., due to noise

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 4/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


or distortions) and when a companion image with well-defined edges is available (e.g., in
the case of flash /no-flash image pairs). Thus, in the case of filtering an II image for which
a companion (registered) IR image is available, the guidance image may either be the II
image itself or its IR counterpart.

Guided filtering
A guided image filter (He, Sun & Tang, 2013) is a translation-variant filter based on a local
linear model. Guided image filtering involves an input image I, a guidance image G) and
an output image O. The two filtering conditions are (i) that the local filter output is a linear
transform of the guidance image G and (ii) as similar as possible to the input image I. The
first condition implies that

Oi=akGi+bk ∀i∈ωk (3)

where ωk is a square window of size (2r+1)×(2r+1). The local linear model ensures
that the output image O has an edge only at locations where the guidance image G has
one, because ∇O=a∇G. The linear coefficients ak and bk are constant in ωk. They can
be estimated by minimizing the squared difference between the output image O and the
input image I (the second filtering condition) in the window ωk, i.e., by minimizing the
cost function E:

E(ak,bk)=
∑
i∈ωk

(
(akGi+bk−Ii)

2
+εa2k

)
(4)

where ε is a regularization parameter penalizing large ak. The coefficients ak and bk can
directly be solved by linear regression (He, Sun & Tang, 2013):

ak =
1
|ω|

∑
i∈ωk GiIi−GkIk
σ2k +ε

(5)

bk = Ik−akGk (6)

where |ω| is the number of pixels in ωk, Ik and Gk represent the means of respectively I
and G over ωk, and σ2k is the variance of I over ωk.

Since pixel i is contained in several different (overlapping) windows ωk, the value of Oi
in Eq. (3) depends on the window over which it is calculated. This can be accounted for by
averaging over all possible values of Oi:

Oi=
1
|ω|

∑
k|i∈ωk

(akGk+bk). (7)

Since
∑

k|i∈ωk ak =
∑

k∈ωiak due to the symmetry of the box window Eq. (7) can be written
as

Oi=aiGi+bi (8)

where ai =
1
|ω|

∑
k∈ωiak and bi =

1
|ω|

∑
k∈ωibk are the average coefficients of all windows

overlapping i. Although the linear coefficients (ai,bi) vary spatially, their gradients will be

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 5/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


smaller than those of G near strong edges (since they are the output of a mean filter). As a
result we have ∇O≈a∇G, meaning that abrupt intensity changes in the guiding image G
are still largely preserved in the output image O.

Equations (5), (6) and (8) define the guided filter. When the input image also serves as
the guidance image, the guided filter behaves like the edge preserving bilateral filter, with
the parameters ε and the window size r having the same effects as respectively the range
and the spatial variances of the bilateral filter. Equations (8) can be rewritten as

Oi=
∑
j

Wij(G)Ij (9)

with the weighting kernel Wij depending only on the guidance image G:

Wij =
1
|ω|2

∑
k:(i,j)∈ωk

(
1+

(Gi−Gk)(Gj−Gk)
σ2k +ε

)
. (10)

Since
∑

jWij(G)=1 this kernel is already normalized.
The guided filter is a computationally efficient, edge-preserving operator which avoids

the gradient reversal artefacts of the bilateral filter. The local linear condition formulated
by Eq. (3) implies that its output is locally approximately a scaled version of the guidance
image plus an offset. This makes it possible to use the guided filter to transfer structure
from the guidance image G to the output image O, even in areas where the input image I
is smooth (or flat). This structure- transferring filtering is an useful property of the guided
filter, and can for instance be applied for feathering/matting and dehazing (He, Sun &
Tang, 2013).

Iterative guided filtering
Zhang et al. (2014) showed that the application of the joint bilateral filter (Eq. (2)) in an
iterative framework results in size selective filtering of small scale details combined with the
recovery of larger scale edges. In this scheme the result Gt+1 of the tth iteration is obtained
from the joint bilateral filtering of the input image I using the result Gt of the previous
iteration step as the guidance image:

Gt+1i =
1
Ki

∑
j∈�

Ij ·f (‖i−j‖)·g(‖G
t
i −G

t
j‖). (11)

In this scheme, details smaller than the Gaussian kernel of the bilateral filter are removed
while the edges of the remaining details are iteratively restored. Hence, this scheme allows
the selective elimination of small scale details while preserving the remaining image
structure. Note that the initial guidance image G1 can simply be a constant (e.g., zero)
valued image since it updates to the Gaussian filtered input image in the first iteration step.
Here we propose to replace the bilateral filter in this scheme by a guided filter to avoid any
gradient reversal artefacts.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 6/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


RELATED WORK
As mentioned before, most multi-scale transform-based image fusion methods introduce
some artefacts because the spatial consistency is not well-preserved (Li, Kang & Hu,
2013). This has led to the use of edge preserving filters to decompose source images
into approximate and residual layers while preserving the edge information in the fusion
process. Techniques that have been applied include weighted least squares filter (Yong &
Minghui, 2014), L1 fidelity using L0 gradient (Cui et al., 2015), L0 gradient minimization
(Zhao et al., 2013), cross bilateral filter (Kumar, 2013) and anisotropic diffusion (Bavirisetti
& Dhuli, 2016a).

Li, Kang & Hu (2013) proposed to restore spatial consistency by using guided filtering in
the weighted recombination stage of the fusion process. In their scheme, the input images
are first decomposed into approximate and residual layers using a simple averaging filter.
Next, each input image is then filtered with a Laplacian kernel followed by blurring with
a Gaussian kernel, and the absolute value of the result is adopted as a saliency map that
characterizes the local distinctness of the input image details. Then, binary weight maps are
obtained by comparing the saliency maps of all input images, and assigning a pixel in an
individual weight map the value 1 if it is the pixelwise maximum of all saliency maps, and 0
otherwise. The resulting binary weight maps are typically noisy and not aligned with object
boundaries and may produce artefacts to the fused image. Li, Kang & Hu (2013) performed
guided filtering on each weight map with its corresponding source layer as the guidance
image, to reduce noise and to restore spatial consistency. The GF guarantees that pixels with
similar intensity values have similar weights and weighting is not performed across edges.
Typically a large filter size and a large blur degree are used to fuse the approximation layers,
while a small filter size and a small blur degree are used to combine the residual layers.
Finally, the fused image is obtained by weighted recombination of the individual source
residual layers. Despite the fact that this method is efficient and can achieve state-of-the-art
performance in most cases, it does not use edge preserving filtering in the decomposition
stage and applies a saliency map that does not relate well to human visual saliency
(Gan et al., 2015).

In their multi-scale image fusion framework Gan et al. (2015) apply edge preserving
filtering in the decomposition stage to extract well-defined image details (i.e., to preserve
their edges) and use guided filtering in the weighted recombination stage to reduce spatial
inconsistencies introduced by the weighting maps used in the reconstruction stage (i.e., to
prevent edge artefacts like halos). First, a nonlinear weighted least squares edge-preserving
filter (Farbman et al., 2008) is used to decompose the source images into approximate and
residual layers. Next, phase congruency is used to calculate saliency maps that characterize
the local distinctness of the source image details. The rest of their scheme is similar to that
of Li, Kang & Hu (2013): binary weight maps are obtained from pixelwise comparison of
the saliency maps corresponding to the individual source images; guided filtering is applied
to these binary weight maps to recue noise and restore spatial consistency, and the fused
image is obtained by weighted recombination of the individual source residual layers.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 7/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 1 Flow chart of the proposed image fusion scheme. The processing scheme is illustrated for two
source images X and Y and 4 resolution levels (0–3). X0 and Y0 are the original input images, while Xi and
Yi represent successively lower resolution versions obtained by iterative guided filtering. ‘Saliency’ repre-
sents the frequency-tuned saliency transformation, ‘Max’ and ‘Mean’ respectively denote the pointwise
maximum and mean operators, ‘(I)GF’ means (Iterative) Guided Filtering, ‘dX,’ ‘dY ’ and ‘dF’ are respec-
tively the original and fused detail layers, ‘BW ’ the binary weight maps, and ‘W ’ the smooth weight maps.

PROPOSED METHOD
A flow chart of the proposed multi-scale decomposition fusion scheme is shown in Fig. 1.
The algorithm consists of the following steps:
1. Iterative guided filtering is applied to decompose the source images into approximate

layers (representing large scale variations) and residual layers (containing small scale
variations).

2. Frequency-tuned filtering (Achanta et al., 2009) is used to generate saliency maps for
the source images.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 8/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


3. Binary weighting maps are computed as the pixelwise maximum of the individual
source saliency maps.

4. Guided filtering is applied to each binary weighting map with its corresponding source
as the guidance image to reduce noise and to restore spatial consistency.

5. The fused image is computed as a weighted recombination of the individual source
residual layers.
In a hierarchical framework steps 1–4 are performed at multiple spatial scales. In this

paper we used a 4 level decomposition obtained by filtering at three different spatial scales
(see Fig. 1).

Figure 2 shows the intensified visual (II) and thermal infrared (IR) or near infrared
(NIR) images together with the results of the proposed image fusion scheme, for the 12
different scenes that were used in the present study. We will now discuss the proposed
fusion scheme in more detail.

Consider two co-registered source images X0(x,y) and Y0(x,y). The proposed scheme
then applies iterative guided filtering (IGF) to the input images Xi and Yi to obtain
progressively coarser image representations Xi+1 and Yi+1 (i>0):

IGF(Xi,ri,εi)=Xi+1; i∈{0,1,2} (12)

where the parameters εi and ri represent respectively the range and the spatial variances of
the guided filter at iteration step i. In this study the number of iteration steps is set to 4. By
letting each finer scale image serve as the approximate layer for the preceding coarser scale
image the successive size-selective residual layers dXi are simply obtained by subtraction as
follows:

dXi=Xi−Xi+1; i∈{0,1,2}. (13)

Figure 3 shows the approximate and residual layers that are obtained this way for the
tank scene (nr 10 in Fig. 2). The edge-preserving properties of the iterative guided filter
guarantee a graceful decomposition of the source images into details at different spatial
scales. The filter size and regularization parameters used in this study are respectively set
to ri={5,10,30} and εi={0.0001,0.01,0.1} for i={0,1,2}.

Visual saliency refers to the physical, bottom-up distinctness of image details (Fecteau
& Munoz, 2006). It is a relative property that depends on the degree to which a detail
is visually distinct from its background (Wertheim, 2010). Since saliency quantifies the
relative visual importance of image details saliency maps are frequently used in the
weighted recombination phase of multi-scale image fusion schemes (Bavirisetti & Dhuli,
2016b; Cui et al., 2015; Gan et al., 2015). Frequency tuned filtering computes bottom-up
saliency as local multi-scale luminance contrast (Achanta et al., 2009). The saliency map S
for an image I is computed as

S(x,y)=
∥∥Iµ−If (x,y)∥∥ (14)

where Iµ is the arithmetic mean image feature vector, If represents a Gaussian blurred
version of the original image, using a 5 × 5 separable binomial kernel, ‖‖ is the L2 norm
(Euclidian distance), and x,y are the pixel coordinates.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 9/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 2 Original input and fused images for all 12 scenes. The intensified visual (II), thermal infrared (IR) or near infrared (NIR: scene 12)
source images together with the result of the proposed fusion scheme (F) for each of the 12 scenes used in this study.

A recent and extensive evaluation study comparing 13 state-of-the-art saliency models
found that the output of this simple saliency model correlates more strongly with
human visual perception than the output produced by any of the other available models
(Toet, 2011).

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 10/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 3 Base and detail layers for the tank scene. Original intensified visual (A) and thermal infrared
(H) images for scene nr. 10, with their respective base B–D and I–K and detail E–G and L–N layers at suc-
cessively lower levels of resolution.

In the proposed fusion scheme we first compute saliency maps SXi and SYi for the
individual source layers Xi and Yi, i∈{0,1,2}. Binary weight maps BWXi and BWYi are then
computed by taking the pixelwise maximum of corresponding saliency maps SXi and SYi:

BWXi(x,y)=

{
1 if SXi(x,y)>SYi(x,y)
0 otherwise

BWYi(x,y)=

{
1 if SYi(x,y)>SXi(x,y)
0 otherwise.

(15)

The resulting binary weight maps are noisy and typically not well aligned with object
boundaries, which may give rise to artefacts in the final fused image. Spatial consistency
is therefore restored through guided filtering (GF) of these binary weight maps with the
corresponding source layers as guidance images:

WXi =GF(BWXi,Xi)
WYi =GF(BWYi,Yi).

(16)

As noted before guided filtering combines noise reduction with edge preservation, while
the output is locally approximately a scaled version of the guidance image. In the present
scheme these properties are used to transform the binary weight maps into smooth

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 11/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 4 Computing smoothed weight maps by guided filtering of binary weight maps. Saliency maps at levels 0, 1 and 2 for respectively the in-
tensified visual (A–C) and thermal infrared (D–F) images from Fig. 3. Complementary binary weight maps for both image modalities (G–I and J–
L) are obtained with a pointwise maximum operator at corresponding levels. Smooth continuous weight maps (M–O and P–R) are produced by
guided filtering of the binary weight maps with their corresponding base layers as guidance images.

continuous weight maps through guided filtering with the corresponding source images as
guidance images. Figure 4 illustrates the process of computing smoothed weight maps by
guided filtering of the binary weight maps resulting from the pointwise maximum of the
corresponding source layer saliency maps for the tank scene.

Fused residual layers are then computed as the normalized weighted mean of the
corresponding source residual layers:

dFi=
WXi ·dXi+WYi ·dYi

WXi+WYi
. (17)

The fused image F is finally obtained by adding the fused residual layers to the average
value of the coarsest source layers:

F =
X3+Y3

2
+

2∑
i=0

dFi. (18)

By using guided filtering both in the decomposition stage and in the recombination stage,
this proposed fusion scheme optimally benefits from both the multi-scale edge-preserving
characteristics (in the iterative framework) and the structure restoring capabilities (through
guidance bythe originalsource images) ofthe guided filter. Themethod is easyto implement
and computationally efficient.

METHODS AND MATERIAL
This section presents the test imagery and computational metrics used to assess the
performance of the proposed images fusion scheme in comparison to existing multi-scale
fusion schemes.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 12/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 5 Comparison with existing multiresolution fusion schemes. Original intensified visual (A) and thermal infrared (B) images for scene nr
10, and the fused results obtained with respectively a Contrast Pyramid (C), Gradient Pyramid (D), Laplace Pyramid (E), Morphological Pyramid
(F), Ratio Pyramid (G), DWT (H), SIDWT (I), and the proposed method (J), for scene nr. 10.

Test imagery
Figure 2 shows the intensified visual (II), thermal infrared (IR) or near infrared (NIR:
scene 12) source images together with the result of the proposed fusion scheme (F) for
each of the 12 scenes used in this study. The 12 scenes are part of the TNO Image Fusion
Dataset (Toet, 2014) with the following identifiers: airplane_in_trees, Barbed_wire_2,
Jeep, Kaptein_1123, Marne_07, Marne_11, Marne_15, Reek, tank, Nato_camp_sequence,
soldier_behind_smoke, Vlasakkers.

Multi-scale fusion schemes used for comparison
In this study we compare the performance of our image fusion scheme with seven other
popular image fusion methods based on multi-scale decomposition including the Laplacian
pyramid (Burt & Adelson, 1983), the Ratio of Low-Pass pyramid (Toet, 1989b), the contrast
pyramid (Toet, Van Ruyven & Valeton, 1989), the filter-subtract-decimate Laplacian
pyramid (Burt, 1988; Burt & Kolczynski, 1993), the gradient pyramid (Burt, 1992; Burt &
Kolczynski, 1993), the morphological pyramid (Toet, 1989a), the discrete wavelet transform
(Lemeshewsky, 1999; Li, Manjunath & Mitra, 1995; Li, Kwok & Wang, 2002; Scheunders
& De Backer, 2001), and a shift invariant extension of the discrete wavelet transform
(Lemeshewsky, 1999; Rockinger, 1997; Rockinger, 1999; Rockinger & Fechner, 1998). We used
Rockinger’s freely available Matlab image fusion toolbox (www.metapix.de/toolbox.htm)
to compute these fusion schemes. To allow a straightforward comparison, the number
of scale levels is set to 4 in all methods, and simple averaging is used to compute the
approximation of the fused image representation at the coarsest spatial scale. Figures 5–9
show the results of the proposed method together with the results of other seven fusion
schemes for some of the scenes used in this study (scenes 2–5 and 10).

Objective evaluation metrics
Image fusion results can be evaluated using either subjective or objective measures.
Subjective methods are based on psycho-visual testing and are typically expensive in terms

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 13/26

https://peerj.com
http://www.metapix.de/toolbox.htm
http://dx.doi.org/10.7717/peerj-cs.80


Figure 6 As Fig. 5, for scene nr. 2.

Figure 7 As Fig. 5, for scene nr. 3.

Figure 8 As Fig. 5, for scene nr. 4.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 14/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Figure 9 As Fig. 5, for scene nr. 5.

of time, effort, and equipment required. Also, in most cases, there is only little difference
among fusion results. This makes it difficult to subjectively perform the evaluation of
fusion results. Therefore, many objective evaluation methods have been developed (for
an overview see e.g., Li, Li & Gong, 2010; Liu et al., 2012). However, so far, there is no
universally accepted metric to objectively evaluate the image fusion results. In this paper,
we use four frequently applied computational metrics to objectively evaluate and compare
the performance of different image fusion methods. The metrics we use are Entropy, the
Mean Structural Similarity Index (MSSIM), Normalized Mutual Information (NMI), and
Normalized Feature Mutual Information (NFMI). These metrics will be briefly discussed
in the following sections.

Entropy
Entropy (E) is a measure of the information content in a fused image F. Entropy is defined
as

EF =−
L−1∑
i=0

PF(i)logPF(i) (19)

where PF(i) indicates the probability that a pixel in the fused image F has a gray value i,
and the gray values range from 0 to L. The larger the entropy is, the more informative the
fused image is. A fused image is more informative than either of its source images when its
entropy is higher than the entropy of its source images.

Mean Structural Similarity Index
The Structural Similarity (SSIM: Wang et al., 2004) index is a stabilized version of the
Universal Image Quality Index (UIQ: Wang & Bovik, 2002) which can be used to quantify
the structural similarity between a source image A and a fused image F:

SSIMx,y =
2µxµy+C1
µ2x+µ

2
y+C1

·
2σxσy+C2
σ2x +σ

2
y +C2

·
σxy+C3
σxσy+C3

(20)

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 15/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


where x and y represent local windows of size M×N in respectively A and F, and

µx =
1

M×N

M∑
i=1

N∑
j=1

x(i,j), µy =
1

M×N

M∑
i=1

N∑
j=1

y(i,j) (21)

σ
2
x =

1
M×N

M∑
i=1

N∑
j=1

(x(i,j)−µx)
2
, σ

2
y =

1
M×N

M∑
i=1

N∑
j=1

(y(i,j)−µy)
2 (22)

σ
2
xy =

1
M×N

M∑
i=1

N∑
j=1

(x(i,j)−µx)(y(i,j)−µy). (23)

By default, the stabilizing constants are set to C1=(0.01·L)2, C2=(0.03·L)2 and C3=C2/2,
where L is the maximal gray value. The value of SSIM is bounded and ranges between −1
and 1 (it is 1 only when both images are identical). The SSIM is typically computed over
a sliding window to compare local patterns of pixel intensities that have been normalized
for luminance and contrast. The Mean Structural Similarity (MSSIM) index quantifies the
overall similarity between a source image A and a fused image F:

MSSIMA,F =
1
Nw

Nw∑
i=1

SSIMxi,yi (24)

where Nw represents the number of local windows of the image. An overall image fusion
quality index can then be defined as the mean MSSIM values between each of the source
images and the fused result:

MSSIMA,BF =
MSSIMA,F +MSSIMB,F

2
(25)

MSSIMA,BF ranges between −1 and 1 (it is 1 only when both images are identical).

Normalized Mutual Information
Mutual Information (MI) measures the amount of information that two images have in
common. It can be used to quantify the amount of information from a source image that
is transferred to a fused image (Qu, Zhang & Yan, 2002). The mutual information MIAF
between a source image A and a fused image F is defined as:

MIA,F =
∑
i,j

PA,F(i,j)log
PA,F(i,j)
PA(i)PF(j)

(26)

where PA(i) and PF(j) are the probability density functions in the individual images, and
PAF(i,j) is the joint probability density function.

The traditional mutual information metric is unstable and may bias the measure towards
the source image with the highest entropy. This problem can be resolved by computing the
normalized mutual information (NMI) as follows (Hossny, Nahavandi & Creighton, 2008):

NMIA,BF =
MIA,F

HA+HF
+

MIB,F
HB+HF

(27)

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 16/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


where HA,HB and HF are the marginal entropy of A, B and F, and MIA,F and MIB,F
represent the mutual information between respectively the source image A and the fused
image F and between the source image B and the fused image F. A higher value of NMI
indicates that more information from the source images is transferred to the fused image.
The NMI metric varies between 0 and 1.

Normalized Feature Mutual Information
The Feature Mutual Information (FMI) metric calculates the amount of image features
that two images have in common (Haghighat & Razian, 2014; Haghighat, Aghagolzadeh &
Seyedarabi, 2011). This method outperforms other metrics (e.g., E, NMI) in consistency
with the subjective quality measures. Previously proposed MI-based image fusion quality
metrics use the image histograms to compute the amount of information a source and
fused image have in common (Cvejic, Canagarajah & Bull, 2006; Qu, Zhang & Yan, 2002).
However, image histograms contain no information about local image structure (spatial
features or local image quality) and only provide statistical measures of the number of
pixels in a specific gray-level. However, since meaningful image information is contained
in visual features, image fusion quality measures should measure the extent to which these
visual features are transferred into the fused image from each of the source images. The
Feature Mutual Information (FMI) metric calculates the mutual information between
image feature maps (Haghighat & Razian, 2014; Haghighat, Aghagolzadeh & Seyedarabi,
2011). A typical image feature map is for instance the gradient map, which contains
information about the pixel neighborhoods, edge strength and directions, texture and
contrast. Given two source images as A and B and their fused image as F, the FMI metric
first extracts feature maps of the source and fused images using a feature extraction method
(e.g., gradient). After feature extraction, the feature images A′, B′ and F ′ are normalized to
create their marginal probability density functions PA′, PB′ and PF′. The joint probability
density functions PA′,F′ and PB′,F′ are then estimated from the marginal distributions
using Nelsen’s method (Nelsen, 1987). The algorithm is described in more detail elsewhere
(Haghighat, Aghagolzadeh & Seyedarabi, 2011). The FMI metric between a source image A
and a fused image F is then given by

FMIA,F =MIA′,F′ =
∑
i,j

PA′,F′(i,j)log
PA′,F′(i,j)
PA′(i)PF′(j)

(28)

and the normalized feature mutual information (FMI) can be computed as follows

FMIA,BF =
MIA′,F′

HA′+HF′
+

MIB′,F′
HB′+HF′

. (29)

In practice the FMI is computed locally over small corresponding windows between
the source and the fused images and averaged over all windows covering the image plane
(Haghighat & Razian, 2014).

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 17/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Table 1 Entropy values for each of the methods tested and for all 12 scenes.

Scene nr. Contrast DWT Gradient Laplace Morph Ratio SIDWT IGF

1 6.4818 6.4617 6.1931 6.5935 6.6943 6.5233 6.4406 6.5126
2 6.7744 6.6731 6.5873 6.7268 6.9835 6.7268 6.7075 7.4233
3 6.4340 6.5704 6.4965 6.6401 6.7032 6.6946 6.5878 6.8589
4 6.8367 6.8284 6.6756 7.0041 7.0906 6.7313 6.8547 7.2491
5 6.7549 6.6642 6.5582 6.7624 6.8618 6.5129 6.6813 7.1177
6 6.3753 6.3705 6.2430 6.5049 6.7608 6.2281 6.4116 6.9044
7 6.7470 6.3709 6.1890 6.5106 6.7445 6.3458 6.3817 6.7869
8 6.3229 7.3503 7.2935 7.3794 7.3501 7.4873 7.3406 7.4891
9 6.4903 6.4677 6.3513 6.5816 6.7295 6.3306 6.4753 6.7796
10 6.9627 7.0131 6.8390 7.1073 7.0530 7.0118 7.0224 7.2782
11 6.5442 6.4554 6.2110 6.5555 6.8051 6.4053 6.4572 6.2907
12 7.3335 7.3744 7.3379 7.3907 7.4251 7.3486 7.3746 7.3568

RESULTS
Fusion evaluation
Here we assess the performance of the proposed image fusion scheme on the intensified
visual and thermal infrared images for each of the 12 selected scenes, using Entropy, the
Mean Structural Similarity Index (MSSIM), Normalized Mutual Information (NMI), and
Normalized Feature Mutual Information (NFMI) as the objective performance measures.
We also compare the results of the proposed method with those of seven other popular
multi-scale fusion schemes.

Table 1 lists the entropy of the fused result for the proposed method (IGF) and all seven
multi-scale comparison methods (Contrast Pyramid, DWT, Gradient Pyramid, Laplace
Pyramid, Morphological Pyramid, Ratio Pyramid, SIDWT). It appears that IGF produces a
fused image with the highest entropy for 9 of the 12 test scenes. Note that a larger entropy
implies more edge information, but it does not mean that the additional edges are indeed
meaningful (they may result from over enhancement or noise). Therefore, we also need to
consider structural information metrics.

Table 2 shows that IGF outperforms all other multi-scale methods tested here in terms
of MSSIM. This means that the mean overall structural similarity between both source
images the fused image F is largest for the proposed method.

Table 3 shows that IGF also outperforms all other multi-scale methods tested here
in terms of NMI. This indicates that the proposed IGF fusion scheme transfers more
information from the source images to the fused image than any of the other methods.

Table 4 shows that IGF also outperforms 10 of the 12 other multi-scale methods tested
here in terms of NFMI. IGF is only outperformed by SIDWT for scene 1 and by the Contrast
Pyramid for scene 7. This implies that fused images produced by the proposed IGF scheme
typically have a larger amount of image features in common with their source images than
the results of most other fusion schemes.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 18/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Table 2 MSSIM values for each of the methods tested and for all 12 scenes.

Scene nr. Contrast DWT Gradient Laplace Morph Ratio SIDWT IGF

1 0.7851 0.7975 0.8326 0.8050 0.7321 0.8054 0.8114 0.8381
2 0.6018 0.6798 0.7130 0.6406 0.6203 0.6406 0.6935 0.7213
3 0.7206 0.7493 0.7849 0.7555 0.6882 0.7468 0.7629 0.7932
4 0.6401 0.6790 0.7162 0.6875 0.6155 0.6668 0.6949 0.7184
5 0.5856 0.6649 0.6938 0.6695 0.6250 0.6270 0.6769 0.7038
6 0.5689 0.6448 0.6755 0.6516 0.5961 0.6099 0.6598 0.6921
7 0.3939 0.5742 0.5994 0.5809 0.5320 0.4490 0.5889 0.6344
8 0.6474 0.6272 0.6630 0.6392 0.5791 0.6291 0.6463 0.6940
9 0.6224 0.6883 0.7224 0.6955 0.6445 0.6718 0.7089 0.7405
10 0.3913 0.5410 0.5715 0.5430 0.4899 0.4331 0.5513 0.5961
11 0.7174 0.7307 0.7754 0.7439 0.6559 0.7419 0.7539 0.7908
12 0.7945 0.8116 0.8466 0.8227 0.7815 0.8106 0.8365 0.8646

Table 3 NMI values for each of the methods tested and for all 12 scenes.

Scene nr. Contrast DWT Gradient Laplace Morph Ratio SIDWT IGF

1 0.1534 0.1692 0.2052 0.1647 0.1699 0.1791 0.1796 0.2818
2 0.0989 0.0948 0.1158 0.0897 0.1028 0.0897 0.1028 0.2994
3 0.0898 0.1222 0.1493 0.1252 0.1171 0.1320 0.1280 0.2231
4 0.1102 0.1097 0.1322 0.1189 0.1169 0.1046 0.1177 0.2294
5 0.1236 0.1170 0.1379 0.1252 0.1318 0.1186 0.1251 0.2166
6 0.0857 0.0943 0.1162 0.0969 0.1068 0.0902 0.0980 0.2229
7 0.0697 0.0711 0.0839 0.0809 0.0888 0.0616 0.0781 0.2147
8 0.2192 0.1825 0.2198 0.1832 0.1884 0.2130 0.2021 0.3090
9 0.0692 0.0679 0.0781 0.0747 0.0790 0.0690 0.0731 0.2013
10 0.1375 0.1643 0.2043 0.1780 0.1761 0.1662 0.1760 0.2962
11 0.1055 0.1043 0.1177 0.1100 0.1047 0.1179 0.1115 0.1646
12 0.2572 0.2511 0.2746 0.2602 0.2438 0.2660 0.2649 0.2987

Summarizing, the proposed IGF fusion scheme appears to outperform the other
multi-scale fusion methods investigated here in most of the conditions tested.

Runtime
In this study we used a Matlab implementation of the GF and IGF written by Zhang et
al. (2014) that is freely available from the authors (at http://www.cs.cuhk.edu.hk/~leojia/
projects/rollguidance). We made no effort to optimize the code of the algorithms. We
conducted a runtime test on a Dell Latitude laptop with an Intel i5 2 GHz CPU and 8 GB
memory. The algorithms were implemented in Matlab 2016a. Only a single thread was used
without involving any SIMD instructions. For this test we used the set of 12 test images
described in ‘Test imagery.’ As noted before, the filter size and regularization parameters
used in this study are respectively set to ri ={5,10,30} and εi ={0.0001,0.01,0.1} for

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 19/26

https://peerj.com
http://www.cs.cuhk.edu.hk/~leojia/projects/rollguidance
http://www.cs.cuhk.edu.hk/~leojia/projects/rollguidance
http://dx.doi.org/10.7717/peerj-cs.80


Table 4 NFMI values for each of the methods tested and for all 12 scenes.

Scene nr. Contrast DWT Gradient Laplace Morph Ratio SIDWT IGF

1 0.4064 0.3812 0.3933 0.3888 0.3252 0.3498 0.4084 0.4008
2 0.4354 0.3876 0.4001 0.3493 0.3432 0.3493 0.4075 0.4383
3 0.4076 0.4081 0.4175 0.4138 0.3758 0.3552 0.4330 0.4454
4 0.4017 0.3913 0.4066 0.4051 0.3655 0.3497 0.4205 0.4490
5 0.4304 0.3971 0.4101 0.4081 0.3758 0.3497 0.4229 0.4580
6 0.4299 0.4074 0.4203 0.4164 0.3832 0.3570 0.4295 0.4609
7 0.5050 0.4383 0.4439 0.4357 0.3942 0.3779 0.4469 0.4286
8 0.4305 0.4074 0.4097 0.4113 0.3806 0.3553 0.4273 0.4325
9 0.4351 0.3959 0.4105 0.3995 0.3658 0.3539 0.4130 0.4370
10 0.4439 0.4251 0.4263 0.4268 0.3863 0.3465 0.4513 0.5045
11 0.3882 0.3798 0.3987 0.3804 0.3131 0.3453 0.4068 0.4206
12 0.4051 0.3725 0.3973 0.3820 0.3449 0.3635 0.4111 0.4257

spatial scale levels i={0,1,2}. The mean runtime of the proposed fusion method was
0.61 ± 0.05 s.

DISCUSSION AND CONCLUSIONS
We propose a multi-scale image fusion scheme based on guided filtering. Iterative guided
filtering is used to decompose the source images into approximation and residual layers.
Initial binary weighting maps are computed as the pixelwise maximum of the individual
source saliency maps, obtained from frequency tuned filtering. Spatially consistent
and smooth weighting maps are then obtained through guided filtering of the binary
weighting maps with their corresponding source layers as guidance images. Saliency
weighted recombination of the individual source residual layers and the mean of the
coarsest scale source layers finally yields the fused image. The proposed multi-scale
image fusion scheme achieves spatial consistency by using guided filtering both at the
decomposition and at the recombination stage of the multi-scale fusion process. Application
to multiband visual (intensified) and thermal infrared imagery demonstrates that the
proposed method obtains state-of-the-art performance for the fusion of multispectral
nightvision images. The method has a simple implementation and is computationally
efficient.

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The effort was sponsored by the Air Force Office of Scientific Research, Air Force Material
Command, USAF, under grant number FA9550-15-1-0433. The funders had no role
in study design, data collection and analysis, decision to publish, or preparation of the
manuscript.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 20/26

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.80


Grant Disclosures
The following grant information was disclosed by the author:
Air Force Office of Scientific Research, Air Force Material Command, USAF: FA9550-15-
1-0433.

Competing Interests
The author declares there are no competing interests.

Author Contributions
• Alexander Toet conceived and designed the experiments, performed the experiments,
analyzed the data, contributed reagents/materials/analysis tools, wrote the paper,
prepared figures and/or tables, performed the computation work, reviewed drafts
of the paper.

Data Availability
The following information was supplied regarding data availability:

Figshare: TNO Image Fusion Dataset
http://dx.doi.org/10.6084/m9.figshare.1008029.

REFERENCES
Achanta R, Hemami S, Estrada F, Süsstrunk S. 2009. Frequency-tuned salient region de-

tection. In: Hemami S, Estrada F, Susstrunk S, eds. IEEE international conference on
computer vision and pattern recognition (CVPR2009). Piscataway: IEEE, 1597–1604.

Agarwal J, Bedi SS. 2015. Implementation of hybrid image fusion technique for feature
enhancement in medical diagnosis. Human-centric Computing and Information
Sciences 5(1):1–17 DOI 10.1186/s13673-014-0018-6.

Bavirisetti DP, Dhuli R. 2016a. Fusion of infrared and visible sensor images based
on anisotropic diffusion and Karhunen–Loeve transform. IEEE Sensors Journal
16(1):203–209 DOI 10.1109/JSEN.2015.2478655.

Bavirisetti DP, Dhuli R. 2016b. Two-scale image fusion of visible and infrared
images using saliency detection. Infrared Physics and Technology 76:52–64
DOI 10.1016/j.infrared.2016.01.009.

Beyan C, Yigit A, Temizel A. 2011. Fusion of thermal- and visible-band video for
abandoned object detection. Journal of Electronic Imaging 20(033001):1–12
DOI 10.1117/1.3602204.

Bhatnagar G, Wu QMJ. 2011. Human visual system based framework for concealed
weapon detection. In: The 2011 Canadian conference on computer and robot vision
(CRV). Piscataway: IEEE, 250–256.

Biswas B, Chakrabarti A, Dey KN. 2015. Spine medical image fusion using wiener
filter in shearlet domain. In: IEEE 2nd international conference on recent trends in
information systems (ReTIS 2015). Piscataway: IEEE, 387–392.

Black MJ, Sapiro G, Marimont DH, Heeger D. 1998. Robust anisotropic diffusion. IEEE
Transactions on Image Processing 7(3):421–432 DOI 10.1109/83.661192.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 21/26

https://peerj.com
http://dx.doi.org/10.6084/m9.figshare.1008029
http://dx.doi.org/10.1186/s13673-014-0018-6
http://dx.doi.org/10.1109/JSEN.2015.2478655
http://dx.doi.org/10.1016/j.infrared.2016.01.009
http://dx.doi.org/10.1117/1.3602204
http://dx.doi.org/10.1109/83.661192
http://dx.doi.org/10.7717/peerj-cs.80


Blum RS, Liu Z. 2006. Multi-sensor image fusion and its applications. Boca Raton: CRC
Press, Taylor & Francis Group.

Bulanona DM, Burks TF, Alchanatis V. 2009. Image fusion of visible and thermal images
for fruit detection. Biosystems Engineering 103(1):12–22
DOI 10.1016/j.biosystemseng.2009.02.009.

Burt PJ. 1988. Smart sensing with a pyramid vision machine. Proceedings IEEE
76(8):1006–1015 DOI 10.1109/5.5971.

Burt PJ. 1992. A gradient pyramid basis for pattern-selective image fusion. In: SID
international symposium 1992. Playa del Rey: Society for Information Display,
467–470.

Burt PJ, Adelson EH. 1983. The Laplacian pyramid as a compact image code. IEEE
Transactions on Communications 31(4):532–540 DOI 10.1109/TCOM.1983.1095851.

Burt PJ, Kolczynski RJ. 1993. Enhanced image capture through fusion. In: Fourth
international conference on computer vision. Piscataway: IEEE Computer Society
Press, 173–182.

Cui G, Feng H, Xu Z, Li Q, Chen Y. 2015. Detail preserved fusion of visible and infrared
images using regional saliency extraction and multi-scale image decomposition.
Optics Communications 341:199–209 DOI 10.1016/j.optcom.2014.12.032.

Cvejic N, Canagarajah CN, Bull DR. 2006. Image fusion metric based on mu-
tual information and Tsallis entropy. Electronics Letters 42(11):626–627
DOI 10.1049/el:20060693.

Daneshvar S, Ghassemian H. 2010. MRI and PET image fusion by combining IHS and
retina-inspired models. Information Fusion 11(2):114–123
DOI 10.1016/j.inffus.2009.05.003.

Farbman Z, Fattal R, Lischinski D, Szeliski R. 2008. Edge-preserving decompositions
for multi-scale tone and detail manipulation. ACM Transactions on Graphics 27(3 -
Article No. 67):1–10 DOI 10.1145/1360612.1360666.

Fecteau JH, Munoz DP. 2006. Salience, relevance, and firing: a priority map for target
selection. Trends in Cognitive Sciences 10(8):382–390 DOI 10.1016/j.tics.2006.06.011.

Gan W, Wu X, Wu W, Yang X, Ren C, He X, Liu K. 2015. Infrared and visible image
fusion with the use of multi-scale edge-preserving decomposition and guided image
filter. Infrared Physics & Technology 72:37–51 DOI 10.1016/j.infrared.2015.07.003.

Ghassemian H. 2001. A retina based multi-resolution image-fusion. In: IEEE interna-
tional geoscience and remote sensing symposium (IGRSS2001). Piscataway: IEEE,
709–711.

Haghighat MBA, Aghagolzadeh A, Seyedarabi H. 2011. A non-reference image fusion
metric based on mutual information of image features. Computers & Electrical
Engineering 37(5):744–756 DOI 10.1016/j.compeleceng.2011.07.012.

Haghighat M, Razian MA. 2014. Fast-FMI: non-reference image fusion metric. Piscataway:
IEEE, 1–3.

He K, Sun J, Tang X. 2013. Guided image filtering. IEEE Transactions on Pattern Analysis
and Machine Intelligence 35(6):1397–1409 DOI 10.1109/TPAMI.2012.213.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 22/26

https://peerj.com
http://dx.doi.org/10.1016/j.biosystemseng.2009.02.009
http://dx.doi.org/10.1109/5.5971
http://dx.doi.org/10.1109/TCOM.1983.1095851
http://dx.doi.org/10.1016/j.optcom.2014.12.032
http://dx.doi.org/10.1049/el:20060693
http://dx.doi.org/10.1016/j.inffus.2009.05.003
http://dx.doi.org/10.1145/1360612.1360666
http://dx.doi.org/10.1016/j.tics.2006.06.011
http://dx.doi.org/10.1016/j.infrared.2015.07.003
http://dx.doi.org/10.1016/j.compeleceng.2011.07.012
http://dx.doi.org/10.1109/TPAMI.2012.213
http://dx.doi.org/10.7717/peerj-cs.80


Hossny M, Nahavandi S, Creighton D. 2008. Comments on ‘‘Information mea-
sure for performance of image fusion’’. Electronics Letters 44(18):1066–1067
DOI 10.1049/el:20081754.

Jacobson NP, Gupta MR. 2005. Design goals and solutions for display of hyperspectral
images. IEEE Transactions on Geoscience and Remote Sensing 43(11):2684–2692
DOI 10.1109/TGRS.2005.857623.

Jacobson NP, Gupta MR, Cole JB. 2007. Linear fusion of image sets for display. IEEE
Transactions on Geoscience and Remote Sensing 45(10):3277–3288
DOI 10.1109/TGRS.2007.903598.

Jiang D, Zhuang D, Huan Y, Fu J. 2011. Survey of multispectral image fusion techniques
in remote sensing applications. In: Zheng Y, ed. Image fusion and its applications.
Rijeka, Croatia: InTech Open, 1–22.

Kong SG, Heo J, Boughorbel F, Zheng Y, Abidi BR, Koschan A, Yi M, Abidi MA.
2007. Multiscale fusion of visible and thermal IR images for illumination-
invariant face recognition. International Journal of Computer Vision 71(2):215–233
DOI 10.1007/s11263-006-6655-0.

Kong W, Wang B, Lei Y. 2015. Technique for infrared and visible image fusion based
on non-subsampled shearlet transform & spiking cortical model. Infrared Physics &
Technology 71:87–98 DOI 10.1016/j.infrared.2015.02.008.

Kumar BKS. 2013. Image fusion based on pixel significance using cross bilateral filter.
Signal, Image and Video Processing 9(5):1193–1204 DOI 10.1007/s11760-013-0556-9.

Lemeshewsky GP. 1999. Park SJ, Juday RD, eds. Multispectral multisensor image
fusion using wavelet transforms. Bellingham: The International Society for Optical
Engineering, 214–222.

Lepley JJ, Averill MT. 2011. Detection of buried mines and explosive objects using dual-
band thermal imagery. In: Harmon RS, Holloway JH, Broach JT, eds. Detection
and sensing of mines, explosive objects, and obscured targets XVI, Vol. SPIE-8017.
Bellingham: The International Society for Optical Engineering, 80171V80171-80112.

Li S, Kang X, Hu J. 2013. Image fusion with guided filtering. IEEE Transactions on Image
Processing 22(7):2864–2875 DOI 10.1109/TIP.2013.2244222.

Li S, Kwok JT, Wang Y. 2002. Using the discrete wavelet frame transform to merge
Landsat TM and SPOT panchromatic images. Information Fusion 3(1):17–23
DOI 10.1016/S1566-2535(01)00037-9.

Li S, Li Z, Gong J. 2010. Multivariate statistical analysis of measures for assessing the
quality of image fusion. International Journal of Image and Data Fusion 1(1):47–66
DOI 10.1080/19479830903562009.

Li H, Manjunath BS, Mitra SK. 1995. Multisensor image fusion using the wavelet
transform. Computer Vision, Graphics and Image Processing: Graphical Models and
Image Processing 57(3):235–245.

Liu Z, Blasch EP, Xue Z, Zhao J, Laganière R, Wu W. 2012. Objective assessment of
multiresolution image fusion algorithms for context enhancement in night vision:
a comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence
34(1):94–109 DOI 10.1109/TPAMI.2011.109.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 23/26

https://peerj.com
http://dx.doi.org/10.1049/el:20081754
http://dx.doi.org/10.1109/TGRS.2005.857623
http://dx.doi.org/10.1109/TGRS.2007.903598
http://dx.doi.org/10.1007/s11263-006-6655-0
http://dx.doi.org/10.1016/j.infrared.2015.02.008
http://dx.doi.org/10.1007/s11760-013-0556-9
http://dx.doi.org/10.1109/TIP.2013.2244222
http://dx.doi.org/10.1016/S1566-2535(01)00037-9
http://dx.doi.org/10.1080/19479830903562009
http://dx.doi.org/10.1109/TPAMI.2011.109
http://dx.doi.org/10.7717/peerj-cs.80


Liu X, Mei W, Du H, Bei J. 2016. A novel image fusion algorithm based on nonsubsam-
pled shearlet transform and morphological component analysis. Signal, Image and
Video Processing 10(5):959–966 DOI 10.1007/s11760-015-0846-5.

Liu Z, Xue Z, Blum RS, Laganiëre R. 2006. Concealed weapon detection and visu-
alization in a synthesized image. Pattern Analysis & Applications 8(4):375–389
DOI 10.1007/s10044-005-0020-8.

Motamed C, Lherbier R, Hamad D. 2005. A multi-sensor validation approach for
human activity monitoring. In: 7th international conference on information fusion
(Information Fusion 2005). Piscataway: IEEE.

Nelsen RB. 1987. Discrete bivariate distributions with given marginals and correla-
tion. Communications in Statistics—Simulation and Computation 16(1):199–208
DOI 10.1080/03610918708812585.

O’Brien MA, Irvine JM. 2004. Information fusion for feature extraction and the develop-
ment of geospatial information. In: 7th international conference on information fusion.
ISIF, 976–982.

Perona P, Malik J. 1990. Scale-space and edge detection using anisotropic diffusion.
IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7):629–639
DOI 10.1109/34.56205.

Petrovic VS, Xydeas CS. 2003. Sensor noise effects on signal-level image fusion perfor-
mance. Information Fusion 4(3):167–183 DOI 10.1016/S1566-2535(03)00035-6.

Petschnigg G, Agrawala M, Hoppe H, Szeliski R, Cohen M, Toyama K. 2004. Digital
photography with flash and no-flash image pairs. New York: ACM Press, 664–672.

Qu GH, Zhang DL, Yan PF. 2002. Information measure for performance of image
fusion. Electronics Letters 38(7):313–315 DOI 10.1049/el:20020212.

Rockinger O. 1997. Image sequence fusion using a shift-invariant wavelet transform.
In: IEEE international conference on image processing, Vol. III. Piscataway: IEEE,
288–291.

Rockinger O. 1999. Multiresolution-Verfahren zur Fusion dynamischer Bildfolge. PhD
Thesis, Technische Universität Berlin.

Rockinger O, Fechner T. 1998. Pixel-level image fusion: the case of image sequences.
In: Kadar I, ed. Signal processing, sensor fusion, and target recognition VII, vol. SPIE-
3374. Bellingham: The International Society for Optical Engineering, 378–388.

Scheunders P, De Backer S. 2001. Fusion and merging of multispectral images us-
ing multiscale fundamental forms. Journal of the Optical Society of America A
18(10):2468–2477 DOI 10.1364/JOSAA.18.002468.

Shah P, Reddy BCS, Merchant S, Desai U. 2013. Context enhancement to reveal
a camouflaged target and to assist target localization by fusion of multispec-
tral surveillance videos. Signal, Image and Video Processing 7(3):537–552
DOI 10.1007/s11760-011-0257-1.

Singh R, Khare A. 2014. Fusion of multimodal medical images using Daubechies
complex wavelet transform—a multiresolution approach. Information Fusion
19:49–60 DOI 10.1016/j.inffus.2012.09.005.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 24/26

https://peerj.com
http://dx.doi.org/10.1007/s11760-015-0846-5
http://dx.doi.org/10.1007/s10044-005-0020-8
http://dx.doi.org/10.1080/03610918708812585
http://dx.doi.org/10.1109/34.56205
http://dx.doi.org/10.1016/S1566-2535(03)00035-6
http://dx.doi.org/10.1049/el:20020212
http://dx.doi.org/10.1364/JOSAA.18.002468
http://dx.doi.org/10.1007/s11760-011-0257-1
http://dx.doi.org/10.1016/j.inffus.2012.09.005
http://dx.doi.org/10.7717/peerj-cs.80


Singh R, Vatsa M, Noore A. 2008. Integrated multilevel image fusion and match score
fusion of visible and infrared face images for robust face recognition. Pattern
Recognition 41(3):880–893 DOI 10.1016/j.patcog.2007.06.022.

Tao C, Junping Z, Ye Z. 2005. Remote sensing image fusion based on ridgelet transform.
In: 2005 IEEE international geoscience and remote sensing symposium (IGARSS’05),
Vol. 2. Piscataway: IEEE, 1150–1153.

Tian YP, Zhou KY, Feng X, Yu SL, Liang H, Liang B. 2009. Image fusion for infrared
thermography and inspection of pressure vessel. Journal of Pressure Vessel Technology
131(2 - article no. 021502):1–5 DOI 10.1115/1.3066801.

Toet A. 1989a. A morphological pyramidal image decomposition. Pattern Recognition
Letters 9(4):255–261 DOI 10.1016/0167-8655(89)90004-4.

Toet A. 1989b. Image fusion by a ratio of low-pass pyramid. Pattern Recognition Letters
9(4):245–253 DOI 10.1016/0167-8655(89)90003-2.

Toet A. 2003. Color image fusion for concealed weapon detection. In: Carapezza EM, ed.
Sensors, and command, control, communications, and intelligence (C3I) technologies
for homeland defense and law enforcement II, Vol. SPIE-5071. Bellingham: SPIE,
372–379.

Toet A. 2011. Computational versus psychophysical image saliency: a comparative
evaluation study. IEEE Transactions on Pattern Analysis and Machine Intelligence
33(11):2131–2146 DOI 10.1109/TPAMI.2011.53.

Toet A. 2014. TNO Image fusion dataset. Figshare DOI 10.6084/m9.figshare.1008029.
Toet A, IJspeert I, Waxman AM, Aguilar M. 1997. Fusion of visible and thermal imagery

improves situational awareness. Displays 18(2):85–95
DOI 10.1016/S0141-9382(97)00014-0.

Toet A, Van Ruyven LJ, Valeton JM. 1989. Merging thermal and visual images by a
contrast pyramid. Optical Engineering 28(7):789–792 DOI 10.1117/12.7977034.

Tomasi C, Manduchi R. 1998. Bilateral filtering for gray and color images. In: IEEE sixth
international conference on computer vision. Piscataway: IEEE, 839–846.

Wang Z, Bovik AC. 2002. A universal image quality index. IEEE Signal Processing Letters
9(3):81–84 DOI 10.1109/97.995823.

Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. 2004. Image quality assessment: from
error visibility to structural similarity. IEEE Transactions on Image Processing
13(4):600–612 DOI 10.1109/TIP.2003.819861.

Wang L, Li B, Tian LF. 2014. Multi-modal medical image fusion using the inter-scale
and intra-scale dependencies between image shift-invariant shearlet coefficients.
Information Fusion 19:20–28 DOI 10.1016/j.inffus.2012.03.002.

Wertheim AH. 2010. Visual conspicuity: a new simple standard, its reliability, validity
and applicability. Ergonomics 53(3):421–442 DOI 10.1080/00140130903483705.

Xue Z, Blum RS. 2003. Concealed weapon detection using color image fusion. In: Sixth
international conference on information fusion (FUSION2003). Piscataway: IEEE,
622–627.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 25/26

https://peerj.com
http://dx.doi.org/10.1016/j.patcog.2007.06.022
http://dx.doi.org/10.1115/1.3066801
http://dx.doi.org/10.1016/0167-8655(89)90004-4
http://dx.doi.org/10.1016/0167-8655(89)90003-2
http://dx.doi.org/10.1109/TPAMI.2011.53
http://dx.doi.org/10.6084/m9.figshare.1008029
http://dx.doi.org/10.1016/S0141-9382(97)00014-0
http://dx.doi.org/10.1117/12.7977034
http://dx.doi.org/10.1109/97.995823
http://dx.doi.org/10.1109/TIP.2003.819861
http://dx.doi.org/10.1016/j.inffus.2012.03.002
http://dx.doi.org/10.1080/00140130903483705
http://dx.doi.org/10.7717/peerj-cs.80


Xue Z, Blum RS, Li Y. 2002. Fusion of visual and IR images for concealed weapon
detection. In: Fifth international conference on information fusion, Vol. 2. Piscataway:
IEEE, 1198–1205.

Yajie W, Mowu L. 2009. Image fusion based concealed weapon detection. In: Inter-
national conference on computational intelligence and software engineering 2009
(CiSE2009). Piscataway: IEEE, 1–4.

Yang W, Liu J-R. 2013. Research and development of medical image fusion. In: 2013
IEEE international conference on medical imaging physics and engineering (ICMIPE).
Piscataway: IEEE, 307–309.

Yang S, Wang M, Jiao L, Wu R, Wang Z. 2010. Image fusion based on a new contourlet
packet. Information Fusion 11(2):78–84 DOI 10.1016/j.inffus.2009.05.001.

Yong J, Minghui W. 2014. Image fusion using multiscale edge-preserving decompo-
sition based on weighted least squares filter. IET Image Processing 8(3):183–190
DOI 10.1049/iet-ipr.2013.0429.

Zhang B, Lu X, Pei H, Zhao Y. 2015. A fusion algorithm for infrared and visible images
based on saliency analysis and non-subsampled Shearlet transform. Infrared Physics
& Technology 73:286–297 DOI 10.1016/j.infrared.2015.10.004.

Zhang Q, Shen X, Xu L, Jia J. 2014. Rolling guidance filter. In: Fleet D, Pajdla T, Schiele
B, Tuytelaars T, eds. 13th European conference on computer vision (ECCV2014), Vol.
III. Berlin Heidelberg: Springer International Publishing, 815–830.

Zhao J, Feng H, Xu Z, Li Q, Liu T. 2013. Detail enhanced multi-source fusion using
visual weight map extraction based on multi scale edge preserving decomposition.
Optics Communications 287:45–52 DOI 10.1016/j.optcom.2012.08.070.

Zhu Z, Huang TS. 2007. Multimodal surveillance: sensors, algorithms and systems.
Norwood: Artech House Publishers.

Zitová B, Beneš M, Blažek J. 2011. Image fusion for art analysis. In: Computer vision and
image analysis of art II, Vol. SPIE-7869. Bellingham: The International Society for
Optical Engineering, 7869081–7869089.

Zou X, Bhanu B. 2005. Tracking humans using multi-modal fusion. In: 2nd joint IEEE
international workshop on object tracking and classification in and beyond the visible
spectrum (OTCBVS’05). Piscataway: IEEE, W01-30-01-08.

Toet (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.80 26/26

https://peerj.com
http://dx.doi.org/10.1016/j.inffus.2009.05.001
http://dx.doi.org/10.1049/iet-ipr.2013.0429
http://dx.doi.org/10.1016/j.infrared.2015.10.004
http://dx.doi.org/10.1016/j.optcom.2012.08.070
http://dx.doi.org/10.7717/peerj-cs.80