0

Abstract Painting with Interactive Control of Perceptual Entropy

MINGTIAN ZHAO and SONG-CHUN ZHU, University of California, Los Angeles and Lotus Hill Institute

This article presents a framework for generating abstract art from photographs. The aesthetics of abstract art is largely attributed

to its greater perceptual ambiguity than photographs. According to psychological theories [Berlyne 1971], the ambiguity tends
to invoke moderate mental efforts of the viewer for interpreting the underlying contents, and this process is usually accompanied

with subtle aesthetic pleasures. We study this phenomenon through human experiments comparing subjects’ interpretations of

abstract art and photographs, and quantitatively verify the increased perceptual ambiguities in terms of recognition accuracy
and response time. Based on the studies, we measure the level of perceptual ambiguity using entropy as it measures uncertainty

levels in information theory, and propose a painterly rendering method with interactive control of the ambiguity levels. Given

an input photograph, we first segment it into regions corresponding to different objects and parts in an interactive manner, and
organize them into a hierarchical parse tree representation. Then we execute a painterly rendering process with image obscuring

operators to transfer the photograph into an abstract painting style with increased perceptual ambiguities of both the scene
and individual objects. Finally, using kernel density estimation and message passing algorithms, we compute and control the

ambiguity levels numerically to the desired levels, during which we may predict and control the viewer’s perceptual path among

the image contents by assigning different ambiguity levels to different objects. We have evaluated the rendering results using a
second set of human experiments, and verified that they achieve similar abstract effects to original abstract paintings by artists.

Categories and Subject Descriptors: I.2.10 [Artificial Intelligence]: Vision and Scene Understanding—Perceptual Reasoning;

I.3.4 [Computer Graphics]: Graphics Utilities—Paint Systems; I.4.10 [Image Processing and Computer Vision]: Image
Representation—Hierarchical; J.5. [Computer Applications]: Arts and Humanities—Fine Arts

General Terms: Algorithms, Experimentation, Human Factors

Additional Key Words and Phrases: Abstract art, entropy, image parsing, painterly rendering, perceptual ambiguity, semantics

ACM Reference Format:

Zhao, M. and Zhu, S.-C. 2012. Abstract Painting with Interactive Control of Perceptual Entropy. ACM Trans. Appl. Percept.
0, 0, Article 0 (July 2012), 20 pages.

DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

1. INTRODUCTION

Abstract artworks like Claude Monet’s famous Wheatstack in Fig. 1a have a characteristic charm beyond
photographic and representational arts. In particular, observing and interpreting an abstract artwork, in
some sense, is like playing a guessing game with the artist, which typically has ambiguities and causes
confusions, but this experience is usually also interesting and rewarding.

The work at UCLA was partially supported by NSF grant IIS-1018751 and ONR MURI grant N000141010933, and the work at
LHI was supported by NSFC grant 60970156. Authors’ addresses: M. Zhao and S.-C. Zhu, UCLA Department of Statistics, 8125
Mathematical Sciences Building, Box 951554, Los Angeles, CA 90095-1554; emails: mtzhao@ucla.edu, sczhu@stat.ucla.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided

that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page
or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to
lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481,
or permissions@acm.org.
c© 2012 ACM 1544-3558/2012/07-ART0 $15.00

DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:2 • M. Zhao and S.-C. Zhu

(a) (b) (c)

(d) (e) (f)

Fig. 1. Some abstract artworks. (a) Wheatstack (Thaw, Sunset), 1890–1891 by Claude Monet. (b) Le Mont Sainte-Victoire,
1902–1904 by Paul Cézanne. (c) View of Collioure (The Bell Tower), 1905 by Henri Matisse. (d) Kairouan (III), 1914 by

August Macke. (e) A photograph with a similar scene to (b). (f) A photograph with a similar scene to (c).

This subtle beauty of abstract art has long been noticed by both artists and psychologists. Wassily Kandin-
sky (1866–1944), a Russian abstract painting master, attributed the “fairy-tale power and splendor” of
Monet’s haystacks to the surprise and confusion caused by their indistinct painting style missing recog-
nizable objects [Lindsay and Vergo 1994, p.363]. Daniel Berlyne (1924–1976), a pioneer in theoretical and
experimental psychology, further explained this phenomenon with his theory of the motivational aspects of
perception [Berlyne 1971, pp.61–114; Konečni 1978; Funch 1997, pp.26–33]. According to Berlyne, the pro-
cess of observing and interpreting aesthetic patterns such as abstract art involves certain levels of perceptual
ambiguities. To resolve the ambiguities, the observer subconsciously puts in mental efforts (e.g., continuous
guesses until the correct answer [Kersten 1987]) that can lead to moderate changes of the arousal levels in
his/her nervous system, which in turn reward him/her with emotional pleasures.

The confusion and ambiguity of abstract art may exist in various forms, styles, and levels, as shown by
the examples in Fig. 1. In many abstract artworks, such ambiguities are often achieved by

— Preserving visual features in certain semantic dimensions (e.g., scene configuration, identity of ob-
ject/part, color/shape/texture characteristics), and

— Freeing (e.g., spatially disarranging, obscuring, randomizing) the other dimensions.

While the former preserves the contents and leaves clues, the latter usually challenges our visual perception,
for example:

— In Monet’s wheatstack, Cézanne’s mount and Matisse’s bell tower shown in Figs. 1a through 1c, global
structures of the scenes are mostly preserved in the sense that they are recognizable, while appearances
and shapes of individual objects are obscured. In particular, the objects in Fig. 1b are obscured to different
degrees, so the viewer usually recognizes the mount first, which further helps recognize the trees and huts in
the context. We call this sequential recognition effect (i.e., the viewer recognizes less obscured objects first,
then understands the scene and other objects with the help of contextual information) the perceptual path.

— In Macke’s Kairouan (III) shown in Fig. 1d, as well as in Pablo Picasso’s famous Guernica and Violin
and Guitar, the identifiability of individual objects/parts are well preserved, while the spatial configurations
of the scenes are disarranged.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:3

— In some modern paintings, such as Jackson Pollock’s drip paintings, only some low-level color and
shape statistics are preserved, while high-level semantic and geometric structures are randomized.

In this article, we focus on the style of Monet’s wheatstack and Cézanne’s mount, which preserves the
scene structures while obscuring individual objects. We conduct human experiments to study the different
ambiguity levels between such abstract paintings and photographs. Based on the experiments, we define and
measure the level of perceptual ambiguity, and propose an abstract painting rendering method using image
obscuring operators such as color shift and shape deformation, in which we can compute and interactively
control the ambiguity levels and perceptual paths using kernel density estimation and message passing
algorithms. Through a second set of human experiments, we verify that our rendering results achieve similar
abstract effects to original abstract paintings by artists.

The rest of this article is organized as follows. Section 1.1 summarizes related work on abstract art in
computer graphics, image analysis, and perception, and Section 1.2 lists our contributions, and improve-
ments over our previous work [Zhao and Zhu 2010]. We carry out our human experiments and analyze the
experimental results in Section 2. In Section 3, we introduce a numerical measure for perceptual ambiguity
named perceptual entropy, which is defined on a hierarchical parse tree representation for image contents.
We also explain how a parse tree is constructed using interactive image segmentation and labeling methods.
Then in Section 4, we present the image obscuration and painterly rendering techniques to manipulate the
perceptual entropy. To complete the system pipeline, in Section 5, we show how the perceptual entropy is
computed and adjusted, and how the perceptual path is predicted. Section 6 illustrates our rendering results.
In Sections 7 and 8 we present the second set of human experiments which verifies the rendering results.
Finally, we conclude our studies in Section 9 with discussions.

1.1 Related Work

Recently, in computer graphics and image analysis communities, especially in the non-photorealistic rendering
(NPR) area [Gooch and Gooch 2001; Strothotte and Schlechtweg 2002], there have been continuing efforts
for understanding and rendering abstract artworks of different styles.

In computer graphics, Haeberli [1990] first proposed abstract image representations using brush strokes.
Image representation with brush strokes essentially abstracts images by modifying many high-frequency
details and only preserving relatively low-frequency surfaces and gentle gradients. Later, the study on stroke-
based rendering was further extended by many painterly rendering methods [Meier 1996; Litwinowicz 1997;
Hertzmann 1998; Zeng et al. 2009] for better visual effects. To achieve non-uniform abstraction across an
image which is naturally performed by artists, DeCarlo and Santella [2002] developed an approach for
stylization and abstraction of photographs, which identifies visually attended elements utilizing eye-tracking
data, and preserves more details for such areas during rendering. Recently, a few automatic methods for
image and video simplification or abstraction have been developed [Orzan et al. 2007; Kyprianidis 2011;
Olsen and Gooch 2011]. The main idea of these methods is to filter images to remove textures in relatively
flat areas, which human vision is not very sensitive to. For vector graphics, Mi et al. [2009] proposed a
method for 2D shape abstraction using part-based representations, by identifying and preserving important
parts. Besides, many specific styles of abstract art have also been widely studied and simulated, including
image mosaics [Finkelstein and Range 1998; Orchard and Kaplan 2008], drip-painting [Lee et al. 2006],
cubism [Collomosse and Hall 2003], abstract texture synthesis by sampling [Morel et al. 2006], etc.

On the image analysis aspect, Pollock’s famous drip paintings were analyzed using fractal mathemat-
ics [Mureika et al. 2005; Jones-Smith and Mathur 2006; Taylor et al. 2007]. Statistical and computer vision
methods have also been applied in analyzing and classifying paintings of various styles [Wallraven et al.
2009; Hughes et al. 2010]. Recently, Rigau et al. [2008] proposed informational aesthetic measures to evalu-

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:4 • M. Zhao and S.-C. Zhu

ate artistic images based on information-theoretic principles. There are also growing interests in the subtle
effects of perceptual ambiguity in abstract art [Arnheim 1971; Yevin 2006; Hertzmann 2010].

In the literature of perception and psychophysics, there are also studies on abstract images generated
using artistic rendering techniques. Gooch et al. [2004] presented a study on human facial illustrations and
showed that the rendered facial illustrations and caricatures are as effective in communicating complex facial
information as photographs. Wallraven et al. [2007] also studied the effects of artistic stylization in stylized
facial expressions, using both real-world and computer-generated images. Redmond and Dingliana [2009]
compared different NPR styles in the perception of abstracted scenes, and observed that salient target
objects can be effectively emphasized using NPR, given appropriate scene context and level of stylization.

1.2 Our Contributions

Most of the above studies on rendering focused on relatively low-level image features (e.g., color, gradient).
A few methods also work in the perceptual space by dealing with visual salience and attention [DeCarlo
and Santella 2002]. In contrast, the creation and appreciation of abstract art entail the manipulation of
categorical recognition for scenes, objects and parts, where ambiguity and confusion may occur. Our method
for rendering abstract paintings is based on the hypothesis that they usually have greater ambiguities for
understanding than photographs, which is fundamentally different from previous image abstraction methods.
To achieve this, this article has the following contributions:

— We introduce the image parsing method [Tu et al. 2005] to provide a hierarchical descriptor of image
contents for studying the mechanism of abstract art at the semantic level.

— We compare abstract art and photographs containing different categories of objects using human
experiments, and quantitatively measure the differences in recognition accuracy and response time between
them, which reflect their differences in perceptual ambiguities.

— Under the frameworks of Bayesian statistics and information theory, we define a numerical measure
of the levels of perceptual ambiguities named perceptual entropy, and develop algorithms to compute the
entropy for images and predict their most likely perceptual paths.

— We propose a painterly rendering method for generating abstract painting images from photographs,
in which we have interactive control of the ambiguity levels and perceptual paths.

This article extends our previous work on abstract painting [Zhao and Zhu 2010]. Compared with the previous
study, this article presents additional or improved methods and results in two main aspects:

— Improved models and algorithms to compute the entropies over hierarchical image structures, for better
simulating human visual perception. These include a logistic-regression-based distance metric between image
regions using color, shape, and texture features (in Section 5.1), a more accurate approximate of the joint
perceptual entropy according to the most probable parse tree configurations (in Section 5.2), a sequential
algorithm for predicting the most likely perceptual path among image regions (in Section 5.4), etc.

— More comprehensive human experiments. These include more extensive experiments and analyses on
the effect of perceptual ambiguity reflected by recognition accuracy and response time (in Sections 2 and 7),
and an additional experiment on the effect of perceptual paths (in Section 8).

2. HUMAN EXPERIMENTS ON THE LEVELS OF PERCEPTUAL AMBIGUITY: PART ONE

We use human experiments to compare the mental efforts for interpreting abstract art images and pho-
tographs, so as to verify our hypothesis that abstract art images should generally have higher ambiguity
levels than photographs, which can be reflected by the lower recognition accuracy and longer response time.

We collected 123 abstract art images from well-known artists’ works, and divided them into different scene
and object categories. Table I shows the list of categories we use in this article, some of which do not exist

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:5

Table I. List of scene and object categories we use in this article, which distribute widely over common
categories usually appearing in paintings.

7 Scene Categories 42 Object Categories

close-up abstract background big mammal bike bird

indoor bridge building bus/car/train butterfly/insect

landscape chimney clothing/fabrics door/window face/skin
portrait fish flag/sign flower fruit

seascape furniture/bench glass/porcelain grass/straw/reed ground/earth/pavement

skyline hair house/pavilion human kite/balloon
streetscape lamp/light leaf mountain pillar/pole

road/street/alley rock/stone/reef sand/shore ship/boat
sky/cloud/glow small mammal snow/frost statue
sun/moon/star tower/lighthouse tree/trunk/twig umbrella

wall/roof water/spindrift

P
a
in

ti
n

g
P

h
o
to

g
ra

p
h

bird door dog flowers buildings tower

Fig. 2. Example image patch pairs of abstract paintings of different object categories and their corresponding photographs

used in our human experiments. Dog belongs to the small mammal category.

Fig. 3. Object category confusion matrices obtained in our experiments for abstract paintings (left) and photographs (right).

The horizontal axis stands for reported categories and the vertical axis stands for true categories. The darkness of each grid

is proportional to its corresponding frequency of subjects’ reports. The rightmost column of each matrix stands for either the
“none of these categories” report, or failed attempts of recognition within the limited time span (one minute per image).

in the 123 images but frequently appear in other paintings. We manually pair these images up with 123
photographs collected using web search engines, which well match the abstract art images in both categories
and contents. Fig. 2 includes some example image patches from the matched pairs.

These images are then presented to 20 human subjects (voluntary college and graduate students of art,
science, and engineering majors) within a limited time span (one minute per image) on a 17-inch color
monitor. During the experiment, these images are depaired and presented in random order, and each image
is seen by a subject only once. Following our pre-experiment instructions, as soon as the subject feels that
he/she recognizes the foreground object (highlighted with a bounding box) in center of the image, he/she

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:6 • M. Zhao and S.-C. Zhu

0

10

5

R
e
sp

o
n

se
T

im
e

(i
n

se
c
o
n

d
s)

bird door dog flowers buildings tower

abstract paintings photographs outliers

Fig. 4. Box plots of response time for object recognition in abstract paintings and photographs. Three outliers greater than

11 seconds are not shown (photograph of bird: 28s, abstract painting of door: 14.8s, and abstract painting of dog: 21.1s).

Table II. Paired one-sided t-tests comparing the response time for object recognition
in abstract paintings and photographs, corresponding to Fig. 4.

bird door dog flowers buildings tower

t-score −0.3775 2.0045 2.524 4.2201 2.4147 1.995
p-value 0.645 0.02974 0.01033 0.0002318 0.013 0.03029

hits the keyboard. Then the image disappears, and the response time is recorded. The subject is immediately
asked to choose one of the categories in Table I provided on the screen, or report “none of these categories.”
Recognition Accuracy. The recognition accuracy can be visually reflected by the confusion matrices sum-

marizing reported vs. true interpretations, as shown in Fig. 3, in which the horizontal axes of the matrices
stand for reported categories and the vertical axes stand for true categories, so the diagonal elements corre-
spond to correct recognition results. The rightmost column of each matrix stands for either the “none of these
categories” report, or failed attempts of recognition with in the limited time span (one minute per image).
The matrix for abstract art is more scattered with weaker diagonal elements, and has a darker rightmost
column. This means the subjects generally have lower recognition accuracy for abstract art than photograph.
Meanwhile, even for abstract art, the diagonal elements are still darker than other grids in each row, which
means that the images are usually still correctly recognizable through efforts, otherwise they may become
meaningless and inaesthetic like pure flat or noise images.
Response Time. For abstract art images, we expect greater mental efforts of the human subjects, reflected

by their longer response time for recognition. Fig. 4 displays a few box plots of the recorded response time for
object recognition in abstract paintings and photographs, in which six categories are included as examples
(corresponding to the image pairs displayed in Fig. 2). These plots show greater average response time for
abstract paintings than that for photographs. But we also notice that not all significance levels are high,
as confirmed by paired one-sided t-tests shown in Table II. The negative t-score for bird is due to the
extreme outlier of 28 seconds in the photograph sample (not shown in Fig. 4). If we remove that pair we get
t-score = 1.4648 and p-value = 0.08012. This confirms the significance of the difference in mental efforts.

3. IMAGE UNDERSTANDING AND PERCEPTUAL AMBIGUITY IN A COMPUTATIONAL PERSPECTIVE

From a computational perspective, vision is an ill-posed problem. It is widely acknowledged in the human
and computer vision communities that the imaging process loses lots of information about the 3D world

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:7

seascape

sailboat

sea buildings trees sky

sail hull human

Fig. 5. A seascape image (left, courtesy of pdphoto.org) and its example parse tree (right).

and thus one cannot restore the contents uniquely from an image. Instead, visual perception is achieved by
computing the most probable interpretations of the observed images in our eyes. When there does not exist
a dominant interpretation with significantly larger probability than all the other interpretations, the image
causes perceptual ambiguity for our visual perception. Thanks to artists’ exquisite skills, good abstract
artworks usually have carefully (but implicitly in terms of numerical computing) tuned probabilities of
competing interpretations, in order for the viewers to enjoy the guessing game with the artists.

To quantitatively measure the level of ambiguity, we can compute the information (Shannon) entropy of
the probabilities of all interpretations [Cover and Thomas 2006], and define

ambiguity level = entropy(probabilities of intepretations).

Therefore, to proceed, we need a representation for the interpretations of image contents.

3.1 Parse Tree and Entropy

We adopt parse tree introduced by Tu et al. [2005] to computer vision from computational linguistics. Similar
to parse trees for English sentences, a parse tree for image representation is a hierarchical decomposition. It
has a root node corresponding to the entire scene of the image, which has a few children/descendant nodes
corresponding to the constituent objects and parts. As shown in Fig. 5, the photograph is a seascape scene
(i.e., label of the root node of the parse tree), which is then decomposed into five objects/regions: sailboat,
sea, buildings, trees, and sky. The sailboat node is futher decomposed into three children: sail, hull, and
human on board. In general, we view a parse tree as a directed acyclic graph (DAG) G = 〈V,E〉, whose
vertices V represent the nodes, and directed edges E represent the parent→child links in the parse tree. Each
node i ∈ V is associated with its category label `i (e.g., a category in Table I) and visual features Ai (e.g.,
shape, color and texture).

To model perceptual ambiguity with the parse tree representation, we make two assumptions.
Assumption 1. The main cause of perceptual ambiguity is due to the obscured objects, rather than unclear

parse tree structures (i.e., we do not study the abstract style as shown in Fig. 1d).
Assumption 2. For understanding abstract art in the sense of recognizing the contents, we only care about

computing the category labels, ignoring visual features specific to object instances (e.g., we do not have to
describe whether a human is tall or short).

Therefore, we simplify the parse tree to a vector representation of its nodes’ category labels

L = (`1,`2, · · · ,`K ),

where the labels for a correct interpretation should (i) correspond well to the image, and (ii) be compatible
to each other, for example, a boat rather than a bus is compatible to the sea surface. Under the Bayesian
framework, in computer vision and pattern recognition, it is a standard practice to compute the maximum
a posteriori (MAP) estimate

L̂MAP = arg max
L

p(L|I)

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:8 • M. Zhao and S.-C. Zhu

as the best interpretation of image I. But the MAP estimate only captures the major mode or peak of the
posterior probabilities p(L|I), and cannot tell how much better this best interpretation is than the other
interpretations, which influences the ambiguity and thus our mental efforts in visual perception.

Compared with MAP, the perceptual entropy defined by

H(L)|I =
∑
L
−p(L|I) log p(L|I)

describes the uncertainty/ambiguity associated with the posterior possibilities. It is worth mentioning that
H(L)|I differs from the conditional entropy [Cover and Thomas 2006]

H(L|I) =
∑
I

p(I)H(L)|I =
∑
L,I

−p(L,I) log p(L|I)

in the way that H(L)|I only deals with a specific image and thus does not sum over I. For the abstract
paintings we study here, with p(L|I) often having more than one local maxima (i.e., multimodality) corre-
sponding to multiple competing interpretations [Yevin 2006], we expect H(L)|I to be significantly greater
than the close-to-zero ambiguities of photographs, but still much lower than the upper bound log |ΩL| where
|ΩL| is the volume of the space of L (i.e., number of all possible category label combinations).

3.2 Constructing the Parse Tree

Given an input photograph, as automatic image parsing is not a solved problem in general, we use an
interactive program to construct the parse tree in three steps.
Step 1: Image Segmentation. We first segment the given image into regions corresponding to different

objects. To achieve this, we adopt a scribble-based interactive segmentation algorithm [Lombaert et al.
2005]. Using this method, each time we draw foreground and background scribbles, we can segment the
image into two parts. We continue with this procedure to further segment each part, until every object is
separated from its neighboring regions, or a resolution limit is reached (i.e., we are not interested in even
smaller objects or parts). With the number of nodes K < 15 for most images, the segmentation is usually
completed within several minutes.
Step 2: Hierarchical Organization. Using the above recursive foreground-background segmentation scheme,

we obtain a binary tree, in which each none-leaf node corresponds to a region we have already segmented into
two parts. However, some nodes might not correspond to individual semantic objects (e.g., a node containing
parts of two different objects), and sometimes an object is mistakenly divided into multiple branches in the
binary tree. In order to obtain a meaningful hierarchy conforming to the image semantics, we delete and
merge nodes to form a multiway tree in an interactive manner on the software interface.
Step 3: Category Labeling. We manually label the categories of all nodes in the parse tree (scene category

for the root node and object category for the other nodes). This ground-truth parse tree with all category
labels is helpful for computing the ambiguity level later. But the category labeling step is optional. Even
without the manually selected category labels, our method can still compute the ambiguity level, possibly
with slightly lower accuracy. The usage of ground-truth labels will be explained in Section 5.1.

4. OBSCURATION AND RENDERING

During rendering, our method allows interactive control of perceptual entropy by sliding a bar on the software
interface, and the system obscures and abstracts the image accordingly. Different objects are allowed to have
different entropy levels, which makes some areas of the image easier to understand than the others, leading
to the perceptual path effect as mentioned in Section 1. We will discuss more about this effect in Section 5.4.
In the rendering process, the parse tree, including the segmentation map, is the central representation. It

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:9

hue shift chroma shift shape

deformation

painterly

rendering
multiple
operators

Fig. 6. An illustration of the image obscuration and painterly rendering operators.

preserves the configuration of the scene, and allows us to propagate the contextual information between
nodes in order to estimate the ambiguity levels.

The main task of the rendering engine is to transfer the visual appearance of an input photograph into
the abstract painting style. According to vision research [Marr 1982], color, shape and texture are the key
features of an image for visual perception. Therefore, we transfer the visual appearance using two groups of
image processing operators: (i) image obscuration, which processes the color and shape of the input image,
and (ii) painterly rendering, which processes the texture. Fig. 6 illustrates these operators.
Image Obscuration. We first transfer the input image into the CIELCH color space, whose three channels

are lightness, chroma, and hue, respectively. It is a cylindrical form of the perceptually uniform CIELAB
color space. To obscure the color information of the image, random noises are added to hue, the color tone.
The noise follows a truncated Gaussian distribution whose standard deviation is positively related to the
desired ambiguity level. Since paintings are usually more saturated than photographs, a positive shift also
related to the ambiguity level (e.g., following a Gamma distribution with its location parameter proportional
to the ambiguity level) is added to the chroma channel to increase the saturation. To obscure the shape
information, an image region is warped using thin plate spline (TPS) transformation [Barrodale et al. 1993],
which is computed using coordinates of its boundary pixels as control points. The offsets of these boundary
points are randomly sampled (2D Gaussian, truncated) whose average distance is related to the specified
ambiguity level. To ensure smoothness for the warped image, a diagonal regularization term is added to the
kernel matrix of the TPS transformation.
Painterly Rendering. For the texture appearance of paintings, we adopt our earlier work on stroke-based

painterly rendering [Zeng et al. 2009; Zhao and Zhu 2010]. The layout and attributes of the brush strokes are
controlled by stochastic stroke processes [Zhao and Zhu 2011], whose parameters are related to the desired
ambiguity level. The entire rendering scheme can be viewed as a top-down hierarchical data generating
process. In a stochastic way, the rendering parameters are generated according to the desired ambiguity
levels, and they further generate the painting image. Using the above stochastic operations on color, shape,
and texture, we expect that the final ambiguity level of the rendered abstract painting should be significantly
larger than the original photograph, and we shall verify this through computation and human experiments.

5. COMPUTATION AND INTERACTIVE CONTROL OF PERCEPTUAL ENTROPY

We compute the actual ambiguity level of the rendered image and compare it with the desired value, in
order to ensure that we have achieved the expected effects. Otherwise, the image should be re-rendered with
(automatically) adjusted parameters.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:10 • M. Zhao and S.-C. Zhu

bike bird buildings bus deer dog

fish human lamp lighthouse mountain tree

Fig. 7. Example probabilistic voters from the LHI image dataset for computing the local evidence for object recognition.

Since visual perception involves both direct object recognition using visual features and indirect recognition
using contextual information [Oliva and Torralba 2007], we compute p(L|I) using a method that accounts
for both aspects. The probability of the category labels in the parse tree can be factorized according to

p(L|I) =
1

Z

∏
i∈V

φi(`i)
∏
〈i,j〉∈E

ψij(`i,`j) =
1

Z

∏
i∈V

p(`i|Ii)
∏
〈i,j〉∈E

f̃(`i,`j)

in which we assume each node is only correlated with its parent and children (i.e., the Markov property).

— The unary term φi(`i) = p(`i|Ii) is the posterior probability of object recognition for image region Ii
without the context (we call it local evidence).

— The binary term ψij(`i,`j) = f̃(`i,`j) models the contextual relations in terms of prior/empirical
pairwise frequencies between parent-child nodes.

In our implementation, we use non-parametric models for p(`i|Ii) and f̃(`i,`j), and compute them using
a large human annotated image dataset from LHI [Yao et al. 2007]. Details are explained in the following
sections.

5.1 Local Evidence

The computation of p(`i|Ii) is achieved using kernel density estimation [Duda et al. 2000], with a sample of
approximately N = 25, 000 kernels (probabilistic voters) from the LHI image dataset. Each voter 〈Jn,`n〉
has its image region Jn and category label `n. Fig. 7 displays a few example voters.

With the voters, the local evidence is computed with

p(`i|Ii) ∝
∑
n

exp{−λD(Ii,Jn)}1(`i = `n)

in which 1(·) is the indicator function, and λ is a rate parameter controlling the overall entropy level. The
logistic distance function

D(Ii,Jn) =
1

1 + exp{−β0 −
∑

j βj‖hj(Ii) −hj(Jn)‖2}

measures the dissimilarity between two image regions, in which each hj extracts a feature statistic in color,
shape, or texture channels of the image regions:

— For color, hj is a normalized 2D hue-chroma histogram with 32 blocks (8 sectors for hue and 4 levels
for chroma).

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:11

Iorii
Color

Texture

Shape

J3

J5

Ii

J4

J8

J6

J2

J1J7

render vote

vote

vote

vote

vote vote

Fig. 8. An illustation of our voting scheme for computing the local evidence in the space spanned by color, shape and texture

features.

— For shape, hj is a normalized 2D spatial histogram of image region boundary pixels with 8
2 = 64

blocks, assuming rough alignment according to the bounding box of each image region.

— For texture, hj are normalized histograms of sine and cosine Gabor filter responses over the image
region (2 types, 4 directions, 3 scales, and 16 bins each).

The β’s are pre-computed using logistic regression by setting D(Jn1,Jn2 ) = 1(`n1 6= `n2 ) for pairs of voters.
Since the number of pairs

(
N
2

)
is huge, we randomly take a small sample of 50, 000 pairs for regression.

The motivation of using a logistic distance is that it is difficult to define a reasonable metric distance
function between categories, for example, it is unclear whether “flower” is closer to “building” or “furniture.”
Instead, we usually only care about the two states of the distance function: (i) zero for the same category
and (ii) non-zero otherwise.

As we mentioned above, if the ground-truth categories are manually labeled for the parse tree, we can
compute p(`i|Ii) more accurately by including the original image region Iorii from the input photograph in
the group of voters. Usually, when the desired ambiguity level is not too high, the rendered image region
should be still quite similar to the original one in terms of D(Ii,I

ori
i ), then the original image region will

have a heavy voting weight and bring significant information gain [Cover and Thomas 2006] to p(`i|Ii).
Fig. 8 illustrates the idea of this voting scheme for computing the local evidence performed in the space

spanned by color, shape and texture features.

5.2 Contextual Information Propagation and Entropy Approximation

Even with all unary terms p(`i|Ii) and binary terms f̃(`i,`j) available, it is still infeasible to compute p(L|I)
for H(L|I) since the space of L is usually too huge to explore. For example, if there are K = 10 nodes of 40
possible categories, the space volume of L is |ΩL| = 40K ≈ 1016.

Fortunately, the parse tree is a singly connected network, and we can compute the most probable joint
configurations of its category labels efficiently using max-product message passing [Sy 1993]. Using Sy’s
algorithm, a user-specified number of the most probable configurations can be derived in the descendent order
of their probabilities. During this process, both the local evidence and compatibility terms are considered,
and messages are propagated between connected nodes in an iterative way to update local beliefs.

With the probabilities p(L|I) of the top-M (M � |ΩL|) most probable joint configurations, we can
approximate the entropy using

Ĥ(L)|I =
∑

top-M

−q(L|I) log q(L|I)

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:12 • M. Zhao and S.-C. Zhu

in which q(L|I) are renormalized from p(L|I) over the top-M configurations according to

q(L|I) =
p(L|I)∑

top-M p(L|I)
.

This approximation essentially drops all the rest configurations after the top-M, which is reasonable since
usually only very few configurations are possible due to the compatibility terms (e.g., a bus cannot be part
of a tree), and when we try to understand an abstract art, we can quickly eliminate configurations with too
low probabilities in early stages, and the ambiguity is only caused by those with relatively high probabilities
among the top-M. In practice, the choice of M is a balance between computational precision and cost.
According to our experiments, for K = 10 nodes, M = 100 ∼ 1000 should work fairly well.

5.3 Normalized Perceptual Entropy

For different images, the number of nodes and the space volume |ΩL| of L may vary. In order to have a
common measure of the ambiguity level, we use a normalized version of perceptual entropy defined as

H̃(L)|I =
Ĥ(L)|I
log M

∈ [0, 1].

This number is then compared with the desired ambiguity level the user specified before rendering.
According to the comparison, we determine whether the rendered painting image has the desired ambiguity

level (e.g., the difference between the computed and desired ambiguities is within � = 10%). If it does not,
the painting is re-rendered with adjusted parameters according to a negative feedback mechanism. Suppose
the desired ambiguity level is H̃∗0, and after the first rendering, the computed ambiguity is H̃1 (normalized
perceptual entropy), we will adjust the rendering parameters according to a virtual desired ambiguity level

H̃∗1 and re-do the rendering, in which

H̃∗1 =

{
(H̃∗0)2/H̃1, if H̃1 > H̃∗0 + �,
1 − (1 −H̃∗0)2/(1 −H̃1) if H̃1 < H̃∗0 − �.

If necessary, we continue to compute H̃2,H̃∗2,H̃3,H̃∗3, · · · ,H̃t,H̃∗t , · · · and repeat the rendering until H̃t is
close to H̃∗0. Due to the randomness involved in the process, the convergence of H̃ is not guaranteed. But in
practice, with a relatively generous difference threshold � (e.g., 10% to 20%), we can usually get close to the
desired level within a few iterations.

5.4 Perceptual Paths

Image understanding can be achieved through various top-down and bottom-up computing processes [Han
and Zhu 2009], during which the viewer recognizes the image contents in an order with the propagation of
contextual information. For example, in Fig. 1b, the mount is usually recognized in the first place, which
further helps the recognition of the highly abstracted trees and huts in the front. In this article, we call it
the perceptual path effect, and control the path/order by assigning different ambiguity levels to different
objects, letting the viewer recognize less obscured objects before more heavily obscured ones, as illustrated
in Fig. 9. In Fig. 9b we set lower ambiguity level to the street than to the buildings, while for Fig. 9c we set
the levels in the opposite way. We can predict the perceptual paths as shown in Figs. 9d and 9e by simulating
a greedy information propagation process. We first identify the node with the lowest ambiguity level, and
take its most probable label as our interpretation. Then we propagate this information to the other nodes,
and identify the one with the lowest ambiguity among them. This process continues until we have reached
all the nodes. Here is the detailed algorithm:

(1) We represent a perceptual path as a sequence of the nodes in the parse tree, denoted by S.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:13

(a) Photograph (b) Painting A (c) Painting B

5

streetscape

4 sky6

buildings

2

street

1

pa
ve

me
nt

3

po
les

9

ca
rs

10

hu
ma

ns

7

wa
ll

8

wi
nd

ow
s

12

lam
ps

11

fla
gs

(d) Perceptual Path A

5

streetscape

4 sky1

buildings

6

street

9

pa
ve

me
nt

7

po
les

8

ca
rs

10

hu
ma

ns

2

wa
ll

3

wi
nd

ow
s

12

lam
ps

11

fla
gs

(e) Perceptual Path B

Fig. 9. A streetscape photograph (courtesy of public-domain-image.com) and its two abstract paintings rendered by setting

different ambiguity levels for different objects, to simulate the perceptual path effect. Zoom to 400% to view details. Their
predicted different perceptual paths using the algorithm described in Section 5.4 are displayed in (d) and (e), respectively. The

numbers indicate the sequences of the nodes in the paths. The arrows indicate the propagation of contextual information (red:

bottom-up, blue: top-down).

(2) According to our previous work [Zhao and Zhu 2010], we are able to compute the marginal probabilities
p(`i|I) and entropies H(`i)|I of the nodes given the whole image (with contextual information), using
sum-product belief propagation [Yedidia et al. 2001]. The first recognized node is

s1 = arg min
i∈V
H(`i)|I

with the lowest entropy, and we push s1 into S.

(3) We fix the category label of node s1 to

`∗s1 = arg max`s1
p(`s1|I),

which essentially sets the entropy of `s1 to zero. Then with this new information, we redo the belief
propagation for the rest nodes to compute their probabilities p1(`i|I) and entropies H1(`i)|I.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:14 • M. Zhao and S.-C. Zhu

(a) H̃≈ 0.25 (b) H̃≈ 0.5 (c) H̃≈ 0.75

Fig. 10. Three abstract paintings of different ambiguity levels rendered using our method, corresponding to the example in

Fig. 5. Zoom to 600% to view details.

(4) We compute the second recognized node s2 and its label `
∗
s2

in similar ways to steps (2) and (3),

s2 = arg min
i∈V\s1

H1(`i)|I

`∗s2 = arg max`s2
p1(`s2|I),

and push s2 into S.

(5) We continue the above steps to sequentially figure out the labels of the other nodes,

s2,`
∗
s2
→ p2,H2 → s3,`∗s3 → p3,H3 →···→ sK,`

∗
sK
,

until we have reached all the nodes. Now we have obtained the whole sequence S.

During the process of fixing the nodes’ category labels `i in a sequence, the number of unknown labels
decreases and thus the perceptual entropy of the parse tree decreases. The entropy reaches zero when all
labels are fixed. Actually, the above algorithm starting from the lowest marginal entropies is a greedy method
to minimize the overall ambiguity eliminated by fixing labels

E(S) =

K∑
i=1

Hi−1(`si )|I

with H0(`i)|I = H(`i)|I. In contrast, if the sequence S starts from nodes with high entropies, E(S) tends to
be higher. Considering that the elimination of ambiguity is associated with mental efforts to make decisions,
the greedy process described above minimizes the effort in interpreting the image.

6. RENDERING RESULTS

Using the pipeline introduced above, we have rendered many abstract painting images from photographs
collected from web search engines.

Fig. 10 displays three abstract paintings corresponding to the example in Fig. 5, with H̃ at approximately
0.25, 0.5, and 0.75, respectively. We can see that as the ambiguity level increases, color and shape in the
images become more heavily obscured, making both the sailboat and the background objects more difficult
to recognize. Note that your perception of the three paintings should have already been affected by seeing
the source photograph beforehand. In fact, knowing the image contents may also make it difficult for an
artist to assess the ambiguity level during the creation of abstract arts. Being able to numerically compute
the perceptual entropy in object recognition, our program is helpful in this problem.

Fig. 11 displays a photograph of UCLA Royce Hall, and its two corresponding abstract paintings rendered
using our method, with H̃ at approximately 0.25 and 0.75, respectively. For this example, we have segmented
the image into five objects: sky, building, trees, grass, and road. Note that in both paintings, the road is
almost impossible to recognize without the context.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:15

(a) Photograph H̃≈ 0 (b) Painting H̃≈ 0.25 (c) Painting H̃≈ 0.75

Fig. 11. A photograph of UCLA Royce Hall, and two corresponding abstract paintings of different ambiguity levels rendered

using our method. Zoom to 600% to view details.

(d) Photograph H̃≈ 0 (e) Painting H̃≈ 0.25 (f) Painting H̃≈ 0.75

Fig. 12. Promenade Morecambe (photograph courtesy of Tom Curtis / FreeDigitalPhotos.net). Zoom to 600% to view details.

Fig. 13. More abstract paintings rendered using our method. Zoom to 600% to view details.

The exampled displayed in Fig. 12 includes many object categories: sky, mountain, sand, water surface,
human, bench, etc. Most of these objects are heavily obscured, but the rendered paintings for this landscape
are still clear enough for appreciation.

Fig. 13 shows a few more abstract paintings rendered using our method, whose ambiguity levels are between
0.25 and 0.75 for all examples according to our computation.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:16 • M. Zhao and S.-C. Zhu

Fig. 14. Object category confusion matrices for our rendered abstract paintings (left) and their original photographs (right).

The horizontal axis stands for reported categories and the vertical axis stands for true categories. The darkness of each grid
is proportional to its corresponding frequency of subjects’ reports. The rightmost column of each matrix stands for either the

“none of these categories” report, or failed attempts of recognition within the limited time span (one minute per image).

0

10

5

R
e
sp

o
n

se
T

im
e

(i
n

se
c
o
n

d
s)

rock/stone/reef ship/boat dog flowers buildings

rendered paintings photographs outliers

Fig. 15. Box plots of response time for object recognition in our rendered abstract paintings and their corresponding source
photographs. Two outliers greater than 11 seconds are not shown (rendered painting of rock/stone/reef: 11.8s, and photograph

of flowers: 15.3s).

7. HUMAN EXPERIMENTS ON THE LEVELS OF PERCEPTUAL AMBIGUITY: PART TWO

In addition to the rendering pipeline and entropy computing method introduced above, we would like to
further verify that the rendered abstract paintings do have our expected ambiguity effects similar to those
of original paintings by artists. We do this with another set of human experiments comparing our rendered
paintings with their source photographs. The computed ambiguity levels of these rendered abstract paintings
are between 0.25 and 0.75.

Most experimental settings remain the same as in Section 2. We selected approximately 100 photographs,
and have them segmented and their parse trees constructed manually. Then we render them into abstract
paintings of different desired ambiguity levels, and ask 15 human subjects (from the 20 in Section 2) to
recognize the objects (highlighted using bounding boxes) in these images. The recognition accuracy and
speed of the subjects are recorded.
Recognition Accuracy. Fig. 14 displays the confusion matrices for object recognition in our rendered ab-

stract paintings and corresponding source photographs. Still, the horizontal and vertical axes stand for
reported and true categories, respectively. The two matrices show that subjects generally have lower recogni-
tion accuracy for our rendered paintings than photographs. Comparing these matrices to those in Fig. 3, we

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:17

Table III. Paired one-sided t-tests comparing the response time for object recognition
in rendered paintings and their source photographs, corresponding to Fig. 15.

rock/stone/reef ship/boat dog flowers buildings

t-score 3.91 2.532 4.2238 2.0427 2.5585

p-value 0.0007852 0.01197 0.0004252 0.03019 0.01137

can see our rendered images have similar ambiguity effects against photographs, to those of original paintings
by artists, with diagonal elements still having the highest frequencies in most cases.
Response Time. Fig. 15 displays a few box plots of the recorded response time for object recognition in

our rendered abstract paintings and their corresponding photographs, in which five categories are included
as examples. Its corresponding t-test results are shown in Table III. We can see similar statistics in response
time to those in Fig. 4 and Table II.

8. HUMAN EXPERIMENTS ON THE EFFECT OF PERCEPTUAL PATHS

Besides the global ambiguities of images, we also use human experiments to verify the effect of perceptual
paths introduced in Section 5.4. The perceptual paths have higher dimensions than perceptual entropies and
are more difficult to observe. In the literature of perception, researchers have used eye-tracking techniques
to studies the paths of viewers’ attentions across images [DeCarlo and Santella 2002], but this is not suitable
for our case since attention and semantic understanding are very different phases in vision [Marr 1982].

As a simplified investigation, we set up a verbal experiment to extract the rough order in which objects
in an image are recognized. We select 12 human subjects, and randomly divide them into two groups of
6 people. Figs. 9b and 9c are presented to the two groups, respectively, on a 23-inch color monitor. Each
subject view the presented image 10 times in limited time spans with increasing lengths (100ms, 200ms,
500ms, 1s, 2s, 5s, 10s, 20s, 30s, and 60s). After each time span, the image disappears and the subject is
asked to describe the scene and objects he/she recognizes in free language. During the process, the subject
is instructed to try his/her best to revise previous reports with additional or corrected information. The
subject is also allowed to describe his/her recognition before the image disappears if a time span is long
enough (e.g., 30s or 60s).

We focus on the six main objects in the picture: buildings, street, windows, poles, cars, and humans. Raw
results of this experiment are visualized in Figs. 16a and 16b, in which each dot represents an instance of a
subject for the first time correctly reporting an object during the corresponding time span. In the reports,
words with similar meanings to the ground truth are considered valid (e.g., pedestrians and humans). Due
to perceptual ambiguities, some objects are not correctly recognized and reported, so the number of dots in
each row may be less than the number of subjects. In the two plots, we can see clearly different patterns.
In general, “buildings” and “windows” are recognized significantly later in Fig. 16a than in Fig. 16b, which
matches Figs. 9d and 9e, and the object “cars” is more difficult to recognize correctly in Fig. 16b, expectedly
due to weaker contextual information from the “street” node.

To look at the perceptual paths of individual subjects, in Figs. 16c and 16d, we summarize the consistency
between their reports and our predictions. In these two plots, each percentage represents the rate of consistent
reports to the pairwise order between corresponding row and column objects in the predicted perceptual
path. Due to unidentifiability, pairs recognized and reported during the same time span are considered half
consistent and half inconsistent. Considering 50% as a baseline, 12/15 of the results in Fig. 16c, and 11/15
in Fig. 16d, are positive (greater than or equal to the baseline). There are two strongly inconsistent (0%)
pairs in Fig. 16d: (i) poles vs. cars and (ii) windows vs. cars. The former is understandable since they are
adjacent in the recognition sequence and their pairwise order is weak. The latter objects are in different
branches of the parse tree. It is possible that after recognizing “buildings,” the bottom-up process prevails

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:18 • M. Zhao and S.-C. Zhu

Time (s)

Object

0.1 0.2 0.5 1 2 5 10 20 30 60

buildings « « « « « «

street — « «

windows – « «

poles « « « « »

cars « » « « «

humans » » «

(a)

Time (s)

Object

0.1 0.2 0.5 1 2 5 10 20 30 60

buildings — « «

street �

windows » « « « «

poles « « » «

cars «

humans « » »

(b)

humans

cars

poles

windows

street

buildings

street

windows

poles

cars

100% 100% 16.7% 90% 83.3%

100% 100% 100% 100%

50% 20% 90%

100% 25%

100%

(c)

humans

cars

poles

windows

street

buildings

street

windows

poles

cars

100% 50% 100% 58.3% 30%

100% 50% 100% 20%

100% 0% 60%

100% 0%

100%

(d)

Fig. 16. Human experiments on the effect of perceptual paths of our rendered abstract paintings. (a) and (c) correspond to

Fig. 9b, and (b) and (d) correspond to Fig. 9c. In (a) and (b), each dot represents an instance of a subject for the first time

correctly reporting a recognized object during the corresponding time span. In (c) and (d), each percentage represents the rate
of consistent reports to the pairwise order between corresponding row and column objects in the predicted perceptual path.

over the top-down processes, for exploring other regions of the image instead of looking into details within
the region. The pursuit criterion in Section 5.4 needs improvement to address this issue.

Although this experiment cannot directly capture the propagation/flow of contextual information shown
in Fig. 9 from inside the minds of the subjects, the orders of object recognition reflected by Fig. 16 mostly
agree with the predicted paths, which partially supports our explanation of the perceptual path effect.

9. CONCLUSION

In this article, we have presented both human and computerized studies on a type of abstract arts, which
obscures the shapes and/or appearances of objects in images, and preserves global scene structures. Our
studies are based on the hypothesis that abstract arts usually have higher ambiguity levels than photographs
and representational arts, which fundamentally differs from most previous work on abstract art in NPR,
image analysis, and perception. After verifying this hypothesis using human experiments, we defined the
perceptual entropy as a numerical measure for the level of perceptual ambiguity, and proposed a method
for the rendering of abstract paintings that is capable of controlling the entropy to the user desired levels.
By assigning different ambiguity levels to different image regions, we may predict and thus roughly control
the perceptual paths in whose orders viewers are most likely to understand the image contents. We have
also examined the ambiguity levels and perceptual paths of our rendered abstract paintings using human
experiments, and showed that they have achieved our expected effects.

This article extends our previous work on abstract painting [Zhao and Zhu 2010] with improved algorithms
and more comprehensive human experiments. For future research, there are a few directions in which we
look forward to further exploring the proposed framework.

— To make our method more general, it is necessary to conduct further human and computerized studies
on more abstract art styles, including those freeing image information not only in the object level, but also

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


Abstract Painting with Interactive Control of Perceptual Entropy • 0:19

in the scene level to the high end (e.g., Pablo Picasso’s paintings), or in the local statistics level to the low
end (e.g., Jackson Pollock’s paintings).

— On the rendering aspect, our current method can be improved by integrating with better semantics-
based painterly rendering algorithms for frequently-depicted object categories in paintings, and more artistic
ways of image obscuration (e.g., feature exaggeration [Gooch et al. 2004]).

— We look forward to better algorithms for predicting the perceptual paths, for example, by considering
bottom-up and top-down processes differently, or having different weights for image regions of different sizes.

— We may use the rendered abstract art as testing images to study human perception and attention
mechanism [Gooch et al. 2004; Wallraven et al. 2007; Redmond and Dingliana 2009], for example, by ex-
tending the human experiments presented in this article, or by other techniques such as recording the eye
saccades and fixations.

ACKNOWLEDGMENTS

We would like to thank our colleagues at UCLA and LHI for their participation in the experiments, and the
anonymous reviewers for their suggestions on improving the presentation of this article.

REFERENCES

Arnheim, R. 1971. Entropy and Art: An Essay on Disorder and Order. University of California Press, Ltd.

Barrodale, I., Skea, D., Berkley, M., Kuwahara, R., and Poeckert, R. 1993. Warping digital images using thin plate
splines. Pattern Recogn. 26, 2, 375–376.

Berlyne, D. E. 1971. Aesthetics and Psychobiology. Appleton-Century-Crofts, Inc.

Collomosse, J. P. and Hall, P. M. 2003. Cubist style rendering of photographs. IEEE Trans. Vis. Comput. Graph. 9, 4,

443–453.

Cover, T. M. and Thomas, J. A. 2006. Elements of Information Theory 2nd Ed. Wiley-Interscience.

DeCarlo, D. and Santella, A. 2002. Stylization and abstraction of photographs. In Proceedings of the 29th Annual Conference

on Computer Graphics and Interactive Techniques (SIGGRAPH ’02). 769–776.

Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification 2nd Ed. Wiley-Interscience.

Finkelstein, A. and Range, M. 1998. Image mosaics. In Proceedings of the 7th International Conference on Electronic

Publishing (EP/RIDT ’98). 11–22.

Funch, B. S. 1997. The Psychology of Art Appreciation. Museum Tusculanum Press.

Gooch, B. and Gooch, A. A. 2001. Non-Photorealistic Rendering. A K Peters, Ltd.

Gooch, B., Reinhard, E., and Gooch, A. 2004. Human facial illustrations: Creation and psychophysical evaluation. ACM

Trans. Graph. 23, 1, 27–44.

Haeberli, P. 1990. Paint by numbers: Abstract image representations. In Computer Graphics (Proceedings of SIGGRAPH

’90). 207–214.

Han, F. and Zhu, S.-C. 2009. Bottom-up/top-down image parsing with attribute grammar. IEEE Trans. Pattern Anal. Mach.

Intell. 31, 1, 59–73.

Hertzmann, A. 1998. Painterly rendering with curved brush strokes of multiple sizes. In Proceedings of the 25th Annual
Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’98). 453–460.

Hertzmann, A. 2010. Non-photorealistic rendering and the science of art. In Proceedings of the 8th International Symposium
on Non-Photorealistic Animation and Rendering (NPAR ’10). 147–157.

Hughes, J. M., Graham, D. J., and Rockmore, D. N. 2010. Quantification of artistic style through sparse coding analysis in

the drawings of Pieter Bruegel the Elder. PNAS 107, 4, 1279–1283.

Jones-Smith, K. and Mathur, H. 2006. Fractal analysis: Revisiting Pollock’s drip paintings. Nature 444, E9–E10.

Kersten, D. 1987. Predictability and redundancy of natural images. J. Opt. Soc. Am. A 4, 12, 2395–2400.

Konečni, V. J. 1978. Daniel E. Berlyne: 1924–1976. Am. J. Psychol. 91, 1, 133–137.

Kyprianidis, J. E. 2011. Image and video abstraction by multi-scale anisotropic kuwahara filtering. In Proceedings of the 9th

International Symposium on Non-Photorealistic Animation and Rendering (NPAR ’11). 55–64.

Lee, S., Olsen, S. C., and Gooch, B. 2006. Interactive 3D fluid jet painting. In Proceedings of the 4th International Symposium
on Non-Photorealistic Animation and Rendering (NPAR ’06). 97–104.

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


0:20 • M. Zhao and S.-C. Zhu

Lindsay, K. C. and Vergo, P. 1994. Kandinsky: The Complete Writings on Art. Da Capo Press.

Litwinowicz, P. 1997. Processing images and video for an impressionist effect. In Proceedings of the 24th Annual Conference
on Computer Graphics and Interactive Techniques (SIGGRAPH ’97). 407–414.

Lombaert, H., Sun, Y., Grady, L., and Xu, C. 2005. A multilevel banded graph cuts method for fast image segmentation.

In Proceedings of the 2005 International Conference on Computer Vision (ICCV ’05), Volume 1. 259–265.

Marr, D. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information.
W. H. Freeman.

Meier, B. J. 1996. Painterly rendering for animation. In Proceedings of the 23rd Annual Conference on Computer Graphics

and Interactive Techniques (SIGGRAPH ’96). 477–484.

Mi, X., DeCarlo, D., and Stone, M. 2009. Abstraction of 2d shapes in terms of parts. In Proceedings of the 7th International
Symposium on Non-Photorealistic Animation and Rendering (NPAR ’09). 15–24.

Morel, J.-M., Alvarez, L., Galerne, B., and Gousseau, Y. 2006. Texture synthesis by abstract painting technique.

http://www.cmla.ens-cachan.fr/membres/morel.html.

Mureika, J. R., Dyer, C. C., and Cupchik, G. C. 2005. On multifractal structure in non-representational art. Phys. Rev.

E 72, 046101.

Oliva, A. and Torralba, A. 2007. The role of context in object recognition. Trends Cogn. Sci. 11, 12, 520–527.

Olsen, S. and Gooch, B. 2011. Image simplification and vectorization. In Proceedings of the 9th International Symposium on

Non-Photorealistic Animation and Rendering (NPAR ’11). NPAR ’11. 65–74.

Orchard, J. and Kaplan, C. S. 2008. Cut-out image mosaics. In Proceedings of the 6th International Symposium on

Non-Photorealistic Animation and Rendering (NPAR 2008). 79–87.

Orzan, A., Bousseau, A., Barla, P., and Thollot, J. 2007. Structure-preserving manipulation of photographs. In Proceedings

of the 5th International Symposium on Non-Photorealistic Animation and Rendering (NPAR ’07). 103–110.

Redmond, N. and Dingliana, J. 2009. Investigating the effect of real-time stylisation techniques on user task performance. In

Proceedings of the 6th Symposium on Applied Perception in Graphics and Visualization (APGV ’09). 121–124.

Rigau, J., Feixas, M., and Sbert, M. 2008. Informational aesthetics measures. IEEE Comput. Graph. and Appl. 28, 2, 24–34.

Strothotte, T. and Schlechtweg, S. 2002. Non-Photorealistic Computer Graphics: Modeling, Rendering and Animation.

Morgan Kaufmann.

Sy, B. K. 1993. A recurrence local computation approach towards ordering composite beliefs in bayesian belief networks. Int.

J. Approx. Reason. 8, 17–50.

Taylor, R. P., Guzman, R., Martin, T. P., Hall, G. D. R., Micolich, A. P., Jonas, D., Scannell, B. C., Fairbanks,
M. S., and Marlow, C. A. 2007. Authenticating Pollock paintings using fractal geometry. Pattern Recogn. Lett. 28, 6,

695–702.

Tu, Z., Chen, X., Yuille, A. L., and Zhu, S.-C. 2005. Image parsing: Unifying segmentation, detection, and recognition. Int.
J. Comput. Vis. 63, 2, 113–140.

Wallraven, C., Bülthoff, H. H., Cunningham, D. W., Fischer, J., and Bartz, D. 2007. Evaluation of real-world and

computer-generated stylized facial expressions. ACM Trans. Appl. Percept. 4, 3, 16:1–16:24.

Wallraven, C., Fleming, R., Cunningham, D., Rigau, J., Feixas, M., and Sbert, M. 2009. Categorizing art: Comparing
humans and computers. Comput. Graph. 33, 4, 484–495.

Yao, B., Yang, X., and Zhu, S.-C. 2007. Introduction to a large-scale general purpose ground truth database: Methodology,

annotation tool and benchmarks. In Proceedings of the International Conferences on Energy Minimization Methods in

Computer Vision and Pattern Recognition (EMMCVPR ’07). 169–183.

Yedidia, J. S., Freeman, W. T., and Weiss, Y. 2001. Understanding belief propagation and its generalizations. IJCAI 2001
Distinguished Lecture Track.

Yevin, I. 2006. Ambiguity in art. Complexus 2006, 3, 74–83.

Zeng, K., Zhao, M., Xiong, C., and Zhu, S.-C. 2009. From image parsing to painterly rendering. ACM Trans. Graph. 29, 1,

2:1–2:11.

Zhao, M. and Zhu, S.-C. 2010. Sisley the abstract painter. In Proceedings of the 8th International Symposium on Non-
Photorealistic Animation and Rendering (NPAR ’10). 99–107.

Zhao, M. and Zhu, S.-C. 2011. Customizing painterly rendering styles using stroke processes. In Proceedings of the 9th

International Symposium on Non-Photorealistic Animation and Rendering (NPAR ’11). 137–146.

Received January 2012; revised July 2012; accepted Month YYYY

ACM Transactions on Applied Perception, Vol. 0, No. 0, Article 0, Publication date: July 2012.


	Introduction
	Related Work
	Our Contributions

	Human Experiments on the Levels of Perceptual Ambiguity: Part One
	Image Understanding and Perceptual Ambiguity in a Computational Perspective
	Parse Tree and Entropy
	Constructing the Parse Tree

	Obscuration and Rendering
	Computation and Interactive Control of Perceptual Entropy
	Local Evidence
	Contextual Information Propagation and Entropy Approximation
	Normalized Perceptual Entropy
	Perceptual Paths

	Rendering Results
	Human Experiments on the Levels of Perceptual Ambiguity: Part Two
	Human Experiments on the Effect of Perceptual Paths
	Conclusion