Perception granular computing in visual haze-free task Expert Systems with Applications 41 (2014) 2729–2741 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Perception granular computing in visual haze-free task 0957-4174/$ - see front matter Crown Copyright � 2013 Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.11.006 ⇑ Corresponding author. Tel.: +86 01062600505. E-mail addresses: huhong@ict.ac.cn (H. Hu), pangl@ics.ict.ac.cn (L. Pang), tiandp@ics.ict.ac.cn (D. Tian), shizz@ics.ict.ac.cn (Z. Shi). Hong Hu, Liang Pang ⇑, Dongping Tian, Zhongzhi Shi Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100080, China a r t i c l e i n f o a b s t r a c t Keywords: Granular computing Leveled granular system Fuzzy logic Machine learning Haze free Brain-like computer In the past decade, granular computing (GrC) has been an active topic of research in machine learning and computer vision. However, the granularity division is itself an open and complex problem. Deep learning, at the same time, has been proposed by Geoffrey Hinton, which simulates the hierarchical structure of human brain, processes data from lower level to higher level and gradually composes more and more semantic concepts. The information similarity, proximity and functionality constitute the key points in the original insight of granular computing proposed by Zadeh. Many GrC researches are based on the equivalence relation or the more general tolerance relation, either of which can be described by some dis- tance functions. The information similarity and proximity depended on the samples distribution can be easily described by the fuzzy logic. From this point of view, GrC can be considered as a set of fuzzy logical formulas, which is geometrically defined as a layered framework in a multi-scale granular system. The necessity of such kind multi-scale layered granular system can be supported by the columnar organiza- tion of the neocortex. So the granular system proposed in this paper can be viewed as a new explanation of deep learning that simulates the hierarchical structure of human brain. In view of this, a novel learning approach, which combines fuzzy logical designing with machine learning, is proposed in this paper to construct a GrC system to explore a novel direction for deep learning. Unlike those previous works on the theoretical framework of GrC, our granular system is abstracted from brain science and information science, so it can be used to guide the research of image processing and pattern recognition. Finally, we take the task of haze-free as an example to demonstrate that our multi-scale GrC has high ability to increase the texture information entropy and improve the effect of haze-removing. Crown Copyright � 2013 Published by Elsevier Ltd. All rights reserved. 1. Introduction Lin (2012) pointed out that Granulation seems to be a natural methodology deeply rooted in human thinking. Many daily things are routinely granulated into sub-things (Lin, 2012). In the IEEE- GrC2006 conference of information about the granular computing (GrC), the outline of GrC is defined as a general computation theory for effectively using granules such as classes, clusters, subsets, groups and intervals to build an efficient computational model for complex applications with huge amounts of data, information and knowledge (Zadeh, 1997). Just as the scholars summarized in the IEEE-GrC2006 conference though the label is relatively recent, the basic notions and principles of GrC, though under different names, have appeared in many related fields, such as information hiding in programming, granularity in artificial intelligence, divide and conquer in theoretical computer science, interval computing, cluster analysis, fuzzy and rough set theories, neutrosophic com- puting, quotient space theory, belief functions, machine learning, databases and many others (Bargiela & Pedrycz, 2006). The above definition of GrC is too augmental and the subjects about classes, clusters, subsets, groups and intervals have already studied by arti- ficial intelligence and mathematics for a long time. What is really new point for GrC? We think that the new or main point of the GrC lies in the original insight of GrC proposed by Zadeh, in which there are three basic concepts that underlie human cognition: granulation, organization and causation. Informally, granulation involves decomposition of whole into parts; organization involves integration of parts into whole; and causation involves association of causes with effects. Granulation of an object A leads to a collection of granules of A, with a granule being a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality (Zadeh, 1997). In this original insight of GrC, Zadel pointed out three important aspects about GrC: (1) the GrC is a main character of human cognition, (2) so called GrC is based on indistinguishability, similarity, proximity or functional- ity, (3) there is a close relationship among granulation, organiza- tion and causation. Based on these points, we think that it is necessary to find some key points of GrC in human cognition. There are two kinds of GrC research: perception-level and knowledge-level. A perception-level GrC does a series feature transformation and tries to find meta-knowledge implied in sam- ples; a knowledge GrC tries to process knowledge or structure http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2013.11.006&domain=pdf http://dx.doi.org/10.1016/j.eswa.2013.11.006 mailto:huhong@ict.ac.cn mailto:pangl@ics.ict.ac.cn mailto:tiandp@ics.ict.ac.cn mailto:shizz@ics.ict.ac.cn http://dx.doi.org/10.1016/j.eswa.2013.11.006 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa 2730 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 information based on metal-knowledge. In this paper we focus on the perception-level. Indistinguishability, similarity and proximity can be described by equivalence relation or tolerance relation, and these relations can be described by some kind distance functions. From the rele- vant literature, it is easy to see that many GrC researches focus on classification and clustering (Yao, 2000, 2001; Zhang & Zhang, 2003, 2004a). Zhang and Zhang (2003, 2004a, 2004b, 2005) use the quotient space theory to try to study indistinguishability and similarity. Yao (2001) extends the equivalent class to rough approximation set. The quotient space structure described by equivalence relation is used to probe the structure of granules such as classes, clusters, subsets, groups etc. In a more general way, Lin (1998, 2007) and Yao (1998, 1999) use binary relations and neigh- borhood systems to study indistinguishability and similarity respectively, the geometric concepts: partitions, covering and topology, and neighborhood can be described by binary relations in the algebra. Pedrycz, Hirota, Pedrycz, and Dong (2012) define granular on fuzzy sets and discuss several operations and their granular consistency (Pedrycz et al., 2012). Lin (2012) gives out a summary about the history of granular computing, he discusses all formal description of granular computing and some further directions e.g. GrC, databases and data mining, GrC and clouding computing etc. In fact, GrC should be discussed in the framework of human cognition from perception to pattern recognition and knowledge processing. Although the concept of granular comput- ing has been proposed more than ten years, only few people pay attention to this subject. In fact granular computing pays much more attention to the leveled computing of intelligence. Just as Yao (2006) pointed out: ‘‘Granules in the family are called focal elements of discussion at the level. Each level is represented by a plane. While granules at the same level are of similar nature, gran- ules at different levels may be very different. Consequently, we may use different vocabularies and languages for descriptions at different levels.’’ The leveled computation revealed by granular computing is very important for machine learning, e.g. the famous approach deep learning. The term deep learning gained much attraction in the mid-2000s after a publication by Bargiela and Pedrycz (2006), Castro (1995). Nowadays it becomes a huge wave of technology trend for big data and artificial intelligence. Deep learning simulates the hierarchical structure of human brain, pro- cesses data from lower level to higher level, and gradually com- poses more and more semantic concepts. These facts mean that deep learning has a close relationship with the granular computing. All the granular computing researches aforementioned, in gen- eral, neglect information transformation and feature abstraction, which are very important for deep learning. In this paper, we pro- pose a novel framework of granular system, which has ability to process information transformation and object-background sepa- ration. We take the haze-free task as an example to validate the ability of our granular system. These facts mean that the deep learning has a close relationship with the granular computing proposed by us. The main contributions of this paper include: (1) The basic notions and principles of GrC termed with differ- ent names, but they have appeared in many related fields, such as information hiding in programming, granularity in artificial intelligence, divide and conquer in theoretical com- puter science, interval computing, cluster analysis, fuzzy and rough set theories, neurotrophic computing, quotient space theory, belief functions, machine learning, databases, and many others, so old version of Granular computing is just an abstraction of old methods, in this paper we give a novel concrete model for granular computing which has a multi scale layered structure from feature abstraction to classification. (2) Our novel granular system has a close relation with deep learning, so it develops a new focus for deep learning. It is the first time that fuzzy logic is introduced for leveled fea- ture abstraction in deep learning. (3) Although fussy logic is often mentioned in granular comput- ing, for example,fuzzy logic and rough set technique are used by Lin (1999) for word computing, and Liu, Xiong, and Wu (2012) use fuzzy lattices in the classification based on hyperspherical granular computing (Liu et al., 2012). In this paper, fuzzy logic is not only used for describing granu- lar similar to hyperspherical granular in Liu et al. (2012), but also for feature abstraction and classification. For this pur- pose, we propose a novel and effective approach which is combined fuzzy logical designing, PSVM and back propagation. (4) The granular computing proposed by us gives a novel approach for the task of haze-free, the experiments’ result show that this approach is sound. The rest of this paper is organized as follows: In Section 2,we try to give out a formal definition of granular system and granular computing based on the a tolerance relation, which is described by fuzzy logical formula; in Section 3, we discuss the algorithm to design a granular system; in Section 4, we give out a concrete example of designing a granular system for haze-free task; at last Section 5 is the discussion and looking forward to the future. 2. Granular system based on tolerance relation The difference between a granular system based on equivalence relation and a granular system based on tolerance relation is that an equivalence relation will divide a space into nonoverlapping covering while a tolerance relation will create overlapping cover- ing of this space. Yiyu Yao proposed granular computing paradigm for concept learning in which two learning strategies are investigated. A global attribute-oriented strategy searches for a good partition of a uni- verse of objects and a local attribute–value-oriented strategy searches for a good covering (Yao & Deng, 2013). In this paper, granular computing is started from feature vectors e.g. images, not attributes. In order to simulation perception procession of our cognition, we define a set of multi-scale nested convex regions with a corresponding computing based on this set. There are two main purposes to build such a granular system based on tolerance relation: (1) Granular systems are designed to describe similarity and proximity of information, which can be described by toler- ance relation. Granular systems based on tolerance relation can be viewed as a topological structure built by topological bases on the topological space ðX; sÞ induced from a metric space ðX; disÞ by the metric dis. Granular systems based on tolerance relation can be used to describe domain stricture, which represents indistinguishability, similarity and prox- imity of examples.Classification is determined by the indis- tinguishability, similarity and proximity of information. There are two kind similarity among examples-static simi- larity and dynamical similarity. (a) If elements of classes are distributed in standard convex regions, we can use some kind distance function to describe classes distribution domains. In this case, simi- larity between two objects can be intuitively described by distance functions. If disðx; yÞ is a distance function H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2731 in the n-dimensional space Rn and c is a point in Rn , the formula disðc; yÞ < r described a convex region D in Rn which takes c as its center. Every point y in this region is equal to y ¼ c þ e, here e is some kind noise, if D is just a ball, e can be viewed as white noise which has an amplitude less than r.We denote such kind similarity as static similarity. (b) The dynamical similarity is different from static similar- ity, if one object O1 continuously changes to another object O2, e.g. a tadpole continuously grow up to a frog, then O1 and O2 are dynamical similar, i.e. if all elements in a class A are dynamical similar, the distribution domain of A is a connected domain. Dynamical similarity will cause distribution domain become very complicate and have a nonlinear borderline. Although,a dynamical similarity may cause the inner class difference larger than the among classes difference, it can also be described by equivalence relation or tolerance relation. The difference of a granular system based equivalence relation and granular system based on tolerance relation is that an equivalence relation will divide a space into non overlapping covering and a tolerance relation will create overlapping covering of this space. The relation described by the formula disðx; yÞ < r is the special case of a tolerance relation. (2) The main purpose of information transformation in pat- tern recognition is to recognize or classify different objects from their mixture, so information transformation used in pattern recognition should be taken place in a granular system which describes static similarity and dynamical similarity. This is the main point we will dis- cuss in this paper. Now we try to use fuzzy logical formula based on distance func- tion to define granular systems. There are three distance axioms: ð1Þdisða;aÞ¼0;ð2Þdisða;bÞ¼disðb;aÞ;ð3Þdisða;bÞþdisðb;cÞP disða;cÞ. disðx;yÞ r (strong r-cut set) or sprða; cjdis; d; xÞ¼ spða; cjdis; d; xÞ P r (r-cut set). sprðx; cjdis; d; xÞ defines an open convex region and denoted as a granule, and sprðx; cjdis; d; xÞ defines a closed convex region. Definition 2 (leveled perception granular system based on tolerance relation Gsys). A granular system based on tolerance relations of distance function (granular system for short) is a set of granules, every granule is a 2-tuple fg; SFg, here g is a convex region which is described by tolerance relations of distance function, and SF is a set of fuzzy logical functions (denoted as ‘‘adjoint functions’’) which are computed from the convex region g, the outputs of all fuzzy logical functions in SF are denoted as ‘‘an adjoint vector’’ of this granule. The granules of Gsys have the following attributions. (1) Multi-scale leveled structure: The metric space X (e.g. a finite connected volume region in the n-dimensional real space Rn) is the only level 0 granule, the level 0 granule is denoted as GðcoeG0Þ¼ fX; SFg, where coeG0 is a coefficient set, coeG0 is usually empty, and the function SF is a set of fuzzy logical functions (denoted as adjoint functions). The convex region of a level 1 granule GðcoeG1Þ¼ fg1; SF1g is defined by the conjunction of finite number r-cut sets spr a; c11jdis 1 ; d11; x � � ; spr a; c12jdis 1 ; d12; x � � ; . . . ; spr a; c1k1jdis 1 ; � d1k1 ; xÞ, where coeG 1 is its coefficient set to define the convex region g1 , so g1 can also be written as g1ðcoeG1Þ and coeG1 ¼ c11; c12; . . . ; c1k1 2 X; d 1 1; d 1 2; . . . ; d 1 k1 ; dis1 n o . The first level of our granular system is denoted as C1ðXÞ¼ fGðcoeG1Þg. If the l level granules GðcoeGlÞ¼ fgl; SFlg have been defined, the l þ 1 level granules can be defined as GðcoeGlþ1Þ, which can be defined as the convex region created by the intersection of gl and finite number strong r-cut sets: spr a; clþ11 jdis lþ1 ; dlþ11 ; x � � ; spr a; clþ12 jdis lþ1 ; dlþ12 ; x � � ; � � � ; spr a; clþ1klþ1 � jdislþ1; dlþ1klþ1 ; xÞ. A level l þ 1 granule’s coefficient is coeGlþ1 ¼ fclþ11 ; c lþ1 2 ; . . . ; c lþ1 klþ1 2 X; dlþ11 ; d lþ1 2 ; . . . d lþ1 klþ1 ; dislþ1g, where klþ1 is denoted as the number of simple logical formulas in same level. The centerofGðcoeGlþ1Þ is clþ1i 2 GðcoeG lÞ, and its radius is dlþ1i 6 d l i; lim l !1d l i ¼ 0. The GðcoeG 1Þ is called as the ‘‘the father granule of GðcoeGlþ1Þ ’’. (2) Granular computing (GrC): A granular computing is described by a set of fuzzy logical formula upon above multi scale leveled structure of convex regions. The purpose of a granular computing (GrC) is to transfer fea- ture information and classy points in an input space X, so at least one level of a granular computing (GrC) outputs a fuzzy label for points in an input space X. If there are totally m classes, a fuzzy label L is a m dimensional fuzzy vector L ¼fl1; l2; . . . ; lmg; and P i¼1;...;mli ¼ 1. Castro (1995) proved that Fuzzy logic controllers using fuzzy rules are universal approximations, later, Li and Philip Chen (2000) shows a proof of the equality between a forward neu- ral circuit (or circuit) and a fuzzy logical inference. So it is not difficulty to prove that any continuous functions F : Rn�!½0; 1�n can be simulated by such kind nested layered granular computing with arbitrary small errors. A level l þ 1 adjoint function F lþ1 receives its input from the outputs of level l þ p; p > 1 adjoint functions, i.e. if a leveled granular system Gsys has k levels, GrC is taken from level k to 1, so the level of a GrC is upside down with the level of the Gsys. The 1st level GrC takes place in the smallest kth level granules’ convex regions of the Gsys. Two kinds layered computing can be taken place over a granular system. In the first kind layered computing,the adjoint feature vectors of larger scale level n granules are computed based on the 2732 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 adjoint feature vectors of smaller scale n þ 1 level gran- ules,such kind layered computing has strictly nested struc- ture, (Fig. 1(a)), and is denoted as ‘‘nested layered computing’’. In the second kind layered computing, adjoint feature vectors of level n granules can be computed based on all adjoint feature vectors of smaller granules, which have level greater than n (Fig. 1(b)), such kind layered computing is denoted as ‘‘unnested layered computing’’. Nested layered computing is a special case of unnested layered computing. (3) Radiuses of convex regions: A granular system can have countable infinite or finite levels. The radiuses of gran- ules’convex regions decrease and tend to zero when the level goes to infinite. (4) Centers’ grid: The centers of granules will distribute in a so called center grid, we call the set of all centers of granules of level l þ 1 on the granule GðcoeGlÞ as the center grid of level l granule GðcoeGlÞ denoted as Gclþ1ðGðcoeGlÞÞ . We denote the set of all centers of level l þ 1 granules over X as Gclþ1ðXÞ and all centers of level l þ 1 granules over a level k < l granule GðcoeGkÞ as Gclþ1ðGðcoeGkÞÞ. The center grid is usually discrete, but it can also be a contin- uous set e.g. the whole metric space X. (5) Shape of granules: the shapes of granules is defined by their distance functions disðÞ . If disðÞ can be an abstract distance function, then a granule’s convex region can be an arbitrary convex region. Every level uses same distance function, so the granules in the same level have the same shape, but for different levels, granules’region may have different shapes. (6) The cover over a granule: In order to create a cover over GðcoeGlÞ, the elements in the centers’ grid Gclþ1 ðGðcoeGlÞÞ should be tight enough. Such kind cover can be formerly defined as: Clþ1ðGðcoeGlÞÞ¼ ^ 16i6klþ1 sprðx; clþ1i jdis lþ1 ; dlþ1i ; xÞ^ gðcoeG lÞjclþ1i 2 Gc lþ1ðGðcoeGlÞÞ; sprðx; clþ1i jdislþ1; d lþ1 i ; xÞ^ gðcoeG lÞ – / � � All level l þ 1 granules create a cover of the whole space X, and denoted as the level l þ 1 cover or the level l þ 1 layer of X, Clþ1ðXÞ¼ [ GðcoeGlÞ2ClðXÞ fClþ1ðGðcoeGlÞÞg . Radial Basis neural network (Haykin, 2008), which can be used to simulate continuous functions, is an example of two layers gran- ule system. Definition 3 (Hyper-granules and mini-granules). All level n þ 1 granules GðcoeGnþ1Þ, which are contained in a level n granule GðcoeGnÞ, denoted as ‘‘mini-granules’’, and the level n granule GðcoeGnÞ is denoted as a ‘‘hyper-granule’’. Fig. 1. Two kind layered compu After the theory of fuzzy logic was conceived by Zadeh (1965), many fuzzy logical systems have been presented, for example, the Zadeh system, the probability system, the algebraic system, and Bounded operator system, etc. According to universal approx- imation theorem (Haykin, 1994), in this paper, the extended Bounded operator is selected, which is denoted as ‘‘q-value Weighted Bounded operator’’. It is not difficult to prove that q-value weighted fuzzy logical formulas can precisely simulate any continuous functions F : Rn�!½0; 1�n with arbitrary small error, or vice versa, i.e. every GrC can be completed by a set of fuzzy logical functions of q-value weighted bounded operator with arbitrary small error. Definition 4 (Bounded Operator Fð�f ;�fÞ). Bounded product: x�f y ¼ maxð0; x þ y � 1Þ, and Bounded sum: x�f y ¼ minð1; x þ yÞ, where 0 6 x; y 6 1 . In order to simulate GrC, it is necessary to extend the Bounded Operator to Weighted Bounded Operator. The fuzzy formulas de- fined by q-value weighted bounded operators is denoted as q-value weighted fuzzy logical functions. Definition 5 (q-value Weighted Bounded operator Fð�f ;�fÞ). q-value Weighted Bounded product: p1�f p2 ¼ F�f ðp1; p2; w1; w2Þ ¼ maxð0; w1p1 þ w2p2 �ðw1 þ w2 � 1ÞqÞ ð3Þ q-value Weighted Bounded sum: p1�f p2 ¼ F�f ðp1; q2; w1; w2Þ¼ minðq; w1p1 þ w2p2Þ ð4Þ where 0 6 p1; p2 6 q. For association and distribution rules, we define: ðp1Df p2ÞHf p3 ¼FHf ðFDf ðp1;p2;w1;w2Þ;p3;1;w3Þ and p1Dfðp2Hf p3Þ¼ FDf ðp1; FHf ðp2; p3; w2; w3Þ; w1; 1Þ, Here Df ; Hf ¼�f or �f . We can prove that �f and �f follow the associative condition (see Appendix C) and x1�f x2�f x3 . . .�f xn ¼ min q; X 16i6n wixi ! ð5Þ x1�f x2�f x3 . . .�f xn ¼ max 0; X 16i6n wi xi � X 16i6n wi � 1 ! q ! ð6Þ For more above q -value weighted bounded operator Fð�f ;�fÞ fol- lows the Demorgan Law, i.e. ting over granular system. H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2733 Nðx1�f x2�f x3 . . .�f xnÞ¼ q � min q; X 16i6n wi xi ! ¼ max 0; q � X 16i6n wi xi ! ¼ max 0; X 16i6n wiðq � xiÞ�ð X 16i6n wi � 1Þq ! ¼ Nðx1Þ�f Nðx2Þ�f Nðx3Þ � � ��f NðxnÞ: ð7Þ But for the q-value weighted bounded operator Fð�f ;�fÞ, the distribution condition is usually not hold, and the boundary condi- tion is hold only all weights equal to 1, for p1�f q ¼ F�f ðp1; q; w1;w2Þ¼maxð0;w1 p1 þð1�w1ÞqÞ and p1�f q ¼ F�f ðp1; q; w1; w2Þ¼ minðq; w1 p1 þ w2 qÞ. In this paper, we show that the task of haze-free can be completed by a common GrC based on fuzzy logical formulas of bounded fuzzy operator. 3. Hybrid designing of leveled perception granular system based on fuzzy logic and PSVM Owing to the limitation of the scope, in this paper only nested layered GrC is discussed. A nested layered GrC is defined by the in- put and output relation of a granular computing on a granular sys- tem. There are three kinds relations between nearby layers (layers k and k þ 1) of a nested GrC: (1) binary logic; (2) fuzzy logic; (3) alogical relation. Because fuzzy logic and binary logic are all created by the sig- moid function, so back propagation method can be used to mod- ify weights of all layers. In order to speed up the learning process, for a layered GrC, we combine logical designing with PSVM (Fung & Mangasarian, 2001), such kind novel approach is called as ‘‘Logical support vector machine (LPSVM)’’. For nested layered GrC, parameters in the binary logical layers can be di- rectly designated according to the binary relation; for the fuzzy logical layers, parameters can also be set according to these lay- ers’functions, but a suitable small adjustment by back propaga- tion is necessary, this is similar to the deep learning proposed by Geoffrey Hinton such that a many-layered neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machi- ne,then using supervised backpropagation for fine-tuning. For the non logical (alogical) layer, parameters should be learned based on samples according to the input and output relation function fiðx1; x2; x3; . . . ; xnÞ, we can use Back Propagation method or PSVM, to learn weights for fiðx1; x2; x3; . . . ; xnÞ . The designing strategy of LPSVM: � Step 1: Except for the alogical layer’s weights, designing the lay- ers’ weights according to the logical (binary or fuzzy) relations, for fuzzy logical relations, a suitable modification of weights maybe be necessary according to the task of this layer; � Step 2: Alogical layers’ weights are computed for the input layer to the last output layer. For an alogical layer i, if X is the input train set, computing the inner layers’ output from the 1st layer to the ði � 1Þth layer based on X; � Step 3: Using PSVM to compute the ith layer’s weights W i according to (8); � Step 4: Repeat the Step 2 to Step 4, until the output error is small enough. � Step 5: using back propagating approach to modify all layers weights. The weight vector W l of W i ¼ X0DU ð8Þ Where The weight vector W i of the node, U is computed by (9) and X and D are the problem data, i.e. X ¼ ½X1; . . . ; Xn�, and diagonal ma- trix D ¼ y1 0 0 .. . . . . .. . 0 . . . yn 2 64 3 75;ðXi; yiÞ is a training sample with Xi feature vector and target yi . U ¼ I m þ DðXX0 þ EE0ÞD � ��1 E ð9Þ Where m is a positive parameter selected for guarantee of a small magnitude kW ik; I is the identity matrix, and E is a vector with all elements are 1. 4. Granular system for visual task The columnar organization of our brain’s primary visual cortex strongly supports the granular system defined aforementioned. Many functions of the primary visual cortex are still unknown, but the columnar organization is well understood. The lateral geniculate nucleus (LGN) transfers information from eyes to brain stem and primary visual cortex (V1) (Mountcastle, 1997). Colum- nar organization of V1 plays an important role in the processing of visual information. Local similarity of information processing gives rise to Colum- nar organization has a granular structure. V1 is composed of a grid of ð1 1mm2Þ neural area of hypercolumns (hc) in our brain’s pri- mary visual cortex. Every hypercolumn contains a set of minicol- umns (mc), which have same focus. Each hypercolumn analyzes information from one small region (described by a distance func- tion) of the retina. Adjacent hypercolumns analyze information from adjacent areas of the retina, so the structure of a columnar organization can be described by a set of fuzzy logical formulas similar to a granular system. Hypercolumns (or supercolumns), minicolumns (mc) can be viewed as granules. Similar to the pri- mary visual cortex,in our granular system, there are two kind gran- ules:hyper-granule and mini-granules in some levels of our granular system. A hyper-granule contain a bundle of mini- granules. Definition 6 (Perception Granular system of Columnar Organization (COGsys)). A perception columnar organization is a special per- ception granular system, in which, there is at least one hyper- granule GðcoeGnþ1Þ such that all mini-granules included in it have same convex region, but different adjoint functions. In this paper, in order to simulating visual cortex, a granular system of columnar organization (COGsys) is designed for the haze-free task.In our Hybrid designing approach (LPSVM), we firstly design Leveled Granular Systems with the help of fuzzy lo- gic, and then we use PSVM to accomplish the learning for some concrete visual tasks. 4.1. The theory of image matting According to Levin, Lischinski, and Weiss (2008), image matting refers to the problem of softly extracting the foreground object from an input image and a trimap image. ‘‘Tripmap’’ means three kinds of regions, white denotes definite foreground region, black denotes definite background region and gray denotes undefined region. Formally, image matting methods take I as an input, which is assumed to be a composite of a foreground image F o and a back- ground B in a linear form and can be written as I ¼ aF o þð1 � aÞB . For the haze-free task, the fuzzy label of haze or non-haze is described by the parameter a . And the task of image matting tries to find a function Fo ¼ fFoðIÞ . Closed form solution assumes that a is a linear function of the input image I in a small window 2734 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 w : ai aIi þ b;8i 2 w . Then to solve a spare linear system to get the alpha matte. Our GrC approach gets rid of the linear assump- tion between a and I. Instead, we try to introduce nonlinear rela- tion between a and I: aw ¼ FðW IwÞ ð10Þ here W Iw is the image block included in the small window w, and aw is its center pixel’s fuzzy label. We take color or texture in local win- dow as our input feature, and the trimap image as the target. After training, the neural fuzzy logical network will generate the result of alpha matte. In the application of alpha matting, our method can re- move the haze using dark channel prior as the trimap. 4.2. Leveled perception granular system for haze-free task In this section, we try to design a Perception Granular system of Columnar Organization (COGsys) for the haze-free task, here only nested layered GrC is needed. The recognition of our Leveled Fig. 2. A 4 layers’ structure of a granular sy Granular System (see Fig. 2) is started with the recognition orien- tation or simple structure of local patterns, then the trimap image is computed based on these local patterns. Eq. (11), which has a high ability to simulate fuzzy logic operator (see the detail in the appendix) is used to design GrC. The weight wi in Eq. (11) can be viewed as connections among granules. A nested layered GrC is de- fined by the input and output relation of a granular computing on a granular system. just as above mentioned, there are three ways to design weights of a layered GrC:according to the binary or fuzzy logical relation about this layered GrC and according to the input and output relation function fiðx1; x2; x3; . . . ; xnÞ from training samples. Ulþ1;i ¼ X k wlþ1;i;k � Ilþ1;k;i Olþ1;i ¼ sigmðUlþ1;i; T lþ1;i; kÞ ð11Þ where sigmðÞ is a sigmoid function Eq. (12), and Olþ1;i is the output of a level l þ 1 granule. stem for haze-background separation. H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2735 sigmðx; t; kÞ¼ 1=f1 þ expf�k � ðx � tÞgg ð12Þ The Theorem 1 discussed in the Appendix A guarantee that above defined granule computing can simulate a boolean function with arbitrary small error. As the designing of Gsys contains two parts: (1) convex regions, (2) adjoint fuzzy logical functions for GrC. The following Gsys for haze-free task is a very simple convex regions can be described by distance function. The input space is just an image, which is a 5-dimensional space X ¼fðx; y; r; g; bÞg, here every example ðx; y; r; g; bÞ represents a pix- el of this image, ðx; yÞ is the pixel’s location and ðr; g; bÞ is pixel’s color value. the nested granular system is build on the image. A granular system is built upon images, with fuzzy logical formula spða; cjdis; d; xÞ here x ¼ð1; 1; 0; 0; 0Þ, and d ¼ 0 for level 1 GrC, and d > 1 for higher level GrCs. All levels’ centers are located on the whole image plane, so every centers grid is just the image plane and granules are overlapped. In the following pages, we focus on the designing of adjoint fuz- zy logical functions for GrC. If there are k levels in our Gsys, the kth level receives the input image I, and the first level granule outputs the result haze free im- age. The relation between input and output of a level-l granule is described by Eq. (11). The weights among granules can be designed by LPSVM, the weights of 1st and 2nd layers are designed by fuzzy logic, and the weights of the 3rd layer are designed by PSVM to learn the trimap image. For the sake of simplicity, in the following Gsys, we use the or- der of GrC level which is upside down with the granular system le- vel, and one layer may contain two GrC levels. The GrC of COGsys is formally defined as bellow: (1) The 1st layer – fuzzy logical layer Every hyper-granule (Fig. 3) in the 1st layer tries to change a 3 3 pixels’ image block Ib 3 3 into a binary 3 3 pixels’texture pattern. The input image is normalized. A hyper-granule HG ¼ðg; SFÞ in the 1st layer contains 3 3 mini-granules to focus a 3 3 small window, every mini-granule focuses only one pixel, so the convex region of a hyper-granule is described by disðx; cÞ� 0 . A hyper-granule completes the task of image processing. There are three kinds fuzzy logical functions in a hyper-granule’s SF : Fig. 3. pattern just a 3 every g (1) In a local image pattern recognition way (LIPW): the 1st processing directly transforms every pixel’s value to a fuzzy logical one by a sigmoid function. F1ðfIbg3 3Þ¼ fsigmðpi;jÞg3 3 ð13Þ Every 1st layer granule tries to change a local image into a binary texture . For a hyper granule is defined by a distance function disðx; cÞ < 3, which is 3 small window, a hyper-granule in the 1st layer contains 9 granules, and ranule focuses only one pixel. here pi;j; i; j ¼ 1; 2; 3 is the RGB pixel value in a small 3 3 window; (2) In a local Binary Pattern operator simulating way (LBPW). The 2nd processing is also completed by a sig- moid function; the difference is that every boundary pix- el’s value is fuzzy exclusive OR � with the center pixel’s value before sending it to a sigmoid function, F 2ðfIbg3 3Þ¼ ffðpi;jÞg3 3 ð14Þ Here fðpi;jÞ¼ sigmðpi;j � p2;2Þ when i; j – 2, and fðp2;2Þ¼ 0. F2 is similar to a Local Binary Pattern operator (LBP) mentioned in Ojala, Pietikäinen, and Harwood (1996) as a mean of sum- marizing local gray-level structure. The operator takes a local neighborhood around each pixel, thresholds the pixels of the neighborhood at the value of the central pixel and uses the resulting binary-valued image patch as a local image descrip- tor. It was originally defined for neighborhoods, giving 8 bit codes based on the 8 pixels around the central one. Such processing emphasizes the contrast of texture, and our exper- iments support this fact. (3) Hybrid LIPW and LBPW (LBIPW). The adjoint function F 3ð�Þ in LBIPW is same as F2ð�Þ in LBPW, except that fðp2;2Þ¼ p2;2 in F 3ð�Þ, while fðp2;2Þ¼ 0 in F 2ð�Þ . Every granule in a 1st layer’s granule has only one input weight wij in Fig. 3, which equals 1; when k !þ1, the coef- ficient k in Eq. (11) changes the outputs from fuzzy values to binary numbers. (2) The 2nd layer–binary logical layer Every 2nd layer mini- granules try to recognize a definite shape (see Fig. 4), so they share the same convex region with a 1st layer hyper-gran- ule, which focuses on the same small 3 3 window in an image, and can be described by disðx; cÞ < 2. If there are total q local small patterns, a hyper-granule in the 2nd layer con- tains q (in our system q ¼ 256 or 512) mini-granules of the 2nd layer, which have same receptive field, but with a differ- ent adjoint fuzzy logical function, which tries to recognize a definite shape from the output of a 1st layer hyper-granule. For example, the ‘‘\’’ shape in Fig. 4 can be described by a adjoint fuzzy logical formula (Eq. (15)). The ‘‘and’’ operator for 9 inputs in Eq. (15)can be created by a granule mc (see. Fig. 4). In Eq. (15), every pixel Pij has two states mij and mij . Suppose the unified gray value (or RGB value) of Pij is gij, and an image module needs a high value gij at the place of mij and a low value at mij . So the input for the gran- ule mc at mij is Iij ¼ gij, and at mij is Iij ¼�ð1:0 � gijÞ. A not gate mc0 is needed for Iij ¼�ð1:0 � gijÞ, here gij; i; j ¼ 1; 2; 3 is the output of a 1st layer hyper-granule. P¼m11 ^m12 ^m13 ^m21 ^m23 ^m31 ^m33 ^m22 ^m32 ð15Þ wij ¼ 1; if the jth bit of a binary pattern ¼ 1 �1; if the jth bit of a binary pattern ¼ 0 � ð16Þ where for LIPW and LBIPW, j ¼ 1; 2; 3; . . . ; 9; for LBPW, the cen- ter 1st-layer granule is useless, so j ¼ 1; 2; 3; . . . 8. There are three kinds hyper-granules in the 2nd-layer, which receive three differ- ent outputs of a 1st-layer’s hyper-granule, so a hyper-granule in the 2nd-layer may work in one of following three ways: 1. In the local image pattern recognition way (LIPW): every 2nd layer hyper-granule contains 512 2nd-layer’s mini-granules, and inputs of these 2nd-layer’s mini-granules come from a 1st-layer’s hyper-granule which works in LIPW way. Every 2nd-layer’s hyper-granule tries to classify the image block in this window into 512 binary texture patterns (BTP), e.g. eight important BTPs are shown in Fig. 5. The pixel value is ‘‘1’’ for Fig. 5. Every the 2nd layer’s granule contains 256 or 512 granule which corresponds to 256 or 512 modules in above picture. Fig. 4. A hyper-granule in the 2nd layer contains q granules which have same receptive field and try to recognize q definite small shapes. A ‘and’ granule is needed for every 2nd layer granule. 2736 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 white and ‘‘0’’ for black. In this mode, 3 3 granules of the 1st layer output a 3 3 vector, i.e., a 3 3 fuzzy logical pattern of a BTP, which is computed by a sigmoid function. 2. In the local Binary Pattern operator simulating way (LBPW), a 2nd-layer’s hyper-granule contains 256 2nd-layer’s mini- granules which receive input from the output of a 1st-layer’s hyper-granule, which works in the way of LBPW. 3. In the hybrid LIPW and LBPW (LBIPW) way, a 2nd-layer’s hyper- granule contains 512 2nd-layer’s mini-granules which receive input from the output of a 1st-layer’s hyper-granule, which has 9-dimensions. In our system, a Gsys is built for every color channel R,G or B, so a hyper-granule in the 2nd layer has a 512 3 dimensions output or 256 3 dimensions output. As a binary logical layer, in order to recognize a binary pattern, an ‘and’ granule with index i is needed (see Fig. 4) for every 2nd- layer granule, and the weights of this ‘and’ granule to the 1st-layer granules are set as Eq. (16), the corresponding parameters in Eq. (11) are set as the threshold T i ¼ 5:1, and k ¼ 0:9. 4.2.1. The 3rd layer – alogical layer The convex region of this layer can also be described by disðx; cÞ < 2. The output of a hyper-granule in the 2nd layer, which has 3 256 or 3 512 dimensions, is transformed to the 3rd-layer granules to compute the similarity parameter or fuzzy value ai in Eq. (10), the weights of this layer is computed by psvm, the target is provided by so called dark channel prior which is computed by the approach mentioned in He, Sun, and Tang (2011). As all ai are optimised on the whole image, in this layer,the whole image is the only convex region. As the small windows focused by hy- per-granules in the 2nd-layer are overlapped, the focuses of 3rd- layer’s granules are also overlapped. 4.2.2. The 4th layer – fuzzy logical layer In this layer, a granule tries to remove the haze from original image. A granule in the 4th layer computes a pixel of a haze free image according to fuzzy logical equation Eq. (17) IiðxÞ¼ minfq; aiðxÞ � JiðxÞþð1 � aiðxÞÞ � Aig¼ JiðxÞ�f Ai ð17Þ where Ji is the haze free image, Ii is the original image, Ai is the global atmospheric light which can be estimated from dark channel prior, ai is the alpha matte generated by 3rd layer, and �f is the q-value Weighted Bounded sum with weights w1 ¼ aiðxÞ; w2 ¼ 1 � aiðxÞ, here q is max gray or RGB value of a pixel, and aiðxÞ and ð1 � aiðxÞÞ are weights. Although we can use back propagation approach to compute pixels’ value JiðxÞ given the haze image pixel value IiðxÞ based on Eq. (17), for the sake of simplicity, we directly use the Eq. (18) mentioned by He et al. (2011) to com- pute the haze free image. As every aiðxÞ is computed upon the whole image, the pixel of haze-free image is also computed upon whole image, so the whole image is also the convex region of this layer. JiðxÞ¼ IiðxÞ� Ai maxðaiðxÞ; a0Þ þ Ai ð18Þ where Ji is the haze free image, Ii is the original image, Ai is the glo- bal atmospheric light which can be estimated from dark channel prior, ai is the alpha matte generated by 3rd layer, and a0 is a threshold, a typical value is 1. 4.3. Experiments result The haze-free experiment result (1) The haze-free and texture information entropy Texture information can give out a rough measure about the effect of haze-freeing, we use the entropy of the texture his- togram to measure the effect of deleting haze from images. The entropy of the histogram is described in Eq. (19). Haze makes the texture of an image unclear, so theoretically speaking, haze removing will increase the entropy of the texture histogram. Entropy : H ¼� XG�1 i¼0 pðiÞlog2½pðiÞ� ð19Þ Fig. 6. The processing result of granular system for visual haze-free task. H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2737 The pðiÞ denote the rate of each pattern in histogram. In general pðiÞ define in Eq. (20). Here patterns we use are the LBPs in Eq. (14), where Sigmðin � iCÞ¼ 1, if in > iC þ 10 else Sigmðin � iCÞ¼ 0 . Table 1 The tex Area2: Area Area Area Fig. 7. granule pðiÞ¼ hðiÞ=ðNMÞ; i ¼ 0; 1; . . . ; G � 1 ð20Þ In Fig. 6(a), (b), (c), (d) and (e) are the results of LBPW, LBIPW, LIPW, the linear mode (LMKH) by He et al. (2011), and the original image respectively. From Fig. 6, we can see that the texture structure in the waist of a mountain becomes vaguer from LBPW, LBIPW, LIPW to LMKH. For the sake of the 2nd kind processing in the 1st-layer’s granules pays much more attention to the contrast, LBPW has the highest ability to remove the haze, LBPW and LIPW are complemen- tary approaches, LBIPW, which is the cooperation of them, has a similar ability as the linear approach proposed by He et al. (2011). According to the results showed in the Table 1, which are about tex- ture information entropy of the image, we can see that the texture information entropy is increased after haze-free processing, so our approaches have higher ability to increase the texture information entropy than the linear approach proposed by He et al. (2011). ture information entropy of the image blocks (Area1: the waist of a mountain; right bottom corner) in the Fig. 6. LBPW LBIPW LIPW LMKH Original 1 5.4852 5.2906 5.1593 4.8323 1.0893 2 6.1091 10.3280 10.2999 9.1759 8.3718 The relation among the precision (rmse) of PSVM learning and k parameters in th . Theoretically speaking, LBPW is a pure texture processing, so LBPW has a highest value, LIPW is much more weaker than LBPW, LBIPW is the hybrid of LBPW and LIPW, so it has a average ability. The tex- ture information entropy of the Area1 correctly reflects this fact. But for the Area2, as it already has a clearest texture structure in the ori- ginal image, the deleting of haze may cause overdone. The texture information is over emphasized by LBPW in the Aera2, so it has a lowest texture information entropy and almost becomes a dark area. This fact means that overtreatment is more easier to appear in a non linear processing than a linear one in the haze-free task. (2) The effect about the degree of fuzzyness Just as the Theorem 1 mentioned above, the parameter k in Eq. (10) can control the fuzzyness of a granule, when the parameter k in Eq. (10) tends to infinite, a granule behaves from a fuzzy logical formula to a binary logical formula. This experiment is about the relation among the precision (rmse) of PSVM learning and k parameters in the first and second layer. LBPW is a pure texture processing and pays much more attention to the contrast of an image’s nearby pixels, a set of large k is necessary for a low rmse, which corre- sponds to binary logic; but LBIPW and LIPW aphe pear to prefer fuzzy logic for a set of small k when rmse is small. A possible explanation for this fact is that LBP proposed by Ojala et al. (1996) is binary, not fuzzy, and has a sound clas- sification ability for image understanding under binary pat- tern, but LBIPW and LIPW are not binary, they have fuzzy information at least for the center pixel of a 3 3 small window (Fig. 7). e first and second layer, the parameter k in Eq. (10) can control the fuzzyness of a Fig. 9. Simulating fuzzy logical and-or by changing thresholds of Eq. (11). The X- axis is the threshold value divided by 0.02, the Y-axis is errG. The real line is errAnd between I1 �f I2 and V i , and the dot line is the errOr between I1�f I2 and V i . Fig. 8. More result of granular system for visual haze-free task. 2738 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 (3) The comparison between our approach and LMKH To illustrate the effect of our approach in haze-free task, we apply it on the other images and compare with LMKH (Fig. 8). Half of the result is better than the LMKH, the rest is as good as the LMKH by manual evaluation. 5. Discussion In this paper, we give out a concrete example to show that the theory of GrC can help us to design the brain-like computer. The experimental results show that LPSVM is a promising approach for designing of a granular system similar to a columnar organiza- tion for image haze-removing task. The concept of granular com- puting is proposed by Bargiela and Pedrycz (2006). Just as he said: a granule is a clump of objects (points) drawn together by indistinguishability, similarity, proximity or functionality. The nec- essary of granular computing to study the information transforma- tion in the pattern recognition lies in indistinguishability, similarity, proximity or functionality of sensed information Due to the local similarity in the information processing of pattern rec- ognition, multi-scale information processing is a common phe- nomenon in pattern recognition. In actual fact, the GrC based on the leveled granular system aforementioned can simulate all mul- ti-scale information processing with arbitrary small error. This fact is very important for the hot approach-deep learning. In this paper, we use a novel designing approach (LPSVM) to design a granular system similar to the structure of columnar organization of visual cortex, We demonstrate that fuzzy logic and machine learning can be hybrid and cooperated easily to design a granular system. This approach not only give out a novel concrete realization of abstract models for granular computing mentioned Lin (2012), but also gives a new focus for deep learning. For more,the corre- sponding of GrC can simulate multi-scale information processing for the task of haze-free of images, and our experiments show that H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2739 our approach has some approvement for the task of haze-free com- paring to the approach proposed by the linear approach proposed by He et al. (2011). For further directions, although our LPSVM gives out a concrete example for designing a granular system for haze free task, many details of LPSVM should be studied in the task of pattern recogni- tion under the framework of deep learning, especially for the lay- ered feature abstraction in the task of pattern recognition. For more, we will extend the investigation by looking at other nested layered computing for more complex tasks. However, since layered computing has no feedback, which is important for many visual tasks in dynamical situations, we also plan to extend our layered granular computing to a more general one which allows for both feedback and dynamical regulation for the task of computer vision. Acknowledgments This work is partially supported by the National Program on Key Basic Research Project (973 Program) (No. 2013CB329502), the National Natural Science Foundation of China (Nos. 61072085, 61035003, 61202212, 60933004), the National High- tech R and D Program of China (863 Program) (No. 2012AA011003), the National Science and Technology Support Program (2012BA107B02) and the China Information Technology Security Evaluation Center (CNITSEC-KY-2012-006/1). Appendix A. Sigmoid function and Binary Logic Theorem 1. Suppose in Eq. (11), every wlik ¼ bk T i; bk > 0; T i > 0; 1 6 k 6 K, for more, C ¼fSiji ¼ 1; . . . ; Lg is a class of index sets, and every index set Si is a subset of f1; 2; 3; . . . ; Kg, then we have: (1) If fðx1; x2; . . . ; xkÞ¼ _ l¼1;...;L ð^xjiÞ jl2Sl is a disjunctive normal form (DNF) formula, and the class C ¼fSiji ¼ 1; . . . ; Lg is the class which has the following two characters: (1). for every Si; Sj 2 C; Si \ Sj – Sk 2 C for all k and i – j (this condition assures that fðx1; x2; . . . ; xkÞ has a simplest form); (2). every Si has the character P j2Sl bj > 1, where Si 2 C, and any index sets S0 R C have character P j2S0bj < 1, or if P j2S0bj > 1, there must be an index set Si 2 C such that S0 \ Si ¼ Si (this condition assures C is the largest), then the output described by Eq. (11) can simulate the DNF formula fðx1; x2; . . . ; xkÞ¼ _ l¼1;...;L ð^xjiÞ i2Sl with arbitrary small error, where xi ¼ zi, if the corresponding input Ii ¼ zi, or xi ¼ �zi if Ii ¼ 1 � zi . (2) If a neural cell described by Eq. (11) can simulate the Boolean formula fðx1; x2; . . . ; xkÞ with arbitrary small error, and ð^xiÞ i2Sl is an item in the disjunctive normal form of fðx1; x2; . . . ; xkÞ, i.e. fðx1; x2; . . . ; xkÞ¼ 1 at xj ¼ 1 for all j 2 Sl and xj ¼ 0 for all j R Sl, then P i2Sl bi > 1. (3) If a couple of index sets Sl1 and Sl2 can be found in the formula fðx1;x2;. . .;xkÞ¼ _ l¼1;...;k; ð ^ t2Sl xtÞ, such that ð ^ t12Sl1 xt1Þ^ð ^ t22Sl2 xt2Þ¼ zi ^ �zi ¼ false, then the output described by Eq. (11) can’t simulate the formula fðx1; x2; . . . ; xkÞ . Proof. (1) If It ¼ 1, for all t 2 Sl, and It ¼ 0, for all t R Sl, becauseP i2Sl bi > 1, then for the index set Sl is a subset of f1; 2; 3; . . . ; Kg, we have V i ¼ 1=½expð�kðUi � T iÞþ 1� ¼ 1=½expð�kð X 16k6K wikIk � T iÞÞþ 1� ¼ 1=½expð�kð X i2Sl bi � 1ÞT iÞÞþ 1�; so limk!þ1V i ¼ 1 ¼ fðx1; x2; . . . ; xkÞ. If It ¼ 1;8t 2 S0; It ¼ 0;8t R S0 and S0 R C, then according to the condition of this theorem: if P i2S0bi < 1, limk!þ1V i ¼ 0 ¼ fðx1; x2; . . . ; xkÞ; if P i2S0bi > 1, then there is an in- dex set Si 2 C such that S0 \ Si ¼ Si , then limk!þ1V i ¼ 1 ¼ fðx1; x2; . . . ; xkÞ. So when k !1, the error between output described by Eq. (11) and fðx1; x2; . . . ; xkÞ trends to 0. (2) If the output described by Eq. (11) can simulate the Boolean formula fðx1; x2; . . . ; xkÞ which is not a constant with arbi- trary small error, and for a definite binary input x1; x2; . . . ; xk , then the arbitrary small error is achieved when k trends to infinite and ðUi � T iÞ¼ P k2Sl wikIk � T i – 0 where Sl is the set of the labels and Ii ¼ 1, for all i 2 Sl , and Ii ¼ 0, for all i R Sl. The theorem’s condition supposes that every wik ¼ bkT i; bk > 0; T i > 0; 1 6 k 6 K, and x1; x2; . . . ; xk are bin- ary number 0 or 1, so if fðx1; x2; . . . ; xkÞ is not a constant, when fðx1; x2; . . . ; xkÞ¼ 0, there must be limk!þ1V i ¼ 0; and when fðx1; x2; . . . ; xkÞ¼ 1, it is necessary for limk!þ1V i ¼ 1. limk!þ1V i ¼ 0 needs that -kð P i2Sl biT i � T iÞ trends to minus infinite and limk!þ1V i ¼ 1 needs that -kð P i2Sl biT i � T iÞ trends to plus infinite. So if fðx1; x2; . . . ; xkÞ¼ 1 at xj ¼ 1 for all j 2 Sl and xj ¼ 0 for all j R Sl, in order to guarantee limk!þ1errfðx1;x2;...;xkÞðwi;1; wi;2; . . . ; wi;k; T iÞ ¼ 0; P i2Sl bi > 1 must be hold, here errfðx1;x2;...;xkÞðwi;1; wi;2; . . . ; wi;k; T iÞ is the error between output described by Eq. (11) and fðx1; x2; . . . ; xkÞ. (3) The third part of the theorem is based on the simple fact that for a single neuron V i is monotone on every input Ii which can be zi or 1 � zi. h Appendix B. Sigmoid function and fuzzy logic For more above granular computing can approximately simu- late Bounded operator. Bounded operator Fð�f ;�fÞ Bounded prod- uct p�f q ¼ maxð0; p þ q � 1Þ, Bounded sum p�f q ¼ minð1; p þ qÞ. Based on Eq. (11), the membrane potential’s fixed point under in- put Ik is Ui ¼ P k wikIk and the output at the fixed point is V i ¼ 1=ðexpð�Ui þ T iÞþ 1Þ. If there are only two inputs I1; I2ðI1; I2 2 ½0; 1�Þ in Eq. (11), we set w1 ¼ 1:0 and w2 ¼ 1:0, then Ui ¼ I1 þ I2 . Now we try to prove that the Bounded operator Fð�f ;�fÞ is the best fuzzy operator to simulate neural cells described by (3) and the threshold Ti can change the neural cell from the bounded oper- ator �f to �f by analyzing the output at the fixed point V i ¼ 1=ðexpð�Ui þ T iÞþ 1Þ. If C > 0 is a constant and Ui ¼ I1 þ I2 P C, then 1=ðexpð�C þ T iÞþ 1Þ6 V i < 1 . When Ui ¼ I1 þ I2 !þ1V i ! 1, so in this case, if C is large enough, V i 1. If -C 6 Ui ¼ I1 þ I2 6 C, then 1=ðexpðC þ T iÞþ 1Þ6 V i 6 1=ðexpð�C þ T iÞþ 1Þ, according to equation (a). We can select a T i, that makes jT i þ P1 j¼2ð�Ui þ T iÞ j =j! � P1 k¼2ð�1Þ k expð�kðUi T iÞÞj small enough, then V i I1 þ I2 . 2740 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 V i ¼ 1=ðexpð�Ui þ T iÞþ 1Þ ¼ 1 � expð�Ui þ T iÞþ X1 k¼2 ð�1Þk expð�kðUi � T iÞÞ ¼ Ui � T i � X1 j¼2 ð�Ui þ T iÞ j =j! þ X1 k¼2 ð�1Þk expð�kðUi T iÞÞ ¼ Ui � T i � X1 j¼2 ð�Ui þ T iÞ j =j! þ X1 k¼2 ð�1Þk expð�kðUi T iÞÞ:ðaÞ So in this case, V i I1�f I2 ¼ minð1; I1 þ I2Þ. Similarly, if Ui ¼ I1 þ I2 !�1 V i ! 0. So when C is large enough and Ui ¼ I1 þ I2 6�C < 0, then V i 0. When -C 6 Ui ¼ I1 þ I2 6 C, if we select a suitable T i which makes T i þ P1 j¼2ð�Ui þ T iÞ j =j! � P1 k¼2ð�1Þ k expð�kðUi � T iÞÞ 1, then V i I1�f I2 ¼ maxð0; I1 þ I2 � 1Þ. Based on above analysis, the Bounded operator fuzzy system is suitable for GrC described by Eq. (11) when ai ¼ 1:0; w1 ¼ 1:0 and w2 ¼ 1:0. For arbitrary positive ai; w1 and w2, we can use corre- sponding q-value weighted universal fuzzy logical function based on Bounded operator to simulate such kind neural cells. If a weight w is negative, a N-norm operator NðxÞ¼ 1 � x should be used. Experiments done by scanning the whole region of ðI1; I2Þ in ½0; 1�2 to find the suitable coefficients for �f and �f show that above analysis is sound. We denote the input in Eq. (11) as ~x ¼ðI1; I2Þ. The ‘‘errOr’’ for �f and ‘‘errAnd’’ for �f are shown in Fig. 9 as the solid line and the dotted line respectively. In Fig. 9, the threshold T i is scanned from 0 to 4.1 with step size 0.01. The best T i in Eq. (4) for �f is 2.54 and the best T i in Eq. (4) for �f is 0, when a ¼ 1:0; w1 ¼ 1:0 and w2 ¼ 1:0. In this case the ‘‘errOr’’ and ‘‘errAnd’’ is less than 0.01. Our experiments show that suitable T i can be found. So in most cases, the bounded operator Fð�f ;�fÞ mentioned above is the suitable fuzzy logical framework for the neuron defined by Eq. (3). If the weight 0 < w1 and 0 < w2, we should use a q-value weighted bounded operator Fð�f ;�fÞ to rep- resent above neuron. Appendix C. Associative condition and Demorgan law of q- weighted bounded operator It is easily to see �f follows the associative condition and x1�f x2�f x3 . . .�f xn ¼ minðq; P 16i6nwixiÞ. For �f , we can prove the associative condition is hold also. The proof is listed as below: If w1p1 þ w2p2 �ðw1 þ w2 � 1Þq P 0, we have: ðp1�f p2Þ�f p3 ¼F�f ðF�f ðp1;p2;w1;w2Þ;p3;1;w3Þ ¼F�f ðw1p1þw2p2�ðw1þw2�1Þq;p3;1;w3Þ ¼maxð0;w1p1þw2p2�ðw1þw2�1Þqþw3 p3 �ð1þw3�1ÞqÞ ¼maxð0;w1p1þw2p2þw3p3�ðw1þw2þw3�1ÞqÞ; if w1p1 þ w2 p2 �ðw1 þ w2 � 1Þq < 0, we have ðp1�f p2Þ�f p3 ¼F�f ðF�f ðp1;p2;w1;w2Þ;p3;1;w3Þ ¼F�f ð0;p3;1;w3Þ¼maxð0;0þw3 p3 �ð1þw3 �1ÞqÞ ¼maxð0;w3 p3 �w3qÞ ¼ for06p36q0 ¼maxð0;w1 p1 þw2p2 þw3p3 �ðw1 þw2 þw3 �1ÞqÞ; so ðp1�f p2Þ�f p3 ¼ p1�fðp2�f p3Þ¼ maxð0; w1 p1 þ w2p2 þ w3 p3 �ðw1 þ w2 þ w3 � 1ÞqÞ. By inductive approach, we can prove that �f also follows the associative condition and x1�f x2�f x3 . . .�f xn ¼ maxð0; P 16i6n wixi �ð P 16i6nwi � 1ÞqÞ. For more if we define NðpÞ¼ q � p (usually, a negative weight wi corresponds a N-norm), above weighted bounded operator Fð�f ;�fÞ follows the Demorgan Law, i.e. Nðx1�f x2�f x3 . . .�f xnÞ¼ q � min q; X 16i6n wi xi ! ¼ max 0; q � X 16i6n wixi ! ¼ max 0; X 16i6n wiðq � xiÞ�ð X 16i6n wi � 1Þq ! ¼ Nðx1Þ�f Nðx2Þ�f Nðx3Þ . . .�f NðxnÞ: References Andrzej, B., & Pedrycz, W. (2006). The roots of granular computing. In GrC (pp. 806– 809). Castro, J. L. (1995). Fuzzy logic controllers are universal approximators. IEEE Transactions on Systems, Man and Cybernetics, 25(4), 629–635. Fung, G., & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 77–86). ACM. Haykin, S. (1994). Neural networks: A comprehensive foundation. Prentice Hall PTR. Haykin, S. (2008). neural networks: A comprehensive foundation. Englewood cliffs: Prentive-Hall. He, K., Sun, J., & Tang, X. (2011). Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353. Levin, A., Lischinski, D., & Weiss, Y. (2008). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 228–242. Lin, T. Y. (1998). Granular computing on binary relations I: Data mining and neighborhood systems. Rough Sets in Knowledge Discovery, 1, 107–121. Lin, T. Y. (1999). Granular computing: Fuzzy logic and rough sets. Computing with words in information/intelligent systems (Vol. 1, pp. 183–200). Springer. Lin, T. Y. (2007). Neighborhood systems: A qualitative theory for fuzzy and rough sets. Berkeley: University of California. 94720. Lin, T. Y. (2012). Granular computing: Practices, theories, and future directions. In Computational complexity (pp. 1404–1420). Springer. Li, H.-X., & Philip Chen, C. L. (2000). The equivalence between fuzzy logic systems and feedforward neural networks. IEEE Transactions on Neural Networks, 11(2), 356–365. Liu, H., Xiong, S., & Wu, C.-a. (2012). Hyperspherical granular computing classification algorithm based on fuzzy lattices. Mathematical and Computer Modelling. Mountcastle, V. B. (1997). The columnar organization of the neocortex. Brain, 120(4), 701–722. Ojala, Timo, Pietikäinen, Matti, & Harwood, David (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59. Pedrycz, Adam, Hirota, Kaoru, Pedrycz, Witold, & Dong, Fangyan (2012). Granular representation and granular computing with fuzzy sets. Fuzzy Sets and Systems, 203, 17–32. Yao, Y. Y. (1998). Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences, 111(1), 239–259. Yao, Y. Y. (1999). Granular computing using neighborhood systems. In Advances in soft computing (pp. 539–553). Springer. Yao, Y. Y. (2000). Granular computing: Basic issues and possible solutions. In Proceedings of the 5th joint conference on information sciences (Vol. 1, pp. 186– 189) Citeseer. Yao, Y. Y. (2001). On modeling data mining with granular computing. In 25th Annual international computer software and applications conference, 2001. COMPSAC 2001 (pp. 638–643). IEEE. Yao, Y. Y. (2001). Information granulation and rough set approximation. International Journal of Intelligent Systems, 16(1), 87–104. Yao, Y. (2006). Granular computing for data mining. In Defense and security symposium (pp. 624105). International Society for Optics and Photonics. Yao, Y., & Deng, X. (2013). A granular computing paradigm for concept learning. In Emerging paradigms in machine learning (pp. 307–326). Springer. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. http://refhub.elsevier.com/S0957-4174(13)00906-8/h0015 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0015 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0020 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0020 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0020 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0025 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0030 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0030 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0035 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0035 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0035 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0040 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0040 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0040 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0045 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0045 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0050 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0050 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0055 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0055 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0060 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0060 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0065 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0065 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0065 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0070 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0070 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0070 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0075 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0075 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0080 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0080 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0080 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0085 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0085 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0085 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0090 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0090 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0095 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0095 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0100 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0100 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0100 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0105 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0105 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0110 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0110 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0115 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0115 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0120 H. Hu et al. / Expert Systems with Applications 41 (2014) 2729–2741 2741 Zadeh, L. A. (1997). Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 90(2), 111–127. Zhang, L., & Zhang, B. (2003). Theory of fuzzy quotient space (methods of fuzzy granular computing). Journal of Software, 14(4), 770–776. Zhang, L., & Zhang, B. (2004a). The quotient space theory of problem solving. Fundamenta Informaticae, 59(2), 287–298. Zhang, L., & Zhang, B. (2004b). The quotient space theory of problem solving. Fundamenta Informaticae, 59(2), 287–298. Zhang, L., & Zhang, B. (2005). Quotient space model based hierarchical machine learning. International conference on neural networks and brain, 2005. ICNN&B’05 (Vol. 1). IEEE, pp. xiv–xiv. http://refhub.elsevier.com/S0957-4174(13)00906-8/h0125 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0125 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0125 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0130 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0130 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0135 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0135 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0140 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0140 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0145 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0145 http://refhub.elsevier.com/S0957-4174(13)00906-8/h0145 Perception granular computing in visual haze-free task 1 Introduction 2 Granular system based on tolerance relation 3 Hybrid designing of leveled perception granular system based on fuzzy logic and PSVM 4 Granular system for visual task 4.1 The theory of image matting 4.2 Leveled perception granular system for haze-free task 4.2.1 The 3rd layer – alogical layer 4.2.2 The 4th layer – fuzzy logical layer 4.3 Experiments result 5 Discussion Acknowledgments Appendix A Sigmoid function and Binary Logic Appendix B Sigmoid function and fuzzy logic Appendix C Associative condition and Demorgan law of q-weighted bounded operator References