Convolutional Neural Network approaches to granite tiles classification Expert Systems With Applications 84 (2017) 1–11 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Convolutional Neural Network approaches to granite tiles classification Anselmo Ferreira a , ∗, Gilson Giraldi b a Shenzhen Key Laboratory of Media Security, College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong 518060, PR China b National Laboratory for Scientific Computing (LNCC), Av. Getúlio Vargas 333, Quitandinha, Petrópolis, Rio de Janeiro 22230 0 0 0, Brazil a r t i c l e i n f o Article history: Received 1 February 2017 Revised 26 April 2017 Accepted 27 April 2017 Available online 4 May 2017 Keywords: Granite classification Convolutional Neural Networks Deep learning a b s t r a c t The quality control process in stone industry is a challenging problem to deal with nowadays. Due to the similar visual appearance of different rocks with the same mineralogical content, economical losses can happen in industry if clients cannot recognize properly the rocks delivered as the ones initially pur- chased. In this paper, we go toward the automation of rock-quality assessment in different image reso- lutions by proposing the first data-driven technique applied to granite tiles classification. Our approach understands intrinsic patterns in small image patches through the use of Convolutional Neural Networks tailored for this problem. Experiments comparing the proposed approach to texture descriptors in a well- known dataset show the effectiveness of the proposed method and its suitability for applications in some uncontrolled conditions, such as classifying granite tiles under different image resolutions. © 2017 Elsevier Ltd. All rights reserved. 1 i u b d m v c d d t c a s a i t q G m g i b u t p t e w a n a B t t s t i b a d a p h 0 . Introduction Natural Stone is the first building material used by the human- ty and it continues to be used on new structures. Among the nat- ral stones, the granite has received a particular interest due to its eauty and strength. Granite is an intrusive igneous rock widely istributed throughout Earth’s crust at a range of depths up to 31 iles (50 km) according to University of Tennessee (2008) . Since there are different colors of granite, their denomination aries depending on the country. Even with some standards being reated such as the one from the European Committee for Stan- ardization (CEN) (2009) and, more recently, from the Stone In- ustry SA Standards Board (2015) , there are some cases on which he visual appearance of different granites with same mineralogi- al content may not differ significantly, making it time consuming nd economically expensive to the stone industry to check, slab by lab, if the product colors to be delivered to clients are the same s the colors purchased. Most of the times, the quality assessment s done by human-skilled professionals in a subjective procedure hat can fail. The international operations of stone industries re- uire this procedure to be faster and more accurate. Automated systems can play an important role in this scenario. ranite companies are interested in computer vision allied with achine learning procedures able to identify patterns on granite ∗ Corresponding author. E-mail addresses: anselmo@szu.edu.br , anselmo.ferreira@gmail.com (A. Ferreira), ilson@lncc.br (G. Giraldi). d t b r ttp://dx.doi.org/10.1016/j.eswa.2017.04.053 957-4174/© 2017 Elsevier Ltd. All rights reserved. mages, using these patterns to sort and grade granite products ased on their visual appearance, helping in this way the prod- ct traceability and warehouse management. Also, an on-site tool hat can help to identify the real type of a product can help solving ossible misunderstoods between customers and manufacturers at he time of a delivery. This challenge has received some attention by the scientific lit- rature in the past years, and some studies of the application of ell-known colors and texture descriptors were performed, such s in the works made by Kurmyshev, Sánchez-Yáñez, and Fer- ández (2013) , Bianconi and Fernández (2006) , Lepisto, Kunttu, nd Visa (2005) , Araújo, Martínez, Ordóñez, and Vilán (2010) and ianconi, González, Fernández, and Saetta (2012) . However, most of hem are general purpose descriptors and were created for a con- rolled scenarios, where the slabs used for experiments have the ame size and images acquired have the same resolution. Descrip- ors applied on granite slabs classification to date generate what s called hand-crafted features created by feature engineering, built ased on a behavior that is expected to happen in all granite im- ges. Finally, there is no study regarding the application of these escriptors in slabs pieces. In this paper, we go beyond the use of hand-crafted features nd propose what is, as far as we know, the first data-driven ap- roach to identify granite patterns. The proposed technique uses eep Convolutional Neural Networks (CNNs) with different archi- ectures, trained to classify the intrinsic patterns of granite tiles ased on texture and color. Instead of designing a very deep neu- al network to be applied on high resolution images, a process that http://dx.doi.org/10.1016/j.eswa.2017.04.053 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2017.04.053&domain=pdf mailto:anselmo@szu.edu.br mailto:anselmo.ferreira@gmail.com mailto:gilson@lncc.br http://dx.doi.org/10.1016/j.eswa.2017.04.053 2 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 a f m q d o c s t p k l p s t e t w o d 3 e c c t e r t H t r a t t f e n a f a s f requires too many layers in the architecture and also thousands of images to train the network, our approach uses lightweight neu- ral networks on small patches of granite images, taking into ac- count the majority voting of patches classification for the images classification, making the proposed approach able to classify slabs of different sizes, image resolutions and even slab pieces. Experi- ments comparing our approach against some hand-crafted descrip- tors and pre-trained networks show the effectiveness of the pro- posed technique. In summary, the main contributions of this paper are: 1. Design and development of ad-hoc CNNs for granite tiles clas- sification. 2. Application of CNNs on multiple image data, represented by small patches located in regions of interest from granite slab images, to learn features that lead to high recognition accuracy by using majority voting on patches classification. 3. Validation of the proposed methodology with other descriptors not used before for granite classification. The remainder of this paper is organized as follows: Section 2 discusses related work about color and texture de- scriptors in the literature and also some studies applying them to the granite classification. Section 3 shows the basic concepts of CNNs, which are necessary to understand the proposed approach. Section 4 presents our approach for granite color classification and Section 5 reports all the details about the experimental methodology used for validating the proposed method against the existing counterparts in the literature. Section 6 shows the performed experiments and results and Section 7 reports our final considerations and proposals for future work. 2. Related work The problem of classifying materials by their type can be re- garded as a problem of classifying textures, colors and geometrical features. To create solutions in this regard, the scientific literature focused on applying computer-vision based approaches allied with machine learning to grade different materials. In the specific case of granite rocks, Ershad (2011) presented a method based on Primitive Pattern Units, a morphological oper- ator which is applied to each color channel separately. Statistical features are then used to discriminate different classes of natural stone such as granite, marble, travertine and hatchet. Kurmyshev et al. (2013) used coordinated clusters representation (CCR) for classifying granite tiles of the type “Rosa Porriño”. Bianconi and Fernández (2006) employed different Gabor filter banks for gran- ite classification. Lepisto et al. (2005) used a similar approach, but applied in each color channel separately. Araújo et al. (2010) em- ployed spectrophotometer to capture spectral data at different re- gions of interest in granite tiles and used Support Vector Machines to grade them. Fernández, Ghita, González, Bianconi, and Whelan (2011) studied how common image descriptors such as Local Bi- nary Patterns, Coordinated Clusters Representation and Improved Local Binary Patterns acted in granite image classification after ro- tation conditions. Most approaches described in literature are based on textural features alone. According to the study of Bianconi et al. (2012) this is somewhat surprising, since the visual appearance of granite tiles strongly depends on both texture and color. To this end, they per- form a study using several textures and color descriptors in five different classifiers, showing that combining color and texture fea- tures and classifying them on Support Vector Machines classifiers outperforms previous methods based on textural features alone. Finally, in a recent work, Bianconi, Bello, Fernández, and González (2015a) investigated the problem of choosing the ade- quate color representation for granite grading. They discussed pros nd cons of different color spaces for granite classification by per- orming experiments using very simple color descriptors: mean, ean + standard deviation, mean + moments from 2nd to 5th, uartiles and quintiles of each color channel. They showed that, epending on the classifier used, some color spaces are better than thers for classification, such as Lab and Luv spaces for the linear lassifier. As can be noticed, solutions presented for granite color clas- ification are based only on feature engineering. In other words, hese approaches are driven by patterns that are supposed to hap- en over all investigated images and also require different expert nowledge. Veering away from these methods, we propose a deep earning based approach, which extracts meaningful discriminative atterns straight from the data by using small image patches in- tead of using ordinary feature engineering in whole images. To do his, our approach exploits the advantages of back-propagation of rror used in Convolutional Neural Networks to automatically learn hese discriminant features present on the image textures. Before e discuss our proposed method to perform granite grading based n deep learning, it is worth discussing some basic concepts about eep neural networks in the next section. . Basic concepts Convolutional Neural Networks have been attracting a consid- rable attention by the computer vision research community re- ently, mainly because of their effective results in several image lassification tasks, outperforming even humans in certain situa- ions according to He, Zhang, Ren, and Sun (2015) , winning sev- ral image classification challenges, such as the ImageNet image ecognition contest as described by Krizhevsky, Sutskever, and Hin- on (2012) . Pioneered by the work of Lecun, Bottou, Bengio, and affner (1998) , CNNs are regarded as a deep learning application o images and, as so, they simulate the activity in layers of neu- ons in the neocortex, the place where most of thinking happens, ccording to Hof (2013) . So, the network learns to recognize pat- erns by highlighting in its layers the edges and pixel behaviors hat are commonly found in different images. The main benefit of using CNNs with respect to traditional ully-connected neural networks is the reduced amount of param- ters to be learned. Convolutional layers made of small size ker- els allow an effective way of extracting high-level features that re fed to fully-connected layers. The training of a CNN is per- ormed through back-propagation and stochastic gradient descent s Rumelhart, Hinton, and Williams (1986) describes. The misclas- ification error drives the weights update of both convolutional and ully-connected layers. The basic layers of a CNN are listed below: 1. Input layer : where data is fed to the network. Input data can be either raw image pixels or their transformations, whichever better emphasize some specific aspects of the image. 2. Convolutional layers : contain a series of filters with fixed size used to perform convolution on the image data, generating what is called feature map . These filters can highlight some pat- terns helpful for image characterization, such as edges, textures, etc. 3. Pooling layers : these layers ensure that the network focuses only on the most important patterns. They summarize the data by sliding a window across the feature maps, applying some lin- ear or non-linear operations on the data within the window, such as local mean or max, reducing the dimensionality of the feature maps used by the following layers. 4. Rectified Linear Unit (ReLU) : ReLU layers are responsible for ap- plying a non-linear function to the output x of the previous layer, such as f (x ) = max (0 , x ) . According to Krizhevsky et al. (2012) , they can be used for fast convergence in the training of A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 3 Fig. 1. Common architecture arrangement of a CNN. The input image is transformed into feature maps by the first convolution layer C1. A pooling stage S1 reduces the dimensions across the feature maps. The same process is repeated for layers C2 and S2. Finally a classifier is trained with the data generated by layer S2. r d C f d t t ( f 4 s r s n l n d 4 c p c I c C w t i 4 i a i d C f i o c i c C t i n 4 e p i b a B f t a l b n f 5 u l o t p p CNNs, speeding-up the training as they deal with the vanishing gradient problem by keeping the gradient more or less constant in all network layers. 5. Fully-connected layers : used for the understanding of patterns generated by the previous layers. Neurons in this layer have full connections to all activations in the previous layer. They are also known as the inner product layers. After trained, transfer learning approaches can extract features in these layers to train another classifier. 6. Loss layers : specify how the network training penalizes the de- viation between the predicted and true labels and is normally the last layer in the network. Various loss functions appropriate for different tasks can be used: Softmax, Sigmoid Cross-Entropy, Euclidean loss, among others. Fig. 1 depicts one possible CNN architecture. The type and ar- angement of layers vary depending on the target application. Although very powerful at representing patterns present in the ata, the main drawback of deep learning is the fact that common NNs normally need thousands or even millions of labeled data or training. This is an unfeasible condition in many applications ue to the lack of training data and to the amount of time needed o train a model. In this work, we present an alternative approach hat deals with this requirement by considering using image tiles multiple data) of the input images in the networks, as described urther in Section 4 . . Proposed method In this paper, we propose a methodology for granite tiles clas- ification through the application of CNNs. The proposed pipeline elies solely on the data to learn the texture patterns therein. We olve the CNN requirement of large training sets by applying the eural network on small patches of images, this way the CNN will earn discriminative features, instead of relying on feature engi- eering. Fig. 2 shows the pipeline of the proposed approach. We iscuss each step of this pipeline in the next paragraphs. .1. Conversion to grayscale As some proposed CNNs in this paper include networks that lassify grayscale images to identify textures, in this optional pre- rocessing step, the image is converted to gray-level, removing olor information from the analysis. Given an input RGB image , with one pixel represented by I(i, j) = (R (i, j) , G (i, j) , B (i, j) , its onversion to graylevels happens using Eq. (1) below: (i, j) = 0 . 2989 R (i, j) + 0 . 5870 G (i, j) + 0 . 1140 B (i, j) , (1) here C is the image converted to grayscale and used as input to he network and R, G and B are the color channels from the initial mage. .2. Patching In the first step of our proposed approach (labeled as step A n Fig. 2 ), the image is divided into tiles of interest. These tiles re out of image borders (no artificial pixel filling is done in the nput images) and will be used as input to the CNNs. This proce- ure is useful for the following reasons: (i) generate more data to NN training (ii) use majority voting in image tiles classification or testing a given input image and (iii) possibility to use different mage resolutions to train and test the CNN, as only image tiles f interest will be used in the network. This is an important pro- edure because, normally, CNNs are designed for given resolution nput images. However, this is not necessary in our approach be- ause only small granite blocks are used as input. In this step the NN is investigating several regions of interest (which we call mul- iple data) to classify the entire image. We use 28 × 28 grayscale mages and 32 × 32 color images in our CNNs, described in the ext subsection. ( Figs. 5 and 6 ) .3. Recognition by Convolutional Neural Networks In the second step, labeled as step B in Fig. 2 , we use differ- nt Convolutional Neural Networks to recognize patterns of image atches. For this, in a training step, we apply tiles from training mages through the networks to estimate the filter weights for a etter classification. Then, we do what is called transfer learning s described by Johnson and Karpathy (2016) and Yosinski, Clune, engio, and Lipson (2014) , using the already trained network as a eature extractor in the train and test stages. To do that we extract he last layer (the softmax layer) of the already trained network nd then the new output will be feature vectors generated by the ast convolutional layer. Finally, these image characterizations will e used by another classifier in the training and testing steps. The umber of networks used are four and their architectures are as ollows. • MNIST1 network: based in a design used in the MNIST dataset digit recognition challenge ( VLFEAT, 2016 ), it has an input layer which requires 28 × 28 grayscale input images and is com- posed by eight other layers: four convolutional layers, two pool- ing layers, one RELU layer and one fully-connected layer. • MNIST2 network: we extended MNIST1 network, creating a new one composed by eleven layers: one input layer, five convolu- tional layers, three pooling layers, one RELU layer and a final fully connected layer. • MNIST3 network: we extended even more our initial MNIST1 network, creating a new one composed by thirteen layers: one input layer, six convolutional layers, four pooling layers, one RELU layer, and a final fully connected layer. • CIFAR network: based in a design used in the CIFAR image recognition challenge ( VLFEAT, 2016 ), it has an input layer which requires 32 × 32 RGB input images and is composed by twelve other layers: five convolutional layers, three pooling lay- ers, four RELU layers and one fully-connected layer. These networks will generate, respectively, feature vectors with 00, 256, 256 and 64 dimensions respectively, which we aim to se in another classifier. Fig. 3 shows the learned filters in the first ayer of each network. Figs. 4 –7 show the output of these filters n networks MNIST1, MNIST2, MNIST3 and CIFAR respectively and heir power to discriminate by convolutions some granite blocks resent on the dataset built in the work of Bianconi et al. (2015a) . Additionally to the use of these networks individually, we pro- ose the fusion of them using for that two approaches: 1. Early fusion: The feature vector of a block will be the fusion of descriptions (feature vectors) from the three networks. Con- catenating such feature vectors will yield a final feature vector 4 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 Fig. 2. Pipeline of the proposed approach based on Convolutional Neural Networks for granite classification. Fig. 3. Filter weights of the first layer from (a) MNIST1 network (b) MNIST2 network (c) MNIST3 network and (d) CIFAR network. a M n of 500 + 256 + 256 = 1012 dimensions, which will be used in a given classifier. 2. Late fusion: the classification of a tile will happen by majority voting of the three classifiers outputs, fed by the feature vectors generated by the three networks applied on that tile. a Figs. 8 and 9 show, respectively, the early and late fusion pproaches pipelines for granite image classification using the NIST-based CNNs considered in this paper. If the first approach (early fusion) is chosen, the next step is ormalizing (or scaling) the concatenated feature vectors. There re several approaches to do this, but the one we choose is the A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 5 Fig. 4. MNIST1 first layer convolutions used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. Fig. 5. MNIST2 first layer convolutions used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. Fig. 6. MNIST3 first layer convolutons used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. Fig. 7. CIFAR first layer convolutons used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. Convolutions are represented in gray for a better visualization. s t | w e c V U i i f m a 4 g S F implest, which is dividing each vector by its norm. Given a fea- ure vector � V , its norm p is calculated as: | � V || p = ( n ∑ i =1 | � V (i ) p | ) 1 p , (2) here i is the i th vector element and n is the number of vector lements. For our application, we choose p = 2 and the norm generated is alled Euclidean Norm . The final feature vector � V f is � f = � V || � V || 2 . (3) sing this approach the feature vectors components will be scaled n such a way that each vector magnitude is always one. This is mportant because, as the feature vectors to be concatenated come rom different sources, the range of all features should be nor- alized so that each feature contributes approximately proportion- tely to the final feature vector. .4. Image patches classification In the training and testing steps, we apply the feature vectors enerated by the previously trained networks in another classifier. o, in the image patches classification step (labeled as step C in ig. 2 ), we choose the 1st Nearest Neighbor (1NN) classifier to clas- 6 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 Fig. 8. Pipeline of the early fusion of Convolutional Neural Networks for granite classification. In this fusion approach, three pre-trained networks with different architectures (MNIST1, MNIST2 and MNIST3) are applied in the same tiny input data (blocks), generating feature vectors that are concatenated and normalized to classify a given 28 × 28 block using information from these three networks. The final classification for an image uses the majority voting of its blocks. Fig. 9. Pipeline of the late fusion of Convolutional Neural Networks for granite classification. In this fusion approach, three pre-trained networks with different architectures are applied in the same tiny input data (blocks), extracting feature vectors and classifying them individually. The final classification of a block is the majority voting of labels generated by the three networks. The same happens on all the classified blocks to classify the whole image. i c w 5 s m s o sify the tiny image blocks from input images. Basically, this classi- fier works by assigning a testing sample to the class of its nearest neighbor defined in a training step. In the end of this process, each valid block from the image is classified. 4.5. Image classification After the individual patches classification, in step D in Fig. 2 ) we will classify the image by analyzing, in the blocks classifi- cation, which is the most predicted class. Given a vector � x = { out put , . . . , out put } which contains the classification of b blocks 1 b n the image, the predicted class of a granite image G is lass (G ) = mode ( � x ) , (4) here mode ( � x ) is the most predicted class in vector � x . . Experimental setup Before we discuss our experimental results, we present in this ection the materials and methods used to compare the proposed ethod against its counterparts in the literature. In the following ubsections we show the granite tile dataset used, the methodol- gy and metrics used in the experiments to assess the classifica- A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 7 t t 5 B 1 p c i i a w p T t v t s q o w t 5 n m n w t a D i d a a a a b i g 1 i fi r b ( m t b c a t b t c t w t A w y a m fi a g t r t d p t e w e w j p c l e o e 5 b a t w m w i f p t v n p f { w T i t v t m W a ion performance, parameters used in our proposed approach and he state-of-the art implementations used. .1. Image dataset We used the same granite tiles dataset applied in the work of ianconi et al. (2015a) . It contains 10 0 0 RGB images with 150 0 × 500 dimensions, subdivided in 25 classes containing 40 images er class. The first 100 images were acquired naturally by a spe- ific scanner. The other 900 were created by rotating the initial 100 mages in 10 °, 20 °, 30 °, 40 °, 50 °, 60 °, 70 °, 80 ° and 90 °. To be suitable for our proposed approach, we subdivide these mages in 28 × 28 valid blocks ( i.e. , outside image borders) to be pplied on MNIST1, MNIST2 and MNIST3 networks, creating this ay 2809 valid blocks per image. So, the dataset used in the ex- eriments contains 2809 × 10 0 0 = 2809 . 0 0 0 tiny grayscale images. o use the proposed CIFAR network on RGB images we subdivide he dataset images on 32 × 32 valid blocks, creating this way 2116 alid RGB blocks per image and a total of 2116 × 10 0 0 = 2116 . 0 0 0 iny color images. Doing these procedures we artificially create a great amount of mall images to be used in small networks, eliminating the re- uirement of big networks applied in a big amount of high res- lution images. By subdividing the test image into these blocks, e can increase classification accuracy by doing majority voting of hese blocks, classifying a big image by its small parts. .2. Methodology and metrics We validate the proposed method using two experimental sce- arios: assessing the performance of classifying whole images (by ajority voting of blocks) and also small blocks. For the first sce- ario, we consider a 5 × 2 cross-validation protocol, on which e replicate the traditional 1 × 2 cross-validation protocol five imes (thus 5 × 2). In each of them, we divided the set of im- ges D randomly into equal subsets of images D 1 and D 2 , on which 1 ∪ D 2 = D and D 1 ∩ D 2 = ∅ . Then, the classifier is trained with D 1 mages as input and tested on D 2 images and then the inverse is one. If the process is repeated five times, metrics can be reported fter 10 rounds of experiments. The experimental results using our pproaches will be in terms of high resolution images classification ccuracy after majority voting of low resolution image blocks. Using the 5 × 2 cross validation in our scenario, if our proposed pproach acts on 2889 28 × 28 valid (outside borders) grayscale locks and 2116 32 × 32 valid RGB blocks from each of the 10 0 0 mages in our dataset, we will end up with 500 × 2809 = 1404 . 500 rayscale blocks (for MNIST based networks) and 500 × 2116 = 058 . 0 0 0 RGB blocks (for the CIFAR based network) used for train- ng the classifier, and the same number of blocks to test the classi- er. This happens in all 10 rounds of experiments. So, our training atio will always be 50% of the data and the other 50% of data will e the testing ratio. According to a study conducted by Dietterich 1998) , the 5 × 2 cross-validation is considered an optimal experi- ental protocol for learning algorithms. For the second experimental scenario, we show the classifica- ion performance of image blocks. For this case, we used one com- ination of training and test images. We intend to show using this onfiguration how the proposed approaches and some state of the rt behave on classifying only small image resolution images con- aining granite patterns. In this case results are reported on block asis (without majority voting). We used the accuracy method to evaluate the performance of he algorithms tested. In a multi-class problem with c classes, the lassification results may be represented in a c × c confusion ma- rix M . In this case, the main diagonal contains the true positives hile the other entries contain either false positives or false nega- ives. So, the accuracy of an experiment round is calculated as ccuracy = ∑ c i =1 M(i, i ) ∑ c i =1 ∑ c j=1 M(i, j) . (5) here i and j are line and column indexes of M , respectively. In the 5 × 2 cross validation protocol, one confusion matrix is ielded per experiment. Therefore, we present results by averaging ccuracies from these matrices. The other set of metrics shown in the experiments are com- only applied in CNN image recognition tasks and will be used to nd the number of epochs to train our proposed networks. These re called top-1 and top-5 errors and are widely used in the Ima- eNet Large Scale Visual Recognition Challenge and also for other asks as reported by Russakovsky et al. (2015) . Commonly, CNNs eturns for one image i the j most probable classes { c i 1 , . . . , c i j } in he output of its last layer as the prediction for that image. If we efine c ij the prediction of an image i and C i its ground truth, the rediction is considered correct if c i j = C i for some j . If we define he error of a prediction d i j = d(c i j , C i ) as 0 if c i j = C i and 1 oth- rwise, the error of an algorithm is the fraction of test images on hich the algorithm makes a mistake,: rr = 1 N N ∑ i =1 min j (d(c i j , C i )) , (6) here N is the total of images used in the validation of a CNN. The difference of top-1 and top-5 errors is the value used for : in top-1 j = 1 and top-5 j = 5 . In other words, top-1 error com- ares if the top class (the one having the highest probability, or i 1 ) is the same as the target label C i and top-5 scores if the target abel is one of the top 5 predictions (the 5 ones with the high- st probabilities, or { c i 1 , . . . , c i 5 } ). The top-5 error is normally used nly for images with more than one object and will not be consid- red in our scenario (we use only top-1 error). .3. Implementation aspects of the proposed approaches To implement our proposed approaches we used the CNNs li- rary for MATLAB available at VLFEAT (2016) . We used the CIFAR nd MNIST1 architectures, training them from scratch. We also ex- ended layers from network MNIST1, creating this way new net- orks MNIST2 and MNIST3. To create feature extractors, we re- ove the last layer from the trained networks and feed them again ith training images, generating training feature vectors to be the nput for the 1NN classifier. This same last process happens again or testing images. For the early fusion approach, applied only to grayscale in- ut images networks (MNIST related CNNs), we used the three rained networks from scratch to extract train and test feature ectors, concatenating the vectors from these three networks and ormalizing them by Eq. 3 . Finally, these vectors are the in- ut of the 1NN classifier and, to classify the image we per- orm majority voting of blocks. We denominate this approach as M NIST 1 , M NIST 2 , M NIST 3 } _ concat . Finally, we test the complementarity of the three grayscale net- orks (MNIST related networks), using majority voting for a block. o do this we apply, in the test phase, each network individually n the same block. The three feature vectors will be then fed into hree independent 1NN classifiers, previously trained with feature ectors from their corresponding CNNs. The result for a block is he majority voting of the three 1NN classifications. Then, a final ajority voting of blocks will define the class of the test image. e label this approach as { M NIST 1 , M NIST 2 , M NIST 3 } _ v ote . The MNIST based networks were trained using batches of im- ges containing 100 images, with learning rate fixed on 0.001. We 8 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 e a t n c a i t g l 6 g i e m fi fi I a t a d c w a n t ( 1 f F v 1 b i o t i i o a c i t 6 p h t i t M p c r used the stochastic gradient descent with momentum equals to 0.9 and weighting decay equals to 0.0 0 05 without dropout. The CI- FAR network was trained using batches of images containing 100 images, with stochastic gradient descent and momentum equals to 0.9. The weighting decay is 0.0 0 01 without dropout. Upon accep- tance of this paper, all the source code of the proposed approaches will be available at GitHub 1 5.4. Baselines To compare our approach against the state of the art we initially chose ten texture descriptors to be used as baselines, some of them were already used for Granite Image classification and others were applied for other texture recognition applications. The first five baselines approaches can be regarded as general purpose texture descriptors used in several applications. The first three chosen are the statistics of Gray-Level Co-occurrence Matrices (GLCMs) from Haralick, Shanmugam, and Dinstein (1973) (we label this approach as GLCM in the experiments), the Local Binary Patterns (we label this approach as LBP in the experiments) from Ojala, Pietikäinen, and Harwood (1996) and the Histogram of Gradients (we label this approach as HOG in the experiments) from Dalal and Triggs (2005) . The GLCM texture description approach build statistics calculated over matrices of neighborhood relations only in given directions and offsets (distances between pixels); LBP can be regarded as a histogram of neighborhood relations between a pixel and its eight neighbors and HOG is a histogram of gradient orientation on re- gions of interest of images. The other two descriptors are the Dom- inant Local Binary Patterns of Bianconi, González, and Fernández (2015b) (labeled as DLBP in the experiments) and the Rotation In- variant Co-Occurrences of patterns from González, Fernández, and Bianconi (2014) without feature selection (labeled as C ri 1 in the ex- periments). These two last approaches were validated in the same granite image dataset we are using, which was first presented in the paper of Bianconi et al. (2015a) . Other five descriptors used as baselines in the experiments were originally proposed for specific texture description applica- tions and were never used for texture characterization of granite tiles. The first three approaches are based on the Convolutional Texture Gradient Filter (CTGF), proposed by Ferreira, Navarro, Pin- heiro, dos Santos, and Rocha (2015) to identify texture patterns of printed letters to attribute the laser printer source of a doc- ument. This descriptor measures the texture of an image as his- tograms of convolved textures in low gradient areas, using for the convolution matrices of different sizes. We label approaches based on CTGF in the experiments as CTGF 3 × 3 , CTGF 5 × 5 and CTGF 7 × 7 . The other two approaches came from the same work of Ferreira et al. (2015) . These are multidirectional extensions of GLCMs (we call this GLCM − MD in the experiments) and multi-directional and multi-scale extensions of GLCMs (called GLCM − MD − MS). Finally, we also used existing pre-trained CNNs to compare against our networks trained from scratch. These are the origi- nal networks used in the CIFAR general-purpose image recognition challenge and MNIST digit recognition challenge and are available to download in the MatConvNet library from VLFEAT (2016) . As we do with our approach, we removed the last layer and applied the feature vectors in the nearest neighbor classifier. We call these ap- proaches CIFAR original and MNIST original , respectively. 6. Experiments Now we focus on showing experimental results. For this, we start applying the methodology and calculated the metrics dis- cussed in Section 5.2 using the dataset showed in Section 5.1 . All 1 www.github.com/anselmoferreira/granite- cnn- classification . T t p xperiments were performed on a cluster node with 11 processors nd 131 GB RAM and graphics card NVIDIA GeForce 210. This sec- ion is described as follows: firstly, we show how we choose the umber of epochs used to train our networks. Then, we start the omparison of our proposed approaches against the state of the rt techniques described in Section 5.4 in two scenarios: using big mages and small blocks. These two last experiments were chosen o show the effectiveness of the proposed approaches to classify ranite rocks images of different resolutions (very high and very ow). .1. Defining epochs to train CNNs Before start training our proposed CNNs from scratch using ranite images, the number of epochs to train the network, which s the number of one forward and one backward pass of all training xamples through the network, must be defined. For this experi- ental scenario we show the result of classification of using the rst combination of train and validation using for that the classi- er attached to the end of the network ( i.e., a soft-max classifier). n this case, we further subdivide each image on small-sized blocks nd use them to train and validate the classifier. One natural solu- ion would be using the whole image containing the granite tile s the input for CNNs, but choosing this solution would require eeper networks, which will require a larger amount of data, more omputational time and more memory resources to train the net- orks. Using smaller areas as input do not require as many layers s using the whole image and also can lead to a faster learning of etwork parameters and weights. Using our proposed procedure of classifying small blocks, we hen found the number of epochs to be used in each network MNIST1, MNIST2, MNIST3 and CIFAR) as the ones with less top- validation error ( valtop1e ) after 18 epochs and are, respectively or each network, 10, 15 and 15 epochs as Fig. 10 shows. For CI- AR network, we choose the number of epoch with the less top-1 alidation error ( valtop1e ) after 150 epochs, which is found in the 42th epoch. As we used so many epochs in our CIFAR seek for est validation epoch, it is not so clear to see its lowest top1-error n the graph, so we decide to exclude it from Fig. 10 for the sake f clarity. After finding the number of epochs to train our feature extrac- or, we find the model used to extract feature vectors by first train- ng the networks using the epochs found in this validation exper- ment. Then, we can use these networks to extract feature vectors f images, by applying training and testing images in the networks nd use the output of the last but one layer to feed a multiclass lassifier based on the nearest neighbor classification, as described n Section 4.4 . The following experiments of this paper consider his scenario and will be discussed as follows. .2. Comparison against baselines in high resolution images Now we focus on the experiments comparing our proposed ap- roaches against the state of the art considering classification of igh resolution images of the dataset considered. Table 1 shows he results. For this task, our approaches (highlighted in light gray n this table) perform the classification of test images after doing he majority voting of the 2809 valid 28 × 28 grayscale blocks for NIST-based networks and 2116 valid 32 × 32 RGB blocks in the roposed CIFAR-based network. As can be seen from Table 1 , our deeper network applied on olor tiles, called CIFAR , is the best network to classify granite ocks, showing perfect detection in all 10 rounds of experiments. his happens because the layers in this network better decompose he color information from these images, highlighting important atterns to be used in the classification. http://www.github.com/anselmoferreira/granite-cnn-classification A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 9 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 training epoch e rr o r error traintop1e valtop1e (a) 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 training epoch e rr o r error traintop1e valtop1e (b) 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 training epoch e rr o r error traintop1e valtop1e (c) Fig. 10. Validation error results after 18 training epochs of the proposed (a) MNIST1 (b) MNIST2 and (c) MNIST3 CNNs for granite images classification. The errors in these three figures were measured for 28 × 28 granite blocks classification. From this figure the smallest validation error ( valtop 1 e ) can be found in the 10th, 15th and 15th epoch respectively. Table 1 Results comparing the best configurations of the proposed method to the existing meth- ods in the literature after 5 × 2 validation. Proposed CNNs methods and CNNs fusion approaches are highlighted in light gray. Method Mean ± Std. dev. Min Max CIFAR 100% ± 0.00 10 0.0 0% 10 0.0 0% CT GF _ 3 X 3 ( Ferreira et al., 2015 ) 100% ± 0.00 10 0.0 0% 10 0.0 0% C ri 1 ( González et al., 2014 ) 99.96% ± 0.12 99.60% 10 0.0 0% CT GF _ 5 X 5 ( Ferreira et al., 2015 ) 99.92% ± 0.10 99.80% 100% CT GF _ 7 X 7 ( Ferreira et al., 2015 ) 99.70% ± 0.14 99.60% 10 0.0 0% DLBP ( Bianconi et al., 2015b ) 99.88% ± 0.16 99.60% 10 0.0 0% MNIST 2 99.32% ± 0.70 97.80% 10 0.0 0% LBP ( Ojala et al., 1996 ) 98.96% ± 0.64 97.80% 99.80% MNIST 3 98.16% ± 0.56 97.00% 99.00% { M NIST 1 , M NIST 2 , M NIST 3 } _ v ote 96.50% ± 0.99 94.80% 97.60% CIFAR original 94.84% ± 0.52 94.00% 96.00% GLCM − MD − MS ( Ferreira et al., 2015 ) 92.04% ± 1.09 90.00% 93.80% GLCM − MD ( Ferreira et al., 2015 ) 90.18% ± 1.43 88.40% 93.20% MNIST 1 86.62% ± 1.61 84.20% 88.80% HOG ( Dalal & Triggs, 2005 ) 78.04% ± 1.50 76.20% 80.80% GLCM ( Haralick et al., 1973 ) 72.74% ± 1.97 68.80% 74.60% MNIST original 62.98% ± 2.12 60.80% 66.40% { M NIST 1 , M NIST 2 , M NIST 3 } _ concat 26.44% ± 25.89 0.00% 67.00% o s a t t c e f M s p l One interesting point to be noticed is the good performance of ther texture descriptors that were proposed for other applications, uch as the ones from Ferreira et al. (2015) . The best state of the rt approach, CT GF _ 3 X 3 , also showed a perfect detection in all of en rounds of experiments. This happens because CTGF builds his- ogram of low pass textures that are in flat areas of the images, onsidering only the texture of these flat areas instead of using t dges information and some abnormal imperfections that can dif- erentiate the same class of granite slabs. Table 1 also shows that the proposed methods MNIST1 and NIST3 are not showing good results if compared with most tate of the art. This severely impacts the proposed fusions ap- roaches by early fusion ( { M NIST 1 , M NIST 2 , M NIST 3 } _ concat ) and ate fusion ( { M NIST 1 , M NIST 2 , M NIST 3 } _ v ote ), because feature vec- ors and outputs from MNIST 1 and MNIST 3 networks do not de- 10 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 Fig. 11. Results considering the classification of 32 × 32 granite image blocks. l u i t f p p a v t p i b t t o e i i f d w 7 s c a a e t t n d t l p e o r t g l t t e p p a a t s a p t r d c a f m scribe the granite images effectively as MNIST 2 does. One good step towards the solution could be training new and more deeper net- works to be used in the fusion. Finally, it is worth discussing the results of pre-trained mod- els in classifying granite slabs. As seen in Table 1 , the pre-trained models CIFAR original and MNIST original show poor classification re- sults. This happens because these pre-trained models were tough to identify different patterns from the ones present on granite tiles, so their parameters (or filter weights) were not found to discrim- inate granite slabs patterns. This affects extremely their experi- ments results. 6.3. Comparison against baselines in low resolution images For the final round of experiments considering the comparison with the state of the art, we now focus on the experiments regard- ing small 32 × 32 blocks classification. For this, we consider the classification result without performing the majority voting, using for this the first split of training data and test data. This will result in a total of 10 0 0 (images) × 2116 (blocks) = 2,116,0 0 0 blocks of size 32 × 32, half of them are used to train and the other half used to test the classifier. We show, in Fig. 11 , the accuracy result of our best proposed approach against the best state of the art methods considered in the experiments of Section 6.2 . Results showed in Fig. 11 highlights the difficulty of classify- ing very tiny images. This can be explained because they contain a very small color information that can be confused with other gran- ite classes, decreasing this way the classification accuracy. Even in this difficult scenario, our proposed CNN CIFAR , trained to classify these small blocks with backpropagation of error and more layers, showed a classification accuracy higher than 85% using the same nearest neighbor classifier used in the experiments of Section 6.2 , classifying a total of 923,231 blocks out of 1,058,0 0 0 blocks. The high classification accuracy of low resolution blocks explains why the majority voting of individual blocks helped our proposed ap- proach to reach the 100% accuracy on the experiments results shown in Table 1 . The poor results of the feature engineering approaches of Ferreira et al. (2015) and González et al. (2014) are due the in- ner characteristic of these feature engineering approaches to be global descriptors, not being thought to classify low resolution im- ages. For example, the CT GF _ 3 X 3 approach of Ferreira et al. (2015) , which classifies correctly only a total of 351,760 blocks, creates histogram of textures on convoluted areas with a 3 × 3 win- dow. As the areas used to describe the granite tiles contain only 32 × 32 = 1024 pixels, the final histogram used as feature vector will contain several bins unused, and the accurate description of this small granite tile will be affected. The low accuracy results of iterature solutions in Fig. 11 seriously decrease the possibility of sing these approaches to do image classification by majority vot- ng of image patches, as our proposed approach does. Other impor- ant aspect of the proposed approach is the time for extracting the eature vectors. While our trained from scratch network took ap- roximately one hour to extract 1,058,0 0 0 feature vectors, the ap- roaches from Ferreira et al. (2015) and González et al. (2014) took pproximately 6 days each to extract the same number of feature ectors. This highlights the efficiency of the proposed method in his multiple data scenario. With the results presented in this section, we show that the roposed approach can achieve perfect detection of high resolution mages, showing comparable results with the state of the art and, y considering low resolution granite rocks slabs, it outperforms he state of the art comfortably. This indicates, firstly, the poten- ial of our proposed method to be a complementary approach to thers in the literature for industrial applications with controlled nvironment. Moreover, it can be a good starting point for help- ng classifying rocks on uncontrolled environments, such as us- ng smartphones with different resolutions as acquisition devices or granite recognition, acting as a possible expert to solve misun- erstoods between customers and manufacturers when a possible rong delivery happens. . Conclusion The creation of quality-control procedures for sorting natural tones such as granites is a promising task, as they avoid economi- al losses due to delivery of wrong granite packages to clients. The utomation of such process is either important to reduce errors nd eliminate the use of a subjective process involving expensive xperts. However, most of the applications proposed in the litera- ure thus far for this task rely on feature engineering approaches hat investigate texture and color behaviors which are supposed to ot change. Additionally, they do not validate the approaches in a ifficult task of classifying natural stones of different image resolu- ions. In this paper, we address these issues by proposing a deep earning based approach to granite rocks classification. Our ap- roaches are based on Convolutional Neural Networks of differ- nt architectures applied on small image tiles, analyzing textures f each one and using majority voting of them to classify high- esolution images. We also investigate the performance of some of he networks when applied together. Experimental results showed ood results of the proposed approaches for classifying high and ow resolution images if compared to some other texture descrip- ors proposed in the literature. Although there is a long path to go toward the classifica- ion of natural stones in real-world situations, in which differ- nt resolutions and lightning conditions of granite rocks can hap- en, we believe that the direction considers deep learning ap- roaches through Convolutional Neural Networks. As our proposed pproaches deal with tiny patches, they have potential to be invari- nt to most of acquisition resolutions, once they are bigger than he input tiny patches that fit our networks. This is an important tep to make the granite slabs recognition tasks available to other cquisition devices, such as smartphones. Also, the proposed ap- roaches can also be complimentary to other approaches in indus- ry applications of slabs recognition, as they showed comparable esults to the state of the art in classifying high resolution images. The work started in this paper opens a set of future work to be one, and involves (i) the proposal of new and deeper network ar- hitectures for this task; (ii) use of Convolutional Neural Networks pplied on other image representations; (iii) testing new ways of using different network architectures; (iv) studying the comple- entarity of these data-driven approaches with featuring engineer- A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 11 i o w i A a a t p M s R A B B B B D D E E F F G H H H J K K L L O R R S S U V Y ng techniques and (v) application of experiments considering the pen set scenario of Scheirer, Rocha, Sapkota, and Boult (2013) , on hich the classifier is designed to also consider unknown samples n the training process. cknowledgments This work was supported partly by NSFC ( 61332012 , U1636202 ) nd the Shenzhen R&D Program (JCYJ2016032814 4 421330). We lso thank the support by the Brazilian National Council for Scien- ific and Technological Development (Grant #312602/2016-2 ) and rofessors Nuria Fernández, Bruno Montandon Noronha Barros and illena Basílio da Silva for the discussions that originated this re- earch. eferences raújo, M. , Martínez, J. , Ordóñez, C. , & Vilán, J. A. (2010). Identification of granite varieties from colour spectrum data. Sensors, 10 (9), 8572–8584 . ianconi, F. , Bello, R. , Fernández, A. , & González, E. (2015a). On comparing colour spaces from a performance perspective: Application to automated classification of polished natural stones. In New trends in image analysis and processing . In Lecture notes in computer science: Vol. 9281 (pp. 71–78). Genoa, Italy: Springer . ianconi, F. , & Fernández, A. (2006). Granite texture classification with Gabor fil- ters. In Proceedings of international congress on graphical engineering (INGEGRAF), Sitges, Spain . ianconi, F. , González, E. , & Fernández, A. (2015b). Dominant local binary patterns for texture classification: Labelled or unlabelled? Pattern Recognition Letters, 65 , 8–14 . ianconi, F. , González, E. , Fernández, A. , & Saetta, S. A. (2012). Automatic classifi- cation of granite tiles through colour and texture features. Expert Systems with Applications, 39 (12), 11212–11218 . alal, N. , & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of International conference on computer vision & pattern recogni- tion, California, USA: Vol. 2 (pp. 886–893) . ietterich, T. G. (1998). Approximate statistical tests for comparing supervised clas- sification learning algorithms. Neural Computation, 10 , 1895–1923 . rshad, S. F. (2011). Color texture classification approach based on combination of primitive pattern units and statistical features. Multimedia and its Applications, 3 (3), 1–13 . uropean Committee for Standardization (CEN) (2009). EN 12440:2008 natural stone: Denomination criteria. Report . European Committee for Standardization (CEN) . ernández, A. , Ghita, O. , González, E. , Bianconi, F. , & Whelan, P. F. (2011). Evaluation of robustness against rotation of LBP, CCR and ILBP features in granite texture classification. Machine Vision and Applications, 22 (6), 913–926 . erreira, A. , Navarro, L. C. , Pinheiro, G. , dos Santos, J. A. , & Rocha, A. (2015). Laser printer attribution: Exploring new features and beyond. Forensic Science Inter- national, 247 , 105–125 . onzález, E. , Fernández, A. , & Bianconi, F. (2014). General framework for rotation in- variant texture classification through co-occurrence of patterns. Journal of Math- ematical Imaging and Vision, 50 (3), 300–313 . aralick, R. M. , Shanmugam, K. , & Dinstein, I. (1973). Textural features for im- age classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6), 610–621 . e, K. , Zhang, X. , Ren, S. , & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of IEEE in- ternational conference on computer vision (ICCV), Santiago, Chile (pp. 1026–1034) . of, R. (2013). Deep learning. With massive amounts of computational power, ma- chines can now recognize objects and translate speech in real time. Artificial in- telligence is finally getting smart. https://www.technologyreview.com/s/513696/ deep- learning/ . ohnson, J., & Karpathy, A. (2016). CS231n convolutional neural networks for visual recognition. http://cs231n.github.io/transfer- learning/ . rizhevsky, A. , Sutskever, I. , & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of neural information pro- cessing systems (NIPS), Nevada, USA (pp. 1106–1114) . urmyshev, E. V. , Sánchez-Yáñez, R. E. , & Fernández, A. (2013). Colour texture clas- sification for quality control of polished granite tiles. In Proceedings of the third IASTED international conference on visualization, imaging and image processing: Vol. 2 (pp. 603–608). ACTA Press . ecun, Y. , Bottou, L. , Bengio, Y. , & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11), 2278–2324 . episto, L. , Kunttu, I. , & Visa, A. (2005). Rock image classification using color features in Gabor space. Journal of Electronic Imaging, 14 (4), 040503-1–040503-3 . jala, T. , Pietikäinen, M. , & Harwood, D. (1996). A comparative study of texture mea- sures with classification based on feature distributions. Pattern Recognition, 29 , 51–59 . umelhart, D. E. , Hinton, G. E. , & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323 , 533–536 . ussakovsky, O. , Deng, J. , Su, H. , Krause, J. , Satheesh, S. , Ma, S. , . . . Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Com- puter Vision, 115 (3), 211–252 . cheirer, W. J. , Rocha, A. , Sapkota, A. , & Boult, T. E. (2013). Towards open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 , 1757–1772 . tone Industry SA Standards Board (2015). Stone standards. Report . Stone Industry SA Standards Board . niversity of Tennessee (2008). Granite dimensional stone quarrying and process- ing. Report . University of Tennessee – Center for Clean Products . LFEAT (2016). MatConvNet: CNNs for MATLAB. http://www.vlfeat.org/matconvnet/ . osinski, J. , Clune, J. , Bengio, Y. , & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328). Montreal, Canada: Curran Associates, Inc . http://dx.doi.org/10.13039/501100001809 http://dx.doi.org/10.13039/501100003593 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0007 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0007 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0008 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0008 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0009 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0009 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014 https://www.technologyreview.com/s/513696/deep-learning/ http://cs231n.github.io/transfer-learning/ http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0023 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0023 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0024 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0024 http://www.vlfeat.org/matconvnet/ http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025 Convolutional Neural Network approaches to granite tiles classification 1 Introduction 2 Related work 3 Basic concepts 4 Proposed method 4.1 Conversion to grayscale 4.2 Patching 4.3 Recognition by Convolutional Neural Networks 4.4 Image patches classification 4.5 Image classification 5 Experimental setup 5.1 Image dataset 5.2 Methodology and metrics 5.3 Implementation aspects of the proposed approaches 5.4 Baselines 6 Experiments 6.1 Defining epochs to train CNNs 6.2 Comparison against baselines in high resolution images 6.3 Comparison against baselines in low resolution images 7 Conclusion Acknowledgments References