Convolutional Neural Network approaches to granite tiles classification


Expert Systems With Applications 84 (2017) 1–11 

Contents lists available at ScienceDirect 

Expert Systems With Applications 

journal homepage: www.elsevier.com/locate/eswa 

Convolutional Neural Network approaches to granite tiles classification 

Anselmo Ferreira a , ∗, Gilson Giraldi b 

a Shenzhen Key Laboratory of Media Security, College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong 518060, PR 

China 
b National Laboratory for Scientific Computing (LNCC), Av. Getúlio Vargas 333, Quitandinha, Petrópolis, Rio de Janeiro 22230 0 0 0, Brazil 

a r t i c l e i n f o 

Article history: 

Received 1 February 2017 

Revised 26 April 2017 

Accepted 27 April 2017 

Available online 4 May 2017 

Keywords: 

Granite classification 

Convolutional Neural Networks 

Deep learning 

a b s t r a c t 

The quality control process in stone industry is a challenging problem to deal with nowadays. Due to 

the similar visual appearance of different rocks with the same mineralogical content, economical losses 

can happen in industry if clients cannot recognize properly the rocks delivered as the ones initially pur- 

chased. In this paper, we go toward the automation of rock-quality assessment in different image reso- 

lutions by proposing the first data-driven technique applied to granite tiles classification. Our approach 

understands intrinsic patterns in small image patches through the use of Convolutional Neural Networks 

tailored for this problem. Experiments comparing the proposed approach to texture descriptors in a well- 

known dataset show the effectiveness of the proposed method and its suitability for applications in some 

uncontrolled conditions, such as classifying granite tiles under different image resolutions. 

© 2017 Elsevier Ltd. All rights reserved. 

1

 
i  

u  

b  

d  

m

 
v  

c  

d  

d  

t  

c  

a  

s  

a  

i  

t  

q

 
G  

m  

g

i  

b  

u  

t  

p  

t

 
e  

w  

a  

n  

a  

B  

t  

t  

s  

t  

i  

b  

a  

d

 
a  

p  

h

0

. Introduction 

Natural Stone is the first building material used by the human-

ty and it continues to be used on new structures. Among the nat-

ral stones, the granite has received a particular interest due to its

eauty and strength. Granite is an intrusive igneous rock widely

istributed throughout Earth’s crust at a range of depths up to 31

iles (50 km) according to University of Tennessee (2008) . 

Since there are different colors of granite, their denomination

aries depending on the country. Even with some standards being

reated such as the one from the European Committee for Stan-

ardization (CEN) (2009) and, more recently, from the Stone In-

ustry SA Standards Board (2015) , there are some cases on which

he visual appearance of different granites with same mineralogi-

al content may not differ significantly, making it time consuming

nd economically expensive to the stone industry to check, slab by

lab, if the product colors to be delivered to clients are the same

s the colors purchased. Most of the times, the quality assessment

s done by human-skilled professionals in a subjective procedure

hat can fail. The international operations of stone industries re-

uire this procedure to be faster and more accurate. 

Automated systems can play an important role in this scenario.

ranite companies are interested in computer vision allied with

achine learning procedures able to identify patterns on granite
∗ Corresponding author. 
E-mail addresses: anselmo@szu.edu.br , anselmo.ferreira@gmail.com (A. Ferreira), 

ilson@lncc.br (G. Giraldi). 

d  

t  

b  

r  

ttp://dx.doi.org/10.1016/j.eswa.2017.04.053 

957-4174/© 2017 Elsevier Ltd. All rights reserved. 
mages, using these patterns to sort and grade granite products

ased on their visual appearance, helping in this way the prod-

ct traceability and warehouse management. Also, an on-site tool

hat can help to identify the real type of a product can help solving

ossible misunderstoods between customers and manufacturers at

he time of a delivery. 

This challenge has received some attention by the scientific lit-

rature in the past years, and some studies of the application of

ell-known colors and texture descriptors were performed, such

s in the works made by Kurmyshev, Sánchez-Yáñez, and Fer-

ández (2013) , Bianconi and Fernández (2006) , Lepisto, Kunttu,

nd Visa (2005) , Araújo, Martínez, Ordóñez, and Vilán (2010) and

ianconi, González, Fernández, and Saetta (2012) . However, most of

hem are general purpose descriptors and were created for a con-

rolled scenarios, where the slabs used for experiments have the

ame size and images acquired have the same resolution. Descrip-

ors applied on granite slabs classification to date generate what

s called hand-crafted features created by feature engineering, built

ased on a behavior that is expected to happen in all granite im-

ges. Finally, there is no study regarding the application of these

escriptors in slabs pieces. 

In this paper, we go beyond the use of hand-crafted features

nd propose what is, as far as we know, the first data-driven ap-

roach to identify granite patterns. The proposed technique uses

eep Convolutional Neural Networks (CNNs) with different archi-

ectures, trained to classify the intrinsic patterns of granite tiles

ased on texture and color. Instead of designing a very deep neu-

al network to be applied on high resolution images, a process that

http://dx.doi.org/10.1016/j.eswa.2017.04.053
http://www.ScienceDirect.com
http://www.elsevier.com/locate/eswa
http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2017.04.053&domain=pdf
mailto:anselmo@szu.edu.br
mailto:anselmo.ferreira@gmail.com
mailto:gilson@lncc.br
http://dx.doi.org/10.1016/j.eswa.2017.04.053


2 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 

 
a  

f  

m  

q  

d  

o  

c

 
s  

t  

p  

k  

l  

p  

s  

t  

e  

t  

w  

o  

d

3

 
e  

c  

c  

t  

e  

r  

t  

H  

t  

r  

a  

t  

t

 
f  

e  

n  

a  

f  

a  

s  

f

 
requires too many layers in the architecture and also thousands of

images to train the network, our approach uses lightweight neu-

ral networks on small patches of granite images, taking into ac-

count the majority voting of patches classification for the images

classification, making the proposed approach able to classify slabs

of different sizes, image resolutions and even slab pieces. Experi-

ments comparing our approach against some hand-crafted descrip-

tors and pre-trained networks show the effectiveness of the pro-

posed technique. 

In summary, the main contributions of this paper are: 

1. Design and development of ad-hoc CNNs for granite tiles clas-

sification. 

2. Application of CNNs on multiple image data, represented by

small patches located in regions of interest from granite slab

images, to learn features that lead to high recognition accuracy

by using majority voting on patches classification. 

3. Validation of the proposed methodology with other descriptors

not used before for granite classification. 

The remainder of this paper is organized as follows:

Section 2 discusses related work about color and texture de-

scriptors in the literature and also some studies applying them to

the granite classification. Section 3 shows the basic concepts of

CNNs, which are necessary to understand the proposed approach.

Section 4 presents our approach for granite color classification

and Section 5 reports all the details about the experimental

methodology used for validating the proposed method against

the existing counterparts in the literature. Section 6 shows the

performed experiments and results and Section 7 reports our final

considerations and proposals for future work. 

2. Related work 

The problem of classifying materials by their type can be re-

garded as a problem of classifying textures, colors and geometrical

features. To create solutions in this regard, the scientific literature

focused on applying computer-vision based approaches allied with

machine learning to grade different materials. 

In the specific case of granite rocks, Ershad (2011) presented

a method based on Primitive Pattern Units, a morphological oper-

ator which is applied to each color channel separately. Statistical

features are then used to discriminate different classes of natural

stone such as granite, marble, travertine and hatchet. Kurmyshev

et al. (2013) used coordinated clusters representation (CCR) for

classifying granite tiles of the type “Rosa Porriño”. Bianconi and

Fernández (2006) employed different Gabor filter banks for gran-

ite classification. Lepisto et al. (2005) used a similar approach, but

applied in each color channel separately. Araújo et al. (2010) em-

ployed spectrophotometer to capture spectral data at different re-

gions of interest in granite tiles and used Support Vector Machines

to grade them. Fernández, Ghita, González, Bianconi, and Whelan

(2011) studied how common image descriptors such as Local Bi-

nary Patterns, Coordinated Clusters Representation and Improved

Local Binary Patterns acted in granite image classification after ro-

tation conditions. 

Most approaches described in literature are based on textural

features alone. According to the study of Bianconi et al. (2012) this

is somewhat surprising, since the visual appearance of granite tiles

strongly depends on both texture and color. To this end, they per-

form a study using several textures and color descriptors in five

different classifiers, showing that combining color and texture fea-

tures and classifying them on Support Vector Machines classifiers

outperforms previous methods based on textural features alone. 

Finally, in a recent work, Bianconi, Bello, Fernández, and

González (2015a) investigated the problem of choosing the ade-

quate color representation for granite grading. They discussed pros
nd cons of different color spaces for granite classification by per-

orming experiments using very simple color descriptors: mean,

ean + standard deviation, mean + moments from 2nd to 5th,
uartiles and quintiles of each color channel. They showed that,

epending on the classifier used, some color spaces are better than

thers for classification, such as Lab and Luv spaces for the linear

lassifier. 

As can be noticed, solutions presented for granite color clas-

ification are based only on feature engineering. In other words,

hese approaches are driven by patterns that are supposed to hap-

en over all investigated images and also require different expert

nowledge. Veering away from these methods, we propose a deep

earning based approach, which extracts meaningful discriminative

atterns straight from the data by using small image patches in-

tead of using ordinary feature engineering in whole images. To do

his, our approach exploits the advantages of back-propagation of

rror used in Convolutional Neural Networks to automatically learn

hese discriminant features present on the image textures. Before

e discuss our proposed method to perform granite grading based

n deep learning, it is worth discussing some basic concepts about

eep neural networks in the next section. 

. Basic concepts 

Convolutional Neural Networks have been attracting a consid-

rable attention by the computer vision research community re-

ently, mainly because of their effective results in several image

lassification tasks, outperforming even humans in certain situa-

ions according to He, Zhang, Ren, and Sun (2015) , winning sev-

ral image classification challenges, such as the ImageNet image

ecognition contest as described by Krizhevsky, Sutskever, and Hin-

on (2012) . Pioneered by the work of Lecun, Bottou, Bengio, and

affner (1998) , CNNs are regarded as a deep learning application

o images and, as so, they simulate the activity in layers of neu-

ons in the neocortex, the place where most of thinking happens,

ccording to Hof (2013) . So, the network learns to recognize pat-

erns by highlighting in its layers the edges and pixel behaviors

hat are commonly found in different images. 

The main benefit of using CNNs with respect to traditional

ully-connected neural networks is the reduced amount of param-

ters to be learned. Convolutional layers made of small size ker-

els allow an effective way of extracting high-level features that

re fed to fully-connected layers. The training of a CNN is per-

ormed through back-propagation and stochastic gradient descent

s Rumelhart, Hinton, and Williams (1986) describes. The misclas-

ification error drives the weights update of both convolutional and

ully-connected layers. The basic layers of a CNN are listed below: 

1. Input layer : where data is fed to the network. Input data can

be either raw image pixels or their transformations, whichever

better emphasize some specific aspects of the image. 

2. Convolutional layers : contain a series of filters with fixed size

used to perform convolution on the image data, generating

what is called feature map . These filters can highlight some pat-

terns helpful for image characterization, such as edges, textures,

etc. 

3. Pooling layers : these layers ensure that the network focuses only

on the most important patterns. They summarize the data by

sliding a window across the feature maps, applying some lin-

ear or non-linear operations on the data within the window,

such as local mean or max, reducing the dimensionality of the

feature maps used by the following layers. 

4. Rectified Linear Unit (ReLU) : ReLU layers are responsible for ap-

plying a non-linear function to the output x of the previous

layer, such as f (x ) = max (0 , x ) . According to Krizhevsky et al.
(2012) , they can be used for fast convergence in the training of


A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 3 

Fig. 1. Common architecture arrangement of a CNN. The input image is transformed 

into feature maps by the first convolution layer C1. A pooling stage S1 reduces the 

dimensions across the feature maps. The same process is repeated for layers C2 and 

S2. Finally a classifier is trained with the data generated by layer S2. 

 
r

 
d  

C  

f  

d  

t  

t  

(  

f

4

 
s  

r  

s  

n  

l  

n  

d

4

 
c  

p  

c  

I  

c

C  

w  

t  

i

4

 
i  

a  

i  

d  

C  

f  

i  

o  

c  

i  

c  

C  

t  

i  

n

4

 
e  

p  

i  

b  

a  

B  

f  

t  

a  

l  

b  

n  

f

 
5  

u  

l  

o  

t  

p

 
p

 
CNNs, speeding-up the training as they deal with the vanishing

gradient problem by keeping the gradient more or less constant

in all network layers. 

5. Fully-connected layers : used for the understanding of patterns

generated by the previous layers. Neurons in this layer have

full connections to all activations in the previous layer. They are

also known as the inner product layers. After trained, transfer

learning approaches can extract features in these layers to train

another classifier. 

6. Loss layers : specify how the network training penalizes the de-

viation between the predicted and true labels and is normally

the last layer in the network. Various loss functions appropriate

for different tasks can be used: Softmax, Sigmoid Cross-Entropy,

Euclidean loss, among others. 

Fig. 1 depicts one possible CNN architecture. The type and ar-

angement of layers vary depending on the target application. 

Although very powerful at representing patterns present in the

ata, the main drawback of deep learning is the fact that common

NNs normally need thousands or even millions of labeled data

or training. This is an unfeasible condition in many applications

ue to the lack of training data and to the amount of time needed

o train a model. In this work, we present an alternative approach

hat deals with this requirement by considering using image tiles

multiple data) of the input images in the networks, as described

urther in Section 4 . 

. Proposed method 

In this paper, we propose a methodology for granite tiles clas-

ification through the application of CNNs. The proposed pipeline

elies solely on the data to learn the texture patterns therein. We

olve the CNN requirement of large training sets by applying the

eural network on small patches of images, this way the CNN will

earn discriminative features, instead of relying on feature engi-

eering. Fig. 2 shows the pipeline of the proposed approach. We

iscuss each step of this pipeline in the next paragraphs. 

.1. Conversion to grayscale 

As some proposed CNNs in this paper include networks that

lassify grayscale images to identify textures, in this optional pre-

rocessing step, the image is converted to gray-level, removing

olor information from the analysis. Given an input RGB image

 , with one pixel represented by I(i, j) = (R (i, j) , G (i, j) , B (i, j) , its
onversion to graylevels happens using Eq. (1) below: 

(i, j) = 0 . 2989 R (i, j) + 0 . 5870 G (i, j) + 0 . 1140 B (i, j) , (1)

here C is the image converted to grayscale and used as input to

he network and R, G and B are the color channels from the initial

mage. 
.2. Patching 

In the first step of our proposed approach (labeled as step A

n Fig. 2 ), the image is divided into tiles of interest. These tiles

re out of image borders (no artificial pixel filling is done in the

nput images) and will be used as input to the CNNs. This proce-

ure is useful for the following reasons: (i) generate more data to

NN training (ii) use majority voting in image tiles classification

or testing a given input image and (iii) possibility to use different

mage resolutions to train and test the CNN, as only image tiles

f interest will be used in the network. This is an important pro-

edure because, normally, CNNs are designed for given resolution

nput images. However, this is not necessary in our approach be-

ause only small granite blocks are used as input. In this step the

NN is investigating several regions of interest (which we call mul-

iple data) to classify the entire image. We use 28 × 28 grayscale
mages and 32 × 32 color images in our CNNs, described in the
ext subsection. ( Figs. 5 and 6 ) 

.3. Recognition by Convolutional Neural Networks 

In the second step, labeled as step B in Fig. 2 , we use differ-

nt Convolutional Neural Networks to recognize patterns of image

atches. For this, in a training step, we apply tiles from training

mages through the networks to estimate the filter weights for a

etter classification. Then, we do what is called transfer learning

s described by Johnson and Karpathy (2016) and Yosinski, Clune,

engio, and Lipson (2014) , using the already trained network as a

eature extractor in the train and test stages. To do that we extract

he last layer (the softmax layer) of the already trained network

nd then the new output will be feature vectors generated by the

ast convolutional layer. Finally, these image characterizations will

e used by another classifier in the training and testing steps. The

umber of networks used are four and their architectures are as

ollows. 

• MNIST1 network: based in a design used in the MNIST dataset

digit recognition challenge ( VLFEAT, 2016 ), it has an input layer

which requires 28 × 28 grayscale input images and is com-
posed by eight other layers: four convolutional layers, two pool-

ing layers, one RELU layer and one fully-connected layer. 
• MNIST2 network: we extended MNIST1 network, creating a new

one composed by eleven layers: one input layer, five convolu-

tional layers, three pooling layers, one RELU layer and a final

fully connected layer. 
• MNIST3 network: we extended even more our initial MNIST1

network, creating a new one composed by thirteen layers: one

input layer, six convolutional layers, four pooling layers, one

RELU layer, and a final fully connected layer. 
• CIFAR network: based in a design used in the CIFAR image

recognition challenge ( VLFEAT, 2016 ), it has an input layer

which requires 32 × 32 RGB input images and is composed by
twelve other layers: five convolutional layers, three pooling lay-

ers, four RELU layers and one fully-connected layer. 

These networks will generate, respectively, feature vectors with

00, 256, 256 and 64 dimensions respectively, which we aim to

se in another classifier. Fig. 3 shows the learned filters in the first

ayer of each network. Figs. 4 –7 show the output of these filters

n networks MNIST1, MNIST2, MNIST3 and CIFAR respectively and

heir power to discriminate by convolutions some granite blocks

resent on the dataset built in the work of Bianconi et al. (2015a) . 

Additionally to the use of these networks individually, we pro-

ose the fusion of them using for that two approaches: 

1. Early fusion: The feature vector of a block will be the fusion

of descriptions (feature vectors) from the three networks. Con-

catenating such feature vectors will yield a final feature vector


4 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 

Fig. 2. Pipeline of the proposed approach based on Convolutional Neural Networks for granite classification. 

Fig. 3. Filter weights of the first layer from (a) MNIST1 network (b) MNIST2 network (c) MNIST3 network and (d) CIFAR network. 

 
a  

M

 
n  
of 500 + 256 + 256 = 1012 dimensions, which will be used in
a given classifier. 

2. Late fusion: the classification of a tile will happen by majority

voting of the three classifiers outputs, fed by the feature vectors

generated by the three networks applied on that tile. 
a  
Figs. 8 and 9 show, respectively, the early and late fusion

pproaches pipelines for granite image classification using the

NIST-based CNNs considered in this paper. 

If the first approach (early fusion) is chosen, the next step is

ormalizing (or scaling) the concatenated feature vectors. There

re several approaches to do this, but the one we choose is the


A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 5 

Fig. 4. MNIST1 first layer convolutions used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. 

Fig. 5. MNIST2 first layer convolutions used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. 

Fig. 6. MNIST3 first layer convolutons used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. 

Fig. 7. CIFAR first layer convolutons used to identify (a) “Giallo Veneziano” granite block (b) “Sky Brown” granite block and (c) “Verde Oliva” granite block. Convolutions are 

represented in gray for a better visualization. 

s  

t

|  

w  

e

 
c

V

U  

i  

i  

f  

m  

a

4

 
g  

S  

F  
implest, which is dividing each vector by its norm. Given a fea-

ure vector � V , its norm p is calculated as: 

| � V || p = 
(

n ∑ 
i =1 

| � V (i ) p | 
)

1 
p , (2)

here i is the i th vector element and n is the number of vector

lements. 

For our application, we choose p = 2 and the norm generated is
alled Euclidean Norm . The final feature vector � V f is 

�
 

 f = 
�
 V 

|| � V || 2 
. (3) 
sing this approach the feature vectors components will be scaled

n such a way that each vector magnitude is always one. This is

mportant because, as the feature vectors to be concatenated come

rom different sources, the range of all features should be nor-

alized so that each feature contributes approximately proportion-

tely to the final feature vector. 

.4. Image patches classification 

In the training and testing steps, we apply the feature vectors

enerated by the previously trained networks in another classifier.

o, in the image patches classification step (labeled as step C in

ig. 2 ), we choose the 1st Nearest Neighbor (1NN) classifier to clas-


6 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 

Fig. 8. Pipeline of the early fusion of Convolutional Neural Networks for granite classification. In this fusion approach, three pre-trained networks with different architectures 

(MNIST1, MNIST2 and MNIST3) are applied in the same tiny input data (blocks), generating feature vectors that are concatenated and normalized to classify a given 28 × 28 
block using information from these three networks. The final classification for an image uses the majority voting of its blocks. 

Fig. 9. Pipeline of the late fusion of Convolutional Neural Networks for granite classification. In this fusion approach, three pre-trained networks with different architectures 

are applied in the same tiny input data (blocks), extracting feature vectors and classifying them individually. The final classification of a block is the majority voting of labels 

generated by the three networks. The same happens on all the classified blocks to classify the whole image. 

 
i

c  

w

5

 
s  

m  

s  

o  
sify the tiny image blocks from input images. Basically, this classi-

fier works by assigning a testing sample to the class of its nearest

neighbor defined in a training step. In the end of this process, each

valid block from the image is classified. 

4.5. Image classification 

After the individual patches classification, in step D in Fig. 2 )

we will classify the image by analyzing, in the blocks classifi-

cation, which is the most predicted class. Given a vector � x =
{ out put , . . . , out put } which contains the classification of b blocks
1 b 
n the image, the predicted class of a granite image G is 

lass (G ) = mode ( � x ) , (4)
here mode ( � x ) is the most predicted class in vector � x . 

. Experimental setup 

Before we discuss our experimental results, we present in this

ection the materials and methods used to compare the proposed

ethod against its counterparts in the literature. In the following

ubsections we show the granite tile dataset used, the methodol-

gy and metrics used in the experiments to assess the classifica-


A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 7 

t  

t

5

 
B

1  

p  

c  

i

 
i  

a  

w  

p  

T  

t  

v  

t

 
s  

q  

o  

w  

t

5

 
n  

m  

n  

w  

t  

a  

D

i  

d  

a  

a  

a

 
a  

b  

i  

g  

1  

i  

fi  

r  

b  

(  

m

 
t  

b  

c  

a  

t  

b

 
t  

c  

t  

w  

t

A

w

 
y  

a

 
m  

fi  

a  

g  

t  

r  

t  

d  

p  

t  

e  

w

e  

w

 
j  

p  

c  

l  

e  

o  

e

5

 
b  

a  

t  

w  

m  

w  

i  

f

 
p  

t  

v  

n  

p  

f  

{
 

w  

T  

i  

t  

v  

t  

m  

W

 
a  
ion performance, parameters used in our proposed approach and

he state-of-the art implementations used. 

.1. Image dataset 

We used the same granite tiles dataset applied in the work of

ianconi et al. (2015a) . It contains 10 0 0 RGB images with 150 0 ×
500 dimensions, subdivided in 25 classes containing 40 images

er class. The first 100 images were acquired naturally by a spe-

ific scanner. The other 900 were created by rotating the initial 100

mages in 10 °, 20 °, 30 °, 40 °, 50 °, 60 °, 70 °, 80 ° and 90 °. 
To be suitable for our proposed approach, we subdivide these

mages in 28 × 28 valid blocks ( i.e. , outside image borders) to be
pplied on MNIST1, MNIST2 and MNIST3 networks, creating this

ay 2809 valid blocks per image. So, the dataset used in the ex-

eriments contains 2809 × 10 0 0 = 2809 . 0 0 0 tiny grayscale images.
o use the proposed CIFAR network on RGB images we subdivide

he dataset images on 32 × 32 valid blocks, creating this way 2116
alid RGB blocks per image and a total of 2116 × 10 0 0 = 2116 . 0 0 0
iny color images. 

Doing these procedures we artificially create a great amount of

mall images to be used in small networks, eliminating the re-

uirement of big networks applied in a big amount of high res-

lution images. By subdividing the test image into these blocks,

e can increase classification accuracy by doing majority voting of

hese blocks, classifying a big image by its small parts. 

.2. Methodology and metrics 

We validate the proposed method using two experimental sce-

arios: assessing the performance of classifying whole images (by

ajority voting of blocks) and also small blocks. For the first sce-

ario, we consider a 5 × 2 cross-validation protocol, on which
e replicate the traditional 1 × 2 cross-validation protocol five

imes (thus 5 × 2). In each of them, we divided the set of im-
ges D randomly into equal subsets of images D 1 and D 2 , on which

 1 ∪ D 2 = D and D 1 ∩ D 2 = ∅ . Then, the classifier is trained with D 1 
mages as input and tested on D 2 images and then the inverse is

one. If the process is repeated five times, metrics can be reported

fter 10 rounds of experiments. The experimental results using our

pproaches will be in terms of high resolution images classification

ccuracy after majority voting of low resolution image blocks. 

Using the 5 × 2 cross validation in our scenario, if our proposed
pproach acts on 2889 28 × 28 valid (outside borders) grayscale
locks and 2116 32 × 32 valid RGB blocks from each of the 10 0 0

mages in our dataset, we will end up with 500 × 2809 = 1404 . 500
rayscale blocks (for MNIST based networks) and 500 × 2116 =
058 . 0 0 0 RGB blocks (for the CIFAR based network) used for train-

ng the classifier, and the same number of blocks to test the classi-

er. This happens in all 10 rounds of experiments. So, our training

atio will always be 50% of the data and the other 50% of data will

e the testing ratio. According to a study conducted by Dietterich

1998) , the 5 × 2 cross-validation is considered an optimal experi-
ental protocol for learning algorithms. 

For the second experimental scenario, we show the classifica-

ion performance of image blocks. For this case, we used one com-

ination of training and test images. We intend to show using this

onfiguration how the proposed approaches and some state of the

rt behave on classifying only small image resolution images con-

aining granite patterns. In this case results are reported on block

asis (without majority voting). 

We used the accuracy method to evaluate the performance of

he algorithms tested. In a multi-class problem with c classes, the

lassification results may be represented in a c × c confusion ma-
rix M . In this case, the main diagonal contains the true positives
hile the other entries contain either false positives or false nega-

ives. So, the accuracy of an experiment round is calculated as 

ccuracy = 
∑ c 

i =1 M(i, i ) ∑ c 
i =1 

∑ c 
j=1 M(i, j) 

. (5) 

here i and j are line and column indexes of M , respectively. 

In the 5 × 2 cross validation protocol, one confusion matrix is
ielded per experiment. Therefore, we present results by averaging

ccuracies from these matrices. 

The other set of metrics shown in the experiments are com-

only applied in CNN image recognition tasks and will be used to

nd the number of epochs to train our proposed networks. These

re called top-1 and top-5 errors and are widely used in the Ima-

eNet Large Scale Visual Recognition Challenge and also for other

asks as reported by Russakovsky et al. (2015) . Commonly, CNNs

eturns for one image i the j most probable classes { c i 1 , . . . , c i j } in
he output of its last layer as the prediction for that image. If we

efine c ij the prediction of an image i and C i its ground truth, the

rediction is considered correct if c i j = C i for some j . If we define
he error of a prediction d i j = d(c i j , C i ) as 0 if c i j = C i and 1 oth-
rwise, the error of an algorithm is the fraction of test images on

hich the algorithm makes a mistake,: 

rr = 1 
N 

N ∑ 
i =1 

min j (d(c i j , C i )) , (6)

here N is the total of images used in the validation of a CNN. 

The difference of top-1 and top-5 errors is the value used for

 : in top-1 j = 1 and top-5 j = 5 . In other words, top-1 error com-
ares if the top class (the one having the highest probability, or

 i 1 ) is the same as the target label C i and top-5 scores if the target

abel is one of the top 5 predictions (the 5 ones with the high-

st probabilities, or { c i 1 , . . . , c i 5 } ). The top-5 error is normally used
nly for images with more than one object and will not be consid-

red in our scenario (we use only top-1 error). 

.3. Implementation aspects of the proposed approaches 

To implement our proposed approaches we used the CNNs li-

rary for MATLAB available at VLFEAT (2016) . We used the CIFAR

nd MNIST1 architectures, training them from scratch. We also ex-

ended layers from network MNIST1, creating this way new net-

orks MNIST2 and MNIST3. To create feature extractors, we re-

ove the last layer from the trained networks and feed them again

ith training images, generating training feature vectors to be the

nput for the 1NN classifier. This same last process happens again

or testing images. 

For the early fusion approach, applied only to grayscale in-

ut images networks (MNIST related CNNs), we used the three

rained networks from scratch to extract train and test feature

ectors, concatenating the vectors from these three networks and

ormalizing them by Eq. 3 . Finally, these vectors are the in-

ut of the 1NN classifier and, to classify the image we per-

orm majority voting of blocks. We denominate this approach as

 M NIST 1 , M NIST 2 , M NIST 3 } _ concat . 
Finally, we test the complementarity of the three grayscale net-

orks (MNIST related networks), using majority voting for a block.

o do this we apply, in the test phase, each network individually

n the same block. The three feature vectors will be then fed into

hree independent 1NN classifiers, previously trained with feature

ectors from their corresponding CNNs. The result for a block is

he majority voting of the three 1NN classifications. Then, a final

ajority voting of blocks will define the class of the test image.

e label this approach as { M NIST 1 , M NIST 2 , M NIST 3 } _ v ote . 
The MNIST based networks were trained using batches of im-

ges containing 100 images, with learning rate fixed on 0.001. We


8 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 

 
e  

a  

t  

n  

c  

a  

i  

t  

g  

l

6

 
g  

i  

e  

m  

fi  

fi  

I  

a  

t  

a  

d  

c  

w  

a  

n

 
t  

(  

1  

f  

F  

v  

1  

b  

i  

o

 
t  

i  

i  

o  

a  

c  

i  

t

6

 
p  

h  

t  

i  

t  

M  

p

 
c  

r  
used the stochastic gradient descent with momentum equals to 0.9

and weighting decay equals to 0.0 0 05 without dropout. The CI-

FAR network was trained using batches of images containing 100

images, with stochastic gradient descent and momentum equals to

0.9. The weighting decay is 0.0 0 01 without dropout. Upon accep-

tance of this paper, all the source code of the proposed approaches

will be available at GitHub 1 

5.4. Baselines 

To compare our approach against the state of the art we initially

chose ten texture descriptors to be used as baselines, some of them

were already used for Granite Image classification and others were

applied for other texture recognition applications. The first five

baselines approaches can be regarded as general purpose texture

descriptors used in several applications. The first three chosen are

the statistics of Gray-Level Co-occurrence Matrices (GLCMs) from

Haralick, Shanmugam, and Dinstein (1973) (we label this approach

as GLCM in the experiments), the Local Binary Patterns (we label

this approach as LBP in the experiments) from Ojala, Pietikäinen,

and Harwood (1996) and the Histogram of Gradients (we label this

approach as HOG in the experiments) from Dalal and Triggs (2005) .

The GLCM texture description approach build statistics calculated

over matrices of neighborhood relations only in given directions

and offsets (distances between pixels); LBP can be regarded as a

histogram of neighborhood relations between a pixel and its eight

neighbors and HOG is a histogram of gradient orientation on re-

gions of interest of images. The other two descriptors are the Dom-

inant Local Binary Patterns of Bianconi, González, and Fernández

(2015b) (labeled as DLBP in the experiments) and the Rotation In-

variant Co-Occurrences of patterns from González, Fernández, and

Bianconi (2014) without feature selection (labeled as C ri 
1 

in the ex-

periments). These two last approaches were validated in the same

granite image dataset we are using, which was first presented in

the paper of Bianconi et al. (2015a) . 

Other five descriptors used as baselines in the experiments

were originally proposed for specific texture description applica-

tions and were never used for texture characterization of granite

tiles. The first three approaches are based on the Convolutional

Texture Gradient Filter (CTGF), proposed by Ferreira, Navarro, Pin-

heiro, dos Santos, and Rocha (2015) to identify texture patterns

of printed letters to attribute the laser printer source of a doc-

ument. This descriptor measures the texture of an image as his-

tograms of convolved textures in low gradient areas, using for the

convolution matrices of different sizes. We label approaches based

on CTGF in the experiments as CTGF 3 × 3 , CTGF 5 × 5 and CTGF 7 × 7 .
The other two approaches came from the same work of Ferreira

et al. (2015) . These are multidirectional extensions of GLCMs (we

call this GLCM − MD in the experiments) and multi-directional and
multi-scale extensions of GLCMs (called GLCM − MD − MS). 

Finally, we also used existing pre-trained CNNs to compare

against our networks trained from scratch. These are the origi-

nal networks used in the CIFAR general-purpose image recognition

challenge and MNIST digit recognition challenge and are available

to download in the MatConvNet library from VLFEAT (2016) . As we

do with our approach, we removed the last layer and applied the

feature vectors in the nearest neighbor classifier. We call these ap-

proaches CIFAR original and MNIST original , respectively. 

6. Experiments 

Now we focus on showing experimental results. For this, we

start applying the methodology and calculated the metrics dis-

cussed in Section 5.2 using the dataset showed in Section 5.1 . All
1 www.github.com/anselmoferreira/granite- cnn- classification . 

T  

t  

p

xperiments were performed on a cluster node with 11 processors

nd 131 GB RAM and graphics card NVIDIA GeForce 210. This sec-

ion is described as follows: firstly, we show how we choose the

umber of epochs used to train our networks. Then, we start the

omparison of our proposed approaches against the state of the

rt techniques described in Section 5.4 in two scenarios: using big

mages and small blocks. These two last experiments were chosen

o show the effectiveness of the proposed approaches to classify

ranite rocks images of different resolutions (very high and very

ow). 

.1. Defining epochs to train CNNs 

Before start training our proposed CNNs from scratch using

ranite images, the number of epochs to train the network, which

s the number of one forward and one backward pass of all training

xamples through the network, must be defined. For this experi-

ental scenario we show the result of classification of using the

rst combination of train and validation using for that the classi-

er attached to the end of the network ( i.e., a soft-max classifier).

n this case, we further subdivide each image on small-sized blocks

nd use them to train and validate the classifier. One natural solu-

ion would be using the whole image containing the granite tile

s the input for CNNs, but choosing this solution would require

eeper networks, which will require a larger amount of data, more

omputational time and more memory resources to train the net-

orks. Using smaller areas as input do not require as many layers

s using the whole image and also can lead to a faster learning of

etwork parameters and weights. 

Using our proposed procedure of classifying small blocks, we

hen found the number of epochs to be used in each network

MNIST1, MNIST2, MNIST3 and CIFAR) as the ones with less top-

 validation error ( valtop1e ) after 18 epochs and are, respectively

or each network, 10, 15 and 15 epochs as Fig. 10 shows. For CI-

AR network, we choose the number of epoch with the less top-1

alidation error ( valtop1e ) after 150 epochs, which is found in the

42th epoch. As we used so many epochs in our CIFAR seek for

est validation epoch, it is not so clear to see its lowest top1-error

n the graph, so we decide to exclude it from Fig. 10 for the sake

f clarity. 

After finding the number of epochs to train our feature extrac-

or, we find the model used to extract feature vectors by first train-

ng the networks using the epochs found in this validation exper-

ment. Then, we can use these networks to extract feature vectors

f images, by applying training and testing images in the networks

nd use the output of the last but one layer to feed a multiclass

lassifier based on the nearest neighbor classification, as described

n Section 4.4 . The following experiments of this paper consider

his scenario and will be discussed as follows. 

.2. Comparison against baselines in high resolution images 

Now we focus on the experiments comparing our proposed ap-

roaches against the state of the art considering classification of

igh resolution images of the dataset considered. Table 1 shows

he results. For this task, our approaches (highlighted in light gray

n this table) perform the classification of test images after doing

he majority voting of the 2809 valid 28 × 28 grayscale blocks for
NIST-based networks and 2116 valid 32 × 32 RGB blocks in the

roposed CIFAR-based network. 

As can be seen from Table 1 , our deeper network applied on

olor tiles, called CIFAR , is the best network to classify granite

ocks, showing perfect detection in all 10 rounds of experiments.

his happens because the layers in this network better decompose

he color information from these images, highlighting important

atterns to be used in the classification. 

http://www.github.com/anselmoferreira/granite-cnn-classification


A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 9 

0 5 10 15 20
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

training epoch

e
rr

o
r

error

traintop1e
valtop1e

(a)

0 5 10 15 20
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

training epoch

e
rr

o
r

error

traintop1e

valtop1e

(b)

0 5 10 15 20
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

training epoch

e
rr

o
r

error

traintop1e

valtop1e

(c)

Fig. 10. Validation error results after 18 training epochs of the proposed (a) MNIST1 (b) MNIST2 and (c) MNIST3 CNNs for granite images classification. The errors in these 

three figures were measured for 28 × 28 granite blocks classification. From this figure the smallest validation error ( valtop 1 e ) can be found in the 10th, 15th and 15th epoch 
respectively. 

Table 1 

Results comparing the best configurations of the proposed method to the existing meth- 

ods in the literature after 5 × 2 validation. Proposed CNNs methods and CNNs fusion 
approaches are highlighted in light gray. 

Method Mean ± Std. dev. Min Max 
CIFAR 100% ± 0.00 10 0.0 0% 10 0.0 0% 
CT GF _ 3 X 3 ( Ferreira et al., 2015 ) 100% ± 0.00 10 0.0 0% 10 0.0 0% 
C ri 

1 
( González et al., 2014 ) 99.96% ± 0.12 99.60% 10 0.0 0% 

CT GF _ 5 X 5 ( Ferreira et al., 2015 ) 99.92% ± 0.10 99.80% 100% 
CT GF _ 7 X 7 ( Ferreira et al., 2015 ) 99.70% ± 0.14 99.60% 10 0.0 0% 
DLBP ( Bianconi et al., 2015b ) 99.88% ± 0.16 99.60% 10 0.0 0% 
MNIST 2 99.32% ± 0.70 97.80% 10 0.0 0% 
LBP ( Ojala et al., 1996 ) 98.96% ± 0.64 97.80% 99.80% 
MNIST 3 98.16% ± 0.56 97.00% 99.00% 
{ M NIST 1 , M NIST 2 , M NIST 3 } _ v ote 96.50% ± 0.99 94.80% 97.60% 
CIFAR original 94.84% ± 0.52 94.00% 96.00% 
GLCM − MD − MS ( Ferreira et al., 2015 ) 92.04% ± 1.09 90.00% 93.80% 
GLCM − MD ( Ferreira et al., 2015 ) 90.18% ± 1.43 88.40% 93.20% 
MNIST 1 86.62% ± 1.61 84.20% 88.80% 
HOG ( Dalal & Triggs, 2005 ) 78.04% ± 1.50 76.20% 80.80% 
GLCM ( Haralick et al., 1973 ) 72.74% ± 1.97 68.80% 74.60% 
MNIST original 62.98% ± 2.12 60.80% 66.40% 
{ M NIST 1 , M NIST 2 , M NIST 3 } _ concat 26.44% ± 25.89 0.00% 67.00% 

 
o  

s  

a  

t  

t  

c  

e  

f

 
M  

s  

p  

l  
One interesting point to be noticed is the good performance of

ther texture descriptors that were proposed for other applications,

uch as the ones from Ferreira et al. (2015) . The best state of the

rt approach, CT GF _ 3 X 3 , also showed a perfect detection in all of

en rounds of experiments. This happens because CTGF builds his-

ogram of low pass textures that are in flat areas of the images,

onsidering only the texture of these flat areas instead of using
t  
dges information and some abnormal imperfections that can dif-

erentiate the same class of granite slabs. 

Table 1 also shows that the proposed methods MNIST1 and

NIST3 are not showing good results if compared with most

tate of the art. This severely impacts the proposed fusions ap-

roaches by early fusion ( { M NIST 1 , M NIST 2 , M NIST 3 } _ concat ) and
ate fusion ( { M NIST 1 , M NIST 2 , M NIST 3 } _ v ote ), because feature vec-
ors and outputs from MNIST 1 and MNIST 3 networks do not de-


10 A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 

Fig. 11. Results considering the classification of 32 × 32 granite image blocks. 

 
l  

u  

i  

t  

f  

p  

p  

a  

v  

t

 
p  

i  

b  

t  

t  

o  

e  

i  

i  

f  

d  

w

7

 
s  

c  

a  

a  

e  

t  

t  

n  

d  

t

 
l  

p  

e  

o  

r  

t  

g  

l  

t

 
t  

e  

p  

p  

a  

a  

t  

s  

a  

p  

t  

r  

 
d  

c  

a  

f  

m  
scribe the granite images effectively as MNIST 2 does. One good step

towards the solution could be training new and more deeper net-

works to be used in the fusion. 

Finally, it is worth discussing the results of pre-trained mod-

els in classifying granite slabs. As seen in Table 1 , the pre-trained

models CIFAR original and MNIST original show poor classification re-

sults. This happens because these pre-trained models were tough

to identify different patterns from the ones present on granite tiles,

so their parameters (or filter weights) were not found to discrim-

inate granite slabs patterns. This affects extremely their experi-

ments results. 

6.3. Comparison against baselines in low resolution images 

For the final round of experiments considering the comparison

with the state of the art, we now focus on the experiments regard-

ing small 32 × 32 blocks classification. For this, we consider the
classification result without performing the majority voting, using

for this the first split of training data and test data. This will result

in a total of 10 0 0 (images) × 2116 (blocks) = 2,116,0 0 0 blocks of size
32 × 32, half of them are used to train and the other half used to
test the classifier. We show, in Fig. 11 , the accuracy result of our

best proposed approach against the best state of the art methods

considered in the experiments of Section 6.2 . 

Results showed in Fig. 11 highlights the difficulty of classify-

ing very tiny images. This can be explained because they contain a

very small color information that can be confused with other gran-

ite classes, decreasing this way the classification accuracy. Even in

this difficult scenario, our proposed CNN CIFAR , trained to classify

these small blocks with backpropagation of error and more layers,

showed a classification accuracy higher than 85% using the same

nearest neighbor classifier used in the experiments of Section 6.2 ,

classifying a total of 923,231 blocks out of 1,058,0 0 0 blocks. The

high classification accuracy of low resolution blocks explains why

the majority voting of individual blocks helped our proposed ap-

proach to reach the 100% accuracy on the experiments results

shown in Table 1 . 

The poor results of the feature engineering approaches of

Ferreira et al. (2015) and González et al. (2014) are due the in-

ner characteristic of these feature engineering approaches to be

global descriptors, not being thought to classify low resolution im-

ages. For example, the CT GF _ 3 X 3 approach of Ferreira et al. (2015) ,

which classifies correctly only a total of 351,760 blocks, creates

histogram of textures on convoluted areas with a 3 × 3 win-
dow. As the areas used to describe the granite tiles contain only

32 × 32 = 1024 pixels, the final histogram used as feature vector
will contain several bins unused, and the accurate description of

this small granite tile will be affected. The low accuracy results of
iterature solutions in Fig. 11 seriously decrease the possibility of

sing these approaches to do image classification by majority vot-

ng of image patches, as our proposed approach does. Other impor-

ant aspect of the proposed approach is the time for extracting the

eature vectors. While our trained from scratch network took ap-

roximately one hour to extract 1,058,0 0 0 feature vectors, the ap-

roaches from Ferreira et al. (2015) and González et al. (2014) took

pproximately 6 days each to extract the same number of feature

ectors. This highlights the efficiency of the proposed method in

his multiple data scenario. 

With the results presented in this section, we show that the

roposed approach can achieve perfect detection of high resolution

mages, showing comparable results with the state of the art and,

y considering low resolution granite rocks slabs, it outperforms

he state of the art comfortably. This indicates, firstly, the poten-

ial of our proposed method to be a complementary approach to

thers in the literature for industrial applications with controlled

nvironment. Moreover, it can be a good starting point for help-

ng classifying rocks on uncontrolled environments, such as us-

ng smartphones with different resolutions as acquisition devices

or granite recognition, acting as a possible expert to solve misun-

erstoods between customers and manufacturers when a possible

rong delivery happens. 

. Conclusion 

The creation of quality-control procedures for sorting natural

tones such as granites is a promising task, as they avoid economi-

al losses due to delivery of wrong granite packages to clients. The

utomation of such process is either important to reduce errors

nd eliminate the use of a subjective process involving expensive

xperts. However, most of the applications proposed in the litera-

ure thus far for this task rely on feature engineering approaches

hat investigate texture and color behaviors which are supposed to

ot change. Additionally, they do not validate the approaches in a

ifficult task of classifying natural stones of different image resolu-

ions. 

In this paper, we address these issues by proposing a deep

earning based approach to granite rocks classification. Our ap-

roaches are based on Convolutional Neural Networks of differ-

nt architectures applied on small image tiles, analyzing textures

f each one and using majority voting of them to classify high-

esolution images. We also investigate the performance of some of

he networks when applied together. Experimental results showed

ood results of the proposed approaches for classifying high and

ow resolution images if compared to some other texture descrip-

ors proposed in the literature. 

Although there is a long path to go toward the classifica-

ion of natural stones in real-world situations, in which differ-

nt resolutions and lightning conditions of granite rocks can hap-

en, we believe that the direction considers deep learning ap-

roaches through Convolutional Neural Networks. As our proposed

pproaches deal with tiny patches, they have potential to be invari-

nt to most of acquisition resolutions, once they are bigger than

he input tiny patches that fit our networks. This is an important

tep to make the granite slabs recognition tasks available to other

cquisition devices, such as smartphones. Also, the proposed ap-

roaches can also be complimentary to other approaches in indus-

ry applications of slabs recognition, as they showed comparable

esults to the state of the art in classifying high resolution images.

The work started in this paper opens a set of future work to be

one, and involves (i) the proposal of new and deeper network ar-

hitectures for this task; (ii) use of Convolutional Neural Networks

pplied on other image representations; (iii) testing new ways of

using different network architectures; (iv) studying the comple-

entarity of these data-driven approaches with featuring engineer-


A. Ferreira, G. Giraldi / Expert Systems With Applications 84 (2017) 1–11 11 

i  

o  

w  

i

A

 
a  

a  

t  

p  

M  

s

R

A  

B  

 
B  
 

B  

 
B  

 
D  
 

D  

E  
 

E  

 
F  

 
F  
 

G  

 
H  
 

H  
 

H  

 
J  

K  
 

K  

 
L  

L  

O  

 
R  

R  
 

S  
 

S  

U  

V  
Y  

 
ng techniques and (v) application of experiments considering the

pen set scenario of Scheirer, Rocha, Sapkota, and Boult (2013) , on

hich the classifier is designed to also consider unknown samples

n the training process. 

cknowledgments 

This work was supported partly by NSFC ( 61332012 , U1636202 )

nd the Shenzhen R&D Program (JCYJ2016032814 4 421330). We

lso thank the support by the Brazilian National Council for Scien-

ific and Technological Development (Grant #312602/2016-2 ) and

rofessors Nuria Fernández, Bruno Montandon Noronha Barros and

illena Basílio da Silva for the discussions that originated this re-

earch. 

eferences 

raújo, M. , Martínez, J. , Ordóñez, C. , & Vilán, J. A. (2010). Identification of granite

varieties from colour spectrum data. Sensors, 10 (9), 8572–8584 . 
ianconi, F. , Bello, R. , Fernández, A. , & González, E. (2015a). On comparing colour

spaces from a performance perspective: Application to automated classification

of polished natural stones. In New trends in image analysis and processing . In
Lecture notes in computer science: Vol. 9281 (pp. 71–78). Genoa, Italy: Springer . 

ianconi, F. , & Fernández, A. (2006). Granite texture classification with Gabor fil-
ters. In Proceedings of international congress on graphical engineering (INGEGRAF),

Sitges, Spain . 
ianconi, F. , González, E. , & Fernández, A. (2015b). Dominant local binary patterns

for texture classification: Labelled or unlabelled? Pattern Recognition Letters, 65 ,

8–14 . 
ianconi, F. , González, E. , Fernández, A. , & Saetta, S. A. (2012). Automatic classifi-

cation of granite tiles through colour and texture features. Expert Systems with
Applications, 39 (12), 11212–11218 . 

alal, N. , & Triggs, B. (2005). Histograms of oriented gradients for human detection.
In Proceedings of International conference on computer vision & pattern recogni-

tion, California, USA: Vol. 2 (pp. 886–893) . 

ietterich, T. G. (1998). Approximate statistical tests for comparing supervised clas-
sification learning algorithms. Neural Computation, 10 , 1895–1923 . 

rshad, S. F. (2011). Color texture classification approach based on combination of
primitive pattern units and statistical features. Multimedia and its Applications,

3 (3), 1–13 . 
uropean Committee for Standardization (CEN) (2009). EN 12440:2008 natural

stone: Denomination criteria. Report . European Committee for Standardization
(CEN) . 

ernández, A. , Ghita, O. , González, E. , Bianconi, F. , & Whelan, P. F. (2011). Evaluation

of robustness against rotation of LBP, CCR and ILBP features in granite texture
classification. Machine Vision and Applications, 22 (6), 913–926 . 
erreira, A. , Navarro, L. C. , Pinheiro, G. , dos Santos, J. A. , & Rocha, A. (2015). Laser
printer attribution: Exploring new features and beyond. Forensic Science Inter-

national, 247 , 105–125 . 
onzález, E. , Fernández, A. , & Bianconi, F. (2014). General framework for rotation in-

variant texture classification through co-occurrence of patterns. Journal of Math-
ematical Imaging and Vision, 50 (3), 300–313 . 

aralick, R. M. , Shanmugam, K. , & Dinstein, I. (1973). Textural features for im-
age classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6),

610–621 . 

e, K. , Zhang, X. , Ren, S. , & Sun, J. (2015). Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification. In Proceedings of IEEE in-

ternational conference on computer vision (ICCV), Santiago, Chile (pp. 1026–1034) .
of, R. (2013). Deep learning. With massive amounts of computational power, ma-

chines can now recognize objects and translate speech in real time. Artificial in-
telligence is finally getting smart. https://www.technologyreview.com/s/513696/ 

deep- learning/ . 

ohnson, J., & Karpathy, A. (2016). CS231n convolutional neural networks for visual
recognition. http://cs231n.github.io/transfer- learning/ . 

rizhevsky, A. , Sutskever, I. , & Hinton, G. E. (2012). ImageNet classification with
deep convolutional neural networks. In Proceedings of neural information pro-

cessing systems (NIPS), Nevada, USA (pp. 1106–1114) . 
urmyshev, E. V. , Sánchez-Yáñez, R. E. , & Fernández, A. (2013). Colour texture clas-

sification for quality control of polished granite tiles. In Proceedings of the third

IASTED international conference on visualization, imaging and image processing:
Vol. 2 (pp. 603–608). ACTA Press . 

ecun, Y. , Bottou, L. , Bengio, Y. , & Haffner, P. (1998). Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86 (11), 2278–2324 . 

episto, L. , Kunttu, I. , & Visa, A. (2005). Rock image classification using color features
in Gabor space. Journal of Electronic Imaging, 14 (4), 040503-1–040503-3 . 

jala, T. , Pietikäinen, M. , & Harwood, D. (1996). A comparative study of texture mea-

sures with classification based on feature distributions. Pattern Recognition, 29 ,
51–59 . 

umelhart, D. E. , Hinton, G. E. , & Williams, R. J. (1986). Learning representations by
back-propagating errors. Nature, 323 , 533–536 . 

ussakovsky, O. , Deng, J. , Su, H. , Krause, J. , Satheesh, S. , Ma, S. , . . . Fei-Fei, L. (2015).
ImageNet large scale visual recognition challenge. International Journal of Com-

puter Vision, 115 (3), 211–252 . 

cheirer, W. J. , Rocha, A. , Sapkota, A. , & Boult, T. E. (2013). Towards open set
recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 ,

1757–1772 . 
tone Industry SA Standards Board (2015). Stone standards. Report . Stone Industry

SA Standards Board . 
niversity of Tennessee (2008). Granite dimensional stone quarrying and process-

ing. Report . University of Tennessee – Center for Clean Products . 

LFEAT (2016). MatConvNet: CNNs for MATLAB. http://www.vlfeat.org/matconvnet/ .
osinski, J. , Clune, J. , Bengio, Y. , & Lipson, H. (2014). How transferable are features

in deep neural networks? In Advances in neural information processing systems
(pp. 3320–3328). Montreal, Canada: Curran Associates, Inc . 

http://dx.doi.org/10.13039/501100001809
http://dx.doi.org/10.13039/501100003593
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0001
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0002
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0003
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0004
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0005
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0006
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0007
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0007
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0008
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0008
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0009
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0009
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0010
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0011
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0012
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0013
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0014
https://www.technologyreview.com/s/513696/deep-learning/
http://cs231n.github.io/transfer-learning/
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0015
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0016
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0017
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0018
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0019
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0020
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0021
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0022
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0023
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0023
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0024
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0024
http://www.vlfeat.org/matconvnet/
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025
http://refhub.elsevier.com/S0957-4174(17)30303-2/sbref0025

	Convolutional Neural Network approaches to granite tiles classification
	1 Introduction
	2 Related work
	3 Basic concepts
	4 Proposed method
	4.1 Conversion to grayscale
	4.2 Patching
	4.3 Recognition by Convolutional Neural Networks
	4.4 Image patches classification
	4.5 Image classification

	5 Experimental setup
	5.1 Image dataset
	5.2 Methodology and metrics
	5.3 Implementation aspects of the proposed approaches
	5.4 Baselines

	6 Experiments
	6.1 Defining epochs to train CNNs
	6.2 Comparison against baselines in high resolution images
	6.3 Comparison against baselines in low resolution images

	7 Conclusion
	 Acknowledgments
	 References