CLASSIFICATION OF OIL PAINTING USING MACHINE LEARNING WITH VISUALIZED DEPTH INFORMATION Jihoon Kim1, Ji Young Jun1, Minki Hong2, Hyeseung Shim1, Jaehong Ahn1∗ 1 Graduate School of Culture Technology Korea Advanced Institute of Science Technology (KAIST) Daejeon, Republic of Korea-(kjih0314, hs.shim, jiyoungjun, ahnjh)@kaist.ac.kr 2 Culture Technology Research Institute Korea Advanced Institute of Science Technology (KAIST) Daejeon, Republic of Korea-minki.hong@kaist.ac.kr Commission II, WG II/8 KEY WORDS: Machine Learning, Visualized Depth Information, RTI, Painting Analysis, Artist Classification ABSTRACT: In the past few decades, a number of scholars studied painting classification based on image processing or computer vision technolo- gies. Further, as the machine learning technology rapidly developed, painting classification using machine learning has been carried out. However, due to the lack of information about brushstrokes in the photograph, typical models cannot use more precise inform- ation of the painters painting style. We hypothesized that the visualized depth information of brushstroke is effective to improve the accuracy of the machine learning model for painting classification. This study proposes a new data utilization approach in ma- chine learning with Reflectance Transformation Imaging (RTI) images, which maximizes the visualization of a three-dimensional shape of brushstrokes. Certain artist’s unique brushstrokes can be revealed in RTI images, which are difficult to obtain with regular photographs. If these new types of images are applied as data to train in with the machine learning model, classification would be conducted including not only the shape of the color but also the depth information. We used the Convolution Neural Network (CNN), a model optimized for image classification, using the VGG-16, ResNet-50, and DenseNet-121 architectures. We conducted a two-stage experiment using the works of two Korean artists. In the first experiment, we obtained a key part of the painting from RTI data and photographic data. In the second experiment on the second artists work, a larger quantity of data are acquired, and the whole part of the artwork was captured. The result showed that RTI-trained model brought higher accuracy than Non-RTI trained model. In this paper, we propose a method which uses machine learning and RTI technology to analyze and classify paintings more precisely to verify our hypothesis. 1. INTRODUCTION In recent years, as the art database expands rapidly, automatic painting classification based on color and morphological fea- tures have been gaining much attention (Berns, 2001) (Barni et al., 2005). Various computational methods are used to al- low experts to analyze and evaluate different characteristics of paintings in a quantitative way which are difficult to be judged by the naked eye (Berezhnoy et al., 2005) (Berezhnoy et al., 2007). In order to classify paintings, the Image processing tech- nique has been studied to extract the characteristics of paintings such as the shape, directions, and the pattern of brushstrokes (Li et al., 2011). In addition to image processing, machine learning techniques have been actively studied. Painting classification studies are proceeding from the classical Support Vector Ma- chine (SVM) method to Convolution Neural Network (CNN), which is optimized for image learning (Cortes, Vapnik, 1995) (Krizhevsky et al., 2012). Artists’ brushstroke is one of the characteristics which reflects their unique painting styles (Li et al., 2011). It contains a com- bination of color, pattern, and texture, and the collaboration of each characteristic element shows much information in oil paintings with pigments (Berezhnoy et al., 2009) (Johnson et al., 2008). Nevertheless, the shape and thickness of the brush- strokes are not represented clearly in typical photographs. It ∗Corresponding author is a significant drawback to the CNN model, which learns the particular properties from the training image set (Krizhevsky et al., 2012) (Zeiler, Fergus, 2014). In order to overcome the lim- itation, the size of brushstrokes can be measured and used for analysis with technologies which can capture three-dimensional (3D) geometric data such as 3D scanning or photogrammetry (Elkhuizen et al., 2014). However, previous studies show that it is difficult to extract the depth information of the individual brushstrokes (Breuckmann, 2011) (Abate et al., 2014). We also confirmed through experiments that it is time-consuming and hard to acquire 3D data of brushstrokes with 3D scanning or photogrammetry. Our hypothesis is that if the depth informa- tion of brushstroke is visualized and used as a training image set, it is effective to improve the accuracy of the machine learn- ing model. We investigate three machine learning architecture for the paint- ing classification task, which uses the RTI images as a training image set. We also present the result of the accuracy and visu- alize learning process. The goal of this paper is to verify our hypothesis by comparing the results when RTI and Non-RTI images are used for painting classification with three different machine learning architectures. In order to capture and visualize the depth of brushstroke, we use Reflectance Transformation Imaging (RTI) technology. RTI is a computational technique that captures the surface shape and The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 617 color of the subject and allows to re-illuminate the subject in any direction (Cultural Heritage Imaging, 2019). It captures the painting’s surface with very high resolution and extracts depth information from the captured images (Cultural Heritage Ima- ging, 2019). The enhancement functions of RTI can reveal each brushstroke that is not disclosed under a direct empirical ex- amination of the physical object (Cultural Heritage Imaging, 2019). This study proposes a new approach to use visualized depth information of paintings as additional input data for ma- chine learning algorithms. It is possible to use an existing ma- chine learning architecture that uses two-dimensional images as the input data. 2. RELATED WORKS As the technology of computer vision advanced, high-resolution images of paintings have been used for analysis. There are sev- eral studies to extract the features of fine art using digital im- age processing and apply them to classification (Barni et al., 2005). In the early stages of painting classification using im- age processing, color and texture processing techniques such as complementary colors and Gabor filtering based on RGB val- ues of a picture were studied (Berezhnoy et al., 2007) (Berezh- noy et al., 2005). In the case of the RGB-value-based research, there is a limit that the result is different according to the qual- ity of the input data for the image processing. Berezhnoy et al. extract the artist’s brushstroke from the paintings automat- ically (Berezhnoy et al., 2009). Johnson et al. classify the pic- tures using wavelets and brush strokes to extract the features of the artist (Johnson et al., 2008).Those researches use only the plane shape information of the brushstrokes, not the three- dimensional information. With the rapid development of computing power, machine learn- ing has been applied to the classification of artwork. Support Vector Machine (SVM), which has been extensively used in the field of machine learning, was used (Cortes, Vapnik, 1995). Arora et al. classified paintings into 7 genres- Renaissance, Baroque, Impressionism, Cubism, Abstract, Expressionism, and Pop art, using SVM (Arora, Elgammal, 2012). Khan et al. clas- sified artists and styles of works in digitized databases to help manage them focus on both artist and style categorization prob- lems (Khan et al., 2014). These painting classification studies using SVM have attempted to improve the classification accur- acy of various characteristics of paintings. In the case of ma- chine learning, much data is necessary. In response to the de- mand, in 2014, Khan et al. released ’Painting-91’ data set con- sisting of 4,266 pictures of 91 different painters (Khan et al., 2014). Using the ’Painting-91’ dataset, a deep-learning model Convolutional Neural Networks (CNN) which specializes in image classification, was attempted to apply artwork classific- ation (Khan et al., 2014) (Krizhevsky et al., 2012). Folego et al. find the way of using the patch with the highest confid- ence scores turned out to be better than the traditional voting method (Folego et al., 2016). Nanni et al. studied cases in which artistic style, artist, and architectural style classification are performed using various features in several layers of CNN instead of using features extracted from only the top layer of CNN (Nanni et al., 2017). Peng et al. performed the figure ana- lysis using plural CNNs in order to extract multi-scale features (Peng, Chen, 2015). Numerous studies have transformed the CNN model to be optimized for artwork data. However, there is no research on how to enhance classification accuracy by ac- quiring and using rich data that has more information about the painter’s style. RTI has been applied to many useful applications on a wide range of cultural heritage domain such as condition monitor- ing, treatment documenting, and surface analyzing. With its mathematical enhancement function, observing features inter- actively that are difficult to see with naked eyes is possible (Manrique Tamayo et al., 2013) (Giachetti et al., 2017) (Clar- ricoates, Kotoula, 2019). In recent research, Pamart, A., et al. developed an integrated tool to cross-reference qualitative depth information of RTI and quantitative depth information of pho- togrammetry (Pamart et al., 2019). They suggest that qualitat- ive depth information of RTI should be supplemented when the purpose is a precise analysis. Ponchio et al. propose that qual- itative depth information can be used to classification problems based on machine learning (Ponchio et al., 2018). They studied the automatic classification of cuneiform using images captured by the RTI in the machine learning algorithm. They mainly use partial information, especially the surface normal vector at rap- idly changing regions for the analysis of cuneiform. Unlike cuneiform, oil paintings have depth variance distributed in the whole part. This research considered the whole region of the RTI image in order to improve classification accuracy. 3. METHODS As seen in Figure 1, it is necessary to build an RTI image for ex- tracting data from oil paintings which depicts the brushstrokes cleary with a proper rendering mode. The image set is used as input data to the Residual Network 50(ResNet) architecture of Convolutional Neural Network (CNN) (He et al., 2016) (Kr- izhevsky et al., 2012). The optimizer uses Adam(Adaptive Mo- ment Estimation), and the ResNet architecture is customized to 224x224 pixel size, which is the size of our data (Kingma, Ba, 2014). 3.1 Dataset Acquisition by RTI from Paintings In the first stage, to acquire the image dataset, RTI images are built for each part of the painting. When the white light LEDs of the dome illuminate one by one, photos are taken. Then a single Polynomial Texture Mapping (PTM) file is built by com- bining 80 photos using the RTI Builder (Cultural Heritage Ima- ging Inc., USA). It can depict clear brushstrokes with a proper rendering mode. The PTM was rendered with static multi-light, which applies optimal light direction on each tile of the image to maximize the contrast of shading. Unlike other rendering methods that manually control the light direction, static multi- light rendering enhances detail information without manipulat- ing controller. It makes a combination of racking light images with only controlling sharpness of images. 3.2 Applying Machine Learning In the second stage, the two different image datasets from RTI are applied to CNN architectures. We implemented the model in Keras, and an open-source deep-learning package based on Tensorflow (Chollet et al., 2015). All the CNN architectures that we used received images of 224x224 pixels as input data. Therefore, we crop the RTI data and Non-RTI data obtained from the first stage respect of the input size. 20% of all datasets except the test set are randomly extracted and used as a valid- ation set, and 80% is used as a train set. The number of CNN model layers, the fully connected layer, the output layer, the optimizer, and the loss function is set equal for both data sets. Parameters for fine-tuning the CNN model such as batch size, learning rate, and epoch are optimized for each data set because The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 618 Figure 1. Pipeline of this research for oil painting classification using depth information. there is a difference between RTI data and Non-RTI data con- taining depth information. Finally, CNN layer visualization is conducted to investigate the learning process depending on the characteristics of the data set. 4. EXPERIMENTS OF PAINTING ANALYSIS USING DEPTH INFORMATION Figure 2. RTI Dome Setting with the Railed Installation We used a dome RTI system for the experiments as shown in Figure 2. The dome is a plastic hemisphere with 1m in diameter and an external frame with rail and wheels for easy transport- ation of the system over a target object. A camera is fixed at the top of the dome, facing the center of the dome. With the 50mm lens, the camera captures 30x50cm of the paintings at once. It contains three different types of LED lights, which are white light, UV and IR, and 80 lights per each type. In this re- search, we only used white light LEDs. The position of LEDs was calibrated to set up the light position file before capturing the paintings. Two experiments were designed to verify our hypothesis: Experiment 1 is a pilot study to see proof that the depth-visualized images improve the result of painting classi- fication with machine learning, and in experiment 2, the num- ber of training data was increased and the classification classes were enlarged. The results were also compared when three dif- ferent CNN architectures and were used to see the accuracy of classification according to CNN’s layer depth. 4.1 Experiment 1 As shown in Figure 3, we tested eight oil paintings by Jiho Lee as our datasets. Paintings used in the experiment were grouped into two groups according to her painting period. Group A shows a landscape view that was created in 2013, while Group B shows a more abstract environmental view using dark and thin brush strokes, which were her recent works in 2017. (a) (b) Figure 3. (a) Group A (b) Group B The different groups of paintings by Jiho Lee. 4.1.1 Dataset of Experiment 1 PTM files were built by using a dome-RTI system with the Nikon D850 DSLR camera. Due to the limitation of the domes size, we captured several different parts with 30x50cm size. We choose the parts where the brushstrokes were well revealed. RTI image set was captured in jpeg format of 6192 x 4128 res- olution from the PTM files under static multi light rendering. The Non-RTI images were captured at the same spot with the same resolution under the natural light. 4.1.2 Categorization the dataset Among various classifications, binary classification was pro- ceeded because the style of paintings was divided into two groups. We focused on the areas where the brushstrokes were prominent because th e goal was to investigate the difference in the clas- sification accuracy between RTI data and Non-RTI data, and the learning process in CNN layer. About 6.8 partial artwork The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 619 Figure 4. A painting by Jiho Lee. In Experiment 1, we captured key parts of each painting, not the whole part. images per work in average were acquired from both datasets [Figure 4]. One image from each group was separated for the use as a test set. The CNN architecture used in Experiment 1 was ResNet- 50 (He et al., 2016). It took a 224x224 pixel size image as input. When resizing the full size of the artwork directly, a large number of features that can be learned in CNN, such as stroke, overall shape, and RGB value can be distorted or disap- pear. Due to the reason, both RTI data and Non-RTI data were cropped into 224x224 pixels, and 750 to 850 cropped images per image were acquired. 80% of the cropped data was used as a train set to learn the CNN model, and 20% was used as a data set for validation of the model accuracy. The data images were randomly separated when they were divided into a test set and validation set. For RTI data, 18,434 data were chosen for the train set, 4,608 data for the validation set, and 1,673 data for the test set. For Non-RTI data, 17,667 data sets were chosen for the train set, 4,416 data for the validation set, and 1,512 data for the test set. The pre-trained model uses ImageNet, which divides 1.2 mil- lion pieces of data into 1000 categories (Deng et al., 2009). Since the machine learning model used in Experiment 1 should perform binary classification, we used sigmoid as an activa- tion function and binary cross-entropy as a loss function in the output layer. The optimizer used Adam, and the fully connected layer was customized for ResNet-50 (Kingma, Ba, 2014). Layer visualization was conducted by activating the first, the second, and the third layers of the convolution layer. As the convolution layer moves to the upper layer, the informa- tion about the visual content of the image gradually decreases, and the information about the image class gradually increases (Zeiler, Fergus, 2014) (Yosinski et al., 2015). Besides, it be- comes increasingly more abstract and more difficult to under- stand from a human perspective as it goes to the upper layers of the layer (Zeiler, Fergus, 2014) (Yosinski et al., 2015). There- fore, in this paper, only two layers that can be understood by human perception are shown as a result. 4.1.3 Results & Discussion The model that was trained with the RTI data (RTI-data model) showed the accuracy of 87.43% and the model trained with the Non-RTI data (Non-RTI-data model) showed 82.95%. The (a) (b) Figure 5. (a) Cropped data of RTI (b) Cropped data of Non-RTI result shows that the accuracy of the model which learned the depth-visualized data is 4.48% higher. (a) Visualization of first convolution layer using (a) in Figure 5. (b) Visualization of first convolution layer using (b) in Figure 5. (c) Visualization of second convolution layer using (a) in Figure 5. (d) Visualization of second convolution layer using (b) in Figure 5. Figure 6. The results of visualization from convolution layers When the convolution layer is activated, there is a clear dif- ference between the data, as shown in Figure 6 using the data [Figure 5]. The Non-RTI-data model learned the shape such as line or color, on the other hand, the RTI-data model learned the depth information in addition to the shape. Moreover, the number of deactivated filters in the CNN model was less in the Non-RTI-data model. There is an empty filter represented in Fig.6 as black. This filter appears in both RTI data and Non- RTI data. It shows that the deeper the layer, the more deactivate the filter appeared. This indicates that the pattern encoded in the filter does not appear in the input image. However, when comparing the same layers, the RTI data is less deactivated than the Non-RTI data. In other words, this means that RTI data is rich in information, and CNN is learning the properties appro- priately. When the learning process is visualized by layer, it is well observed that RTI data actively learns various data more than Non-RTI data. 4.2 Experiment 2 In experiment 2, we tested 14 paintings by Wi Hyeok Son. There are three different styles painted in the same period (2019). We categorized into three groups by different brush strokes styles for the analysis as seen in Figure 7. Group A’ includes paintings with wet and clear brushstrokes, which are more vivid and bold The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 620 VGG16 NonRTI VGG16 RTI ResNet50 Non-RTI ResNet50 RTI DenseNet121 Non-RTI DenseNet121 RTI Group A’ 65.79 53.80 55.53 46.76 59.55 40.22 Group B’ 10.99 30.35 15.90 25.57 9.67 30.62 Group C’ 42.92 52.82 45.40 53.16 50.97 39.61 Table 1. The table shows each accuracy between Non-RTI and RTI datasets for classifying with Group A, B, and C. compared to the paintings in other groups. Group C’ shows dry brush strokes using dried pigments, which is difficult to re- cognize the outline of each brushstroke. Group B’ includes the paintings in which different styles of brush strokes are mixed. Figure 8 shows the details of the style difference of each group. (a) Group A’ (b) Group B’ (c) Group C’ Figure 7. RTI data examples of three different groups. (a) Group A’ (b) Group B’ (c) Group C’ Figure 8. Cropped parts of RTI data from Figure 7 of each group. 4.2.1 Dataset of Experiment 2 The RTI and Non-RTI data were acquired at 7952 x 5304 resol- ution with Sony a7R III camera. Other processes were the same as Experiment 1. 4.2.2 Categorization the dataset We tested three different architectures of the CNN model - VGG-16 (16 Layers), ResNet-50 (50 Layers) and DenseNet- 121 (121 Layers) to see whether the simple CNN model or the complex model is efficient by increasing the layer from a rel- atively low-layered architecture (Simonyan, Zisserman, 2014) (He et al., 2016) (Huang et al., 2017). Since the paintings in experiment 2 were grouped into three classes, categorical was used as the loss function of the outlayer, and softmax as the activation function. The fully-connected layer has also been modified for the three classes. Six artworks in Group A’, three in Group B’, and two in Group C’ were chosen as a train set and a validation set. One artwork of each group was separated to be used as a test set. 90% of the dataset except the test set was used as a train set to learn the CNN model and 10% was used as a data set for validation of the model accuracy. The data were randomly separated when they were divided into a test set and a validation set. As a result, 65,930 data were chosen as a train set, 7,325 data as a validation set, and 13,685 data as a test set for Non-RTI data. For RTI data, 65,006 data were chosen as a train set, 7,222 data as a validation set, and 13,457 data as a test set. VGG-16, ResNet-50, DenseNet-121 models were pre-trained with ImageNet (Deng et al., 2009). 4.2.3 Results & Discussion The results in table 1 shows that RTI-data model is more ac- curate than Non-RTI-data model except Group A’. The group A’ with wet and clear brushstrokes does not show thick brush- strokes compared to group B’ and group C’ (Fig 7). That is, the difference between the RTI data and the Non-RTI data in Group A’ is not so apparent. Other groups explicitly show the differ- ence in brushstrokes between RTI and Non-RTI data. Group B’ with mixed brush strokes has the lowest accuracy among the three groups of RTI trained models. The total cropped image from RTI data of group B’ used as a test set in VGG-16 model was 4,830. The prediction results as groups A’, B’ and C’ were 1,862 (38.5 %), 1,466 (30.4%), 1,502 (31.1%) each (rounded to one decimal places). We consider that accuracy was low be- cause both types of brush strokes are observed at the same time in a picture. The results also show that accuracy does not always increase as the layer becomes deeper through experiments. In the case of group A, VGG-16 with Non-RTI dataset, group B, DenseNet-121 with RTI dataset, and group C’, ResNet-50 with RTI dataset showed the best performance. This result suggests that there is a need for future research to develop more suit- able machine learning architectures and optimizers for utilizing RTI data. In Experiment 2, more data was acquired than Ex- periment 1, but most data were classified into Group A’. This imbalance should be compensated to obtain better results. The disadvantage of machine learning is that it requires many data, and in most cases, the data must be labeled. The developer of the Generative Adversarial Network (GAN), Lan Goodfellow, said that at least 5,000 learning data per category is required for acceptable performance and at least one million learning examples are needed to match or exceed human performance (Goodfellow et al., 2016). In the field of machine learning, data issues have always existed. This study suggests the possibil- ity of using RTI data that visualizes depth information which can increase accuracy in painting classification where dataset acquisition is limited. 5. CONCLUSION In this paper, a new type of dataset for automatic oil painting style classification using machine learning was presented and evaluated. We hypothesized that the visualized depth inform- ation of brushstroke would be useful to improve the accuracy of the machine learning model for painting classification. RTI technology was applied to extract and visualize the depth in- formation of brushstrokes in paintings. In Experiment 1, the performance of binary classification obtained by learning RTI data was compared with Non-RTI data. In Experiment 2, clas- sifying three classes using more artworks. Since this study only performed data validation, we used a basic CNN model and an optimizer. According to the results, when performing machine learning using the RTI dataset, improved classification accuracy can be obtained compared to the general photography dataset. The results of experiments validated our hypothesis. However, we did not attempt to categorize various shapes, in- cluding sizes, angles, or area of brush strokes, as shown in each canvas. Indeed, the results of irregular accuracy in Experiment The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 621 2 allows us to rebuild in terms of setting dataset for the on-going project in the future work, and also it means the importance of classification dependant on which and how much data will be given for us is important to analyze images for the next step. For improving this gap between different styles of paintings, it is necessary for testing a variety of paintings to compare and classify them, not only in the number of paintings or styles but also in various artists. Besides, an optimized model for machine learning is needed, especially in a developed architecture apply- ing RTI technology for future work. Through this approach, it provides us to widen the area of art classification in machine learning, and further, it can provide a sophisticated tool to aid painting classification and appraisal. In the following research, we will expand the dataset with dif- ferent styles of artists and paintings, and divide them into the brush strokes of different objects included in the artwork. It is necessary to develop a model and optimizers suitable for the RTI data of the artwork to improve the performance of the paint- ing classification. ACKNOWLEDGEMENT This research is supported by Ministry of Culture, Sports and Tourism (MCST), Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research Development Program 2019, the National Research Foundation of Korea (NRF), grant funded by the Korea government (MSIT) (No.2017R1C1B1012808), and Korea Arts Management Ser- vice. REFERENCES Abate, D., Menna, F., Remondino, F., Gattari, M., 2014. 3D painting documentation: Evaluation of conservation conditions with 3d imaging and ranging techniques. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 45. Arora, R. S., Elgammal, A., 2012. Towards automated classific- ation of fine-art painting style: A comparative study. Proceed- ings of the 21st International Conference on Pattern Recogni- tion (ICPR2012), IEEE, pp. 3541–3544. Barni, M., Pelagotti, A., Piva, A., 2005. Image processing for the analysis and conservation of paintings: Opportunities and challenges. IEEE Signal Processing Magazine, 22(5), pp. 141– 144. Berezhnoy, I. E., Postma, E. O., van den Herik, H. J., 2009. Automatic extraction of brushstroke orientation from paintings. Machine Vision and Applications, 20(1), pp. 1–9. Berezhnoy, I. E., Postma, E. O., van den Herik, J., 2005. Com- puterized visual analysis of paintings. Int. Conf. Association for History and Computing, pp. 28–32. Berezhnoy, I., Postma, E., van den Herik, J., 2007. Computer analysis of Van Goghs complementary colours. Pattern Recog- nition Letters, 28(6), pp. 703–709. Berns, R. S., 2001. The science of digitizing paintings for color- accurate image archives: A review. Journal of imaging science and technology, 45(4), pp. 305–325. Breuckmann, B., 2011. 3-Dimensional digital fingerprint of paintings. 2011 19th European Signal Processing Conference, IEEE, pp. 1249–1253. Chollet, F. et al., 2015. Keras. https://github.com/fchollet/keras. Clarricoates, R., Kotoula, E., 2019. The potential of reflectance transformation imaging in architectural paint research and the study of historic interiors: A case study from Stowe House, England. Journal of the Institute of Conservation. Cortes, C., Vapnik, V., 1995. Support-Vector Networks. Ma- chine Learning, 20(3), pp. 273–297. Cultural Heritage Imaging, 2019. Reflectance transformation imaging (RTI). Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recog- nition, IEEE, pp. 248–255. Elkhuizen, W. S., Zaman, T., Verhofstad, W., Jonker, P. P., Dik, J., Geraedts, J. M., 2014. Topographical scanning and reproduc- tion of near-planar surfaces of paintings. Measuring, Modeling, and Reproducing Material Appearance, 9018, International So- ciety for Optics and Photonics, pp. 901809. Folego, G., Gomes, O., Rocha, A., 2016. From impressionism to expressionism: Automatically identifying van gogh’s paint- ings. 2016 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 141–145. Giachetti, A., Ciortan, I., Daffara, C., Pintus, R., Gobbetti, E. et al., 2017. Multispectral RTI analysis of heterogeneous art- works. Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT press. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learn- ing for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q., 2017. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Re- cognition, pp. 4700–4708. Johnson, C. R., Hendriks, E., Berezhnoy, I. J., Brevdo, E., Hughes, S. M., Daubechies, I., Li, J., Postma, E., Wang, J. Z., 2008. Image processing for artist identification. IEEE Signal Processing Magazine, 25(4), pp. 37–48. Khan, F. S., Beigpour, S., Van de Weijer, J., Felsberg, M., 2014. Painting-91: A large scale database for computational paint- ing categorization. Machine Vision and Applications, 25(6), pp. 1385–1397. Kingma, D. P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. Imagenet classification with deep convolutional neural networks. Ad- vances in Neural Information Processing Systems, pp. 1097– 1105. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 622 Li, J., Yao, L., Hendriks, E., Wang, J. Z., 2011. Rhythmic brush- strokes distinguish Van Gogh from his contemporaries: Find- ings via automated brushstroke extraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), pp. 1159– 1176. Manrique Tamayo, S. N., Andrés, V., Cayetano, J., Osca Pons, M., 2013. Applications of reflectance transformation imaging for documentation and surface analysis in conservation. Inter- national Journal of Conservation Science, 4, pp. 535–548. Nanni, L., Ghidoni, S., Brahnam, S., 2017. Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognition, 71, pp. 158–172. Pamart, A., Ponchio, F., Abergel, V., Alaoui M’Darhri, A., Corsini, M., Dellepiane, M., Morlet, F., Scopigno, R., De Luca, L., 2019. A complete framework operating spatially-oriented RTI in a 3d/2d cultural heritage documentation and analysis tool. ISPRS-International Archives of the Photogrammetry, Re- mote Sensing and Spatial Information Sciences, 422, pp. 573– 580. Peng, K.-C., Chen, T., 2015. A framework of extracting multi- scale features using multiple convolutional neural networks. 2015 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp. 1–6. Ponchio, F., Lamé, M., Scopigno, R., Robertson, B., 2018. Visualizing and transcribing complex writings through rti. 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), IEEE, pp. 227–231. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H., 2015. Understanding neural networks through deep visualiza- tion. ArXiv Preprint ArXiv:1506.06579. Zeiler, M. D., Fergus, R., 2014. Visualizing and understanding convolutional networks. European Conference on Computer Vision, Springer, pp. 818–833. Revised May 2019 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, 2019 27th CIPA International Symposium “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W15-617-2019 | © Authors 2019. CC BY 4.0 License. 623