1 Introduction

In recent years, machine learning has influenced how we solve a variety of real-world problems. Indeed, artificial neural networks (NN) outperformed many state-of-the-art approaches in several applications with the development of deep neural networks (DNN) and convolutional neural networks (CNN) architectures.

Most neural network architectures are real-valued neural networks (RVNN). In such architectures, the input data is arranged into real-valued vectors, matrices, or tensors to be processed by the neural network. In some sense, this approach assumes that all the input data components have equal importance and, thus, they are all evaluated in the same way. However, in some cases, the data sets contain multidimensional information that requires a specific approach to treat them as single entities. For example, a pixel’s color is obtained by combining the red, green, and blue components in image processing. The three coordinates position in the color space represents a plethora of colors such as pink or brown, and the color information is lost if the components are treated separately [38]. In some practical image recognition tasks, the complexity of the color space needs to be captured by the neural networks to generalize well and represent the multidimensional nature of the colors [20, 27]. Indeed, Parcollet et al. showed that RVNNs might fail to capture the color information [39]. Also, Matsui et al. remarked that RVNNs are not able to preserve the 3D shape of an input object when transformed into the 3D space [30]. From these remarks, neural networks based on hypercomplex numbers, such as complex and quaternions, have been proposed and extensively investigated in the last years.

1.1 Complex and Quaternion-Valued Neural Networks

A complex-valued neural network (CVNN) is based on the algebra of complex numbers, which allows preserving or treating the relationship between magnitude and the phase information during the learning [38]. Furthermore, the algebraic structure of complex numbers yields CVNNs better generalization capability [17] besides being easier to train [35]. As long as the processed information are correlated two-dimensional data, CVNNs mostly outperformed or at least matched the real-valued ones [3, 4, 16, 29, 48].

The encouraging performance of CVNNs inspired the development of quaternion-valued neural networks (QVNNs). QVNNs use quaternion algebra and can represent colors efficiently, with the advantage of fully representing colors through unique structures [6].

As far as we know, the first QVNN has been introduced by Arena et al. [6], who developed a specific backpropagation algorithm able to learn the local relations that exist between quaternions. Furthermore, like the real-valued neural networks, single hidden layer QVNNs are universal approximators [6]. An extensive list of applications and investigations with different QVNN architectures can be found in references [7, 11, 36,37,38, 40, 46]. A detailed up-to-date review on quaternion-valued neural networks, including some of their successful applications, can be found at [38].

In contrast to RVNN, which represented color channels as independent variables, QVNN can benefit from representing colors as single quaternions. For example, Greenblatt et al. applied a QVNN model to prostate cancer [13]. Gaudet and Maida investigated the use of quaternion-valued convolutional neural networks (QVCNN) for image processing [11]. Pavllo et al. modeled human motion using QVNNs [41]. Zhu et al. proposed a QVCNN for color image classification and denoising tasks [51]. The localization of color image splicing by a fully quaternion-valued convolutional network was explored by Chen et al. [9]. A deformable quaternion Gabor convolutional neural network for recognition of color facial expression was proposed by Jin et al. [22]. Takahashi et al. have merged histograms of oriented gradients (HOG) for human detection with a QVNN to determine human facial expression [47]. Quaternion multi-layer perceptron has been successfully applied to polarimetric synthetic aperture radar (PolSAR) land classification [24, 43].

1.2 Contributions and the Organization of the Paper

Corroborating with the development of hypercomplex-valued neural networks, we present a quaternion-valued convolutional neural network (QVCNN) development to classify isolated white cells as lymphoblasts. Precisely, the QVCNN receives a white cell image like the one shown in Fig. 1 and classifies it as a lymphoblast or not. The classification of lymphoblast is essential for diagnosing acute lymphoblastic leukemia, a kind of blood cancer. The performance of the QVCNN is compared with a real-valued convolutional neural network with a similar architecture.

Fig. 1.
figure 1

Candidate cell to be a lymphoblast, from ALL-IDB dataset [28].

The paper is structured as follows: Sect. 2 presents the medical problem of acute lymphoblastic leukemia and presents a literature review on the computer-aided diagnosis of leukemia. Section 3 addresses real-valued and quaternion-valued convolutional neural networks. The experimental results are detailed in Sect. 4. The Sect. 5 presents the concluding remarks and future works.

2 Acute Lymphoblastic Leukemia (ALL)

According to the national cancer institute of the United States, acute lymphoblastic leukemia (ALL) is a type of leukemia, cancer in the blood, that appears and multiplies rapidly [32]. ALL is characterized by the presence of many lymphoblasts in the blood and also in the bone marrow. In this context, a lymphoblast is an immature cell that can be converted into a mature lymphocyte [33].

There are several methods used for the diagnosis of ALL that can be found in the literature [34], including the peripheral blood smear technique [23]. The peripheral blood smear technique allows observing the information of a blood sample taken from the patient through a microscope. A specialist (hematologist) counts the number of lymphoblasts observed by microscope and, based on that, makes a diagnosis [45]. Figure 2 shows a picture of a blood smear that a hematologist sees for analysis. It is worth mentioning that the white cells appear stained with a bluish-purple coloration, which serves as a guide to find lymphoblasts.

Fig. 2.
figure 2

Blood smear image from ALL-IDB dataset [28].

The manual counting of lymphoblasts under the microscope is a somewhat dull task that takes much time from a professional who could be more productive in other matters. In effect, the time spent analyzing the microscope image has an economical cost because a specialist has significant value in the labor market. In addition, the analysis can be affected by human factors such as tiredness and stress. The operator’s experience also plays an important role, and therefore, there is a subjectivity component affecting the results of the lymphoblast count. For these reasons, computational models to perform automatic lymphoblast counting in a blood smear image have been proposed in the literature [42].

Many methods divide the problem of automatic lymphoblast counting into two stages. The first stage, usually called the identification phase, aims to find white cells to be lymphoblast. Labeling a candidate cell as a lymphoblast or healthy cell is performed in the second stage, referred to as the classification phase. In this paper, we use real-valued and quaternion-valued convolutional neural networks to classify white cells, that is, in the second stage of the blood smear image analysis. In the following sections, we review real-valued and quaternion-valued neural networks. Before, however, we provide a literature review on automatic leukemia diagnosis methods.

2.1 Computer-Aided Diagnosis of Leukemia: Literature Review

Current literature has shown a large number of studies on computer-aided leukemia diagnosis with different approaches, including support vector machines (SVM), k-nearest neighbor (k-NN), principal component analysis (PCA), naive Bayes classifier, and random forest [8].

In [26], the authors used 60 sample images to develop a model to detected ALL using kNN and naive Bayes classifier with 92.8% accuracy. A method to extract features of microscopic images using discrete orthogonal Stockwell transform (DOST) and linear discriminant analysis (LDA) has been proposed in [31]. The paper [50] applies three pre-trained CNN architectures to extract features for image classification. In [2], a CNN reached 88.25% of accuracy in classifying ALL versus healthy cells. To distinguish between the four subtypes of leukemia, this CNN hits 81.74% accuracy. Using ALL-IDB dataset, [1] presents a k-medoids algorithm with 98.60% accuracy to classify white blood cells. Furthermore, a method based on generative adversarial optimization (GAO) [49], a neural network with statistic features [5], and a deep CNN with chronological sine-cosine algorithm (SCA) [21] have been proposed for ALL detection with 93.84%, 97.07%, and 98.70% accuracy, respectively.

A table summarizing the results from 16 papers on automated detection of leukemia and its subtypes can be found in [8]. This reference also presents a framework for automated leukemia diagnosis based on the ResNet-34 [15] and the DenseNet-121 [19]. The accuracy reported was 99.56% for the ResNet-34 and 99.91% for the DenseNet-121 [8].

3 Convolutional Neural Networks

In many machine learning applications, identifying appropriate representations of a large amount of data is usually challenging. A successful model must efficiently encode local relations within the input resources and their structural relations. Moreover, an adequate representation of data also offers a positive side effect by reducing the number of neural parameters needed to well-learn the input features, leading to a natural solution to the overfitting phenomenon [38].

Convolutional neural networks (CNN) are feed-forward neural networks with a robust feature representation method widely applied in machine learning. For example, the ResNet set a milestone in 2015 by outperforming humans in the ImageNet competition [10, 15]. The successful AlexNet [25] also inspired the development of many novel CNNs including the VGG [44] and the DenseNet [19]. In addition, deep neural networks have been successfully used, for example, for segmentation tasks as well as for the automatic classification of objects in images [14, 18].

One crucial aspect of the deep networks is the convolution layer, which extracts features from high-dimensional data through a set of convolution kernels [51]. Although convolutions perform well in many practical situations, it has some drawbacks in color image processing tasks. Firstly, a convolution layer sums up the outputs corresponding to different channels and ignores their complicated interrelationships. As a consequence, it may eventually lose important information of a color image. Secondly, simply summing up the outputs gives too many degrees of freedom, and thus, the network has a high risk of overfitting even when imposing heavy regularization terms [51]. Accordingly, García-Retuerta et al. argue that quaternion-valued neural networks may have a significant advantage in color image processing tasks because of quaternion’s four-dimensional algebraic structure [10]. The following section reviews the basic concepts of quaternion-valued convolutional neural networks.

3.1 Quaternion-Valued Convolutional Neural Networks

Quaternions are a four-dimensional extension of complex numbers. Developed by Hamilton in 1843, the set of all quaternions is defined by

$$\begin{aligned} \mathbb {H} = \{q = {q}_0 + {q}_1 \boldsymbol{i}+ {q}_2 \boldsymbol{j}+ {q}_3 \boldsymbol{k}: q_0,q_1,q_2,q_3 \in \mathbb {R} \} \end{aligned}$$
(1)

where \(q_0\) is the real part of a quaternion, \(q_1\), \(q_2\), and \(q_3\) denote the imaginary components while \(\boldsymbol{i}\), \(\boldsymbol{j}\), \(\boldsymbol{k}\) are the hypercomplex units. The product of the hypercomplex units is governed by the following identities, knows as Hamilton rules:

$$\begin{aligned} \begin{array}{c} \boldsymbol{i}^2 = \boldsymbol{j}^2 = \boldsymbol{k}^2 = \boldsymbol{i}\boldsymbol{j}\boldsymbol{k}= -1. \end{array} \end{aligned}$$
(2)

Alternatively, a quaternion can be written as

$$\begin{aligned} q = (q_0+q_1\boldsymbol{i}) + (q_2+q_3\boldsymbol{i})\boldsymbol{j}= z_0 + z_1 \boldsymbol{j}, \end{aligned}$$
(3)

where \(z_0 = q_0 + q_1\boldsymbol{i}\) and \(z_1 = q_2+q_3\boldsymbol{j}\) are complex numbers.

The addition of quaternions is performed adding the real and imaginary components. Precisely, given \(p = {p}_0 + {p}_1 \boldsymbol{i}+ {p}_2 \boldsymbol{j}+ {p}_3 \boldsymbol{k}\) and \(q ={q}_0 + {q}_1 \boldsymbol{i}+ {q}_2 \boldsymbol{j}+ {q}_3 \boldsymbol{k}\), their sum is

$$\begin{aligned} p+q = (p_0+q_0) + (p_1+q_1)\boldsymbol{i}+ (p_2+q_2)\boldsymbol{j}+ (p_3+q_3)\boldsymbol{k}. \end{aligned}$$
(4)

The main result in quaternion algebra is the Hamilton product between two quaternions \(p = {p}_0 + {p}_1 \boldsymbol{i}+ {p}_2 \boldsymbol{j}+ {p}_3 \boldsymbol{k}\) and \(q = {q}_0 + {q}_1 \boldsymbol{i}+ {q}_2 \boldsymbol{j}+ {q}_3 \boldsymbol{k}\), denoted by \(p \otimes q\) and defined by

$$\begin{aligned} \begin{array}{rl} p \otimes q &{}= (p_0 q_0 - p_1 q_1 - p_2 q_2 - p_3 q_3) + \, (p_0 \, q_1 + p_1 \, q_0 + p_2 \, q_3 - p_3 \, q_2) \, \boldsymbol{i}\\ &{} \quad + \, (p_0 \, q_2 - p_1 \, q_3 + p_2 \, q_0 + p_3 \, q_1) \, \boldsymbol{j}+ \, (p_0 \, q_3 + p_1 \, q_2 - p_2 \, q_1 + p_3 \, q_0) \, \boldsymbol{k}\end{array} \end{aligned}$$
(5)

Quaternions and quaternion algebra allow building processing entities composed of four elements that share information via the Hamilton product.

According to Gaudet and Maida [11], a quaternion-valued convolutional layer is obtained convolving a quaternion-valued filter matrix \(\boldsymbol{W} = {\boldsymbol{W}}_0 + {\boldsymbol{W}}_1 \boldsymbol{i}+ {\boldsymbol{W}}_2 \boldsymbol{j}+ {\boldsymbol{W}}_3 \boldsymbol{k}\) by a quaternion-valued vector \(\boldsymbol{h} = {\boldsymbol{h}}_0 + {\boldsymbol{h}}_1 \boldsymbol{i}+ {\boldsymbol{h}}_2 \boldsymbol{j}+ {\boldsymbol{h}}_3 \boldsymbol{k}\). Here, \(\boldsymbol{W}_0\), \(\boldsymbol{W}_1\), \(\boldsymbol{W}_2\), and \(\boldsymbol{W}_3\) are real-valued matrices while \(\boldsymbol{h}_0\), \(\boldsymbol{h}_1\), \(\boldsymbol{h}_2\), and \(\boldsymbol{h}_3\) are real-valued vectors. Details on the implementation of quaternion-valued convolutional layers can be found in [11].

Fig. 3.
figure 3

RVCNN and QVCNN architectures.

4 Computational Experiments

Let us compare real-valued and quaternion-valued convolutional neural networks’ performance for classifying a white cell image as a lymphoblast. Both real-valued and quaternion-valued neural networks have been implemented in python using the Keras and Tensorflow libraries.

The real-valued model is a sequential feed-forward network composed of three convolutional layers, three max-pooling layers, and a dense layer. Precisely, the first convolutional layer has 32 filters with a (3, 3) kernel and ReLU activation function. A max-pooling follows the convolutional layer with a (2, 2) kernel. The second and third two-dimensional convolutional layers have 64 and 128 filters and have ReLU activation functions. Furthermore, they are also followed by max-pooling layers with (2, 2) kernels. Figure 3 shows the architecture of the real-valued convolutional neural network. The total number of trainable parameters of the real-valued convolutional neural network is 106, 049.

The quaternion-valued convolutional neural network has been designed similarly. Precisely, to maintain the same parameter budget among the real and quaternion-valued models, the number of filters per layer of the real-valued network was divided by four to build a quaternion-valued convolution. Thus, the quaternion-valued convolutional neural network has the same structure as the real-valued network depicted in Fig. 3, but with a quarter of the number of filters per layer. The number of trainable parameters of the quaternion-valued CNN model is 36, 353. Table 1 summarizes the number of trainable parameters of both neural networks per layer.

Table 1. Parameters of real and quaternion-valued neural networks

The dense layer of both real-valued and quaternion-valued networks has a single output neuron without activation function. Such a single neuron is used to classify the input image as a lymphoblast or not. Moreover, the parameters of all layers have been initialized according to Glorot and Bengio [12]. The optimizer used was Adam, an algorithm based on the stochastic gradient descent method with adaptive estimation of first-order and second-order moments.

To evaluate the performance of the RVCNN and QVCNN classifiers, we used the ALL-IDB: The Acute Lymphoblastic Leukemia Image Database for Image Processing provided by the “Università Degli Studi di Milano” [28]. This image database contains 260 images of white blood cells with \(257 \times 257\) pixels, labeled by experts and evenly distributed among lymphoblast and health cells. Figure 1 shows an example of a color image used in the computational experiment.

We resized the \(257 \times 257\) white blood cells images to \(100 \times 100\) pixels. Also, the set of 260 color images was randomly divided into training and test images with different ratios. Data augmentation has been applied on the training set to improve the accuracy of the convolutional neural networks. Precisely, the images used for training were all submitted to a pre-processing data generation, which consists of obtaining new images through horizontal and vertical flips.

In our experiments, images were converted to RGB (red, green, and blue) and HSV (hue, saturation, and value) color spaces and used as input to neural networks. As a consequence, we performed the four experiments detailed in Table 2. The first experiment considers a real-valued CNN whose input is obtained by concatenating the three RGB channels in a single tensor with values in the unit interval [0, 1]. The second experiment also considers real-valued CNNs, but the input is obtained by concatenating the three HSV channels. Here, hue is arranged in a radial slice \(H \in [0,2\pi )\) while saturation and value belong to the unit interval, i.e., \(S, \ V \in [0,1]\).

Table 2. Experiments with real and quaternion-valued neural networks

The last two experiments were performed using quaternion-valued CNN. Specifically, in the third experiment, the RGB image is encoded in a quaternion structure with real part null, and each channel as one imaginary part of a quaternion as follows:

$$\begin{aligned} q = 0 + R \ \boldsymbol{i}+ G \ \boldsymbol{j}+ B \ \boldsymbol{k}. \end{aligned}$$
(6)

Finally, in the fourth experiment, a color is encoded in a quaternion through the following expression using the HSV representation:

$$\begin{aligned} q = S \cos (H) + S \sin (H) \ \boldsymbol{i}+ V \cos (H) \ \boldsymbol{j}+ V \sin (H) \ \boldsymbol{k}. \end{aligned}$$
(7)

The dataset has been divided into training and test sets with 5 different training/test ratios and trained by 100 epochs. One hundred simulations were performed for each different training/test ratio and, the average and standard deviation of the accuracy was calculated. Figure 4 presents the average accuracy of both real-valued and quaternion-valued convolutional neural networks for different percentages used for testing the networks in the four experiments. This figure also presents the interval between the \(25\%\) and the \(75\%\) quantiles of accuracy as shaded area.

Fig. 4.
figure 4

Accuracy by the percentage of the dataset used for testing.

Note from Fig. 4 that the quaternion-valued convolutional neural network with images in HSV color space (QVCNN-HSV) obtained the best performance, reaching 98.2% of accuracy in the test phase with 10% of training/test ratio.

The real and quaternion-valued networks with RGB encoded images exhibited similar performance, with accuracy between \([93.6\%, \ 97.1\%]\) and \([94.4\%,\) \(97.3\%]\), respectively, depending on the ratio training/test. The real and quaternion-valued CNN models with RGB encoded images exhibited statistically equivalent performances. The real-valued neural network with HSV encoded images yielded the worst performance, reaching an average accuracy of 95.3% in the best case.

Concluding, the QVCNN-HSV exhibits a better generalization capability than the QVCNN-RGB, RVCNN-RGB, and RVCNN-HSV models. Moreover, the performance of the quaternion-valued convolutional neural network with images encoded using the HSV color space and (7) compares well with the results reported in the literature (see Sect. 2.1). However, the quaternion-valued convolutional neural network is much simpler than many of the architectures considered previously.

5 Concluding Remarks and Future Works

Acute lymphoblastic leukemia is characterized by many lymphoblasts in the blood and the bone marrow. Such disease can be diagnosticated by counting the number of lymphoblasts in a blood smear microscope image. This paper investigated the application of convolutional neural networks for classifying a white cell as lymphoblast or not. Precisely, we compared the performance of real-valued and quaternion-valued models. The QVCNN with input images encoded using the HSV color space showed the best result in our experiments. Also, the performance of the QVCNN is comparable with other deeper neural networks from the literature, including the ResNet and the DenseNet [8]. This computational experiment suggests that quaternion-valued neural networks exhibit better generalization capability than the real-valued convolutional neural network, possibly because it treats colors as single quaternion entities. Furthermore, it is noticeable that the quaternion-valued convolutional neural network has about 34% of the parameters of the corresponding real-valued model.

We plan to develop neural networks that segment and classify white blood cells on a blood smear microscope image as future work. Further research can also address the application of QVCNN for the classification of other types of leukemia.