key: cord-0298064-nsbq1xys
authors: Cooper, Clayton; Zhang, Jianjing; Gao, Robert X.; Wang, Peng; Ragai, Ihab
title: Anomaly detection in milling tools using acoustic signals and generative adversarial networks
date: 2020-12-31
journal: Procedia Manufacturing
DOI: 10.1016/j.promfg.2020.05.059
sha: b281aec7972a4726b2709a61b51cacd899425825
doc_id: 298064
cord_uid: nsbq1xys

Abstract Acoustic monitoring presents itself as a flexible but under-reported method of tool condition monitoring in milling operations. This paper demonstrates the power of the monitoring paradigm by presenting a method of characterizing milling tool conditions by detecting anomalies in the time-frequency domain of the tools’ acoustic spectrum during cutting operations. This is done by training a generative adversarial neural network on only a single, readily obtained class of acoustic data and then inverting the generator to perform anomaly detection. Anomalous and non-anomalous data are shown to be nearly linearly separable using the proposed method, resulting in 90.56% tool condition classification accuracy and a 24.49% improvement over classification without the method.

While there have recently been great strides in manufacturing-inclined machine learning methods and models [1] [2] , there is still a significant reliance on multi-class data availability when working with classifiers in machine learning. In the realm of machine condition monitoring, these various classes are often behaviors under different failure modes such as bearing faults [3] [4] or different states such as varying tool condition [5] [6] . While having a wide breadth of these different classes available is helpful for developing machine learning models, the difficulty in gathering data from realistic operating conditions poses a major barrier to multi-class data analytics. Operating machines under such conditions can be economically infeasible in real-world applications [7] whereas in-lab simulations may not replicate well the intricacies of large-scale machine operation on the factory floor. Therefore, the ability to classify data using a classifier trained on only a single, normal, well-labeled class is a sought-after goal in manufacturing, as doing so would alleviate extensive data gathering under various operating conditions during realistic manufacturing scenarios.

An example to illustrate this concept is the milling operation. A common failure mode in milling is the wear of the cutting tool -without a timely tool change, a tool dulled beyond a set threshold can result in low product surface quality, leading to part rejection and waste of time, money, and energy [8] . Detecting when a tool has passed this wear threshold from compliant to noncompliant is thus critical and defines a binary classification problem. This problem has been approached from different sensing modalities, such as force [9] , vibration [10] , motor current [11] , data fusion [12] , and acoustics [13] . Acoustic sensing has the advantage of being flexible and responsive in its implementation, however noise contamination poses a challenge. It has been previously shown that an acoustic signal processing approach involving 2D deep convolution in the time-frequency domain is capable of 99.5% classification accuracy when separating compliant milling tool audio signals from noncompliant milling tool audio signals [14] . However, this approach relies on the acquisition of audio signals from

While there have recently been great strides in manufacturing-inclined machine learning methods and models [1] [2] , there is still a significant reliance on multi-class data availability when working with classifiers in machine learning. In the realm of machine condition monitoring, these various classes are often behaviors under different failure modes such as bearing faults [3] [4] or different states such as varying tool condition [5] [6] . While having a wide breadth of these different classes available is helpful for developing machine learning models, the difficulty in gathering data from realistic operating conditions poses a major barrier to multi-class data analytics. Operating machines under such conditions can be economically infeasible in real-world applications [7] whereas in-lab simulations may not replicate well the intricacies of large-scale machine operation on the factory floor. Therefore, the ability to classify data using a classifier trained on only a single, normal, well-labeled class is a sought-after goal in manufacturing, as doing so would alleviate extensive data gathering under various operating conditions during realistic manufacturing scenarios.

An example to illustrate this concept is the milling operation. A common failure mode in milling is the wear of the cutting tool -without a timely tool change, a tool dulled beyond a set threshold can result in low product surface quality, leading to part rejection and waste of time, money, and energy [8] . Detecting when a tool has passed this wear threshold from compliant to noncompliant is thus critical and defines a binary classification problem. This problem has been approached from different sensing modalities, such as force [9] , vibration [10] , motor current [11] , data fusion [12] , and acoustics [13] . Acoustic sensing has the advantage of being flexible and responsive in its implementation, however noise contamination poses a challenge. It has been previously shown that an acoustic signal processing approach involving 2D deep convolution in the time-frequency domain is capable of 99.5% classification accuracy when separating compliant milling tool audio signals from noncompliant milling tool audio signals [14] . However, this approach relies on the acquisition of audio signals from

While there have recently been great strides in manufacturing-inclined machine learning methods and models [1] [2] , there is still a significant reliance on multi-class data availability when working with classifiers in machine learning. In the realm of machine condition monitoring, these various classes are often behaviors under different failure modes such as bearing faults [3] [4] or different states such as varying tool condition [5] [6] . While having a wide breadth of these different classes available is helpful for developing machine learning models, the difficulty in gathering data from realistic operating conditions poses a major barrier to multi-class data analytics. Operating machines under such conditions can be economically infeasible in real-world applications [7] whereas in-lab simulations may not replicate well the intricacies of large-scale machine operation on the factory floor. Therefore, the ability to classify data using a classifier trained on only a single, normal, well-labeled class is a sought-after goal in manufacturing, as doing so would alleviate extensive data gathering under various operating conditions during realistic manufacturing scenarios.

An example to illustrate this concept is the milling operation. A common failure mode in milling is the wear of the cutting tool -without a timely tool change, a tool dulled beyond a set threshold can result in low product surface quality, leading to part rejection and waste of time, money, and energy [8] . Detecting when a tool has passed this wear threshold from compliant to noncompliant is thus critical and defines a binary classification problem. This problem has been approached from different sensing modalities, such as force [9] , vibration [10] , motor current [11] , data fusion [12] , and acoustics [13] . Acoustic sensing has the advantage of being flexible and responsive in its implementation, however noise contamination poses a challenge. It has been previously shown that an acoustic signal processing approach involving 2D deep convolution in the time-frequency domain is capable of 99.5% classification accuracy when separating compliant milling tool audio signals from noncompliant milling tool audio signals [14] . However, this approach relies on the acquisition of audio signals from 48th SME North American Manufacturing Research Conference, NAMRC 48 (Cancelled due to both classes, and hence is subject to the data availability limitation as described above. Demonstrating comparable classification performance while having only compliant tool audio available would be a significant step forward for acoustic tool condition monitoring (TCM), which is the focus of this paper.

Machine learning techniques have shown promise when extracting hidden patterns from audio signals. [15] utilized a generative adversarial network (GAN) and time-frequency representations of audio signals to conduct audio style transfer, both from narrator to narrator and from musical genre to musical genre. A transfer learning-based approach to speech recognition and processing has shown to be successful [16] . Recurrent neural networks have also been successfully utilized for audio processing and synthesis [17] [18] . These efforts have dealt with highly structured or isolated sound signals, such as music and human speech. What has not been reported is machine learning success in analyzing and synthesizing unstructured sounds like those associated with milling operations. The presented study addresses this gap by investigating a method of acoustic-based TCM using GANs and highly unstructured data, thereby presenting a new scenario of sound signal analysis through machine learning.

Recently, GANs have shown to be capable of anomaly detection after being trained on only a single class of data [19] . GAN-based methods are in contrast to other single-class methods such as one class support vector machines and support vector data description [20] as well as probabilistic k-means clustering and Gaussian mixture models [21] . The first two approaches have intentionally simple decision boundaries (hyperplanar and hyperspherical, respectively) but do not perform well when trained using broadly distributed data that may substantially overlap in some dimension with evaluation data. The last two approaches, in comparison, require making assumptions about underlying data distributions as well as arbitrary quantization and thresholding. GAN-based methods are more computationally demanding than the above methods but result in well-separated data in low-dimensional dataspace for classification as well as more explainable interpretations of classification performance and behavior. This latter point is due to the fact that GANs are capable of outputting data of any dimension, which allows real images to be compared to GANgenerated images or real time series to be compared to GANgenerated time series, etc.

Motivated by these findings, this paper investigates TCM using highly unstructured audio and GANs in order to advance the feasibility of acoustic TCM. The remainder of the paper is organized as follows: Section 2 provides background on GANs and the proposed anomaly detection method using them; Section 3 presents a case study using audio signals of both compliant and noncompliant milling tools; Section 4 presents case study results; Section 5 presents conclusions and pathways for future work.

This section provides background on generative adversarial networks and describes how such networks may be used for anomaly detection.

Gaining widespread attention following a 2014 paper [22] , GANs have emerged as a state of the art method of data generation in a variety of dimensions including images [23] and audio [24] . The basic structure of a GAN is depicted in Fig. 1 . GANs operate on the premise that a generator (G) can be trained such that it can transform random noise vectors ( ⃗) into data that closely resembles the ground truth data ( ⃗ ). The performance of such a generator is measured by the discriminator (D) whose sole purpose is to correctly classify inputs as either "real" (ground truth) or generated by the generator. It does this by outputting a single scalar representing the probability that an input is from the ground truth data set. Conversely, the objective of the generator is to deceive the discriminator and generate data that is not separable from the ground truth data, thus making G and D adversaries. G and D may take on the form most suitable for the real data's format (fully-connected, convolutional, etc.) as long as the generator outputs the proper data format for the task and the discriminator returns the aforementioned probability. Regardless of model type, the discriminator's loss is directly proportional to the number of discriminator misclassifications in a batch of mixed real and generated samples whereas the generator's loss is inversely proportional to this quantity [22] . Both the generator and discriminator use machine learning techniques like stochastic gradient descent or Adam optimization [25] in order to minimize their loss over time. 

Note that Eq. (1) is a basic GAN value function and that numerous other functions have been proposed that have numerical stability as they do not tend to negative infinity [26] .

For the presented methodology, an anomaly is defined as a pattern in data which does not conform to expected behavior

expected value of discriminator accuracy (-∞ to 1) 27] . This may lead to the assumption that the aptly named discriminator of a GAN is well suited for anomaly detection, which however is not true. GAN discriminators are trained specifically to detect if an input is from a ground truth data set. As such, the discriminators' learned data patterns may or may not be sufficient to detect if an input is of a different class than the one on which the discriminator was trained. Performing multi-class classification with a discriminator results only in measures of similarity between the examples used for training and the examples not used for training. Discriminators do not provide a unique and robust measure for class difference between these two groups.

Classification abilities are instead best provided by the GAN generator. The procedure used herein for creating an anomaly detector using a GAN generator is as follows [28] :

1. Train generator such that it produces data of arbitrarily high similarity to ground truth data ⃗ 2. Freeze generator so that none of its parameters change during subsequent steps 3. Using gradient descent or other multi-dimensional optimization algorithm, initialize and search generator input space ⃗ for a reconstruction of a given example, %%⃗ (update ⃗ based on gradients of the reconstruction loss with respect to ⃗ at each of n iterations) 4. If %%⃗ can be reconstructed according to a set threshold of accuracy or loss, it is not anomalous; if %%⃗ cannot be constructed according to said threshold, it is anomalous

As discussed in detail in [28] , a successful search in the latent space of noise vector ⃗ should yield a generator output G( ⃗) that recreates a given example %%⃗ if and only if G has both learned the true data distribution in ⃗ and the given example is indeed from ⃗. (For simplicity, the generator is assumed to have learned the true distribution of ⃗ through a rigorous GAN training procedure.) The examples which can be successfully recreated from the latent space are denoted as %%⃗ ! . Alternatively, if a given %%⃗ does not come from ⃗, then G has not been trained to reconstruct such an example and thus a search of latent space shall be unsuccessful, and this %%⃗ will not be able to be reconstructed by G. The examples which cannot be successfully recreated from the latent space are denoted as %%⃗ -. The Euclidian loss as presented in Eq. (2) is used as the measure for reconstruction accuracy, denoting the sum of the squared "distance" between each element of the reference datum %%⃗ and its attempted reconstruction, G( ⃗):

(2) This loss function allows for a gradient-based search through the latent space ⃗ in search for a minimum. As %%⃗ ! is better reconstructed by G than is %%⃗ -, Eq. (3) is established:

Notably, Eq. (2) condenses the high-dimensional spaces of ⃗, %%⃗, and G to the single-dimensional space of ℓ. This would ideally mean that %%⃗ ! and %%⃗are linearly separable given a G capable of inducing such a separability, or at least nearly linearly separable. Also note that by freezing G following its optimization versus D, G does not need to be optimized further. Additionally, %%⃗ cannot be optimized since it is ground truth. Therefore, the only variable needing to be optimized is ⃗, which greatly reduces the computational and memory complexities of the GAN-based anomaly detection model.

The experimental setup investigated for the presented study is depicted in Fig. 2 . ISCAR 328 carbide indexable inserts are secured into a 2.54 cm end mill and cuts are made into a block of 1018 steel at 121 HB hardness. For all experiments, the cutting depth is held constant at 0.254 mm. A shallow depth of cut is chosen to allow for not using cutting fluid, which would otherwise introduce additional variables to the audio signal collection by changing the acoustic characteristics of the tool and workpiece via temperature variation or by adding acoustic noise via coolant nozzles.

A 32-microphone spherical beamformer is used to gather inprocess audio signals at a 48 kHz sampling rate. Each cut is recorded for 10 seconds and recording commences upon full diametric engagement of the tool. Two kinds of audio signals are collected: that of brand-new compliant inserts and that of consistently worn noncompliant inserts. Prior to being recorded, the latter inserts were worn to 2 mm of flank wear by removing 105 mm 3 of 1018 steel (121 HB hardness) at 1215 RPM and 264 mm/min feed, respectively. A comparison of compliant and noncompliant tool wear is shown in Fig. 3 . Fig. 2 . Experimental setup [29] A motorized ATLAS knee mill is used for all the experimental runs in order to maintain consistency in the feed throughout. Experimentation is designed for factorial Taguchi 3 2 , using mill RPMs of 804, 1215, and 1406, and feeds of 176 mm/min, 264 mm/min, and 308 mm/min, respectively. A total of 40 cuts are made, 13 of which are with noncompliant tools. Each insert is used for data collection only once in order to maintain consistency in tool conditions. A total of 40 different

inserts are used. Full experiment details are listed in Table 1 .

Recording with 32 microphones results in a total of 1280 audio files, 384 of which are of noncompliant tool audio. 804  804  804  1215  1215  1215  1406  1406  1406  804  804  804  1215   176  264  308  176  264  308  176  264  308  176  264  308  176   4  3  3  3  3  3  3  3  3  3  3  3  3 

Microphone data from single cut arrives as a onedimensional vector of air pressure levels relative to the microphone diaphragm. Each vector is 440,000 samples long, which results in a large quantity of data when 1280 samples are present. Accordingly, data size reduction is needed prior to performing data analysis using machine learning methods. While the GAN model described in Section 2 is able to be used on any data dimensionality, audio data may be analyzed in one dimension (time or frequency) or two dimensions (timefrequency). Based on a literature study involving experimental results in other fields, a two-dimensional convolutional neural network for signal processing tasks like that at hand is chosen [30] . Thus, two-dimensional convolution is chosen as the network type for the generator and discriminator. Accordingly, grayscale time-frequency images are made from each of the 1280 audio files and stored as .npy numerical arrays in order to increase data handling speed.

The time-frequency images are created using the short-time Fourier transform (STFT), and the Hanning window was set to a length such that the resultant image has square representations of the frequencies' presence. Square representations present a balance between when a frequency occurs and what the frequency is. Accordingly, each spectrogram is limited in the frequency domain to the region between 0 and 2000 Hz, as this region captures over 90% of the frequency data for every audio file. Each spectrogram is rescaled to be 128 rows by 128 columns in order to promote ease in generating spectrograms during the GAN training phase. 128 units 2 was chosen as it presented a good experimentally observed tradeoff between memory demands and information representation capability. Examples of ground truth spectrograms for the compliant and noncompliant tools are shown in Fig. 4 . 

The best generator and discriminator hyperparameters found for this experiment are presented in Table 2 and Table 3 . These parameters were determined through trial and error for the presented study, which indicates one of the directions of future research. The GAN presented here is written using the Tensorflow software package and runs on a Tesla K80 GPU.

The generator receives a 100-element noise vector, which is passed through a fully-connected "dense" layer before seeing a series of batch normalizations [31] , Leaky ReLU activations [32] , and 2D Convolution Transpose [33] operations, respectively. These operations normalize weights for more expedient training, cull negative-valued neurons to a fraction of their original magnitude, and upsample small images to larger images, respectively.

The dropout layer randomly switches off a set percentage of neurons in the preceding layer (10% for the generator) such that the network does not overfit to the training data it is fed [34] . Further details and examples are provided in the referenced papers. The discriminator receives a 128 x 128 spectrogram and subsequently convolves that image with 16 and then 32 kernels, respectively, before flattening the 32 resultant feature maps into a single vector for probability generation, as described in Section 2.1. The dropout rate for both dropout layers in the discriminator is 20%.

The GAN is trained for 250 epochs on 800 compliant tool spectrograms. At the end of each epoch, the generator provides 400 spectrograms which are combined with 400 compliant spectrograms before being sent to the discriminator. The discriminator then classifies the 800 combined images as either real or generator-made and evaluation accuracy is then determined by comparing discriminator predictions to labels. The discriminator trains and stabilizes to an average 92% discrimination accuracy vs. the generator across 10 reinitializations with a standard deviation of 5%. The network correctly identifies 95% of real spectrograms as being real and 89% of generator-made spectrograms as being generator-made. Stabilization at this threshold represents an equilibrium between D and G, and thus a stabilization of the value function V from Eq. (1). Images from the trained generator are shown in Fig. 5 . The generated images have captured large-scale spatial structures of compliant tool audio as shown in Fig. 4 . The images do however struggle to replicate fine detail of the compliant tool images. Even with weak image generation, the generator performs the anomaly detection task well. Initialization data as well as final epoch data for 400 mixed-class validation spectrograms is shown in Fig. 6 through Fig. 9 . As shown in Fig. 6 , the final Euclidian loss distributions for compliant and noncompliant tools are more separated after the latent space search than before and misclassification error probability drops from 33.93% to 9.44% because of the GAN methodology. Confusion matrices for these boundaries are shown in Fig. 7 . Additionally, the area under the receiver operating characteristic (ROC) curve for the method is improved over the pre-search curve; this is also true for the precision-recall curve (PRC.) This gain in area demonstrates the classification performance improvements gained because of the latent space search. The curves are shown in Fig. 8 . The GAN-based anomaly detection methodology results in classification accuracy improvements in all data classes.

Aggregate compliant tool classification accuracy increases by 12.34% and aggregate noncompliant tool classification accuracy increases by 13.13%. These improvements are shown in Fig. 9 . Fig. 9 . Classification performance before and after latent space search

It has been shown that a GAN-based system of anomaly detection is capable of 90.56% accurate tool wear detection in a milling TCM application using acoustic signals measured during the milling process. Additionally, it has been demonstrated that the likelihood of a noncompliant tool being misclassified as compliant is reduced from 16.65% to 4.52% on the experimental dataset used, demonstrating the performance of the developed technique. Further, it has been shown that a GAN-based tool wear detector is capable of transforming data to be linearly separable, making the technique easy to use for real-world applications. Most importantly, all of this has been accomplished by training the aforementioned GAN on only normal acoustic signals of compliant cutting tools, eliminating the need to run a machine under suboptimal conditions in order to create data that resembles that from a noncompliant tool in order to train the network. As a result, the presented study represents a novel contribution to the literature of using acoustic TCM in milling operations and further demonstrates the power of GANs as a new approach towards smart manufacturing.

In order to further investigate the GAN-based tool wear detection technique, future research will address the following:

• Vary wear threshold by changing training data and determine binary classification ability as a function of wear condition. • Including more tool wear classes in order to develop a multiclass condition model. • Further investigating if a one-dimensional convolutional neural network would provide similar results to that from a two-dimensional network, in a systematic fashion. • Further investigating if other machine learning methods like one-class support vector machines (OC-SVM) and Bayesian convolutional networks outperform the GAN-based approach for acoustic TCM. • Hyperparameter tuning, as is commonly performed, is based on a trial-and-error approach. There has been reported new research into methods of GAN construction and training, e.g.

[35] [36] , and future work investigate such methods. These will reduce generator and discriminator variability over repeated reinitializations as well as automate hyperparameter optimization. • The generator described herein performs weakly on generating realistic recreations of spectrograms, but still performs strongly on anomaly detection. This discrepancy clearly merits significant future work. • Further investigating if the presented TCM methodology is extendable to general milling process anomaly detection

The field of machine learning has been quickly evolving, providing new tools for manufacturing research. At the same time, it is raising as many interesting questions as it provides answers, such as network transparency, link to physics, etc. The success demonstrated in the presented acoustic TCM case study motivates future research on acoustics-based process and product quality control online, and in real-time.

A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM

A novel convolutional neural network based fault recognition method via image fusion of multi-vibration-signals

Adversarial adaptive 1-D convolutional neural networks for bearing fault diagnosis under varying working condition

ASM1D-GAN: An Intelligent Fault Diagnosis Method Based on Assembled 1D Convolutional Neural Network and Generative Adversarial Networks

Tool wear classification using time series imaging and deep learning

WaveletAE: A Wavelet-enhanced Autoencoder for Wind Turbine Blade Icing Detection

Bearing Condition Monitoring Methods for Electric Machines: A General Review

Friction and Lubrication in Metal Forming

Realtime tool wear monitoring in milling using a cutting Author name / Procedia Manufacturing 00

condition independent method

Fuzzy logic based tool condition monitoring for end-milling

Current rise criterion: a process-independent method for tool-condition monitoring and prognostics

Estimation of tool wear during CNC milling using neural networkbased sensor fusion

ACOUSTIC SIGNAL ANALYSIS FOR PREDICTION OF FLANK WEAR DURING CONVENTIONAL MILLING

Convolutional neural network-based tool condition monitoring in vertical milling operations using acoustic signals

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Transfer Learning from Audio-Visual Grounding to Speech Recognition

LSTM Time and Frequency Recurrence for Automatic Speech Recognition

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling

Adversarially Learned Anomaly Detection

One-Class Convolutional Neural Network

Gaussian Mixture Models

Generative Adversarial Nets

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Adversarial Audio Synthesis

Adam: A Method for Stochastic Optimization

More is Different

Anomaly Detection : A Survey

Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery

Investigation of the feasibility of using microphone arrays in monitoring machining conditions

A Comparison of 1-D and 2-D Deep Convolutional Neural Networks in ECG Classification

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Empirical Evaluation of Rectified Activations in Convolutional Network

Non-Local Color Image Denoising With Convolutional Neural Networks

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Improved Techniques for Training GANs

Stabilizing Training of Generative Adversarial Networks through Regularization

1D Convolutional Neural Networks and Applications: A Survey

Least Squares Support Vector Machine Classifiers