key: cord-0028732-fuktmp5p authors: Malhotra, Priyanka; Gupta, Sheifali; Koundal, Deepika; Zaguia, Atef; Enbeyle, Wegayehu title: Deep Neural Networks for Medical Image Segmentation date: 2022-03-10 journal: J Healthc Eng DOI: 10.1155/2022/9580991 sha: 8b3c307f5c7519deebd501567fccb45bde4123d5 doc_id: 28732 cord_uid: fuktmp5p Image segmentation is a branch of digital image processing which has numerous applications in the field of analysis of images, augmented reality, machine vision, and many more. The field of medical image analysis is growing and the segmentation of the organs, diseases, or abnormalities in medical images has become demanding. The segmentation of medical images helps in checking the growth of disease like tumour, controlling the dosage of medicine, and dosage of exposure to radiations. Medical image segmentation is really a challenging task due to the various artefacts present in the images. Recently, deep neural models have shown application in various image segmentation tasks. This significant growth is due to the achievements and high performance of the deep learning strategies. This work presents a review of the literature in the field of medical image segmentation employing deep convolutional neural networks. The paper examines the various widely used medical image datasets, the different metrics used for evaluating the segmentation tasks, and performances of different CNN based networks. In comparison to the existing review and survey papers, the present work also discusses the various challenges in the field of segmentation of medical images and different state-of-the-art solutions available in the literature. Image segmentation involves partitioning an input image into different segments with strong correlation with the region of interest (RoI) in the given image [1, 2] . e aim of medical image segmentation [3] is to represent a given input image in a meaningful form to study the anatomy, identify the region of interest (RoI), measure the volume of tissue to measure the size of tumor, and help in the deciding the dose of medicine, planning of treatment prior to applying radiation therapy, or calculating the radiation dose. Image segmentation helps in analysis of medical images by highlighting the region of interest. Segmentation techniques can be utilized for brain tumor boundary extraction in MRI images, cancer detection in biopsy images, mass segmentation in mammography, detection of borders in coronary angiograms, segmentation of pneumonia affected area in chest X-rays, etc. A number of medical image segmentation algorithms have been developed and are in demand as there is a shortage of expert manpower [4] . e earlier image segmentation models were based on traditional image processing approaches [3, 5] which include thresholding and edge-based and region-based techniques. In thresholding technique, pixels were allocated to different categories in accordance with the range of values where a particular pixel lies. In edge-based technique, a filter was applied to an image; it classifies the pixels as edged or nonedged in accordance with the filter output. In region-based segmentation methods, neighbouring pixels having similar values and the groups of pixels having dissimilar values were split. Medical image segmentation is difficult task due to various restrictions inflict by the medical image procurement procedure, the type of pathology, and different biological variations [6] . e analysis of medical images can be done by experts and there is a shortage of medical imaging experts [7] . In the last few years, deep learning networks had contributed to the development of newer image segmentation models with improvement in performance. e deep neural networks had achieved high accuracy rates on different popular datasets. e image segmentation techniques can be broadly classified as semantic segmentation and instance segmentation. Semantic segmentation can be considered as a problem of classifying pixels. In this segmentation technique, each pixel in the image is labelled to a certain class. Instance segmentation detects and delineates each object of interest present in the input image. e present work covers the recent literature in medical image segmentation. e work provides a review on different deep learning-based image segmentation models and explains their architecture. Many authors have worked on the review of medical image segmentation task. Table 1 gives the description of few review papers utilizing deep CNN in the field of medical image segmentation. All the aforementioned survey literatures discuss the various deep neural networks. is survey paper does not only focus on summarizing the different deep learning approaches but also provides an insight into the different medical image datasets used for training deep neural networks and also explains the metrics used for evaluating the performance of a model. e present work also discusses the various challenges faced by DL based image segmentation models and their state-of-the-art solutions. e paper has several contributions which are as follows: Firstly, the present study provides an overview of the current state of the deep neural network structures utilized for medical image segmentation with their strengths and weaknesses Secondly, the paper describes the publicly available medical image segmentation datasets irdly, it presents the various performance metrics employed for evaluating the deep learning segmentation models Finally, the paper also gives an insight into the major challenges faced in the field of image segmentation and their state-of-the-art solutions e organization of the rest of the paper is given in Deep learning is the most essential approach to artificial intelligence. Deep learning algorithm uses various layers to construct an artificial neural network. An artificial neural network (ANN) consists of [52] input layer, hidden layer(s), and output layer. e input layer of the network receives the signal, an output layer makes decision regarding the input, and between the input and output layers there are hidden layers which perform computations (shown in Figure 1 ). A deep neural network consists of many hidden layers between input and output layers. is section provides a review of different deep learning neural networks employed for image segmentation task. e different deep neural network structures generally employed for image segmentation can be grouped as shown in Figure 2 . A convolutional neural network or CNN (see Figure 3) consists of a stack of three main neural layers: convolutional layer, pooling layer, and fully connected layer [52, 53] . Each layer has its own role. e convolution layer detects distinct features like edges or other visual elements in an image. Convolution layer performs mathematical operation of multiplication of local neighbours of an image pixel with kernels. CNN uses different kernels for convolving the given image for generating its feature maps. Pooling layer reduces the spatial (width, height) dimensions of the input data for the next layers of neural network. It does not change the depth of the data. is operation is called as subsampling. is size reduction decreases the computational requirements for upcoming layers. e fully connected layers perform high-level reasoning in NN. ese layers integrate the various feature responses from the given input image so as to provide the final results. Different CNN models have been reported in the literature, including AlexNet [54] , GoogleNet [55] , VGG [56] , Inception [57] , SequeezeNet [58] , and DenseNet [59] . Here, each network uses different number of convolutions and pooling layers with important process blocks inbetween them. e CNN models have been employed mostly for classification task. In [60] , SqueezeNet and GoogleNet have been employed to classify brain MRI images into three different categories. e CNN segmentation models performance is limited by the following: e fully connected layers in CNN cannot manage different input sizes A convolutional neural network with a fully connected layer cannot be employed for object segmentation task, as the presence of number of objects of interest in the image segmentation task is not fixed, so the length of the output layer cannot be constant 2.1.1. Fully Convolutional Network. In fully convolutional network (FCN), only convolutional layers exist. e different existing in CNN architectures can be modified into FCN by converting the last fully connected layer of CNN into a fully convolutional layer. e model designed by [61] can output spatial segmentation map and can have dense pixel-wise prediction from the input image of full size instead of performing patch-wise predictions. e model uses skip connections which perform upsampling on feature maps from final layer and fuses it with the feature map of It is not fast for real time inference and it does not consider the global context information efficiently. In FCN, the resolution of the feature maps generated at the output is downsampled due to propagation through alternate convolution and pooling layers. is results in low resolution predictions in FCN with fuzziness in object boundaries. An advanced FCN called ParseNet [63] has been also reported; it utilises global average pooling to attain global context. e approaches incorporating models such as conditional random fields and Markov random field into DL architecture have been also reported. Encoder-decoder based models employ two-stage model to map data points from the input domain to the output domain. e encoder stage compresses the given input, x to latent space representation, while the decoder predicts the output from this representation. e different types of encoder-decoders based models generally employed for medical image segmentation are discussed as follows: 2.2.1. U-Net. U-Net model [64] has a downsampling and upsampling part. e downsampling section with FCN like architecture extracts features using 3 × 3 convolutions to capture context. e upsampling part performs deconvolution to decrease the number of computed feature maps. e feature maps generated by downsampling or contracting part are fed as input to upsampling part so as to avoid any loss of information. e symmetric upsampling part provides precise localization. e model generates a segmentation map which categorizes each pixel present in the image. e U-Net model offers the following advantages: U-Net model can perform efficient segmentation of images using limited number of labelled training images U-Net architecture combines the location information obtained from the downsampling path and the contextual information obtained from upsampling path to predict a fair segmentation map U-Net models also have few limitations, stated as follows: Input image size is limited to 572 × 572 In the middle layers of deeper UNET models, the learning generally slows down which causes the network to ignore the layers with abstract features e skip connections of the model impose a restrictive fusion scheme which causes accumulation of the same scale feature maps of the encoder and decoder networks To overcome these limitations, the different variants of U-Net architecture have been proposed in the literature: U-Net++ [65] , Attention U-Net [66] , and SD-UNet [67] . It is also an FCN-based model employed for medical image segmentation [68] . VNet architecture has two parts, compression and decompression network. e compression network comprises convolution layers at each stage with residual function. ese convolution layers utilized volumetric kernels. e decompression network extracts feature and expands the spatial representation of low resolution feature maps. It gives two-channel probabilistic segmentation for both foreground and background regions. 2.3. Regional Convolutional Network. Regional convolutional network has been utilized for object detection and segmentation task. e R-CNN architecture presented in [69] generates region proposal network for bounding boxes using selective search process. ese region proposals are then warped to standard squares and are forwarded to a CNN so as to generate feature vector map as output. e output dense layer consists of features extracted from the image and these features are then fed to classification algorithm so as to classify the objects lying within the region proposal network. e algorithm also predicts the offset values for increasing the precision level of the region proposal or bounding box. e processes performed in R-CNN architecture are shown in Figure 4 . e use of basic RCN model is restricted due to the following: It cannot be implemented in real time as it takes around 47 seconds to train the network for classification task of 2000 region proposals in a test image. e selective search algorithm is a predetermined algorithm. erefore, learning does not take place at that stage. is could lead to the generation of unfavourable candidate region proposals. To overcome these drawbacks, different variants of R-CNN, fast R-CNN, faster R-CNN, and mask R-CNN have been proposed in the literature. In R-CNN, the proposed regions of image overlap and same CNN computations are carried again and again. e fast R-CNN reported by [70] is fed with an input image and a set of object proposals. e CNN then generates convolutional feature maps. After that, the ROI pooling layer reshapes each object proposal into a feature vector of fixed size. e feature vectors are sent to the last fully connected layers of the model. At the end, the computed ROI feature vector is fed to Softmax layer for predicting the class and offset values of the proposed region [71] . e fast R-CNN is slower due to the use of selective search algorithm. In R-CNN and fast R-CNN, the proposed regions were created using a process of selective search and were a slow process. So, in faster R-CNN architecture given by [72] , a single convolutional network was deployed to carry out both region proposals and classification task. e model employs a region proposal network (RPN), passing the sliding window on the top of the entire CNN feature map. For each window, it outputs K different potential boundary boxes with their respective scores representing position of object. ese bounding boxes fed to fast R-CNN generate the precise classification boxes. He et al. in [73] extended faster R-CNN to present Mask R-CNN for instance segmentation. e model can detect objects in a given image and generates a high-quality segmentation mask for each object in an image. It uses RoI-Align layer to conserve the exact spatial locations of the given image. e region proposal network (RPN) generated multiple RoIs using a CNN. e RoI-Align network generates multiple bounding boxes which are warped into fixed dimensions. e warped features computed in the previous step are fed to fully connected layer so as to create classification using softmax layer. e model has three output branches with one branch computing bounding box coordinates, second branch determining associated classes, and the last branch evaluating the binary mask for each RoI. e model trains all the branches jointly. e bounded boxes are improved by employing regression model. e mask classifier outputs a binary mask for each RoI. DeepLab model employs pretrained CNN model ResNet-101/VGG-16 with atrous convolution to extract the features from an image [74] . e use of atrous convolutions gives the following benefits: It controls the resolution of feature responses in CNNs It converts image classification network into a dense feature extractor without the requirement of learning of any more parameters employs conditional random field (CRF) to produce fine segmented output e various variants of DeepLab have been proposed in the literature including DeepLabv1, DeepLabv2, DeepLabv3, and DeepLabv3+. In DeepLabv1 [75] , the input image is passed through deep CNN layer with one or two atrous convolution layers (see Figure 5 ). is generates a coarse feature map. e feature map is then upsampled to the size of original image by using bilinear interpolation process. e interpolated data is applied to fully connect conditional random field to obtain the final segmented image. In DeepLabv2 model, multiple atrous convolutions are applied to input feature map at different dilation rates. e outputs are fused together. Atrous spatial pyramid pooling (ASPP) segments the objects at different scales. e ResNet model used the atrous convolution with different rates of dilation. By using atrous convolution, information from large effective field can be captured with reduced number of parameters and computational complexity. DeepLabv3 [20] is an extension of DeepLabv2 with added image level features to the atrous spatial pyramid pooling (ASPP) module. It also utilises batch normalization so as to easily train the network. DeepLabv3+ model combines the ASPP module of DeepLabv3 with encoder and Journal of Healthcare Engineering 5 decoder structure. e model uses Xception model for feature extraction. e model also employed atrous and depth-wise separable convolution to compute faster. e decoder section merges the low-and the high-level features which correspond to the structural details and semantic information. DeepLabv3+ [76] consists of an encoding and a decoding module. e encoding path extracts the required information from the input image using atrous convolution and backbone network like MobileNetv2, PNASNet, ResNet, and Xception. e decoding path rebuilds the output with relevant dimensions using the information from the encoder path. e different deep neural networks discussed in the above sections are employed for different applications. Each model has its own advantages and limitations. Table 3 gives a brief comparison between different deep learning-based image segmentation algorithms. Deep learning networks had contributed to various applications like image recognition and classification, object detection, image segmentation, and computer vision. A block diagram representing deep learning-based system is given in Figure 5 . e first step in deep learning system consists of collecting data [77] . e collected data is then analyzed and preprocessed to be available in the format acceptable to the next block. e preprocessed data is further divided into training, validation, and testing dataset. A deep neural network-based model is selected and trained. e trained model is tested and evaluated. At the end, the analysis of the complete designed system is carried out. is basic layout of deep learning models (shown in Figure 6 ) is employed in various medical applications [78] including image segmentation. In image segmentation, the objects in image are subdivided. e aim of medical image segmentation is to identify region of interest (RoI) like tumor and lesion. e automatic segmentation of the medical images is really a difficult task because medical images are usually complex in nature due to presence of different artifacts, inhomogeneity in intensity, etc. Different deep learning models have been proposed in the literature. e choice of a particular deep learning model depends on various factors like body part to be segmented, imaging modality employed, and type of disease as different body parts and ailments have different requirements. A 2D and 3D CNN based fully automated framework have been presented by [15] to segment cardiac MR images into left and right ventricular cavities and myocardium. e authors in [18] designed a deep CNN with layers performing convolution, pooling, normalization, and others to segment brain tissues in MR images. Christ et al. in [30] presented a design in which two cascaded FCN were employed to segment liver and further the lesions within ROI were segmented. e final segmentation was produced by dense 3D conditional random field. Hamidian et al. in [25] converted 3D CNN with fixed field of view into a 3D FCN and generated the score map for the complete volume of CT images in one go. e authors employed the designed network for segmentation of pulmonary nodules in chest CT images. e authors concluded that by employing FCN speed of the network increases and there is fast generation of output scores. In [32] , authors employed FCN for liver segmentation in CT images. In [27] , authors proposed a fully convolution spatial and channel squeeze ad excitation module for segmentation of pneumothorax in chest X-ray images. Gordienko et al. [26] reported a U-Net based CNN for segmentation of lungs and bone shadow exclusion techniques on 2D CXRs images. Zhang et al. in [19] designed SDRes U-Net model, which embedded the dilated and separable convolution into residual U-Net architecture. e network was employed for segmenting brain tumor present in MR images. In [33] , the authors proposed the use of Multi-ResUNet architecture for segmentation. e authors concluded that the use of Multi-ResUNet model generates better results in lesser number of training epochs as compared to the standard U-Net model. In [29] , the authors segmented pneumothorax on CT images. e authors compared the performance of U-Net model with PSPNet. Ferreira [17] employed U-Net model to automatically segment heart in the short-axis DT-CMR images. e authors in [68] further designed a FCN network for segmenting 3D MRI volumes and employed a VNet based network to segment prostate in MRI images. Poudel et al. in [16] developed a recurrent fully convolutional network (RFCN) to detect and segment body organ. e given design ensures fully automatic segmentation of heart in cardiac MR images. e authors concluded that the RFCN architecture reduces the computational time, simplifies segmentation pipeline, and also enables real time application. Mulay et al. in [31] presented a nested edge detection and Mask R-CNN network for segmentation of liver in CT and MR images. e input images were firstly preprocessed by applying image enhancement so as to produce the sketch of the abdomen area. e network enhances input images for edge map. At last, the authors employed Mask R-CNN for segmenting liver from the edge maps. In [28] , authors designed a CheXLocNet based on Mask R-CNN to segment area of pneumothorax from chest radiographs. In [22] , authors suggested a recurrent neural network utilizing multidimensional LSTM. e authors arranged the computations in pyramidal fashion. e authors had shown It is a large model with number of parameters to train. So, while training on higher resolution images and batch sizes, it needs large GPU memory. that the PyraMiD-LSTM design can parallelize for 3D data and utilized the design for pixel-wise segmentation of MR images of brain. Table 4 summarizes the different DL based models employed for segmentation in medical images. Data is important in deep learning models. Deep learning models require large amount of data. e data plays an important role. It is difficult to collect the medical image data as there are data privacy rules governing collection and labelling of data and also it requires time-consuming explanation to be performed by experts [79] . e medical image datasets can be categorized into three different categories: 2D images, 2.5D images, and 3D images [2] . In 2D medical images, each information element in image is called pixels. In 3D medical images, each element is called voxel. 2.5D refers to RGB images. e 3D images are also sometimes represented as a sequential series of 2D slices. CT, MR, PET, and ultrasound pixels represent 3D voxels. e images may exist in JPEG, PNG, or DICOM format. e medical imaging is performed in different types of modalities [2] , such as CT scan, ultrasound, MRI, mammograms, positron emission tomography (PET), and X-ray of different body parts. MR imaging allows achieving variable contrast image by employing different pulse sequences. MR imaging gives the internal structure of chest, liver, brain, pelvis, abdomen, etc. CT imaging uses X-rays to obtain the information about the structure and function of the body parts. CT imaging is used for diagnosis of disease in brain, abdomen, liver, pelvis, chest, spine, and CT based angiography. Figure 7 shows MRI and CT image of brain. Mammography is a technique that uses X-rays to capture the images of the internal structure of the breast. Chest X-rays (CXR) imaging is a photographic image depicting internal composition of chest which is produced by passing X-rays through the chest and these rays are being absorbed by different amounts of different components in the chest [31] . e important publicly available medical image datasets are summarized in Table 5 . A metric helps in evaluating the performance of any designed model. e metrics provide the accuracy of the designed model. e popular metrics employed for assessing effectiveness of any designed segmentation algorithm are represented in terms of the following [80] : True positive (TP) represents that both the actual data class and the class of predicted data are true. True negative (TN) represents that both the actual data class and the class of predicted data are false. False positive (FP) represents that the actual data class is false while the class of predicted data is true. False negative (FN) represents that the actual data class is true while the class of predicted data is false. Precision is an evaluation metric that tells us about the proportion of input data cases that are reported to be true and represented in [81] . Precision � TP TP + FP . (1) (2) gives the percentage of the total relevant results which had been correctly classified by the model [81] . Recall � TP TP + FN . (2) 5.3. F1 Score. F1 score tells about models accuracy as represented in the following equation. It is defined as the harmonic average of the precision and recall values [81] : It gives the percentage of pixels in a given input image which are correctly classified by the model [82] : Pixel accuracy � no. of pixels properly classified total number of pixels . (4) Intersection over union (IoU) or Jaccard index [82] is a metric commonly used for checking the performance of image segmentation algorithm. It is the amount of intersecting area between the predicted image segment and the ground truth mask, divided by the total area of union between the predicted segment mask and the ground truth mask: where A represents ground truth. B represents predicted segmentation. Mean IoU is employed for evaluating modern segmentation algorithm. Mean IoU is the average of IoU for each class. It is defined in the following equation and termed as twice the amount of intersection area between the segment predicted and the ground truth divided by the total number of pixels in both the predicted segment and ground truth image [83] : e medical image segmentation field has gained advantage from deep learning, but still it is a challenging task to employ deep neural networks due to the following. Figure 7 : (a) MR image of brain. (b) CT scan of brain [30] . e different challenges related to the dataset include the following: Limited Annotated Dataset. Deep learning network models require large amount of data. e data required for training is well annotated. e dataset plays an important role in various DL based medical procedures [84] . In medical image processing, the collection of large amounts of annotated medical images is tough [85] . Also, performing annotation on fresh medical images is tedious and expensive and requires expertise. Several large-scale datasets are publicly available. A list of few such datasets is provided in Table 2 . ere is still a need of more challenging datasets which can enable better training of DL models and are capable of handling dense objects. Typically, the existing 3D datasets [86] are not so large and few of them are synthetic, so more challenging datasets are required. e size of the existing medical image datasets can be increased by (a) application of image augmentation transformations like rotating image by different angles, flipping image vertically or horizontally, cropping, and shearing image. ese augmentation techniques can boost the system performance. (b) e application of transfer learning from efficient models can provide solution to the problem of limited data [87] . (c) Finally comes synthesizing data collected from various sources [87] . Class Imbalance in Datasets. Class imbalance is intrinsic in various publicly available medical image datasets. A highly imbalanced data poses great difficulty in training DL model and makes model accuracy misleading, for example, in a patient data, where the disease is relatively rare and occurs only in 10% of patients screened. e overall designed model accuracy would be high as most of the patients do not have the disease and will reach local minima [88, 89] . e problem of class imbalance can be solved by (a) oversampling the data; the amount of oversampling depends on the extent of imbalance in the dataset. (b) Second, by changing the evaluation or performance metric, the problem of dataset imbalance can be handled. (c) Data augmentation techniques can be applied to create new data samples. (d) By combining minority classes, dataset class imbalance problem can also be handled. Sparse Annotations. Providing full annotation for 3D images is a time-consuming task and is not always possible. So, partial labelling of information slices in 3D images is done. It is really challenging to train DL model based on these sparsely annotated 3D images [85] . In case of sparsely annotated dataset, weighted loss function can be applied to the dataset. e weights for the unlabeled data in the available dataset are all set to zero, so as to learn only from the pixels which are labelled. Intensity Inhomogeneities. In pathology images, colour and intensity inhomogeneities [90] are common. Intensity inhomogeneities cause shading over the image. It is more specific in the segmentation of MR images. Also, the TEM images have brightness variations due to presence of nonuniform support films. e segmentation process becomes tedious due to these variations. For correcting intensity inhomogeneities [90] , different algorithms are employed and many nonparametric techniques are proposed in the literature. Prefiltering operation can be employed before segmentation to remove inhomogeneities. Also, intensity inhomogeneities are taken care of by improvement in scanning devices. Complexities in Image Texture. In medical images, there may be different artifacts present during manipulation of images. e different sensors and electronic components used for capturing images create noise in the image [11, 91] . In the captured image, gray levels can be very close to each other and there may be weak image boundaries. ere may be overlap in tissues and presence of irregularities like skin lines and hair in dermoscopic images. All these complexities cause difficulty in identification of region of interest in medical images. To remove different artefacts and noises from the image, different image enhancement techniques are used before segmentation. e image enhancement technique suppresses the noise in the image and preserves the integrity of the edges of the image. e important challenging issues related to the training of DNN for robust segmentation of the medical images are as follows: Overfitting the Model. Overfitting of the model refers to the instance when the model learn the details and regularities in training dataset with high accuracy compared with the unprocessed data instance. It mainly occurs while training the model with a small size training data [9] . Overfitting can be handled [88] by (a) increasing the size of dataset by applying augmentation techniques. (b) Dropout techniques [92] also help in handling overfitting by discarding the output of some of the random set of network neurons during each iteration. Memory Efficient Models. Medical image segmentation models require large amount of memory [93] . In order to make these models compatible with certain devices like mobile phones, the models are required to be simplified. Simpler models and model compression techniques can reduce memory requirements for a DL model. Training Time. e training of deep neural network architecture needs time. In image segmentation, fast convergence of training time for deep NN is required. e solution to this problem is (a) application of batch normalization [93] . It refers to locating the pixel values around 0 by subtracting the pixel values from the mean value of the image. It is effective in providing fast convergence. (b) Also, adding pooling layers to reduce dimension of parameters can also provide faster convergence. Vanishing Gradient. Deep neural network faces the problem of vanishing gradient [94] . It occurs as the final gradient loss is not able to be backpropagated to earlier layers. e vanishing gradient problem is more pronounced in 3D models. ere are several solutions to the problem of gradient vanishing. (a) By upscaling the intermediate hidden layer output using deconvolution and softmax [91] , the auxiliary losses and the original loss of hidden layers are combined to strengthen the gradient value. (b) Also, by carefully initializing weights [95] , for the network, we can combat the problem of vanishing gradient. Computational Complexity. Deep learning algorithm performing feature analysis needs to operate at a high level of computational efficiency. ese algorithms need high performance computing devices and GPU [96] . Some of the top algorithms may require supercomputers for training the model, which may not be available. To combat these issues, the researcher has to consider the specific number of parameters to attain a limited level of accuracy. e image segmentation techniques have come far away from manual image segmentation to automated segmentation using machine learning and deep learning approaches. e ML/DL based approaches can generate segmentation on large set of images. It helps in identification of meaningful objects and diagnosis of diseases in the images. e image segmentation techniques discussed in the paper can be explored by future researchers for application to various datasets. e future work may include a comparative study of the different existing deep learning models discussed in the paper on the publicly available datasets. Also, different combination of layers and classifiers can be explored to improve the accuracy of image segmentation model. ere is still a requirement of an efficient solution to improve performance of image segmentation model. So, the various new deep learning model designs can be explored by future researchers. Deep learning-based automated diagnosis of diseases from medical images had become the latest area of research. In the present work, we had summarized the most popular DL based models employed for segmentation of medical images with their underlined advantages and disadvantages. An overview of the different medical image dataset employed for segmentation of diseases and the various performance metrics utilized for evaluating the performance of image segmentation algorithm is also provided. e paper also investigates the different challenges faced in segmentation of medical images using the deep networks and discusses the different state-of-the-art solutions to overcome these challenges. With advances in technology, deep learning plays a very important role in segmentation of images. e different studies reviewed in Section 3 confirm that applications of deep neural networks in medical image segmentation task outperform the traditional image segmentation techniques. e present work will help the researchers in designing neural network architectures in the medical field for diagnosis of disease. Also, the researchers will become aware with the possible challenges in the field of deep learningbased medical image segmentation and the state-of-the-art solutions. is review paper provides the reference material and the valuable research in the area of medical image segmentation [97] . No data were used to support this study. e authors declare that there are no conflicts of interest regarding the publication of this paper. A survey on neutrosophic medical image segmentation A survey of semantic segmentation Automated medical image segmentation techniques National survey to identify subspecialties at risk for physician shortages in Canadian academic radiology departments Image segmentation algorithms overview Interaction in the segmentation of medical images: a survey Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists A survey on deep learning in medical image analysis Deep learning in medical image analysis Medical image analysis using convolutional neural networks: a review Deep learning techniques for medical image segmentation: achievements and challenges Medical image segmentation using deep learning: a survey A review of deeplearning-based medical image segmentation methods Fuzzy logic in surveillance big video data analysis: comprehensive review, challenges, and research directions An exploration of 2D and 3D deep learning techniques for cardiac MR image segmentation Recurrent fully convolutional neural networks for multi-slice MRI cardiac segmentation," in Reconstruction, segmentation, and analysis of medical images Automating in vivo cardiac diffusion tensor postprocessing with deep learning-based segmentation Deep convolutional neural networks for multi-modality isointense infant brain image segmentation SDResU-Net: separable and dilated residual U-net for MRI brain tumor segmentation Rethinking atrous convolution for semantic image segmentation Deep neural networks segment neuronal membranes in electron microscopy images Parallel multi-dimensional lstm, with application to fast biomedical volumetric image segmentation Interactive medical image segmentation using deep learning with image-specific fine tuning Segmentation of brain lesions from CT images based on deep learning techniques 3D convolutional neural network for automatic detection of lung nodules in chest CT Deep learning with lung segmentation and bone shadow exclusion techniques for chest X-ray analysis of lung cancer Automated segmentation and diagnosis of pneumothorax on chest X-rays with fully convolutional multi-scale ScSE-DenseNet: a retrospective study CheXLocNet: automatic localization of pneumothorax in chest radiographs using deep convolutional neural networks Pneumothorax segmentation in routine computed tomography based on deep neural networks Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks Liver segmentation from multimodal images using HED-mask R-CNN 3D deeply supervised network for automatic liver segmentation from ct volumes MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation Pancreas segmentation in CT and MRI images via domain specific network designing and recurrent neural contextual learning Deep learning and structured prediction for the segmentation of mass in mammograms Impact of image enhancement technique on CNN model for retinal blood vessels segmentation Automatic retinal blood vessel segmentation based on fully convolutional neural networks Fully convolutional network with hypercolumn features for brain tumor segmentation Segmentation of knee images: a grand challenge Recurrent residual U-Net for medical image segmentation SegTHOR: segmentation of thoracic organs at risk in CT images e KiTS19 challenge data: 300 kidney tumor cases with clinical context Efficient and generalizable statistical models of shape and appearance for analysis of cardiac MRI Computer aided diagnosis of pneumonia from chest radiographs A survey of deep learning and its applications: a new paradigm to machine learning Imagenet classification with deep convolutional neural networks Xception: deep learning with depthwise separable convolutions Very deep convolutional networks for large-scale image recognition Rethinking the inception architecture for computer vision SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size Densely connected convolutional networks A comparative analysis of efficient CNN-based brain tumor classification models Fully convolutional networks for semantic segmentation A review of semantic segmentation using deep neural networks Parsenet: looking wider to see better U-net: convolutional networks for biomedical image segmentation Pulmonary vessel segmentation based on orthogonal fused U-Net++ of chest CT images RA-UNet: a hybrid deep attention-aware network to extract liver and tumor in CT scans SD-Unet: a structured Dropout U-net for retinal vessel segmentation V-net: fully convolutional neural networks for volumetric medical image segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Fast r-cnn Object detection techniques: a comparison Faster r-cnn: towards real-time object detection with region proposal networks Mask r-cnn Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs Encoder-decoder with atrous separable convolution for semantic image segmentation Fire segmentation using a DeepLabv3+ architecture A survey of the recent architectures of deep convolutional neural networks Identifying medical diagnoses and treatable diseases by image-based deep learning Deep convolutional neural network based medical image classification for disease diagnosis Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset Performance analysis and comparison of machine and deep learning algorithms for IoT data classification Fully convolutional networks for semantic segmentation Optimizing the Dice score and Jaccard index for medical image segmentation: theory and practice Probabilistic deep Q network for real-time path planning in censorious robotic procedures using force sensors 3D U-Net: learning dense volumetric segmentation from sparse annotation Retrospective geometric correlation of MR, CT, and PET images Deep learning and medical image processing for coronavirus (COVID-19) pandemic: a survey Dense volume-to-volume vascular boundary detection Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset Intensity inhomogeneity correction of magnetic resonance images using patches Neutrosophic sets in dermoscopic medical image segmentation Geeps: scalable deep learning on distributed gpus with a gpuspecialized parameter server Dropout: a simple way to prevent neural networks from overfitting Batch normalization: accelerating deep network training by reducing internal covariate shift Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation Challenges and recent solutions, for image segmentation in the era of deep learning Computer-aided diagnosis in chest radiography: a survey Supporting Project (number TURSP-2020/114), Taif University, Taif, Saudi Arabia.