key: cord-0760416-caaeiqek authors: Sujatha, D.; Subramaniam, M.; Rene Robin, Chinnanadar Ramachandran title: A new design of multimedia big data retrieval enabled by deep feature learning and Adaptive Semantic Similarity Function date: 2022-02-05 journal: Multimed Syst DOI: 10.1007/s00530-022-00897-8 sha: 36d8ede387e74e56053e7955ffc9643e913a1188 doc_id: 760416 cord_uid: caaeiqek Nowadays, multimedia big data have grown exponentially in diverse applications like social networks, transportation, health, and e-commerce, etc. Accessing preferred data in large-scale datasets needs efficient and sophisticated retrieval approaches. Multimedia big data consists of the most significant features with different types of data. Even though the multimedia supports various data formats with corresponding storage frameworks, similar semantic information is expressed by the multimedia. The overlap of semantic features is most efficient for theory and research related to semantic memory. Correspondingly, in recent years, deep multimodal hashing gets more attention owing to the efficient performance of huge-scale multimedia retrieval applications. On the other hand, the deep multimodal hashing has limited efforts for exploring the complex multilevel semantic structure. The main intention of this proposal is to develop enhanced deep multimedia big data retrieval with the Adaptive Semantic Similarity Function (A-SSF). The proposed model of this research covers several phases “(a) Data collection, (b) deep feature extraction, (c) semantic feature selection and (d) adaptive similarity function for retrieval. The two main processes of multimedia big data retrieval are training and testing. Once after collecting the dataset involved with video, text, images, and audio, the training phase starts. Here, the deep semantic feature extraction is performed by the Convolutional Neural Network (CNN), which is again subjected to the semantic feature selection process by the new hybrid algorithm termed Spider Monkey-Deer Hunting Optimization Algorithm (SM-DHOA). The final optimal semantic features are stored in the feature library. During testing, selected semantic features are added to the map-reduce framework in the Hadoop environment for handling the big data, thus ensuring the proper big data distribution. Here, the main contribution termed A-SSF is introduced to compute the correlation between the multimedia semantics of the testing data and training data, thus retrieving the data with minimum similarity. Extensive experiments on benchmark multimodal datasets demonstrate that the proposed method can outperform the state-of-the-art performance for all types of data. In recent years, the emerging growth in multimedia data has been focused on different applications such as social networks, transportation, health, e-commerce, etc. Data diversity has become one of the major components of multimedia big data [9] , where multimedia documents suffer from variations in the storage structures and data formats to show the equivalent semantic information [10] . Thus, retrieving and managing multimedia documents reflect the intent of users in heterogeneous big data platforms, which is considered a major problem. The efficient and sophisticated retrieval approaches need accessing of specific data in huge scale datasets [11] . Moreover, the major component for showing Communicated by A. Liu. the large-scale data produced in a faster manner is big data owing to the advancements in cloud computing algorithms and technologies, storage, communication, and sensing [12] . However, this gigantic data has more number of problems in industries and businesses. Consequently, it offers more possibilities for eminent future growth by considering the efficient use of data for evaluation [13] . Effective and reliable access is offered for suitable data in huge scale image repositories due to the complexities in their contents that have been analyzed in recent years [18] . Multimedia big data retrieval has more benefits inclusive of switching a distributed pattern for storing and processing the enormous multimedia contents [14] . Even though multimedia computing solves the computational burden and maintenance issues, it suffers from the storage and processing of multimedia big data. In a big data environment, a huge amount of commodity computers requires more storage capacity and enormous computation power while generating multimedia content [2] . As more applications and services are provided, processed, edited, and retrieved some multimedia documents, the significant multimedia contents with their storage structures and data formats demonstrate similar semantic information [15] . Hence, multimedia big data storage and retrieval are affected by the most common limitation called heterogeneity. Another complexity in the multimedia retrieval process is users' intention. In most of the existing methods, the retrieval process is often restricted to a similar category of multimedia content like images [16] . It minimizes the Quality of Service (QoS) of multimedia retrieval due to the failure in identifying the intention of users' search owing to the lack of type diversity [17] . The semantic feature can be overlapped that plays a crucial role in the research field in terms of semantic memory [24] . In recent years, more research works is carried out in the similarity measurement domain [19] . Different applications employ Euclidean distance [20] as a significant measure. Some of the specific scenarios use Mahalanobis distance [21] as a similarity measurement metric. Though, retrieval systems are impacted by the query conditions through inappropriate expressions. These problems are solved by adopting some deep learning methods, in which fuzzy matching techniques [22] are employed for reflecting the user's intent. On the other hand, these retrieval models do not precisely reflect the users' intentions concerning the incomplete information [23] . Hence, this model proposes a new multimedia big data retrieval model along with the semantic similarity function. The major contributions of the developed model are given here as follows: • To present a new multimedia big data retrieval model with deep semantic feature extraction technique, optimal semantic feature selection, and A-SSF-based retrieval under the map-reduce framework with the aid of a new hybrid meta-heuristic-based algorithm. • To choose the most significant semantic features of multimedia big data using a new hybrid algorithm called SM-DHOA with the objective of correlation maximization among the features. Therefore, this process gets unique and most representative features for further processes. • To develop a new A-SSF-based retrieval with SM-DHOA under the map-reduce framework for the efficient retrieval process. It aims to maximize the correlation and F1-score among the features for retrieving the most suitable data according to the user query. • To introduce a new algorithm called SM-DHOA with the integration of Deer Hunting Optimization Algorithm (DHOA) and Spider Monkey Optimization (SMO) algorithms for selecting the optimal semantic features and to optimize the weight function of features in map-reduce framework to maximize the retrieval performance with high precision and F1-score. • To validate the efficiency of the suggested A-SSF-based retrieval model with different existing approaches in terms of standard performance measures along with statistical analysis. The remaining section of the developed model is given here. The existing works are reviewed in Sect. 2. The proposed multimedia data retrieval using adaptive semantic similarity and deep learning is depicted in Sect. 3. The optimal semantic feature selection and map reducer-based multimedia data retrieval are explained in Sect. 4. The proposed SM-DHOA for enhanced multimedia data retrieval is discussed in Sect. 5. The results are evaluated in Sect. 6. Finally, the concluding statements are given in Sect. 7. In 2015, Guo et al. [1] have developed an economical and effective structure called "Semantic-based Heterogeneous Multimedia Retrieval (SHMR)" to use the low cost for retrieving and storing the semantic information. Initially, this model has addressed individuality in big data environments with heterogeneous multimedia retrieval. Then, a new method was proposed for extracting and representing the semantic information. Moreover, this model has offered the multimedia data in semantic storage using a NoSQL-based technique, which was processed in a parallel way among the distributed nodes. Last, the developed model has achieved good user experience and high retrieval precision through a user feedback-driven approach and MapReduce-based retrieval technique. The suggested model has shown better efficiency in terms of economic efficiency and retrieval performance. In 2018, Ahmad et al. [2] have presented a more effective approach named Bi-Directional Fast Fourier Transform (BD-FFT) to get the condensed binary codes from the high dimensional deep features. It was used for saving the memory and simplified the computation for efficient retrieval. The transformed codes were considered as the hash codes to access the images in the huge scale datasets through Approximate Nearest Neighbor (ANN) search methods. It has achieved better retrieval accuracy based on the shorter length codes, which were experimented on Convolutional Neural Network (CNN) features. The experimentations were carried out on seven different huge scale datasets to demonstrate the efficiency of the suggested retrieval model. In 2020, Xia et al. [3] have developed a deep correlation mining method for developing the multimedia retrieval model using "Levinberg-Marquard algorithm-based Deep Typical Correlation Analysis method (LM-DCCA)". This model has intended to train the diverse medial features based on the suggested algorithm. Then, the correlation among the trained features was attained to compare the features of diverse multimedia data. This method has addressed the problem of local optima problems with the Levenberg-Marquart method. The experimentation was conducted on different datasets to show the efficiency of cross-media retrieval. This model has shown superior retrieval approaches while comparing with other progressed multimedia retrieval approaches. In 2020, Sun et al. [4] have implemented a new graph classification-based semantic similarity model by introducing the feature reduction technique. Here, the suggested technique has initially learned the representations of vectors through neural language systems with subtree patterns. Further, similar subtree patterns were merged semantically to get the novel features. This model has offered a novel feature discrimination score for selecting the most discriminative features. The experimental analysis was conducted on real datasets to show the efficiency of the suggested retrieval method using the FRS-KELM graph classifier. In 2018, Beltran et al. [5] have developed a new multimedia retrieval model using probabilistic Latent Semantic Analysis (pLSA) for solving the problems in big data along with data reduction strategy. This model has aimed to enhance the extraction procedure for Content-Based Multimedia Retrieval (CBMR). The developed system has focused on discussing the result of document reduction than other existing approaches, which were performed using pLSA and Latent Dirichlet Allocation (LDA) in CBMR. The suggested model has used the retrieval performance in CBMR. In 2014, Yilmaz et al. [6] have implemented a "RELIEFbased modality weighting approach termed RELIEF-MM". This method was the modified version of challenges like handling unbalanced datasets, challenges with multi-labeled data and noise, class-specific feature selection, and utilizing the technique through machine learning identifications. It has focused on enhanced weight estimation function using RELIEF-MM to show the reliability and representation abilities of modalities with their discrimination capacity at lesser computational complexity. The experimental analysis was carried out on "TRECVID 2007, TRECVID 2008, and CCV datasets" that have shown the robustness and accuracy of the modality weighting on multimedia data retrieval model. In 2018, Guo et al. [7] have suggested a new heterogeneous multimedia data retrieval model using Semantic Ontology Retrieval (SOR) for storing and retrieving the ontologies by processing the big data tools. Initially, the environment of ontology representation and semantic extraction were solved on multimedia big data. Secondly, SOR was used as the retrieval technique for model definition. Third, MapReduce was used for parallel processing of SOR to propose a new retrieval framework based on distributed nodes. Last, a user feedback method was presented for achieving a better user experience and high retrieval precision. The numerical results have shown that the designed SOR was appropriate for heterogeneous multimedia big data and semantic-based retrieval. In 2019, Liu et al. [8] have implemented the "Deep Hashing with Multilevel Similarity Learning (DHMSL)" to learn the discriminative and compact hash codes to explore the "multilevel semantic similarity correlations of multimedia data." Initially, the unified binary hash codes were learned for exploring the multilevel similarity correlation through the exploitation of semantic label information and the local structure. Consequently, the unified hash codes were made compact by taking the constraints like quantization and bit balance. This model has combined the Deep Neural Networks (DNN) for learning the feature representations using learned unified binary codes. The developed model has minimized the prediction errors among the outputs of the networks and unified binary codes. The simulation results were applied on two extensively used multimodal datasets to show the performance on both text and image-query-image tasks. In 2018, Alghamdi et al. [30] have employed two types of transfer learning techniques to retrain the pre-trained VGG-Net (Fine-tuning and VGG-Net as fixed feature extractor) and obtained two new networks VGG-MI1 and VGG-MI2. In the VGG-MI1 model, the last layer of the VGG-Net model was replaced with a specific layer according to our requirements, and various functions were optimized to reduce overfitting. In 2018, Sedik et al. [31] have suggested two data-augmentation models to enhance learnability of the Convolutional Neural Network (CNN) and the Convolutional Long Short-Term Memory (ConvLSTM)-based deep learning models (DADLMs) and, by doing so, boost the accuracy of COVID-19 detection. In 2018, Sedik et al. [32] have implemented a deep learning model based on CNN and a hybrid model that combined CNN with ConvLSTM to compute a three-tier probability that a biometric has been tempered. Simulation-based experiments have indicated that the alteration detection accuracy matches those recorded in advanced methods with superior performance in terms of detecting central rotation alteration to fingerprints. Multimedia big data are dependable to retrieve semantic information concerning video, text, image, or audio from data sources. The extraction of such information is a major and challenging problem owing to their growth in media. Thus, there is a huge demand for fast multimedia retrieval models with the smallest utility gap. These challenges have occurred with the size and diversity of media in the corresponding field. Numerous approaches are developed in multimedia retrieval that has diverse features and challenges, as given in Table 1 . MapReduce [1] offers better economic efficiency and also reduces investments in hardware, though this model was not able to adapt to a real-time Internet environment and also suffers from slow retrieval speed. BD-FFT [2] increases the retrieval performance and overall efficiency and also improves the retrieval accuracy with deep features. However, this model does not perform on sparse features. LM-DCCA [3] has enhanced the training with a better convergence effect and also improves the overall performance. This model does not permit users for selecting the counter samples based on the user's feedback strategy. FRS-KELM graph classifier [4] attains better classification accuracy and selected the more discriminative features. On the other hand, it is not suitable for applying the developed feature reduction approach in Convolutional Neural Networks (CNN). Priorbased probabilistic Semantic Analysis (PpLSA) [5] improves the efficiency of information reduction and also increases the number of parameters required for extracting the information. Conversely, the developed approach does not support an incremental retrieval. RELIEF-MM [6] guarantees a higher accuracy rate with better performance and also a computationally efficient one. However, this model will noi use some caching strategies that affect efficiency. MapReduce [7] achieves a good user experience and high retrieval precision and it cannot be applied in real-time applications. Multimodal hashing [8] is efficiently performed on multimedia retrieval. If the indiscriminative hash bits control the hase code, then the retrieval performance can be decreased. These challenges can be further used for developing new 3 Proposed multimedia data retrieval using adaptive semantic similarity and deep learning Multimedia big data have emerged in recent years, which offers different mobile technologies and online services. Hence, more researches have been focusing on the multimedia era to target diverse constraints of big data analytics like capturing, storing, mining, indexing, and retrieval of multimedia big data. In recent days, the widespread and fast utilization of multimedia data consists of text, video, audio, and image with easy access and availability for managing the multimedia systems. The major problem in big data analytics is how to minimize the storage capacity and computational time when maintaining accurate outcomes even for small datasets. Therefore, to get semantic information about the text, video, audio, and image from data sources, the multimedia retrieval system is necessary. However, the extraction of this multimedia information is more complex owing to the growth in media. Moreover, there is a high demand for faster multimedia retrieval with the least utility gap. Conversely, it also suffers from the size of media and diversity. As the major requirement of multimedia is to match the content and other information based on the user's necessity, it has also offered best-matched data retrieval. On the other hand, there are some utility gaps among the delivered data and expected data, which happen due to the different conditions like failures in retrieval models for reaching each media data, false content description and poor query generation. These challenges must be considered while implementing a new retrieval model that assists in searching for entire items saved on the internet by any other platform through social networking sites or direct uploads. Moreover, a good multimedia retrieval model must have completeness and soundness. However, achieving the complete goodness in the retrieval model is challenging due to the retrieved irrelevant media. Likewise, completeness refers to the retrieval of entire significant multimedia data according to the user query, which is not properly attained on the internet. The multimedia big data suffer from the challenges like transmission, storage, additional compression, and analysis problems related to ensuring computing and scalability efficiency, dealing with understanding and cognition complexity, organization of heterogeneous and unstructured data, and solving the QoS and real-time requirements. Here, the technical and corresponding scientific issues promote new multimedia big data retrieval models with intelligent approaches, as depicted in Fig. 1 . The proposed multimedia data retrieval model consists of two stages: training and testing. Initially, the multimedia data including text, video, audio, and image as multimedia queries are collected from benchmark datasets, which are first processed to the feature extraction stage for getting the significant information. This suggested model has used CNN for getting the semantic features for describing the semantic contents from multimedia data. The CNN can extract the most useful features through the pooling layer, which extracts the most important features for enhancing the retrieval performance by reducing the redundant data or dimensionality of data. The main advantages for the convolutional layer consist of many filters which apply convolution operation to the input to capture some special features and pass the result to the next layer. Due to these advantages, CNN is properly trained even using different images from the same class. Optimal feature selection reduces the falsely selected features. Removing the irrelevant data improves learning accuracy and reduces the computation time. The extracted semantic features are given to the feature selection process for getting the optimal semantic features by applying a newly proposed SM-DHOA technique. This process is essential for the multimedia retrieval model to get the necessary information. When the huge set of features is processed, the retrieval model suffers from computation and time complexity. Therefore, feature selection is necessary for getting the most representative features. Weighted feature extraction trial-and-error for determining the appropriate number of extracted features can be avoided. A weight-based feature extraction approach to reduce the number of features for text classification. In the training stage, the weighted features are stored in the feature library, which is used for the retrieval process in a testing stage. Moreover, in the testing stage, all the above-mentioned processes are carried out for multimedia queries. The optimally selected semantic features using the SM-DHOA are given to the map-reduce framework. Here, it consists of two functions like mapper and reducer for retrieving the appropriate multimedia data. The optimal features are mapped into the mapper and generate the intermediate data, which is further given to the reducer. The reducer performs the retrieval performance using the A-SSF. For computing this similarity function, the optimal features of both trained data and query data are multiplied by a weight function. This weight function is optimized by the SM-DHOA and using the weighted features, the semantic similarity function is computed, and hence it is termed as A-SSF. The DHO algorithm can highly balance the exploration and exploitation phases and improve the accuracy of the proposed model. SM is a new type of swarm intelligencebased algorithm. It is used for finding the best-fit solution and also applied to solve complex optimization problems. This aims to maximize the precision and F1-score for attaining an efficient retrieval process. This developed multimedia big data retrieval model uses two datasets for experimentation, as follows. Dataset 1: It is collected from "http:// press. liacs. nl/ mirfl ickr/ mirdo wnload. html: Access Date: 2021-05-20". It consists of 25,000 files like multimedia data like images, video, text, and audio files with different directories. Dataset 2: It is gathered from the "https:// github. com/ satis hrdd/ Fast_ image_ retri eval: Access Date: 2021-05-20". It consists of images from both query and database with different lists, where the database includes 1,000 files and the query consists of 54 files. The input multimedia queries collected from both datasets are termed as Q n , where n = 1, 2, … , N and the total number of multimedia data in the dataset is indicated as N. It is the initial stage of the proposed multimedia big data retrieval model, which is carried out by applying CNN. The feature extraction is the essential step for developing a new Fig. 2 CNN-based semantic feature extraction for multimedia retrieval process retrieval model. It is the process of extracting the necessary features from multimedia data. It is required for extracting every feature of multimedia queries, which helps in retrieving the relevant multimedia data according to the queries. This stage gets the high dimensional features from the pooling layer of CNN. CNN [27] is used in this developed multimedia big data retrieval model because of its automatic and efficient feature extraction capabilities without human intervention, which is also a computationally efficient approach. Moreover, this architecture is less dependent on pre-processing, and thus, CNN is more efficient in this model. Moreover, it is simple and easy for processing to offer higher accuracy in the retrieval process. CNN is a feedforward network, which consists of different layers like "convolutional and pooling (or subsampling)" layers that are grouped into modules. The multimedia query is given as the input for CNN, which is processed into different stages "convolutional and pooling" for attaining the specific representations, and finally, the output classes are attained from the output layer. This paper extracts the necessary semantic features from the pooling layers. The input multimedia queries Q n are given to the input layer of CNN, which is served as the feature extractors [30] for learning the feature representations from the Q n . The convolution layer consisted of different neurons that are organized into feature maps, in which each neuron has its receptive field. Each neuron is correlated with a neighborhood of neurons in the earlier layer through a set of trainable weights, namely filter bank. Here, a new feature map is computed by convolving the inputs through the learned weights. Moreover, the convolved results are forwarded with a non-linear activation function. Here, the mth output feature map FE m is computed in Eq. (1). (1) FE m = R m * Q n . Here, the term R m denotes the convolutional filter concerning with mth feature map, represents the non-linear activation function, and the 2D convolutional operator is denoted as ' ( * ) ' that is responsible for extracting the nonlinear features from the input Q n . The pooling layer of CNN [31, 32] is used for minimizing the spatial resolution of the feature maps to get the spatial invariance for input translations and distortions. Most of the model employs average pooling aggregation layers for propagating the average of entire input values. The max-pooling layers choose the largest element in the corresponding receptive field as formulated in Eq. (2). In Eq. (2), the element at location (u, v) included through the pooling region ps is termed as Q nmuv that is used for embodying a receptive field about the position (p, s) , and the output of the pooling operation is denoted as FE mps that is correlated through the mth feature map. Therefore, the final semantic features FE m are attained using the pooling layer of CNN, where m = 1, 2, … , M and M indicate the total number of semantic features and it is obtained as 784 features, which is further given to the retrieval process. This CNN-based semantic feature extraction is diagrammatically represented in Fig. 2. The major contribution of this developed multimedia data retrieval model is to select the optimal semantic features from the extracted CNN-based semantic features FE m . The feature selection is the process of reducing the noisy, excess, and extraneous multimedia feature subset to select the optimal features. Finally, the most significant and optimal features are selected using the SM-DHOA from the extracted semantic feature subsets. The optimal feature selection is performed using the SM-DHOA technique to reduce the retrieval time. The optimally selected semantic features using SM-DHOA are termed as FS m * , where m * = 1, 2, … , OM and OM represents the total number of optimal semantic features. The objective model here is minimizing the correlation between features. The correlation CT among the optimal semantic features FS m * is found for attaining the minimum correlated features, which is the primary objective of the developed model as formulated in Eq. (3). (2) FE mps = max (u,v)∈ ps Q nmuv . Here, the two features in the optimal semantic feature set are represented as fs and fh , and the total number of extracted optimal semantic features is indicated as OM . It is attained as 10. Hence, the proposed multimedia retrieval model aims to get unique semantic features with minimum correlation by the SM-DHOA. This optimal semantic feature selection enhances multimedia retrieval efficiency. The optimal semantic feature selection is shown in Fig. 3 . This phase is used for retrieving the multimedia data using the map-reduce framework for handling the big data. The input to the map reduce-based retrieval process uses the optimal semantic features FS m * . The map reduce model consist of different phases such as mapper function and reducer function. Here, the mapper function is utilized for mapping the optimal features to provide a transitional values and the reducer function is performed for processing the transitional values corresponding with the equal transitional features. In the map-reduce model, the input and output are considered as the query id and returned list, respectively. The optimally selected semantic features are given to the mapper, where these features are mapped together for getting the intermediate results. Further, the reducer finds the similarity function for efficient retrieval, where the weight function is multiplied with the features of the reducer and feature library. This Here, the new semantic feature set in the feature library after multiplying the weight function is represented as FE TR(new) tr and the weight function used for the feature library is termed as wg TR tr , where FE TR tr denotes the features present in the feature library. Similarly, the features attained from the reducer are considered as FS Rd tr , which is multiplied with the weight function for getting the better retrieval results that are derived in Eq. (6) . Here, the new semantic weighted feature set is denoted as FS Rd(new) tr , and the weight function employed for the reducer is represented as wg Rd tr . Therefore, the weighted features of both feature library and reducer are checked with adaptive similarity function, where the weights wg Rd tr and wg TR tr are optimized using the SM-DHOA. Therefore, the adaptive similarity function is formulated in the reducer as given in Eq. (7) . To summarize, the mapper function considers the pairs of record values and attained the intermediate results. Then, each pair is processed to get the multimedia retrieval process using the adaptive similarity function by the reducer. Finally, the returned list is attained as the multimedia retrieval content. The adaptive semantic similarity in the map-reduce framework is shown in Fig. 4 . reducer model. This algorithm aims to enhance the multimedia data retrieval for big data. DHOA [25] and SMO [26] algorithms are selected because of their features, in which DHOA offers efficient balancing among exploitation and exploration stages with superior searching capabilities, which has determined the optimal position effectively. It provides excellent sensitivity and solves real-time optimization problems owing to the robustness and ability of faster convergence,. though DHOA suffers from premature convergence. Thus, there is a need of adopting a new SM-DHOA with the SMO algorithm. It offers convergence more quickly and requires fewer resources. Hence, SMO is adopted with DHOA for improving multi-media data retrieval. In this proposed SM-DHOA, the propagation through the position angle of DHOA is replaced by the global leader phase of the SMO algorithm. Hence, if (f < 1) , then the solutions are updated based on the DHOA with leader and successor positions, or else the solutions are updated based on SMO algorithm with global leader phase. The DHOA is motivated by considering the nature of the humans to hunt the deer with the mobility of hunters to get the best location with "leader and successor" solutions. The deer positions are updated until finding the deer. First, the deer is enclosed by the hunters with some constraints like "deer position and wind angle". This algorithm has special characteristics like efficient teamwork among hunters for finding the deer, and better-attacking procedure to achieve the optimal location of prey through the position of leader and successor. Moreover, the prey or deer has diverse special characters as given in [25] , which are studied for an efficient hunting process. DHOA is derived by initializing the population as given in Eq. (8) . Here, the number of hunters in the Xth population is indicated as p . Moreover, the constraints like deer position and wind angle are derived in Eqs. (9) and (10), respectively. In the aforementioned equation, wind angle is necessary due to the circular search space. Here, the wind angle is termed as , the recent iteration is represented as k , indicates the deer's position angle and r specifies the random number in the range of [0, 1]. The positions are propagated based on diverse constraints. Initially, the position of the optimal search area is anonymous and so, the solutions are randomly considered that is near to the optimal space through formulating the fitness function. As mentioned above, the solutions are considered as leader position and (8) X = X 1 , X 2 , .........., X p ; 1 < k ≤ p. successor position, which are represented as X ldr and X Sur . These are considered as the first best position and successor position of the hunters, respectively. (i) "Propagation through Leader's position": At the initial stage, the best positions are formulated, and then every individual in the population is tried to get the best position. The enclosing behavior to find deer is formulated in Eq. (11). In Eq. (24) , the position at the next iteration is denoted as X k+1 , the random number of wind speed is represented as f that varies among [0, 2], the position of the current iteration is termed as X k and coefficient vectors are represented as Y and B as formulated in Eqs. (12) and (13), respectively. Here, the term p is a parameter with the range of [− 1, 1], the maximum iteration is considered as j max , and rnd specifies the random number in the bounding range of [− 1, 1]. (ii) "Propagation through position angle" based on global leader position of SMO algorithm: SMO is a population-based technique inspired by the social interactions among the spider monkeys, which has been considered the fission-fusion social strategy for intelligent foraging behavior. The monkeys have different groups of leaders like local and global leaders and corresponding group members. This hierarchy is used for maintaining defensive boundaries and social bonds. It has different phases like "local leader phase, global leader phase, local leader learning phase, global leader learning phase, local leader decision phase, and global leader decision phase". In this developed SM-DHOA algorithm global leader phase is considered for updating the position. Once, the local leader phase is performed for finding the food, the entire spider monkeys reevaluate their positions based on the experience of members of the local group and global leader. This process helps in identifying the most suitable positions through Eq. (14) . In Eq. (14) , the term X newkj expresses the new position update based on the global leader, the kth spider monkey at jth dimension is formulated as X kj , the random number is denoted as rn , the global leader position at jth dimension is formulated as GL j and the s th spider monkey at jth dimension is indicated as X sj , where an arbitrarily chosen index is given as j ∈ {1, 2, ⋯ , J}. (iii) "Propagation through Successor's position": The exploration process of DHOA is formulated by altering the vector B with the enclosing behavior. When B < 1 the random search is initialized. Thus, the position is updated with the position of the successor by considering the obtained first-best solution, which permits a global search that is derived in Eq. (15) . Here, the successor position of the search agent is represented as X Sur in the recent population. Hence, the designed SM-DHOA is formulated based on the random parameter f , which decides the position updating through the SMO and DHOA. The pseudo-code of the proposed algorithm is given in Algorithm 1. (14) The proposed multimedia data retrieval model focuses on efficient retrieval using a new SM-DHOA with optimal semantic feature selection with A-SSF. The weighted feature extraction is performed to implement the A-SSF computation and the optimization [33] strategy is used to tune the weight function, thus calling the similarity function A-SSF. Here, the map-reduce framework is used for retrieval by optimizing the weight function of features in the reducer for increasing the performance in terms of precision and F1-score. The major multi-objective function FF 2 for A-SSF using SM-DHOA is derived in Eq. (16) . Here, precision is "the ratio of positive observations that are predicted exactly to the total number of observations that are positively predicted" as formulated in Eq. (17) . The F1-score is the "harmonic mean between precision and recall. It is used as a statistical measure to rate performance" as derived in Eq. (18) . Here, terms te ps ,te ng , fe ps and fe ng refer to the "true positives, true negatives, false positives, and false negatives," respectively. The multi-objective function focuses on improving the retrieval performance through adaptive semantic similarity in the map-reduce framework using SM-DHOA. (17) prn = te ps te ps + fe ps . 2te ps 2te ps + fe ps + fe ng . The proposed multimedia big data retrieval was developed in MATLAB 2020a, and the experimental analysis was carried out. From the datasets, video to video and audio to an image is analyzed using dataset 1 and also text to image and image to image is analyzed using dataset 2. Here, the performance of the proposed model was compared over the conventional methods like PSO [28] , GWO [29] , DHOA [25] , and SMO [26] and existing models like BD-FFT [2] , FRS-KELM [4] , and RELIEF-MM [6] . The experimental analysis was performed by considering the number of iteration as 10 and the number of populations as 10. Various performance metrics are used for evaluating the performance,which is described below. (e) FNR: "the proportion of positives which yield negative test outcomes with the test". (f) Sensitivity: "the number of true positives, which are recognized exactly". (g) Specificity: "the number of true negatives, which are determined precisely". (h) Accuracy: It is a "ratio of the observation of exactly predicted to the whole observations". (23) FNR = fe ng te ng + te ps . (24) Se = te ps te ps + fe ng . (25) Sp = te ng te ng + fe ps . (26) Ac = (te ps + te ng ) (te ps + te ng + fe ps + fe ng ) . The performance of the designed multimedia big data retrieval model using SM-DHOA with A-SSF is analyzed in terms of precision by varying the number of multi-media data retrieved, which is depicted in Figs. 6 and 7 for metaheuristic-based algorithms and conventional models, respectively. When considering the number of the multi-media data retrieved as 20 for text to image retrieval, the suggested SM-DHOA gets 7.9%, 4.3%, 6.7%, 6%, 4.4%, 6.7%, and 5.5% higher precision rates than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively. For an image to image retrieval, the proposed SM-DHOA attains (27) Re = te ps te ps + fe ng . 5.7%, 4.5%, 6.97%, 5%, 6.9%, 6.9%, and 6.9% maximum precision rate than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, while taking the number of multi-media data retrieved as 40. Similarly, the performance of the developed SM-DHOA is analyzed in terms of precision with the number of multimedia data retrieved as 60, which is 3.7%, 5%, 2.4%, 7.6%, 2.4%, 1.2%, and 9% superior to PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for a video to video retrieval. Likewise, the efficiency of the recommended SM-DHOA for audio to image retrieval is 11%, 1.28%, 2.5%, 6.7%, 8%, 6.75%, and 5.3% progressed than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, while taking the number of multi-media data retrieved as 80. Finally, the proposed model shows better performance by comparing with the existing approaches. For the statistical analysis, the number of the multi-media data retrieval was taken as R@100 for image retrieval. In Fig. 7 Performance analysis on the precision of the proposed Multimedia Big Data Retrieval model with different machine learning-based algorithms for a text to image retrieval, b image to image retrieval, c video to video retrieval, and d audio to image retrieval our proposed model, image 1 of the actual and predicted images are compared in terms of recall which is regarded as R@1, further the set of 2 of the images are compared in terms of recall which is regarded as R@2. Similarly, the comparison is made for the R@100 images. The proposed multimedia big data retrieval is evaluated in terms of F1-score with different optimization-based algorithms and classifiers as given in Figs. 8 and 9 , respectively. The F1-score of the suggested SM-DHOA shows the better performance, which is 8%, 9.4%, 10.7%, 6.8%, 8%, 6.8%, and 9.4% advanced than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, by considering the number of multi-media data retrieved as 40 for text to image retrieval. While taking the image to image retrieval, the performance of the recommended SM-DHOA is 8.75% advanced than PSO and GWO, 8.75% advanced than DHOA, 7.4% advanced than SMO, 4.8% advanced than BD-FFT, 6% advanced than FRS-KELM, and 6% advanced than RELIEF-MM when evaluating the number of multimedia data retrieved as 60. Likewise, when considering the video to video retrieval with the number of multi-media data retrieved as 80, the developed SM-DHOA is 6.75%, 3.9%, 8.2%, 9.7%, 9.4%, 10.9%, and 8.7% better than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively. Finally, when taking the number of multimedia data retrieved as 100, the implemented SM-DHOA is 11.7%, 13.4%, 8.5%, 16.9%, 5.6%, 8.69%, and 15.3% enhanced than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for audio to image retrieval. Further, the designed multimedia big data retrieval model using SM-DHOA with A-SSF establishes superior performance in terms of F1-score when compared to other approaches. The overall performance of the designed multimedia big data retrieval model is analyzed in terms of accuracy with different optimization-based algorithms and classifiers as given in Fig. 10 . While considering the text to image retrieval, the performance of the designed SM-DHOA is 9.3%, 6.8%, 4.44%, and 4.44% superior to PSO, GWO, for a text to image retrieval, b image to image retrieval, c video to video retrieval, and d audio to image retrieval DHOA, and SMO, respectively. The performance of the designed SM-DHOA is 4.3%, 3.2%, and 1% enhanced than BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for the video to video retrieval. Hence, the performance of the designed model with A-SSF using SM-DHOA gets a more accurate retrieval rate than other conventional models. The statistical analysis on precision for the developed multimedia data retrieval model is given in Tables 2 and 3 , respectively, by considering different query data. "The mean is the average value of the best and worst values and the median is referred to as the center point of the best and worst values, whereas the standard deviation is represented as the degree of deviation between each execution". While considering the text to image retrieval, the accuracy of the SM-DHOA is 7.8%, 8.3%, 7.8%, 8.5%, 9.2%, 9.6%, and 7.69% higher than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively. Likewise, the statistical analysis on precision for all the datasets shows superior performance while comparing with the other existing approaches. The statistical analysis on F1-score of the proposed multimedia data retrieval model with A-SSF is given in Tables 4 and 5. The performance of the SM-DHOA is 7.5%, 8%, 6.7%, 7.6%, 4.4%, 5.5%, and 9.11% advanced than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, while considering the audio to image retrieval. The median of the suggested SM-DHOA is 5.9%, 7.22%, 5.9%, 5.3%, 3.4%, 5.2%, and 7.8% higher than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for audio to image retrieval. Similarly, superior performance is attained by the proposed model than existing approaches. The performance of the implemented SM-DHOA for the multimedia data retrieval is validated in Tables 6 and 7. The efficiency of the recommended SM-DHOA is 7%, 6%, 7%, 7.3%, 7.7%, 6.9%, and 8.3% higher than PSO, GWO, DHOA, SMO, BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for an image to image retrieval. The median of the proposed SM-DHOA is 7.3%, 2.3%, 3.5%, 3.5%, 7.8%, 7%, and 7.8% enhanced than PSO, GWO, DHOA, SMO, Table 2 Statistical analysis of the proposed Multimedia Big Data Retrieval model with different meta-heuristic-based algorithms for four datasets in terms of precision BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for the image to image retrieval. Therefore, the performance of the designed model is validated in terms of recall while compared with the other approaches. This paper has developed a new deep multimedia big data retrieval with A-SSF based on hybrid SM-DHOA. The gathered multimedia data were given to the deep CNN-based semantic feature extraction for getting the significant features. Then, the optimal semantic features were selected using the SM-DHOA to minimize the correlation among the optimal features. Furthermore, the map-reduce framework with A-SSF was developed using SM-DHOA for an efficient retrieval process. Through the performance analysis, while considering dataset 1, the accuracy of the developed SM-DHOA was 4.3%, 3.2%, and 1% enhanced than BD-FFT, FRS-KELM, and RELIEF-MM, respectively, for the video to video retrieval. Hence, superior performance was observed by the proposed model while compared to the conventional algorithms. In our proposed SM-DHOA model, results were found only for text-to-image, image-toimage, video-to-video, and audio-to-image. In future works, the proposed model extends towards the utilization of audioto-video or audio-to-image-to-image processing by utilizing intelligent approaches like the ensemble learning method. An effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval Efficient conversion of deep features to compact binary codes using Fourier decomposition for multimedia big data A cross-modal multimedia retrieval method using depth correlation mining in big data environment Feature reduction based on semantic similarity for graph classification Prior-based probabilistic latent semantic analysis for multimedia retrieval RELIEF-MM: effective modality weighting for multimedia information retrieval. Multimedia Syst SOR: an optimized semantic ontology retrieval algorithm for heterogeneous multimedia big data Multimedia retrieval by deep hashing with multilevel similarity learning OMIR: ontology-based multimedia information retrieval system for web usage mining ADAM pro: database support for big multimedia retrieval Differentially-private and trustworthy online social multimedia big data retrieval in edge computing Deep understanding of 3-D multimedia information retrieval on social media: implications and challenges Data mining and machine learning technologies for multimedia information retrieval and recommendation Query quality refinement in singular value decomposition to improve genetic algorithms for multimedia data retrieval An efficient method for estimating semantic similarity based on feature overlap: reliability and validity of semantic feature ratings Dynamic distance learning for joint assessment of visual and semantic similarities Table 6 Statistical analysis of the proposed Multimedia Big Data Retrieval model with different meta-heuristic-based algorithms for Table 7 Statistical analysis of the proposed Multimedia Big Data Retrieval model with different conventional models for four datasets in terms of recall within the framework of medical image retrieval Feature reduction based on semantic similarity for graph classification MDCBIR-MF: multimedia data for content-based image retrieval by using multiple features Semantic link network-based model for organizing multimedia big data Multi-modal multimedia big data analyzing architecture and resource allocation on cloud platform Cloud computing model for big data processing and performance optimization of multimedia communication Discriminative deep quantization hashing for face image retrieval Semantic neighbor graph hashing for multimodal retrieval Deep semanticpreserving ordinal hashing for cross-modal similarity search Deer Hunting Optimization Algorithm: a new nature-inspired meta-heuristic paradigm Spider Monkey Optimization: a survey Convolutional neural networks for relevance feedback in content based image retrieval. Multimedia Tools Appl A Particle Swarm Optimization Algorithm for web information retrieval: a novel approach Novel real time content based medical image retrieval scheme with GWO-SVM. Multimedia Tools Appl Detection of myocardial infarction based on novel deep transfer learning methods for urban healthcare in smart cities Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections Deep learning modalities for biometric alteration detection in 5G networks-based secure smart cities Area optimization of CMOS full adder design using 3T XOR Best Worst Mean Median Standard deviation