key: cord-0064967-57nxfvyr authors: Nasif, Ammar; Othman, Zulaiha Ali; Sani, Nor Samsiah title: The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart Cities date: 2021-06-20 journal: Sensors (Basel) DOI: 10.3390/s21124223 sha: 345c7edcfdb744bbb81f2ec36e358a9218042215 doc_id: 64967 cord_uid: 57nxfvyr Networking is crucial for smart city projects nowadays, as it offers an environment where people and things are connected. This paper presents a chronology of factors on the development of smart cities, including IoT technologies as network infrastructure. Increasing IoT nodes leads to increasing data flow, which is a potential source of failure for IoT networks. The biggest challenge of IoT networks is that the IoT may have insufficient memory to handle all transaction data within the IoT network. We aim in this paper to propose a potential compression method for reducing IoT network data traffic. Therefore, we investigate various lossless compression algorithms, such as entropy or dictionary-based algorithms, and general compression methods to determine which algorithm or method adheres to the IoT specifications. Furthermore, this study conducts compression experiments using entropy (Huffman, Adaptive Huffman) and Dictionary (LZ77, LZ78) as well as five different types of datasets of the IoT data traffic. Though the above algorithms can alleviate the IoT data traffic, adaptive Huffman gave the best compression algorithm. Therefore, in this paper, we aim to propose a conceptual compression method for IoT data traffic by improving an adaptive Huffman based on deep learning concepts using weights, pruning, and pooling in the neural network. The proposed algorithm is believed to obtain a better compression ratio. Additionally, in this paper, we also discuss the challenges of applying the proposed algorithm to IoT data compression due to the limitations of IoT memory and IoT processor, which later it can be implemented in IoT networks. The UN reported that by 2030, almost 60% of the world's population will reside in big cities with almost 38 million residents, such as Tokyo followed by Delhi, Shanghai, Mexico City, São Paulo, and Mumbai, which are all ranked amongst the world's most populated cities [1] . In 2014, there were 28 mega-cities with thrice the population than back in 1990, and this number was estimated to exceed 41 cities in 2030. In the European Union the urban population is expected to reach 80% in 2050. Now, more than 50% of the world's population live in urban areas, where they consume 75% of the energy, and they are also responsible for 80% of the greenhouse effect [2] . In 2050 it is predicted that the largest 200 cities in the world will each have a minimum population of 3 million people and that Mumbai (Bombay) in India, for example, may exceed 42 million [3] . The cities' infrastructure has been developed to cater to the demands of the new urban population. In the beginning, when wireless technologies had not been introduced yet, governments tried to connect buildings through cables and wires, and the cities containing these buildings have been referred to as wired cities [4] . Later the term "virtual cities" was proposed in order to show digital Memory size can be considered a critical problem in the IoT network because the small available memory segments messages into many smaller packets that require more transmission time, leading to consumption of more power and more latency [36] . A realistic example of this was stated in [37] where the RootMetrics smart city project relied on the IoT network as an infrastructure, and the enormous network data traffic caused system failure because the tiny IoT memory was unable to handle such massive data without intelligent management. It has been shown that when sensed data is sent directly to a Memory size can be considered a critical problem in the IoT network because the small available memory segments messages into many smaller packets that require more transmission time, leading to consumption of more power and more latency [36] . A realistic example of this was stated in [37] where the RootMetrics smart city project relied on the IoT network as an infrastructure, and the enormous network data traffic caused system failure because the tiny IoT memory was unable to handle such massive data without intelligent management. It has been shown that when sensed data is sent directly to a gateway or server, it not only consumes excessive power but also increases the chance of data loss [38] . As a solution, many previous research studies have focused on enhancing the transmission range and speed. Scratchpad Memory (SPM) & Non-Volatile Memory Express (NVMe) memory types were developed in order to hold small items of data for rapid retrieval in IoT devices [39] . SPMs are software-controlled and require additional programmer effort [40] , while NVMe enables the code to be executed directly. No code has to be copied to the Random Access Memory (RAM), which will reduce the boot-up time as well [21] . SPMs & NVMe were expensive enough to be implemented for IoT. The key contributions of this study are summarized as follows: (1) We study the technical side of IoT memory to clarify why small IoT memory cannot handle massive amounts of data. (2) We investigate lossless compression algorithms as well as previous and current related work that has been used to reduce data size and illustrated detailed differences between them to clarify which can be used for IoT. (3) We demonstrate the fundamentals of deep learning, which later help us understand the techniques used for dimension reduction and how we can use them to compress data in IoT memory. (4) We implement experiments on five datasets using lossless compression algorithms to justify which fits better for IoT and which is more suitable for numeric and time series data type as IoT data type. The paper is organized as follows: we investigate the technical details about IoT memory and why the small IoT memory cannot handle large data traffic, as well as how previous studies have tried to manage such large data using compression algorithms in Section 2. Then we investigate in more details the compression algorithms and methods in Section 3 and review algorithms that can be applied for numeric and time series data because of their similar characteristics with IoT data. In Section 4, deep learning fundamentals are illustrated in order to understand the techniques used for dimensionality reduction. We also investigate the current compression algorithms using deep learning in order to assess whether they, as well as traditional compression algorithms, can be used to compress the IoT data. However, we found that compression algorithms in deep learning do not share a similar concept with traditional compression algorithms. Additionally, we also discuss the potential of combining pruning and pooling in deep learning techniques with any suitable traditional compression algorithms. This paper describes how to minimize or compress the data to fit into a memory of IoT node in order to alleviate IoT data traffic in the IoT network. To illustrate in detail how an IoT system works, Figure 2 shows the IoT network architecture where every IoT node can be connected at least with one sensor or actuator or both. The node contains many integrated modules such as a processing unit (microcontroller), power management, memory (SRAM, Flash Memory, EEPROM), and communication modules (Wi-Fi, Bluetooth, 802.15.4 Wireless, wired). IoT nodes can be connected to an IoT gateway forming a local network. The gateway is connected to the internet which allows end-users to access (monitor or control) things. Memory is an essential component of an IoT device, as it stores both received and sent data. However, the performance of this memory depends on its type. One of these types is non-volatile memory (NVM), which retains data even if power is removed. The Memory is an essential component of an IoT device, as it stores both received and sent data. However, the performance of this memory depends on its type. One of these types is non-volatile memory (NVM), which retains data even if power is removed. The other type is volatile memory (VM), which loses data if power is removed. VM is faster than NVM but more expensive. Manufacturers using NVM for embedded devices have two options: one-time programmable (OTP) and multiple-time programmable (MTP). MTP offers applications that require long battery life, it is considered better than external flash memory and also lower in cost per bit. OTP is more suitable when the contents of memory cannot be modified once configured. For IoT devices, manufacturers have developed scratchpad memories (SPMs) that are high-speed internal memories used for the temporary storage of calculations, data, and other works in progress. Ratzke stated in [39] that SPM is used to hold small items of data for rapid retrieval in IoT devices. In [40] , researchers stated that SPM is different from cache memory because cache memory is managed by hardware while SPM is managed by software and requires additional effort from programmers. However, many researchers have focused on improving the IoT network by improving SPMs for performance gain, instead of focusing on data allocation, they focused on instruction allocation because IoT has embedded systems that have particular and special uses [39] . The researchers mentioned in [39] discovered that dynamic allocation of memory is better than static; therefore, there is no need to fill the memory before execution; instead, the memory should be filled when needed. Therefore, they proposed an algorithm that would decide whether to store memory objects (variables and code segments) in the SPM first or to the main memory before computing the addresses in the SPM. The SPM includes an array of SRAM cells and is used as an alternative to cache due to its specifications in energy efficiency, time predictability, and scalability. However, there is a need for the compiler or the programmer to allocate appropriate data to the SPM efficiently. Therefore, data management is the most challenging issue in systems equipped with SPMs, as researchers have stated in [41] . Furthermore, Lipman suggested one of the other ways to improve IoT devices would be using non-volatile memory (NVM). NVM is fast enough to allow executing the code directly, and there is no need to copy the code to the RAM here, which would reduce the boot-up time as well. However, there are still many improvements to be made, such as those in size and cost [21] . Because of this, manufacturers still use the traditional memory, which is the SRAM, to store data in IoT devices. IoT memory has a low capacity, which is used in caching enormous network data, the IoT insufficient memory space is a crucial problem for smart city projects that rely on IoT networks as infrastructure. However, manufacturers of IoT devices have focused on increasing the speed of accessing data by proposing SPMs and NVM, as illustrated in the IoT memory section. Furthermore, they have focused on increasing the range of connections with low power consumption. Unfortunately, only a handful of researchers were interested in increasing the memory size both because the process was expensive and because this was not a critical issue since data was not large in the past. For more clarification, Figure 3 shows that IoT memories are of three types: nonvolatile flash memory, which is used for programs, also known as program memory, and the other two types are for data and are known as data memory. A non-volatile EEPROM and volatile SRAM are used to temporarily store data. Memory sizes differ by controller type and version; the data that is received and transmitted through the network is stored in the SRAM. Data for Wi-Fi credentials, such as usernames and passwords, is stored in the EEPROM. For more clarification, Figure 3 shows that IoT memories are of three types: non-volatile flash memory, which is used for programs, also known as program memory, and the other two types are for data and are known as data memory. A non-volatile EEPROM and volatile SRAM are used to temporarily store data. Memory sizes differ by controller type and version; the data that is received and transmitted through the network is stored in the SRAM. Data for Wi-Fi credentials, such as usernames and passwords, is stored in the EEPROM. One of the challenges faced here is the insufficient memory size that causes buffer overflow, which can happen when software writes data to a buffer and anomalously overflows the capacity of the buffer, resulting in the overriding of adjacent memory positions. Information is transmitted into a container with insufficient space, and this information is then replaced by the data in neighboring recipients. In the IoT, the SRAM memory works as a buffer when it receives and transmits data. Most controllers have a small SRAM size, for example, Arduino controllers SRAM, in comparison to many boards (shown in Table 1 ) [42] . One of the challenges faced here is the insufficient memory size that causes buffer overflow, which can happen when software writes data to a buffer and anomalously overflows the capacity of the buffer, resulting in the overriding of adjacent memory positions. Information is transmitted into a container with insufficient space, and this information is then replaced by the data in neighboring recipients. In the IoT, the SRAM memory works as a buffer when it receives and transmits data. Most controllers have a small SRAM size, for example, Arduino controllers SRAM, in comparison to many boards (shown in Table 1 ) [42] . To clarify the problem, Figure 4 illustrates how many sensors (from Sensor 1 to Sensor n, where n is an undetermined number) try to send their data to the SRAM memory of a connected IoT node, and sometimes the sensors send the data simultaneously and cause overflowing the IoT SRAM. Hence potential problems here are memory overflow and the possible loss of data due to buffer overflow. The probability of these problems increases, especially when more sensors are connected to the IoT node. total messages in one millisecond = n ∑ i=0 S n .DF (1) where S denotes the sensor, i the number of sensors, which ranges from 1 to n, where n is the sensor's max count connected to an IoT node. DF is the data flow from the sensor to the IoT node. If we have at least 2 bytes every millisecond, we can calculate the data flow size for one second from the following example: If the total number of messages sent in one second from sensor 1 = 2 Bytes × 1000 = 2000 Bytes~2 KB/1 second, 2 KB is the max capacity of the IoT memory (SRAM). It has been found that the size of the transmitted data from all sensors can collapse the IoT node memory. To solve this problem, many solutions were proposed, such as limiting the count of sensors connected to the IoT node, adjusting the time interval in order to control when the sensor sends the data-i.e., when the controller reads sensor data-although, the fact remains that less read means less accuracy, or adjusting the packet size, sent from the sensor to the IoT node, which is not reliable to send fewer numbers. For example, instead of sending the integer 25, send 2, then 5, or just 2. Therefore, the best solution is to compress the data immediately when received using a compression algorithm suitable to work within the IoT memory limits and processer power. In the next section, we will investigate data compression algorithms. To clarify the problem, Figure 4 illustrates how many sensors (from Sensor 1 to Sen sor n, where n is an undetermined number) try to send their data to the SRAM memor of a connected IoT node, and sometimes the sensors send the data simultaneously an cause overflowing the IoT SRAM. Hence potential problems here are memory overflow and the possible loss of data due to buffer overflow. The probability of these problem increases, especially when more sensors are connected to the IoT node. . (1 After collecting the data from sensors inside IoT memories, every node sends its data packets to the servers through IoT gateways, as illustrated in the IoT architecture in Figure 2 . Thus, the number of sensors and IoT nodes directly affects the size of the data transmitted to the server. However, there are limitations for any network system, such as connection bandwidth, which could overflow when trying to send massive data in a period that the bandwidth of the network cannot handle. Furthermore, connection overflow could occur when sending an abundance of connection requests from clients to the server during a period that cannot be handled by the server, thus pushing the server to drop many of these connections. As a solution to these problems, compressing the data during the first stages before sending it to the servers will minimize connection sessions and reduce data traffic. Compression means that instead of sending the original data, we can send data of a smaller size, which will consume less battery and need fewer connection sessions and less time. For example, if the original data was 100 MB and the network bandwidth was 10 MB/S, it would take 10 turns to send this data, where every turn takes a second, which needs 10 s for sending the entirety of the data. However, if this data was compressed to 10 MB, the time needed would be reduced to one second, which reduces the network use by about 90%. Accordingly, this reduces data traffic and makes bandwidth available for service and for transmitting other data. Many studies on aggregation and compression have been conducted in WSNs as the backbone of IoT networks [43, 44] , however, they mostly used compression at the servers because these nodes have more processing power than the sense/edge nodes and they do not have consumption problems, and that did not reduce much the traffic [45] . On the other hand, a IoT network differs from a WSN in terms of connectivity between each node, whereby the IoT node can be connected directly to the internet and has the ability to make decisions [46, 47] . Therefore, a new way of aggregation and compression became in demand in IoT edge and sense nodes as the number of connected IoT devices and data increased exponentially during the last years [48, 49] . Therefore, to deal with such large IoT data, a method was proposed as an update and a query-efficient index system in [4, 50] , with several criteria such as regular and necessary multidimensional updating of data. Some researchers stated that traditional database systems are not capable of handling large volumes of data and cannot support millions of data inputs per minute [51] . Other researchers in [52] stated that it could be highly impossible to move enormous data from IoT peripheral nodes to the server in a timely fashion and they stated that IoT devices should be able to store data, process, analyze, and sometimes make decisions in real time. Despite the IoT's memory limitations, many machine intelligent algorithms have been proposed in [53] (ASIC-based acceleration [54] , FPGA-based acceleration [55] , mobile SoC-based acceleration [53] ) in order to accelerate convolutional neural networks (CNNs) on embedded platforms. They focused on accelerating processing [56] and decreasing its energy consumption [57, 58] . Few researchers have focused on data compression to minimize data size by retaining identical information content [36] . Although they have proposed that different algorithms compress data, because of various factors, the performances of these algorithms differ. These include factors such as power consumption [13] , speed of data transmission [59] , bandwidth [60] , size of transmitted data [61] , and processor power [62] . All these factors affect the IoT network's performance directly. The motivation to use compression algorithms comes from the small memory capacity of IoT devices, which works either as a buffer or cache memory in IoT networks, as researchers have stated in [63] . Some researchers in [31] have suggested data compression as a technique to reduce data traffic in the network and to empower IoT capability while others focused on power consumption; for example, Kimura and Latifi in [64] stated that energy consumption of one bit transmission via radio is 480 times higher than conducting Sensors 2021, 21, 4223 9 of 27 one addition process. Some researchers tried to classify compression algorithms depending on the type of data, for example, algorithms that rely on the temporal correlation of sequenced residue data, as shown in [44, 65] , where they used information for compression like in [66] ; therefore, they proposed S-LZW, SHuffman, and ND-encoding algorithms as examples. Another type of algorithm depends on data prediction [67] , which has been considered more complicated and has several drawbacks such as high power consumption and large memory requirements, which is not available in most IoTs; for example, the MinDiff algorithm in [66] . Many data compression algorithms were proposed, such as coding by ordering, which dropped some sensor nodes and their data in an aggregated node [68] . Another method is pipelined in-network compression, which has been proposed for audio and video sensors and depends on the common similarity of data packets in bit values in order to delete the redundancies in data packets. Yet another method was proposed as a low-complexity video compression algorithm in [55] for video surveillance sequences collected by a wireless sensor network, where researchers introduced a framework based on change detection and JPEG compression of the region of interest (ROI); they stated that the proposed compression algorithm is similar to MPEG-2 and available at a much less computational cost. Another algorithm is distributed compression, which is used to obtain data from many spatial sources. The central node compares every sensor partial data with the data from the reference node in order to determine if there are any changes or errors, then decides what to send over the network and how to remove spatial redundancy [69, 70] . Although some of these algorithms have been applied on WSNs, none of them have been applied for the IoT. The next section investigates compression techniques in order to determine which could better fit in IoT networks. Compression is a way to represent massive data, which could be numeric, text, pictures, videos, and audios, or any other type, using a small data size. Compression is categorized into lossy and lossless. Lossy means the decompressed data is different from the original one while lossless compression is identical to the original and the decompressed data. The selection from the two types of compression techniques or algorithms depends on the type of data to be compressed. For example, to compress a picture using lossy compression, one should only keep enough information to know what is inside the picture, such as a car or a person. In contrast, lossless compression is not suitable for sensitive data such as financial or election data where it is used to alleviate transmission on the internet or storing data on USB drives. Therefore, when every single bit of data is critical, lossless data compression is used; otherwise, lossy compression is used. For video, audio, and picture data, it is better to use lossy compression because the accuracy and the compression ratio are high, otherwise, the original files are too large to be transmitted. For text and numerals or symbols, it is better to use lossless compression because identical data is required when decompressing. For example, we cannot rely on two words to replace ten words when representing the names of students, nor can we rely on two numbers to represent ten numbers because we will lose accuracy and sometimes transmit wrong data, which will lead to destructive results. However, IoT data only has numeric and text data format; therefore, using lossless data compression is the best solution. A high compression ratio for any algorithm does not imply it is the best algorithm for all data types. Every data type has many suitable compression methods and algorithms. Many factors affect choosing the best compression method for every data type. However, it is known that the most influential compression factors are the speed of compression/decompression and compression ratio. Also, real-time data vs. offline data influences the selection of the compression algorithm as well. However, this paper focuses on lossless algorithms that have been proposed to compress numeric and time series data because the purpose of this paper is to investigate compression algorithms for IoT data. Therefore, three lossless compression types of algorithms were reviewed, which are categorized as entropy, dictionary, and general-based algorithms. Entropy encoding is a lossless data compression scheme in information theory, regardless of the medium's specific characteristics. One of the main entropy coding types creates and assigns every single symbol of the entry into a unique prefix-free code. There are more than 16 algorithms support entropy algorithms such as Arithmetic Coding [71] [72] [73] [74] , Asymmetric Numeral Systems (ANS) [75] [76] [77] , Golomb Coding [78, 79] , Adaptive Huffman [80] [81] [82] , Canonical Huffman [83] , Modified Huffman [84] , Range encoding [85, 86] , Shannon [87] , Shannon-Fano [88] [89] [90] , Shannon-Fano-Elias [91] , Tunstall coding [92, 93] , Unary coding [94] [95] [96] , Universal Exp-Golomb [97, 98] , Universal Fibonacci Coding [99] [100] [101] , Universal Gamma Coding [102, 103] , Universal Levenshtein Coding [104] . The main concept of entropy is to replace the symbol with a prefix code, which reserves a smaller size in the memory. In most of these algorithms, there is a need to store the symbols with their frequencies, which is then used in order to determine the replacement codes for the symbol, and this needs an abundance of memory. Furthermore, due to the complexity of searching and counting for the matched symbols and the encoding process itself, algorithms use more memory and need a large processing power that is not available in IoT devices; therefore, without modifying these algorithms, none of them would be suitable or applicable for the IoT systems and cannot be implemented on IoT nodes. The most potential candidate algorithm to be used after modification is the Adaptive Huffman because it can process real time inputs which is similar to the case of IoT inputs. A dictionary-based algorithm is a scheme that creates a dictionary containing the symbols and the codewords assigned to it. The symbols are collected from the input data with no redundancy and represent all the input data, and the codeword assigned to every symbol should be smaller than the symbol itself, otherwise, inflation could happen. Many applications and algorithms create the dictionary dynamically, hence, when there is an input, the dictionary can be updated as needed. There are more than 19 algorithms support dictionary-based algorithms such as Byte pair encoding [105] , Lz77 [87, 106, 107] , Lz78 [74] , (LZW) Lempel-Ziv-Welch [108] , (LZSS) Lempel-Ziv-Storer-Szymanski [103, [109] [110] [111] , (LZS) Lempel-Ziv-Stac [112] , (LZO) Lempel-Ziv-Oberhumer [113, 114] , Snappy [115, 116] , Brotli [117, 118] , Deflate [119] , Deflate64 [120] , LZ4 [121] [122] [123] , (LZFSE) Lempel-Ziv Finite State Entropy [124, 125] , (LZJB) Lempel Ziv Jeff Bonwick [108] , (LZMA) Lempel-Ziv-Markov chain-Algorithm [108] , (LZRW) Lempel-Ziv Ross Williams [108, 121, 126] , LZWL [127, 128] , LZX [129] . Entropy scheme algorithms rely on giving an index value for each symbol with the rule that each entry in the dictionary should not be iterated and has a unique index value. The dictionary size increases every time we have a new entry, which makes it a critical issue because the max size of the dictionary is limited according to the size of memory. The sliding window comes as a solution, which limits the entries for every interval. Every value in the sliding window is compared with previous indexed values in the dictionary. Hence, if the size of the dictionary increases, the search process for match symbols can take a long time, and this can make the encoding process even slower. All these are considered as obstacles for running any of these algorithms on an IoT node because of its low processing power and low memory size. Many modifications are needed, such as reducing the slide window size and limiting the dictionary size, to fit the IoT node specifications. Lossless general compression algorithms are implemented by replacing symbols in the context with codes or numbers in order to refer to their counts or predictions in the data, or by differences between the values if the input data is made of integers. The methods of these algorithms come in many shapes and steps, such as prediction at first followed by arithmetic coding that can be involved in order to encode the data. Hence, in this scheme, no dictionary or slide window is used. There are more than 8 algorithms support Lossless general compression algorithms such as Burrows-Wheeler transform (BWT) Burrows-Wheeler transform [130] , (CTW) Context tree weighting [131] , Delta [132, 133] , (PPM) Prediction by partial matching [134, 135] , (DMC) Dynamic Markov compression [136, 137] , (MTF) Move-to-front transform [138] , PAQ [139] , RLE [140, 141] . Lossless general compression algorithms are different from entropy and dictionarybased algorithms in that they do not use a sliding window or create a dictionary. This is clear, especially in the BWT, Delta, and RLE algorithms The results of these algorithms depend on the sequence of input data, which is not guaranteed when dealing with IoT data. Most of the others need a large memory that exceeds the limits of IoT nodes. Furthermore, there is the complexity of encoding processes such as PPM and DMC algorithms that use arithmetic coding as a step or PPM and PAQ that use context mixing in order to increase the prediction preciseness. Many symbols move to the header of the stack in MTF, exceeding the limits of IoT nodes as well as all the mentioned algorithms. Deep learning is an evolution of machine learning mainly consisting of neural networks that aims to automate systems for many applications. It consists of neurons arranged in layers. Deep learning become popular recently due to its ability to provide accurate solutions in many domain problems. It has neurons, weight, bias and activation functions which need to be adjusted to obtain the best solution. There are various variants of deep learning in neural network architectures that consist of a wide variety of neural network training strategies [142, 143] . Deep learning is divided into unlabeled and labeled data according to the type of data under processing. Autoencoder (AE) architecture [144, 145] and restricted Boltzmann machine (RBM) architecture [146] , which have been proposed by the so called "Father of Deep Learning", Geoff Hinton, are considered the best for unsupervised learning and unlabeled data [147] . Both the architectures are considered to belong to the feature-extractor family and are supposed to be suitable for pattern recognition. For any work that involves the processing of time-series data, it is better to use a recurrent net (RNN) [148] . Supervised learning architectures are used for labeled data, such as using recursive neural tensor net (RNTN) and RNN for sentiment analysis [149] , parsing [150] , and entity/object recognition [151] . Deep belief networks (DBN) [152, 153] and CNN [154, 155] are used for images, objects [156] , and speech recognition. RNN is used for speech recognition [157, 158] , entity recognition [159] , and time-series analysis [160] . Many of the current deep learning architectures use one or a combination of previous solutions, depending on the data type they are analyzing. Researchers in [161] stated that some functions have a complexity that cannot be handled in IoT devices without machine learning or deep learning. Other researchers in [162] explained that the obstacles of low memory and low processing power were the reason behind this. Despite this, the IoT and sensors' data are the most common potential uses for brontobyte-level storage that is equal to 10 to the 27th power of bytes, as stated in [163] . Therefore, many scientists have studied how to reduce data traffic in order to alleviate the load on memory, as stated in [164, 165] . The next paragraph illustrates the techniques used in deep learning in order to reduce the weights and number of parameters. These techniques are defined under dimensionality reduction, which represents big data using small, meaningful data by reducing its space [166] . Pruning and pooling are illustrated in more details to see if they can be used to reduce the data traffic. Pruning is a method used for various applications and areas. It is very commonly used in different ways to minimize complexity [146] . For example, it is used for mining spatial high utility co-location patterns based on actually shared weights and features [167] . However, pruning aims to make it fast and small in the neural network by reducing learning weights [168] . After training the network for the first time, all connections with weights below a threshold are deleted from the network. This process occurs whenever the network is retrained. The training results can minimize the network size by keeping sparse connections and neurons [169] . In [60] researchers used pruning and other techniques in order to compress neural networks. from the ImageNet ILSVRC-2012 dataset, researchers experimented on AlexNet Caffe to get 89% of weights pruned with 9× compression ratio and on VGGNet-16 to get 92.5% of weights pruned with 13× compression ratio. Researchers experimented on the MINIST dataset with two architectures. First, the Lenet-300-100, a fully connected network with two hidden layers, has 300 and 100 neurons in each layer. The second is the Lenet-5, which has two convolutional layers and two fully connected layers, they got 92% of weights pruned with 12× compression ratio for both architectures. The ImageNet datasets describes the layer of convolutional (Conv) and full connected (Fc), while the MINIST datasets uses the layer of Conv and learnable parameters (lp). Each nodes describes the weights number and percent of weights pruned. The effectiveness of the pruning process was assessed in reducing the number of parameters and connections. Pruning removes the low-value weights and only keeps the high-value ones. The pooling layer is used to reduce the features or the spatial volume of inputs. Pooling is usually used after the convolution layer or between two convolution layers [170] . The size of the dimension after pooling is reduced [155] . There are three types of pooling: minimum, average, and maximum pooling. CNN used pruning after convolution and before using a classifier to reduce complexity and avoid overfitting. This depends on dividing the convolved layer into disjoined regions, then determining the max or min or the average value for every region's features [171, 172] . Han in [60] proposed a deep learning algorithm to reduce the storage and energy required to run inference on large networks and deploy on mobile devices in three phases. He used pruning to reduce redundant connections, then applied quantization on weights to produce fewer codebooks that needed to be stored because many of the connections share the same weight. After that, Huffman coding was applied to effective weights. Although the experiment was not applied for IoTs, the results were promising. However, researchers in [173] tried to compress neural network structures into smaller matrices by finding the non-redundant elements. Other researchers in [174] proposed SparseSep for deep learning in order to fully connect layers for sparsification and for the separation of convolutional kernels in order to reduce the resource requirements. The authors in [175] stated that the large model's group could be transferred to one small model after training using distillation, and that this would be much better for deployment. However, in [176, 177] , researchers proposed a dynamic network surgery compression algorithm to reduce the complexity of the network using the on-the-fly pruning method. They limited pruning in order to save accuracy. Therefore, they used the splicing method to compensate the important connections and weights that were pruned. Researchers in [178] worked on reducing the test time for the large convolutional network, which was directed for object recognition, starting with each convolution layer compressing and identifying the perfect low rank approximation before adjusting the top layers until the performance of the prediction was restored. Researchers in [179] investigated techniques for reducing complexity. Others tried to accelerate training by computing convolutions in the Fourier domain while reusing the same transformed feature map many times [180] . However, it is stated that most of the parameter values predicted need not be learned; architectures can be trained by learning a small weight number and predicting the others [181] . In order to improve model discrimination in responsive fields for local patches, a new network structure called "network in network" was suggested. It is a micro neural network that is instantiated with a multi-layer perceptron. This micro neural network is slid over the input in the same manner as CNN to generate the feature maps and use average pooling for classification [182, 183] . Other researchers tried using information theory ideas in order to determine the optimal neural network size by having a tradeoff between complexity and a training error using second derivative information, which includes removing unimportant weights [184] . Researchers in [185] proposed a new method to train binarized neural networks at run-time; during forward propagation, this method greatly reduces the required memory size and replaces most operations with bit-wise operations [186] . However, binary weights were also proposed in [187] , where researchers tried to replace the simple accumulations of several multiply-accumulate operations because multipliers took up most of the space and are considered power-hungry components when digital neural network is implemented. Another way to compress neural networks using a hashing trick was proposed in [188] , where the idea of linking every group of weights in the same hash bucket with a single parameter using a hash function was proposed. The proposed method managed to minimize the model sizes significantly by exploiting redundancy in neural networks. Other researchers in [189] found that the use of k-means in weights clustering can lead to a very good balance between the size of the model and the accuracy of the recognition. According to the specifications of the IoT data, this paper experiments on selected algorithms that need minimum memory, consume the least power, and have the potential to be modified and implemented into IoT nodes. The three algorithms that have been selected are Lz77 from sliding window algorithms, Lz78 from dictionary-based algorithms because these algorithms are considered to have the lowest complexity amongst the three, and the Huffman code from entropy algorithms, which been used in many compression applications and is very good for text compression with minimum complexity. Because the IoT data type can be heterogeneous since it comes from many different sensors, it is better to deal with this data as text instead of numbers. Otherwise, the data will have to be classified according to its sources, which will be more complex for the IoT device. The datasets used in the experiment are categorized into three types: (1) The first type is a time-series dataset collected from sensors connected to IoT devices, (2) The second type is time-series data not collected by sensors or IoT devices, and (3) The third type is a collection of varied files, not time series, and not collected by sensors or IoT devices. All three types of datasets were used in order to evaluate the performance of the proposed algorithms. All the experiments used at least 17 threads on a Dell server with a 2.4 GHz Intel Zeon 8 Cores E5620 46-bit-based processor and 100 GB RAM. Windows 10 Pro virtual was hosted on Centos 6, the operating system of the server. The five datasets with various dataset files are used for Compression Algorithms evaluation are: four data sets in the Dataset Kaggle [190] , 5 in UCI database [191] , 6 datasets in AMPDs [192, 193] , 10 datasets in The Calgary Corpus [194] Compression algorithms were implemented on the previous datasets in order to evaluate these algorithms depending on the compression ratio that can be obtained by dividing the size of compressed files by the size of uncompressed. However, before calculating the compression ratio, the compressed size for each file should be calculated from the datasets according to every compression algorithm used. Table 2 shows the results for the dataset compression. Figure 5 shows the results and ratios of compression algorithms have been categorized by the source of the datasets. a, c, e, g, and i show the compression results, whereas b, d, f, h, and j show the compression ratios. It is clear from compression results that the adaptive Huffman algorithm had the best values in all the datasets, although it equaled the canonical Huffman in some results such as in ozone level detection for eight hours in c and Book1 in g. In contrast, Lz77 got the worst results-in some cases the sizes of compressed files were even bigger than the original ones in many cases because of an inflation problem. However, there were cases where Lz78 obtained the worst results, especially for electricity monthly, electricity billing, and climate historical normally in e, which proves that compression results depend on the distribution and iterations in data. The compression results in a, c, e, g, and i show the comparison between compression algorithms when applying to the same files in datasets, whereas b, d, f, h, and j show the differences between compression ratios where the lowest compression ratio means better compression result. The adaptive Huffman also had the lowest compression ratio with one exception in h, where Lz78 got the lowest value for Book1 in the Calgary Corpus dataset. Table 2 also shows the results categorized by data type; the minimum compression ratio is 32%, which resulted using Lz78 on Book1 from the Calgary Corpus dataset, where the maximum compression ratio is 263%, which resulted in using Lz77 on water billing data from the AMPDs dataset. However, for data type 1, the minimum compression ratio is 38%, which was obtained using adaptive Huffman, and for data type 2, the minimum ratio is 43%, which was also obtained using adaptive Huffman. For data type 3, Lz78 is the lowest compression ratio when applying to Book1. However, if we exclude Book1 from the dataset, the adaptive Huffman would be the lowest ratio again, which is 58% ratio on paper2 from the Calgary Corpus. This means adaptive Huffman is the best when compressing time series and numeric data such as data type 1 and 2, however, it not necessarily good for data type 3. Figure 5 shows the results and ratios of compression algorithms have been categorized by the source of the datasets. a, c, e, g, and i show the compression results, whereas b, d, f, h, and j show the compression ratios. It is clear from compression results that the adaptive Huffman algorithm had the best values in all the datasets, although it equaled the canonical Huffman in some results such as in ozone level detection for eight hours in c and Book1 in g. In contrast, Lz77 got the worst results-in some cases the sizes of compressed files were even bigger than the original ones in many cases because of an inflation problem. However, there were cases where Lz78 obtained the worst results, especially for electricity monthly, electricity billing, and climate historical normally in e, which proves that compression results depend on the distribution and iterations in data. The compression results in a, c, e, g, and i show the comparison between compression algorithms when applying to the same files in datasets, whereas b, d, f, h, and j show the differences between compression ratios where the lowest compression ratio means better compression result. The adaptive Huffman also had the lowest compression ratio with one exception in h, where Lz78 got the lowest value for Book1 in the Calgary Corpus dataset. Table 2 also shows the results categorized by data type; the minimum compression ratio is 32%, which resulted using Lz78 on Book1 from the Calgary Corpus dataset, where the maximum compression ratio is 263%, which resulted in using Lz77 on water billing data from the AMPDs dataset. However, for data type 1, the minimum compression ratio is 38%, which was obtained using adaptive Huffman, and for data type 2, the minimum ratio is 43%, which was also obtained using adaptive Huffman. For data type 3, Lz78 is the lowest compression ratio when applying to Book1. However, if we exclude Book1 from the dataset, the adaptive Huffman would be the lowest ratio again, which is 58% ratio on paper2 from the Calgary Corpus. This means adaptive Huffman is the best when compressing time series and numeric data such as data type 1 and 2, however, it not necessarily good for data type 3. The results clearly show that adaptive Huffman has a better compression ratio and is more significant than canonical Huffman. This means compressing real-time data is better than compressing offline data. On the other hand, Lz78, which is a dictionary-based algorithm, has more significant results than Lz77, which is a sliding window-based algorithm. However, some anomalies could happen, such as the three results in AMPDs dataset, where Lz77 has better compression ratios, and the reason for this was data sequence and redundancy as well as the file sizes, therefore the inflation problem can be noticed in the Lz77 sliding window in all the datasets. In the Compression section, it was found that not all the mentioned algorithms are suitable to be implemented in the IoT nodes without being modified because they require more memory and greater power processors than what an IoT node can provide. However, compression algorithms can be implemented in cloud servers or some aggregated nodes. These algorithms need a considerable space of stack and heap that should be reserved according to every algorithm code (arrays and pointers). Because of the differences between these codes, the size of the allocated memory could not be known before the implementation. Furthermore, the size of the data itself, in some cases, could require hours to be compressed. The Deep Learning section explains that it is rather difficult to determine how many features are required to recognize an object, classify an image, or carry out other deep learning functions. These processes evolve deferent tasks according to the architecture used, and they also depend on the data type under processing. Therefore, every deep learning architecture has a different scenario. All architectures aim to know the minimum number of features in order to have the knowledge of which feature is good enough to have satisfied outputs with minimum errors. They transformed the high-dimensional data space into small-dimensional data space, which in turn conserves the same original data properties. High-dimensional data has many problems. It requires more time and space complexity and can also lead to overfitting. Furthermore, not all the features in highdimensional data are involved or related to the problem we are solving. Reducing the dimension of data space leads to reducing the noise and unnecessary parts of data and helps to determine the features most related to the problem. Two approaches to apply dimensionality reduction were proposed. The first is feature selection, where the most related features to the problem are selected. The second is feature extraction, where new features from the high-dimensional data space are assessed to create the low-dimensional C h e r a s T a n j u n g M a l i m P u t r a j a y a P e t a l i n g J a y a N i l a i K l a n g The results clearly show that adaptive Huffman has a better compression ratio and is more significant than canonical Huffman. This means compressing real-time data is better than compressing offline data. On the other hand, Lz78, which is a dictionarybased algorithm, has more significant results than Lz77, which is a sliding window-based algorithm. However, some anomalies could happen, such as the three results in AMPDs dataset, where Lz77 has better compression ratios, and the reason for this was data sequence and redundancy as well as the file sizes, therefore the inflation problem can be noticed in the Lz77 sliding window in all the datasets. In the Compression section, it was found that not all the mentioned algorithms are suitable to be implemented in the IoT nodes without being modified because they require more memory and greater power processors than what an IoT node can provide. However, compression algorithms can be implemented in cloud servers or some aggregated nodes. These algorithms need a considerable space of stack and heap that should be reserved according to every algorithm code (arrays and pointers). Because of the differences between these codes, the size of the allocated memory could not be known before the implementation. Furthermore, the size of the data itself, in some cases, could require hours to be compressed. The Deep Learning section explains that it is rather difficult to determine how many features are required to recognize an object, classify an image, or carry out other deep learning functions. These processes evolve deferent tasks according to the architecture used, and they also depend on the data type under processing. Therefore, every deep learning architecture has a different scenario. All architectures aim to know the minimum number of features in order to have the knowledge of which feature is good enough to have satisfied outputs with minimum errors. They transformed the high-dimensional data space into small-dimensional data space, which in turn conserves the same original data properties. High-dimensional data has many problems. It requires more time and space complexity and can also lead to overfitting. Furthermore, not all the features in high-dimensional data are involved or related to the problem we are solving. Reducing the dimension of data space leads to reducing the noise and unnecessary parts of data and helps to determine the features most related to the problem. Two approaches to apply dimensionality reduction were proposed. The first is feature selection, where the most related features to the problem are selected. The second is feature extraction, where new features from the high-dimensional data space are assessed to create the low-dimensional data space. Many deep learning techniques could be used for this, such as principal component analysis (PCA), non-negative matrix factorization (NMF), kernel PCA, graphbased kernel PCA, linear discriminant analysis (LDA), generalized discriminant analysis (GDA), Autoencoder, t-SNE, and UMAP. However, in order to avoid the problems or curses of dimensionality, the K-nearest neighbor algorithm (k-NN) is most commonly applied. Traditional compression algorithms, as illustrated earlier in the Compression section, have a different meaning. In deep learning, compression in many architectures means minimizing the number of neurons or weights by removing them from layers, and this process is achieved by using the dimensionality reduction techniques. It is categorized as lossy compression, where lost information after compression does not fit the aim of IoT data compression. One of the first steps in deep learning architectures is initializing the values of the weights, which is done randomly, as illustrated in Figure 5 . This process alone makes the output values unequal compared to the input data in the first layer, even though these output values could be very high accuracy. Furthermore, the process of deep learning is carried out in one direction from the input layer to the output layers. Activation functions are used through this process in order to determine which neuron values are relied upon to drop or keep these neurons and their connected weights. Hence, using activation functions breaks the linearity by retaining sparred values randomly and then training the model. When implementing the activation functions, the model starts from scratch with different weights values and leads to different results and outputs. However, previous results show some cases have a very close similarity with the original inputs and have smaller sizes and dimensions, as we have in lossy compression algorithms, which are acceptable in some cases and applications. This paper reviewed smart cities' issues and the importance of IoT in reducing data traffic, especially between sensors and IoT nodes. The current compression algorithms have limitations when trying to implement them using the IoT's small memory. Lossy compression algorithms are not suitable due to the loss of information after transmission. In contrast, applying lossless compression algorithms is complex for IoT devices. Deep learning using pruning and pooling methods was applied in order to reduce data. However, it uses a lossy approach and does not aim for connections between sensors and IoT devices. In the future, a new algorithm using deep learning techniques combined with the lowest complex lossless compression algorithm and has the best compression ratio is needed. The suggested algorithm should fit the sensors and IoT data type and aim to produce a good compression ratio on every IoT node that reduces the network data traffic and transmits data faster, has higher utilization, and has better throughput. Author Contributions: This research has been carried out through a concerted effort by three authors. Hence, any author has participated in conducting every single part of the paper. Each author's basic role has been summarizing in the following: A.N. is the first author and responsible for writing the paper and implementation of the compression algorithms on the datasets and conducting reviews for related, previous and current works. [190] [191] [192] [193] [194] . The authors declare no conflict of interest. World Urbanization Prospects The Smart City Concept in the 21st Century Socioeconomic Pathways and Regional Distribution of the World's 101 Largest Cities; Global Cities Institute Building an IoT data hub with elasticsearch, Logstash and Kibana Digital Cities and Digital Citizens Cybercity: Conception, technical supports and typical applications. Geo-Spat Digital Cities: Technologies, Experiences, and Future Perspectives-Google Books Intelligent Cities and Globalisation of Innovation Networks Architectural League of New York la psicología de la salud en el nuevo currículo de la diplomatura en enfermería The Position of Green Logistics in Sustainable Development of a Smart Green City Petri nets for systems and synthetic biology Offloading and transmission strategies for IoT edge devices and networks The Smart City as Disciplinary Strategy. Urban Stud Cities: An IoT-centric Approach Reviewed paper Mapping Conflicts in the Development of Smart Cities: The Experience of Using Q Methodology for Smart Gusu Project Jib Kwon, S. The role of the Internet of Things in developing smart cities Open Smart Cities in Canada: Environmental Scan and Case Studies Competitiveness, distinctiveness and singularity in urban design: A systematic review and framework for smart cities Determining factors in becoming a sustainable smart city: An empirical study in Europe A Critical Design Consideration for IoT Applications Understanding smart cities: An integrative framework Tokyo Smart City Development in Perspective of 2020 Olympics Opportunities for EU-Japan Cooperation and Business Development Teena Maddox|US|Meet the Team-TechRepublic Challenges of IoT Based Smart City Development in Kuwait Deployment of an open sensorized platform in a smart city context Smart cities: Moving beyond urban cybernetics to tackle wicked problems Smart Cities at Risk! Privacy and Security Borderlines from Social Networking in Cities Barriers to the Development of Smart Cities in Indian Context The Internet of Things-A problem statement Online clustering of evolving data streams using a density grid-based method Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices. IEEE Solid-State Circuits Mag Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing Data compression for energy efficient IoT solutions Smart Cities Face Challenges and Opportunities Data compression based on stacked RBM-AE model for wireless sensor networks An introduction to the research on Scratchpad memory with focus on performance improvement-Instruction SPM Low-Cost Memory Fault Tolerance for IoT Devices Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures A Survey on Network Methodologies for Real-Time Analytics of Massive IoT Data and Open Research Issues Practical data compression in wireless sensor networks: A survey An energy efficient IoT data compression approach for edge machine learning Integration of a Wireless Sensor Network and IoT in the HiG University OECD The Internet of Things-Seizing the Benefits and Addressing the Challenges. OECD Digit. Econ. Pap. 2016 Robust IoT time series classification with data compression and deep learning An Efficient Data Collection Algorithms for IoT Sensor Board Data management for the Internet of Things: Design primitives and solution An efficient index for massive IOT data in cloud environment V A review of Internet of Things for smart home: Challenges and solutions Machine Intelligence on Resource-Constrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference Canestrini's Models of Leonardo da Vinci's friction Experiments, Figure 1a LOW-COMPLEXITY VIDEO COMPRESSION FOR WIRELESS SENSOR NETWORKS CERCOM-Center for Multimedia Radio Communications Internet of things in industries: A survey Business model analysis of public services operating in the smart city ecosystem: The case of SmartSantander Network optimizations in the Internet of Things: A review An Adaptive Edge Router Enabling Internet of Things Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016 A survey on LPWA technology: LoRa and NB-IoT Survey of Real-time Processing Technologies of IoT Data Streams A study of distributed compressive sensing for the Internet of Things (IoT) A survey on data compression in wireless sensor networks RAKE: A simple and efficient lossless compression algorithm for the internet of things Comparison of local lossless compression algorithms for Wireless Sensor Networks Compressed sensing based traffic prediction for 5G HetNet IoT Video streaming Data funneling: Routing with aggregation and compression for wireless sensor networks Distributed compression for sensor networks An energy-efficient low-memory image compression system for multimedia IoT products Introduction To Arithmetic Coding Arithmetic coding for data compression Performance evaluation of arithmetic coding data compression for internet of things applications A Comparative Study of Image Compression Algorithms Coding with Asymmetric Numeral Systems The Use of Asymmetric Numeral Systems Entropy Encoding in Video Compression A tutorial on the range variant of asymmetric numeral systems Optimal Source Codes for Geometrically Distributed Integer Alphabets Adaptive run-length/golomb-rice encoding of quantized generalized gaussian sources with unknown statistics Weighted Adaptive Huffman Coding Design and Analysis of Dynamic Huffman Coding Algorithm 673: Dynamic Huffman coding A Lossless Compression Algorithm Based on Differential and Canonical Huffman Encoding for Spaceborne Magnetic Data Modified Huffman based compression methodology for Deep Neural Network Implementation on Resource Constrained Mobile Platforms LOP-RE: Range encoding for low power packet classification Range Encoding-Based Network Verification in SDN A Mathematical Theory of Communication The transmission of information to human receivers. Audio-Video Commun. Rev Homogeneous Image Compression Techniques with the Shannon-Fano Algorithm Cryptograph Rsa and Compression Shannon Fano Text File Services at Mobile Devices Image Compression using Shannon-Fano-Elias Coding and Run Length Encoding Variable-to-fixed length codes: A geometrical approach to low-complexity source codes Generalized Tunstall codes for sources with memory Unary Coding Controlled Simultaneous Wireless Information and Power Transfer Performance analysis of the unary coding aided SWIPT in a single-user Z-channel Generalized Unary Coding. Circuits Syst. Signal Process Design of high-resolution quantization scheme with exp-Golomb code applied to compression of special images Multispectral image compression using universal vector quantization A new coding/decoding algorithm using Fibonacci numbers On the Complexity of Fibonacci Coding A New Application to Coding Theory via Fibonacci and Lucas Numbers Universal Codeword Sets and Representations of the Integers LZAC lossless data compression Variable length integer coding revisited. Progr. Syst. Theory Appl. системы теoрия и прилoжения 2019 Sparse coding in authorship attribution for Polish tweets Complexity and Similarity for Sequences using LZ77-based conditional information measure A Universal Algorithm for Sequential Data Compression A Brief Study of Data Compression Algorithms Data Compression via Textual Substitution Repair and Restoration of Corrupted LZSS Files The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Compression speed enhancements to LZO for multi-core systems Adaptive On-the-Fly Compression Improving Hadoop MapReduce performance with data compression: A study using wordcount job Gipfeli-High speed compression algorithm. Data Compress Comparison of brotli, deflate, zopfli, lzma, lzham and bzip2 compression algorithms Brotli: A general-purpose data compressor Hardware implementation of a novel adaptive version of deflate compression algorithm Comparison of open source compression algorithms on VHR remote sensing images for efficient storage hierarchy LZ4 compression algorithm on FPGA. Proc. IEEE Int. Conf. Electron. Circuits Syst Data Compression Device Based on Modified LZ4 Algorithm A two stage data compression and decompression technique for point cloud data Lightweight compression with encryption based on Asymmetric Numeral Systems. arXiv 2016 LZRW1 without hashing. Data Compress. Conf. Proc Compression of small text files using syllables Compression of Semistructured Documents Transform in lossless Data compression Problems on hybrid Computing Systems The Context-Tree Weighting Method: Basic Properties Potential benefits of delta encoding and data compression for HTTP DELTA: Delta encoding for less traffic for apps Romanized Arabic and Berber detection using prediction by partial matching and dictionary methods Experimental results in Prediction by Partial Matching and Star transformation applied in lossless compression of text files Data compression using dynamic markov modelling The structure of DMC Proceedings of the Proceedings DCC'95 Data Compression Conference Chain code lossless compression using move-to-front transform and adaptive run-length encoding. Signal Process A machine learning perspective on predictive coding with PAQ8. Data Compress Image Compression Using Proposed Enhanced Run Length Encoding Algorithm A novel RLE & LZW for bit-stream compression Artificial neural network classification for fatigue feature extraction parameters based on road surface response Multi-view deep clustering based on autoencoder Guided autoencoder for dimensionality reduction of pedestrian features The Self-Organizing Restricted Boltzmann Machine for Deep Representation with the Application on Classification Problems Hands-On Unsupervised Learning Using Python Deep learning methods for forecasting COVID-19 time-Series data: A Comparative study Deep learning for sentiment analysis of movie reviews. CS224N Proj Joint RNN-based greedy parsing and word composition Entity relationship extraction optimization based on entity recognition Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification Convolutional deep belief networks on cifar-10 Convolutional Neural Network (CNN) for Image Detection and Recognition An experimental study of vehicle detection on aerial imagery using deep learning-based detection approaches A Machine Learning Approach for Beamforming in Ultra Dense Network Considering Selfish and Altruistic Strategy Performance Evaluation of Deep neural networks Applied to Speech Recognition: Rnn, LSTM and GRU SPEECH APPLICATIONS NTT Communication Science Laboratories Long short-term memory RNN for biomedical named entity recognition Predicting Computer Network Traffic: A Time Series Forecasting Approach Using DWT, ARIMA and RNN Prevent the Transmission of Useless/Repeated Data To the Network in Internet of Things Battery Sensory Data Compression for Ultra Narrow Bandwidth Iot Protocols What is Brontobyte?-Definition from WhatIs Real-time data reduction at the network edge of Internet-of-Things systems Adaptive sensor data compression in iot systems: Sensor data analytics based approach Klang vally rainfall forecasting model using time series data mining technique A Method of Mining Spatial High Utility Co-location Patterns Based on Feature Actual Participation Weight Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning Reducing network intrusion detection association rules using Chi-Squared pruning technique Orientation and Scale Based Weights Initialization Scheme for Deep Convolutional Neural Networks Maxpooling convolutional neural networks for vision-based hand gesture recognition Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework Machine learning in wireless sensor networks: Algorithms, strategies, and applications Distilling the Knowledge in a Neural Network. arXiv 2015 Dynamic network surgery for efficient DNNs Deep Learning for the Internet of Things Exploiting linear structure within convolutional networks for efficient evaluation Improving the speed of neural networks on CPUs Fast training of convolutional networks through FFTS. arXiv Predicting parameters in deep learning. arXiv 2013 Going deeper with convolutions Optimal brain damage Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. arXiv 2016 Training Binarized Neural Networks Using MIP and CP. Int. Conf. Princ. Pract. Constraint Program Training deep neural networks with binary weights during propagations. arXiv 2015 Compressing neural networks with the hashing trick Compressing Deep Convolutional Networks using Vector Quantization. arXiv Kaggle Time Series Datasets|Kaggle. Available online UCI 7 Time Series Datasets for Machine Learning. Available online The Almanac of Minutely Power dataset (Version 2)-Harvard Dataverse Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to Corpus The Canterbury Corpus