key: cord-0669947-r2gs5q85 authors: Shehzad, Muhammad Karam; Rose, Luca; Assaad, Mohamad title: Dealing with CSI Compression to Reduce Losses and Overhead: An Artificial Intelligence Approach date: 2021-04-01 journal: nan DOI: nan sha: f8415714d745300fe3bd2153a86db94d15df1295 doc_id: 669947 cord_uid: r2gs5q85 Motivated by the issue of inaccurate channel state information (CSI) at the base station (BS), which is commonly due to feedback/processing delays and compression problems, in this paper, we introduce a scalable idea of adopting artificial intelligence (AI) aided CSI acquisition. The proposed scheme enhances the CSI compression, which is done at the mobile terminal (MT), along with accurate recovery of estimated CSI at the BS. Simulation-based results corroborate the validity of the proposed scheme. Numerically, nearly 100% recovery of the estimated CSI is observed with relatively lower overhead than the benchmark scheme. The proposed idea can bring potential benefits in the wireless communication environment, e.g., ultra-reliable and low latency communication (URLLC), where imperfect CSI and overhead is intolerable. In wireless communication, in particular multiple-input multiple-output (MIMO)-based wireless networks, channel state information (CSI) is indispensable to provide high data rate. Accurate CSI at the transmitter allows efficient precoding, optimal transmit power allocation, modulation and coding scheme selection, etc. However, the acquisition of CSI is a crucial job. For example, in the fifth-generation (5G) environment, several communication technologies, e.g., unmanned aerial vehicle (in high mobility), and millimeter wave (with shorter wavelength), are prone to excessive channel fading [1] , [2] . Albeit time-division-duplex (TDD) systems could exploit reciprocity to acquire CSI, in most of current schemes, in both frequency-division-duplex (FDD) and TDD systems [3] , [4] , base stations (BSs) transmit reference symbols (RS) to the mobile terminals (MTs) to estimate the channel, and the MTs feedback the CSI to the BS. To reduce over-theair (OTA)-overhead, the CSI feedback is largely compressed. At present, the Third Generation Partnership Project (3GPP) considers two strategies, i.e., type-I CSI and type-II CSI for CSI reporting [5] , [6] . Nevertheless, the downside is that both strategies involve strong compression of the estimated CSI with consequent deterioration of CSI accuracy. Lastly, it is important to remark that the necessity of feedback grows in massive-MIMO (mMIMO), which makes it challenging to design a mMIMO system having a fewer number of feedback [7] . Interestingly, artificial intelligence (AI) has paved the way in many applications of wireless communication, and it has emerged as a paradigm shift in future needs [8] , [9] . Therefore, owing to the above issues, we utilize AI to overcome the losses, which occur due to CSI compression. In addition, we try to minimize OTA-overhead by means of a channel predictor functionality. To this end, in particular, we consider twinchannel predictors, which are synchronized, at both ends of the communication system, and feedback is evaluated on the basis of prediction at MT. Besides, different from most of the existing work, the training of channel predictor is based on quantized version of estimated CSI rather than actual CSI. The remainder of the paper is organized as follows. In Section II, system model is discussed. Conventional scheme is summarized in Section III. AI-enabled proposed scheme is explained in Section IV. The results and their analysis is presented in Section V. Finally, conclusion is made in Section VI. Notations : In this paper, [·] T indicates the matrix transpose. Additionally, matrices are represented by boldface upper-case, vectors with boldface lower-case, and scalars with normal lower-case. Furthermore, H, H, and H represent the estimated, predicted and actual channel, respectively. Consider a single-cell downlink point-to-point MIMO system, having N t transmit and N r receive antennas, respectively. The MT receives dedicated RSs, transmitted by the BS, to estimate the channel. Once the channel is estimated, the CSI is compressed to reduce OTA-overhead, and thus fed back to the BS. Further, let us assume that both network entities are equipped with a channel predictor function. Without loss of generality, MIMO system can be modelled as where y(t) = y 1 (t), y 2 (t), ..., y N r (t) T represents the received signal at time t, which has a dimension N r × 1. In addition, s(t) = s 1 (t), s 2 (t), ..., s N t (t) T and γ(t) denote the vectors of transmitted symbols and additive white Gaussian noise (AWGN), respectively. Moreover, H(t) = h n r n t (t) N r ×N t depicts the channel matrix, having dimension N r × N t , and h n r n t ∈ C 1×1 is the complex-valued flat-fading channel gain between n t transmit and n r receive antennas, where 1 ≤ n t ≤ N t and 1 ≤ n r ≤ N r . The time varying mobile radio channel can be modeled using an auto-regressive (AR) process [10] , which is capable of generating future channel values by combining past realizations and AR coefficients. For the design of discrete-time simulation of the model, the auto-correlation function (ACF) is given by where J 0 (·) represents the zeroth-order Bessel function. Additionally, f m = f d · T s depicts the maximum Doppler frequency (in Hertz) normalized by the sampling rate, 1/T s . Also, f d is the simple maximum Doppler frequency (in Hertz), which can be written as f d = ς /λ , where ς is the speed of the mobile device and λ is the carrier wavelength. Without loss of generality, a u th order complex AR process, denoted by AR(u), can be generated by where ϕ i represents a coefficient of AR model and ω(n) is the zero-mean complex AWGN having variance σ 2 u . Finally, AR coefficients, [ϕ 1 , ϕ 2 , ..., ϕ u ], can be obtained by solving the set of u Yule-Walker equations as where and Thus, having AR coefficients, channel, at time t, can be obtained as Similarly, for a MIMO channel, the above equation can be extended as where represents the element-wise multiplication, i.e., Hadamard product, of two matrices. Additionally, C i is written as which is the AR coefficient matrix having entry ϕ n r ,n t i , which represents the i th coefficient of the AR model, for n r and n t receive and transmit antennas, respectively. In the following section, we briefly summarize the conventional channel estimation scheme. Later on, we discuss the proposed scheme along with the potential benefits. In the conventional scheme, the BS transmits dedicated RS, at time t, to the MT to get the estimate of the channel. Then, the MT estimates 1 the channel, denoted by H MT , using the received RS and transmits the feedback to the BS as where Q f (·) denotes the quantization 2 function. Nonetheless, such quantized channel, H q , can reduce the performance of the estimated channel and can also result in higher OTA-overhead. To overcome these problems, below, we address the proposed scheme. The proposed scheme considers the use of twin-channel predictors at both ends of the communication system, i.e., BS and MT. The key idea is to evaluate the feedback based on the prediction at the MT; thereby, reducing the number of feedbacks depending on the predicted channel. Moreover, reducing the necessary overhead to feedback the estimated channel from MT. Therefore, in the proposed scheme, firstly, we assume that through the received data, MT estimates the channel, H MT , using conventional scheme. Later on, the estimated channel is quantized and fed back to the BS. Consequently, the quantized estimated channel, denoted by H q , is available at both ends of the communication system. Furthermore, the proposed scheme consists of three phases: prediction phase, reporting phase, and the recovery phase (at BS). In the following subsections, we provide the details of each phase. During the prediction phase, both the network entities adopt AI to predict the next channel realization based on the previous channel realization. Importantly, both the channel predictors use H q for training. In the following, we describe the AIenabled channel predictor adopted in our work. Within the domain of AI, recurrent neural network (RNN) is a type of AI algorithm that has the potential of predicting the time-series data [12] . Motivated by this capability, this work adopts a RNN-based channel predictor. The goal of the RNN-based predictor is to obtain K-step ahead prediction of the channel, denoted by H[t + K], which is as close as possible to H q [t + K]. A typical RNN architecture to predict multi-step MIMO channel is drawn in Fig. 1 . Specifically, Fig. 1 depicts a typical multi-input multi-output RNN, which consists of three layers: an input layer, a hidden layer, and an output layer. In terms of AI, this architecture is well known as a single hidden layer RNN, or a two-layer RNN (input layer is generally excluded). Further, input layer has M input neurons, which consist of external input and the feedback from the output. Similarly, the hidden layer has M h neurons, and output layer is composed of M o neurons. At time t, the corresponding quantized channel, H q (t), along with k-step delays (as shown with dotted-dashed box in left part of Fig. 1 , is fed as external input to the RNN. Nonetheless, in the context of AI, such input data should be pre-processed, i.e., unrolled into a vector, to feed into the RNN architecture. Therefore, we have drawn a pre-processing block, represented within right side of dotted-dashed box, which unrolls the H q into vector as Here, for convenience, we use h as a vector. Alongside external input at time t, feedback or a recurrent component, represented by is fed as an internal input to the RNN. Therefore, combined input vector to the RNN at time t is written as Finally, the output of RNN is simply K-steps ahead prediction, denoted by h(t + K), which can be transformed into the form of MIMO predicted channel, i.e., H(t + K), using a post-processing block (denoted by dotted-dashed box on the extreme right side of Fig. 1) . Thus, the predicted channel can be expressed as Importantly, the prediction behavior of a RNN is fully dependent on its weight values and the activation function. As depicted in Fig. 1 , we denote the weights for j th hidden neuron with m th predecessor neuron as w jm , whereas v o j represents the weight for o th output neuron, where 1 ≤ m ≤ M, 1 ≤ j ≤ M h , and 1 ≤ o ≤ M o . Moreover, generally, hyperbolic tangent (tanh) function is used as an activation function in the hidden layers, which is defined as Therefore, the output activation of j th hidden neuron at time t can be written as where w j = [w j1 , ..., w jM ]. Similarly, having the activation of predecessor layer, i.e., a j (t), the output for the o th neuron, which depicts the K-step ahead prediction for the particular communication channel, is given as where in the context of our work, o = n t + (n r − 1)N t . Finally, by substituting (18) into (19), K-step ahead prediction at a particular neuron, which depicts prediction for the channel, h n r n t (t + K), can be expressed as In order to predict future channel realizations, the RNN needs to be trained. Therefore, once the parameters, i.e., the number of neurons and layers, of the RNN have been chosen, the training process can begin by providing both training and validation (optional) data along with labels. At each training iteration, RNN evaluates the cost function and back propagates the error to update the weights. This process is repeated until the convergence condition is reached, i.e., the cost function has minimized. Once the RNN has been trained, it can be used to predict the channel. To get a deeper understanding of RNN, interested readers are recommended to read, e.g., [12] , [13] . In the following, we discuss the channel reporting strategy, which is followed by the MT. Once the RNN training phase is completed, the predictors at both ends of the communication, generate the same 3 channel, for instance, at time instant t − 1. On the other hand, at time t, MT receives a dedicated RS to estimate the channel, H MT (t). After the channel is estimated at MT, it computes the difference between the predicted channel, i.e., H MT (t −1), and the estimated channel as In the next step, the update obtained in the above equation, is quantized as where Q p f (·) represents a quantization function. Later on, the quantized-update, H q (t), is reported to the BS. In the final phase, by the time BS gets the feedback, H q (t), from the MT, it has also predicted the channel, denoted by H BS (t − 1). Therefore, BS estimates the channel, at time t, as By substituting (22) into (23), the above equation can be written as The major benefits of using the above approach are as follows. Firstly, if the predicted, H MT (t −1), and the estimated channel, H MT (t), at the MT, are the same, then there is no need to feedback anything, and in such case H BS (t) = H BS (t − 1); thus, feedback-related overhead is eliminated. For example, in an extreme ideal scenario, this can happen in the environment where MT is immobile (e.g., watching football match in a stadium, in the global pandemic (coronavirus), user sitting at home most of the time, etc.), or when variation in the channel is very low (e.g., MT is walking in the street), etc. Nonetheless, in static scenarios, only little variations may occur; thus, requiring only marginal updates. Hence, in such kind of scenarios, the overhead can be largely reduced. Secondly, if there is a difference between H MT (t − 1) and H MT (t), then the quantization in (22) will introduce less noise as compared to (12) (i.e., conventional scheme). Thereby, more precise estimated channel can be reported to BS. Lastly, for the proposed scheme, we conclude a few remarks, which are given below. 1) The key point of proposed scheme is to either reduce the amount of quantization bits necessary for the feedback, i.e., (22) will require less bits than (12) for similar performance, or to increase the performance with the same amount of bits. 2) If the prediction was perfect at MT, then no feedback would be necessary; thereby, bringing the amount of necessary bits to 0. it is possible to define Q f (β ) = Q p f (β ) = β , where β is the data to be quantized. In other words, this would imply that H BS (t) = H MT (t), i.e., the proposed scheme gives no advantage. Correspondingly, the largest amount of gain is acquired with low resolutions feedback. This, in particular, is relevant as 3GPP CSI acquisition schemes consider low amount of bits to reduce feedback overhead. 4) Finally, different channel prediction algorithms can be considered and standardized. Furthermore, possible standardization elements are: prediction algorithm, CSI memory, and message exchanges between BS and MT. This section showcases the performance of the proposed scheme by means of Monte-Carlo numerical simulations. For the sake of simplicity, we consider a single user MIMO system, composed of a single BS and a MT in a cell, equipped with N t = 2 and N r = 1 transmit and receive antennas, respectively. In addition, length of tapped-delay line and order of AR process is equal to 1. We adopt the adaptive moment estimation, Adam [14] optimizer to enchance the RNN, and the optimal number of hidden neurons are found to be M h = 16. Moreover, in order to observe only quantization effect, we assume zero-estimation error in the channel, which is estimated at MT, at time t. Finally, the results are scrutinized by considering mean-squared-error (MSE), which is calculated as where, importantly, in case of conventional scheme, H BS (t) = Q f ( H MT (t)). Besides, the performance of the proposed scheme is also verified by calculating the received signal-tonoise ratio (SNR), denoted by Γ P , at the MT. For this purpose, we use H BS (t) to obtain a simple matched-filter (MF) precoder. To train RNN, a data-set, which is composed of a set of past CSI realizations, is extracted from consecutive 10 4 data blocks, that is, { H q (t)|1 ≤ t ≤ 10 4 }. Besides, out of 10 4 data blocks, 80% is used as a training, 10% as a validation, and 10% as a test data-set. The training process starts by initializing random weights. At training iteration, t, the corresponding channel matrix, i.e., H q (t), is fed into the RNN along with the delayed versions, i.e., { H q (t − 1), ..., { H q (t − k)}. Afterwards, the resultant predicted channel, i.e., H(t + K), is compared with the desired channel, i.e., H q (t + K), and the error H(t + K) − H q (t + K) is fed back to update the weights by using a dedicated training algorithm, e.g., Adam, in our case. This iterative process ends when a predefined convergence condition is satisfied. Finally, to measure the prediction accuracy of the RNN algorithm, a test data-set is used. In this regard, MSE is considered as a performance metric, which is written as where p denotes the time instant of p th test channel matrix and P represents the total length of test data-set. In our experiments, η mse = 6 × 10 −3 and η mse = 7 × 10 −3 are observed, for the real and imaginary part of the channel matrix, respectively. Moreover, the computational complexity of RNN-based predictor can be visualized in terms of required complex multiplication operations [15] . To this end, for singlestep prediction, hidden layer will have to perform M · M h multiplications and output layer M o · M h . Therefore, the total required multiplications are Additionally, the number of input neurons, M, are dependent on number of MIMO sub-channels and the delayed taps. Therefore, the total number of input neurons can be calculated as On the other hand, output neurons, M o , are solely dependent on MIMO sub-channels. Therefore, By substituting (28) and (29) into (27), the total required multiplications can be written in the simplified form as Further, for the sake of simplicity, let us denote δ = N r · N t , which is the configuration of the MIMO system and ρ = k · M h represents the scale of RNN. Thus, the one-step prediction complexity of RNN can be written as O(δ ρ). Contrarily, training complexity is also bounded to the number of training samples, S, and the number of epochs, τ. Thereby, the training complexity of RNN can be expressed as O(δ ρSτ). Nevertheless, the design of an optimal channel predictor is not the objective of our work. Therefore, in the following, we evaluate the performance of conventional and proposed scheme using the metrics, mse and Γ P . Fig. 2 reveals the trend of MSE, mse , for different values of quantization bits. Also, the results are portrayed for both the schemes, i.e., conventional and proposed. It can be seen that mse reduces with the increase in quantization bits. However, the superior performance of the proposed scheme can be clearly observed. For instance, for one quantization bit, conventional scheme has mse ≈ 6.2 × 10 −2 . In contrast, the proposed scheme has mse ≈ 0; thus, the proposed scheme has reduced MSE by a huge margin. Similarly, increasing the quantization bits is reducing the MSE of the conventional scheme, and it is approaching the proposed one. On the other hand, the proposed scheme's MSE has squeezed to zero, which depicts the 100% recovery of the estimated channel. In a nutshell, the proposed scheme not only saves the quantization bits but also reduces MSE significantly. Such reduction in MSE can greatly improve the performance of the MIMO precoder, which we investigate in Fig. 3 . Fig. 3 shows received SNR at the MT by varying number of quantization bits. It can be depicted that Γ P ≈ −0.17 dB and Γ P ≈ 7.5 × 10 −3 dB for the conventional and proposed schemes, respectively, when quantization bit is 1. Nevertheless, in the case of 2 and 3 quantization bits, there is negligible change in the proposed scheme as the variation in MSE is tremendously low, which can be verified from Fig. 2 . In contrast, the conventional scheme is trying to catch up with the proposed scheme when the quantization bits are increasing; thus, remark-3, given in Section IV, holds true. By and large, the proposed scheme is outperforming by using lower number of quantization bits; thereby, reducing OTA-overhead. On the other side, the conventional scheme requires a higher number of quantization bits to achieve similar performance. This paper introduced the potential use of AI to not only reduce the overhead for CSI feedback but also to get an accurate recovery of estimated CSI at the BS. In particular, to eliminate CSI loss due to compression, a novel compression strategy is introduced. For this purpose, twin AI-enabled channel predictors are utilized at the BS and MT, which are trained on previously available compressed CSI. Simulation results showed that approximately 100% estimated channel is recovered at the BS with the lower number of quantization bits as compared to the conventional scheme. Moreover, the achieved precoding gain verified the validity of the proposed scheme. The proposed scheme can play a significant role in scenarios where overhead and inaccurate CSI are intolerable. A drone-aided blockchain-based smart vehicular network Machine learning-based context aware sequential initial access in 5G mmWave systems Method for frequency division duplex communications Timing adjustment control for efficient time division duplex communication What is the value of limited feedback for MIMO channels? Learning equilibria with partial information in decentralized wireless networks An introduction to deep learning for the physical layer Autoregressive modeling for fading channel simulation Kalman-filter channel estimator for OFDM systems in time and frequency-selective fading environment Recurrent neural networks and robust time series prediction Boosting recurrent neural networks for time series prediction Adam: A method for stochastic optimization A comparison of wireless channel predictors: Artificial intelligence versus Kalman filter