key: cord-0144481-u29h4qfj
authors: Maatouk, Ali; Assaad, Mohamad; Ephremides, Anthony
title: The Age of Incorrect Information: an Enabler of Semantics-Empowered Communication
date: 2020-12-24
journal: nan
DOI: nan
sha: 51bf71b2f8c9e2b8b37a27bcb618ca32bf2180ed
doc_id: 144481
cord_uid: u29h4qfj

In this paper, we introduce the Age of Incorrect Information (AoII) as an enabler for semantics-empowered communication, a newly advocated communication paradigm centered around data's role and its usefulness to the communication's goal. First, we shed light on how the traditional communication paradigm, with its role-blind approach to data, is vulnerable to performance bottlenecks. Next, we highlight the shortcomings of several proposed performance measures destined to deal with the traditional communication paradigm's limitations, namely the Age of Information (AoI) and the error-based metrics. We also show how the AoII addresses these shortcomings and captures more meaningfully the purpose of data. Afterward, we consider the problem of minimizing the average AoII in a transmitter-receiver pair scenario where packets are sent over an unreliable channel subject to a transmission rate constraint. We prove that the optimal transmission strategy is a randomized threshold policy, and we propose a low complexity algorithm that finds both the optimal threshold and the randomization parameter. Furthermore, we provide a theoretical comparison between the AoII framework and the standard error-based metrics counterpart. Interestingly, we show that the AoII-optimal policy is also error-optimal for the adopted information source model. At the same time, the converse is not necessarily true. Finally, we implement our proposed policy in various real-life applications, such as video streaming, and we showcase its performance advantages compared to both the error-optimal and the AoI-optimal policies.

In the last decade, communication systems have witnessed astronomical growth in both traffic demand and widespread deployment. Thanks to the technological advances in battery productions and the cheap cost of radio-enabled devices, communication systems are no longer constrained to the traditional data and voice exchange frameworks. Today, wireless devices provide essential services and play a vital role in various disciplines. For example, the Internet of Things (IoT) revolution is reshaping modern healthcare systems by incorporating technological, economic, and social prospects. This was witnessed lately amid the global COVID-19 pandemic, where wireless devices for tracking and collecting patient data were prevalent. This example barely scratches the surface as IoT systems are gaining massive momentum in many other domains. Given that we are just witnessing the tip of the iceberg, a natural question arises: are current communication paradigms suitable to deal with such demand? Furthermore, are we extracting the best possible performance from the communication networks?

Like any system, these networks' performance is contingent on the performance measure's choice that we set our goal to optimize. Traditionally, metrics like throughput, delay, and packet loss were adopted. Note that these metrics do not consider the packets' content and the amount of information they bring to the destination. Therefore, we can see that traditional communication paradigms follow a blind approach to data packets' content at both the physical and data link layers. In other words, at these layers, packets are treated equally regardless of the amount of information they will potentially bring to the destination. Inherently, this traditional approach creates a separation between sampling (data acquisition) and exploiting the available resources at the PHY/MAC levels. Given the anticipated astronomical growth in traffic demand and the potential interconnections between these systems, this content-blind approach to network optimization can lead to performance bottlenecks. Accordingly, researchers have been trying to push the boundaries of this traditional paradigm and establish more elaborate frameworks for network optimization. Perhaps one of the most recent successful efforts was the introduction of the Age of Information (AoI) [1] . The AoI quantifies the notion of information freshness by measuring the information time lag at the destination. By incorporating this metric in the network's optimization, we give another dimension to the data packets as they will no longer be treated equally at these layers. For example, a packet is given more importance when its destination has not been updated for a while. Following its introduction, a surge in the number of papers on the AoI can be seen (we refer the readers to [2] , [3] for a literature review). This surge is due to the expected performance improvement this added dimension will have, especially in applications where timely data is required. To that end, age-optimal sampling and transmission policies, along with age-aware scheduling schemes, were proposed in the literature (e.g., [4] - [10] ).

Although the AoI was shown to provide significant improvements to data freshness in various applications (e.g., UAV path planning [11] ), it exhibits some critical shortcomings. Precisely, the AoI infers the importance of packets through their timestamps only and does not consider their content. Due to this property, recent works showed that age-optimal sampling policies are incapable of minimizing the prediction/mean squared error in remote estimation applications [12] . In these applications, a remote monitor constructs an estimate of a physical process through the update packets it receives from the source. Given this shortcoming of the AoI, researchers have proposed data acquisition and scheduling schemes based on error minimization and the notion of the value of information in control theory (e.g., [12] - [14] ). The adoption of error-based metrics in data acquisition and transmission decisions at the PHY/MAC layers allows us to abolish the separation principle prevalent in the traditional communication frameworks. Even though this is a step forward in the right direction, error-based metrics come short in capturing a crucial aspect of the communication: its goal. In fact, these metrics do not consider what the packets are used for, but rather their optimization aims solely to reduce the mismatch between the physical process and its estimate at the destination. Given that the communication's goal is neglected, adopting these metrics could hinder achieving the desired goal.

To address these shortcomings, the present authors and several other researchers have been recently advocating for a new communication paradigm based on the notion of "Semantics of Data" [15] - [17] . Semantics is employed here with its etymological meaning, that of significance. Therefore, we mean by the semantics of data, the purpose of data, and its usefulness to the communication's goal. To understand this concept, let us consider an example of a communication network involving various temperature sensors and a central controller. In these settings, the goal is not to always have timely packets delivered about the sensors' temperature processes nor to minimize the mismatch between the temperature processes and their estimates at the controller. On the contrary, the sole goal is to make sure the controller reacts swiftly to any abnormal temperature rise. Therefore, to extract the best performance out of the network, our system's design must undoubtedly include the purpose of the data involved. In this case, when sampling or transmitting packets, we look at the bigger picture of how vital these packets are to achieve our prescribed goal. Using the notion of data semantics, the objective is to establish a network optimization framework that is adaptable to any communication goal by merely changing a set of parameters of a general performance metric. This brings us to the new notion of Age of Incorrect Information (AoII), proposed by the present authors in [15] .

The AoII was introduced to address the shortcomings of both the AoI and the error-based metrics by incorporating the semantics of data more meaningfully. As we will explain in Section II, the AoII takes into account the content of packets, the information knowledge at the destination, and the effect of the mismatch between the physical process and its estimate on the overall communication's goal. Interestingly, we will show that many real-life applications' communication goals are merely variants of the AoII obtained by tweaking specific parameters. To that end, we summarize in the following the key contributions of this paper: • We first start by motivating the proposed AoII framework.

We do so by putting in perspective several shortcomings of the AoI and the error-based metrics. Afterward, we show how the AoII metric captures the data's role in achieving the communication's goal, making it a key enabler of the semantics-empowered communication paradigm. • Next, we consider the problem of minimizing the average AoII in a transmitter-receiver pair scenario where packets are sent over an unreliable channel subject to a transmission rate constraint. Compared to our previous work on the AoII [15] where a linear version of the AoII was studied, we consider a more general version of the AoII where any non-decreasing dissatisfaction function f (·) can be adopted. This generalization leads to numerous technical challenges that we address in this paper. Although the optimization problems in both this paper and [15] are modeled through an MDP framework where Lagrange approaches are standard, there exist significant technical differences between the two works. In [15] , the optimal policy was derived by first providing a closed-form expression of the average cost of a threshold policy [15, Theorem 2] followed by a study of the behavior of a specific intersection point [15, Theorem 3] . This approach fails in the general function case as a characterization of the intersection point is not feasible. Therefore, in this paper, we adopt a different approach where 1) We provide structural results on the problem at hand for both unbounded and asymptotically bounded functions f (·), 2) In both cases, we derive an expression of the value function and the update rate for any threshold policy, 3) We leverage fundamental properties of the AoII to show that an optimal policy can be constructed through randomization, 4) Finally, we provide pseudocode of the optimal transmission policies and prove their logarithmic complexity. • Afterward, we provide a thorough comparison between the AoII framework and the standard error-based metrics counterpart. This comparison is fundamental as it highlights the differences between the two frameworks. Curiously, our comparison leads to an interesting conclusion: for the adopted information source model, the AoII-optimal policy is also error-optimal. At the same time, the converse is not necessarily true. This comparison was only made possible with the generalization done in this paper, which further highlights its significance. • Lastly, we provide several real-life applications where the communication's goal can be formulated as an AoII minimization problem by adequately choosing f (·). Such applications allow us to frame the AoII as an enabler of semantics-empowered communication, which is a radical new communication paradigm that has been receiving significant attention recently for 6G networks (e.g., [18] ). Once more, this was not possible in the linear version of the AoII, which additionally puts into perspective the importance of the generalization found in this paper. For the applications mentioned above, we show how our approach achieves a significant performance advantage compared to the AoI and the standard error metrics frameworks.

The rest of the paper is organized as follows: Section II is dedicated to the motivation behind the AoII. The system model, along with the dynamics of the AoII, is presented in Section III. Section IV presents our optimization approach to the problem at hand, along with the main results of the paper. In Section V, we theoretically compare the AoII-optimal transmission policy to the error framework and provide a key comparison between them. In Section VI, we provide real-life applications that fall within our framework and showcase the advantages of the AoII compared to both the AoI and errorbased approaches. Lastly, we conclude our paper in Section VII.

To understand the notion of AoII, it is best to consider a basic transmitter-receiver system where a process X t is observed by the transmitter. For example, X t can be a machine's temperature, a vehicle's velocity, or merely the state of a wireless channel. To that end, X t is subject to possible changes at any time instant t, and these changes have to be reported to the monitor (receiver) through the transmission of status updates packets. Using these packets, the monitor creates an estimate of X t at each time t, denoted byX t . The monitor uses these estimates to complete tasks, make decisions, or carry out commands. Therefore, it is easy to see that the system's performance is contingent on a proper estimation of X t at each time t. Ideally, we would like to have a perfect estimation whereX t = X t at any time instant t. However, given many limiting factors, such as the delay in wireless channels, this is not feasible in practice. Accordingly, one must adopt a particular penalty/utility function for which its minimization/maximization helps us achieve the system's best possible performance.

Traditionally, wireless networks have been looked at as a content-agnostic data pipe. In other words, the content of the data packets and the role they play in the broader scope of an application at the receiver have been overlooked from a network optimization perspective. To that end, the conventional goal in the communication paradigms has been to merely optimize network-based metrics such as throughput or delay through a smart allocation of the available resources. However, this approach strips away the context from the data. Therefore, packets are treated as equally important, regardless of the amount of information they bring to the monitor. Given the astronomical growth in data demand, the ubiquitous wireless connectivity, and the abundance of remote monitoring applications, a more effective approach to network optimization has to be adopted. Accordingly, the research community has been intensively trying to propose new network optimization frameworks to achieve this efficacy. To this date, the proposed frameworks generally fall into one of the two following groups:

1) Age-based metrics framework 2) Error-based metrics framework First, let us discuss the age-based metrics framework. The AoI, or simply the age, is defined as [ 

where U t is the timestamp of the last successfully received packet by the monitor at time t. Essentially, the AoI captures the information time-lag at the monitor. To that end, the minimization of age-based metrics like the time-average age has been widely regarded as a means to achieve freshness in communication [1] . This approach's idea is that with a guarantee of fresh data at the monitor, one would expect an overall better system performance. As one can see, contrary to the throughput and delay frameworks, adopting the AoI as a network performance metric avoids the equal treatment of packets. In fact, in this framework, data packets have the highest value when they are fresh. Consequently, the AoI lets us infer the importance of a packet using its generation time.

Although the AoI is a step forward in the right direction, we can witness its fundamental flaw in many applications. To put this flaw into perspective, let us consider a time interval [t 1 , t 2 ] in which X t =X t . In other words, during this interval, the monitor has a perfect estimate of the information process X t . As seen from the age definition (1) and Fig. 1a , the system is still penalized even in this time-interval. Due to this unnecessary penalization of the system, we can expect a waste of vital resources on useless status updates. This flaw is inherent in the AoI definition as it does not consider the current value of the information process and its estimate at the monitor. For this reason, age-optimal sampling policies were found to be sub-optimal in many remote estimation applications (e.g., [12] ). This leads us to the next class of proposed optimization frameworks: the error-based metrics framework.

Remark 1. It is worth mentioning that several time-based metrics have been proposed in the literature to address various shortcomings of the AoI. For example, the Age of Synchronization (AoS), which measures the time-elapsed since a new update was generated, was introduced for caching systems [19] . Although they address several shortcomings of the AoI, these metrics remain time-based and do not depend on the mismatch between X t andX t , which limits their usage in remote estimation applications.

The error-based metrics framework consists of taking as a network performance measure a quantitative representation of the difference betweenX t and X t . The hope is, by incorporating the information on X t andX t in the performance metric, we can better utilize the available resources to letX t be close to X t . Among the most common error-based metrics, we have

where 1{·} is the indicator function. By minimizing the timeaverage of the metrics found in (2) and (3), we obtain the celebrated Minimum Prediction Error (MPE) and the Minimum Mean Squared Error (MMSE) policies respectively [12] , [20] , [21] . It is clear that this framework does not have the AoI's fundamental shortcomings. For example, as illustrated in Fig. 1b , the penalty of the system is equal to 0 in the timeinterval [t 1 , t 2 ] in which X t =X t . Additionally, one can notice that, similarly to the AoI framework, adopting an error-based metric as a network performance measure avoids the equal treatment of data packets. Interestingly, in this framework, data packets have the highest value when the difference between the information they carry andX t is large. Although the error-based metrics add a sense of meaning to the packets compared to throughput and delay, they also have underlying flaws. As seen in (2)-(3), the error-based metrics only consider the difference between X t andX t to infer the importance of the packets. Given that a perfect match X t =X t for all t is not feasible in realistic scenarios, we can see that this approach fails to capture the effect their mismatch has on the overall communication's goal. To see this more clearly, let us consider that the information process X t ∈ {0, 1} tracks the temperature of a machine. Let us suppose that X t = 0 indicates that the machine is operating at a normal temperature at time t while X t = 1 indicates that it is overheating. We consider that the estimateX t is used by the monitor to react to any sudden temperature spike in the machine. Now, let us assume that a spike occurs in the time interval [0, t 1 ].

As illustrated in Fig. 1b , the error-based metrics will lead to a constant penalization of the system. However, it is wellknown from the physical characteristics of materials that an abnormal temperature rise's repercussions become more severe the longer that spike is prolonged. In the same spirit, this flaw is highlighted when we consider the phenomena of error bursts. As seen in Fig. 2a , the system's error penalty due to two bursts of errors of one timeslot is equivalent to that resulting from a single error of two timeslots. However, it is well-known that in a large variety of applications, the repercussions of a long burst of error are far more severe (e.g., video streaming [22] ). Therefore, a better performance measure takes into account, not just the mismatch between X t andX t , but also how long that mismatch has been prevailing. By adopting such a metric, we capture more the context of data and their purpose. Accordingly, we can then enable semantics-empowered communication in the network, which is more elaborate than the AoI and the error-based frameworks. This leads us to our proposed metric: the AoII. We define the AoII as

where f : [0, +∞) → [0, +∞) is a non-decreasing function and g(X t ,X t ) : D ×D → [0, +∞) where D is the state space of X t . The AoII is therefore a combination of two elements: 1) A function g(·, ·) that reflects the gap between X t and X t . 2) A function f (·) that plays the role of increasingly penalizing the system the more prolonged a mismatch between X t andX t is. To better understand the metric, let us go back to the machine temperature example. As seen in Fig. 1c , the AoII is 0 in the time-interval [t 1 , t 2 ] in which no mismatch exists. In the interval [0, t 1 ], we can see that, unlike the error-based metrics, we are penalizing the system more the longer the mismatch lasts. As we have previously explained, this allows us to capture the purpose of the data being transmitted more meaningfully. Given that the performance of a network designed to take into account the purpose of data will always outperform any semantic-blind network, we delve into more details in the proposed AoII metric. The proposed AoII metric is quite general and presents itself as an umbrella for a large variety of performance measures depending on the selected functions f (·) and g(·, ·). For example, we can adopt for the function g(·, ·) any of the standard error-based metrics such as • The indicator error function:

We can choose this function when any mismatch between X t andX t , regardless of how big it is, equally harms the system's performance. • The squared error function:

Choosing this function implies that the larger the gap between X t andX t is, the more significant its impact on the system's performance is. • The threshold error function:

where c > 0 is a predefined threshold. This is an adequate choice when the system's performance is immune to small mismatches between X t andX t .

(a) Illustration of the burst errors situation.

(b) Illustration of the process model. Fig. 2 : Illustration of the burst errors and information process.

Next, to provide examples of the function f (·), we first define V t as the last time instant where g(X t ,X t ) was equal to 0. Specifically, V t is the last time instant where the monitor had sufficiently accurate information about the process X t . With this notion in mind, we provide in the following a few examples of f (·).

• The linear time-dissatisfaction function:

We can choose this function when the impact of the mismatch between X t andX t grows uniformly with time. • The degree m monomial function:

where m > 1 is a positive integer. The choice of m reflects how quickly and heavily the mismatch between X t andX t deteriorates the system's performance. • The time-threshold dissatisfaction function:

where c > 0 is a fixed threshold. We can choose this function when the system's performance is immune to the mismatch between X t andX t for a certain time duration c. Depending on the application at hand, we can adopt an appropriate choice of f (·) and g(·, ·) to capture the data's purpose. In later sections, we will provide many real-life applications that showcase how the AoII minimization framework achieves a semantic-empowered communication suitable for these applications.

We consider a transmitter-receiver system where time is assumed to be slotted and normalized to the slot duration (i.e., the slot duration is taken as 1). The transmitter observes a specific process, denoted by X t t∈N , and its goal is to send status updates to keep the receiver up-to-date on the process' values. To understand how this system works, let us suppose that the transmitter decides to transmit a packet at time t. Therefore, a sample of X t is generated, and the transmission stage immediately begins. The packet is transmitted over an unreliable channel where transmission errors may occur. We suppose that the channel realizations are independent and identically distributed over the timeslots and follow a Bernoulli distribution. In particular, the channel realization h t is equal to 1 if the packet is successfully decoded by the receiver side and is 0 otherwise. Given the Bernoulli assumption, we define the transmission success and failure probabilities as Pr(h t = 1) = p s and Pr(h t = 0) = p f = 1 − p s respectively. If the transmission is successful, the status update is delivered at time t + 1, and the transmitter receives an instantaneous Acknowledgement (ACK) packet. The quick delivery of the ACK packets is a widely used assumption as these packets are typically small. Accordingly, their transmission time can be considered negligible [4] , [10] . Note that if an ACK is not received at t + 1, the transmitter discards the old packet and generates a new status update if it opts for a new transmission. By leveraging this feedback mechanism, the transmitter can have perfect knowledge of the packets that arrive at the receiver and those that do not.

Using the information found in the received status updates, the receiver constructs an estimate of the information process, denoted by X t t∈N . Similar to [12] , we consider that the receiver's estimate of the information source iŝ

is a predefined threshold. To that end, d t represents whether or not the receiver has sufficiently correct knowledge of the information process. Large values of c suggest that the system can tolerate to a certain extent mismatches between X t and X t , while small values of c suggest its sensitivity to these mismatches. To model d t , we consider that if no packets are delivered to the receiver, d t t∈N evolves as a 2 states discrete Markov chain depicted in Fig. 2b with parameters α and β.

Although simple, this model encompasses a variety of real-life settings and have been adopted in numerous research works (e.g., [13] ). For instance, suppose that the observed process X t is a certain channel state andX t is its estimate at the transmitter. By adopting a Markovian channel model, it can be shown that without any training using pilot symbols, d t can be modeled using a Markov chain similar to Fig. 2b . Note that Markovian channels are a typical assumption for fading channels, and their usefulness is supported by experimental results (we refer the readers to [23] ). On another note, Markov chains are also widely used to discretize and approximate continuous-valued processes (e.g., diffusion processes [24] , continuous-valued autoregression processes [25] ). This puts in perspective the applicability of the adopted Markov chain model in various settings despite its simplicity. In addition, the simplicity of the model enables a better understanding of the dynamics and merits of the new performance measure. Lastly, as is the case in realistic scenarios, we consider that the transmitter cannot send status updates at each timeslot. Precisely, due to battery limitations, for example, an average transmission frequency δ cannot be surpassed. Given the constraint on the transmission frequency and the random nature of the channel, the transmission policy's choice has an immense effect on the system's performance. As motivated in the previous subsection, we adopt the AoII as a performance measure of the system. To fully understand the evolution of the AoII, we provide details on its dynamics in the next subsection.

In this paper, we focus on the class of AoII measures having the function g(·, ·) as

for a certain c > 0. To that end, let us define the system's state S t at time t as

Given that t ∈ N, we have S t ∈ N. Next, as seen in the AoII examples given in (8)-(10), the function f (·) is generally written in function of t−V t . With that in mind, we can rewrite the AoII as

where f : [0, +∞) → [0, +∞) is a non-decreasing function. Therefore, to characterize the AoII's evolution, it is sufficient to report the evolution of the system's state S t . To that end, let ψ t denote the action taken at time t, where ψ t = 1 if a transmission is initiated and 0 otherwise. Given the available actions that the transmitter can take and the possible transitions of the process d t t∈N , it is essential to characterize the relationship between S t+1 and S t . To that end, we distinguish between two cases:

• Case 1 -S t = 0: In this case, the monitor has sufficiently accurate information on X t at time t. Let us now assume that the transmitter decides to remain idle for the duration of the timeslot t. At the next timeslot t + 1, we could end up in one of the following situations: 1) either X t+1 changes drastically to the point where d t+1 becomes equal to 1, or 2) X t+1 keeps a value relatively similar to X t . As per our adopted Markovian model for d t , these two events happen with a probability 1−α and α respectively. To that end, we obtain

Let us now consider that the transmitter proceeds with a transmission at time t. Given that the monitor already has accurate information X t , no substantial information will be conveyed to the monitor through this transmission. Accordingly, regardless of the channel realization, we have

• Case 2 -S t = 0: In this case, the monitor has inaccurate information on X t at time t. Let us now consider that the transmitter opted out from any transmission at time t. At the next timeslot t + 1, we may end up in one of the following situations: 1) either the monitor will keep having inaccurate information on the process, or 2) the information process undergoes a drastic change that leads to the monitor being back to having accurate information on the process. As per our adopted Markovian model for d t , these two events happen with a probability β and 1 − β respectively. To that end, we obtain

Let us now consider that the transmitter decides to transmit a status update to the monitor at time t. By taking into account the possible channel realizations, we distinguish between two cases:

• h t = 0: In this case, the packet is not successfully delivered to the monitor. Accordingly, from the monitor's perspective, this is similar to the case where no transmission is initiated. Therefore, the evolution of S t follows the transitions reported in (17) . • h t = 1: In this case, the packet is successfully delivered to the monitor. To that end, we distinguish between two possible scenarios: 1) by the time the packet arrives at time t + 1, the information process did not drastically change, or 2) the information process undergoes a drastic change during the transmission that even with the newly delivered packet, the monitor still ends up with inaccurate information at t + 1. As per our adopted Markovian model for d t , these two events happen with a probability 1 − β and β respectively. Consequently, we have

By taking into account the independence between the transitions of the process d t t∈N and the channel realizations, we can summarize the transitions of S t as follows

Given the above system's dynamics, one can notice a necessity to impose some restrictions on the parameters and functions involved. Effectively, for packet transmission to be useful to the system's performance, we need to have

If this condition is violated, then transmitting a packet does not improve the system's overall performance. Specifically, this means that the information process changes drastically at each timeslot to the point that if we transmit a packet, the packet becomes obsolete by the time it arrives at the receiver. From (20) , we can conclude that the condition is equivalent to having a < β. Next, let us consider that a packet is transmitted at each timeslot. Given the dynamics of the system, we have

In other words, even if a packet is transmitted at every timeslot, there is still a chance for the system's penalty to grow. To prevent the situation where even a transmission at each timeslot will still lead to an unbounded penalty, it is necessary to impose the following condition

Note that, for similar reasons, analogous conditions have been previously adopted in the AoI framework for communication over unreliable channels [26] . With the system's evolution clarified, we can now formulate our problem and find its optimal solution.

Let π represents a transmission policy that determines the packets being sent over time. The transmission policy π is defined as a sequence of actions π = (ψ π 0 , ψ π 1 , . . .). Let Π denotes the set of all causal scheduling policies, i.e., where the decisions are taken without any knowledge of the future. Our optimization problem can be formulated as follows

where f : [0, +∞) → [0, +∞) is a non-decreasing function of S π t , and 0 < δ ≤ 1 is the highest update rate allowed. The above problem belongs to the family of Constrained Markov Decision Process (CMDP), which are known to be generally challenging to solve optimally. To address these challenges, we proceed in the sequel with a Lagrange approach and provide a step-by-step analysis to solve problem (22) optimally.

The Lagrange approach consists of transforming the constrained problem (22) to an unconstrained one by incorporating the constraint in the objective function. Specifically, let us introduce the Lagrange multiplier λ ∈ R + . We define the Lagrangian function as g(λ, π) = lim sup

Given that λ ≥ 0, it can be regarded as a penalty that is paid for a packet transmission. Ideally, we would like to find a certain λ * for which minimizing the function (23) across all policies Π allows us to derive the optimal policy of the constrained problem (22) . To proceed in that direction, let us consider the following optimization problem min π∈Π g(λ, π),

for any fixed λ ∈ R + . Knowing that λδ is independent of the chosen policy π, the above minimization problem is equivalent to the following

(25) Therefore, we focus on the optimization problem (25) . Based on the system's dynamics previously detailed in Section III-B, the above problem can be cast into an infinite horizon average cost Markov Decision Process (MDP) as follows • States: The state of the system S t coincides with that reported in Section III-B. Accordingly, the state space of interest S is the space of natural numbers N. • Actions: At any time t, the possible actions that can be taken by the transmitter are to either initiate a new transmission (ψ t = 1) or to stay idle (ψ t = 0). • Transitions probabilities: The transitions probabilities between the different states correspond to those previously reported in Section III-B. • Cost: Given the objective function of the problem, the instantaneous cost is set to C(S t , ψ t ) = f (S t ) + λψ t . To obtain the optimal policy of an infinite horizon average cost MDP, it is well-known that it is sufficient to solve the following Bellman equation [27] θ +V (S) = min

where Pr(S → S |ψ) is the transition probability from state S to S given the action ψ, θ is the optimal value of (25), and V (S) is the differential cost-to-go function. However, this is notoriously known to be a challenging task [27] . We leverage our system's particularity to circumvent these challenges and provide key structural results on the value function V (·). Using these results, we proceed to solve the Bellman equation, as will be seen in the sequel.

As previously explained, we start by studying the particularity of the value function. Before doing so, we first distinguish between two types of functions f (S) based on their behavior for large S. To that end, we define The list of such functions includes the linear and monomial functions reported in (8)- (9) . • Bounded f (·): In this case, the penalty of the system saturates and reaches a fixed limit

An example that belongs to this family of functions is the time-threshold function reported in (10) . To analyze the bounded function case, we will proceed with a truncation of the state space S = N. Specifically, from the limit definition, we have

Accordingly, we can choose an arbitrarily small such that f (S) ≈ f (S thresh ), ∀S ≥ S thresh . To that end, we let S = {0, 1, . . . , S thresh } ⊆ N. Although this truncation will have a negligible effect on the performance for a small , it will prove to have analytical benefits in deriving the optimal transmission policy. With this distinction in mind, we lay out the following lemma.

Lemma 1 (Non-decreasing Property of V (·)). For both function classes, the differential cost-to-go function V (S) is a nondecreasing function of S.

Proof. The proof is in Appendix A.

Next, we leverage the above lemma to establish the fundamental proposition below.

Proposition 1 (Structure of the Optimal policy). For any λ ∈ R + , and for both function classes, the transmission policy that optimally solves problem (25) is a threshold policy.

Proof. The proof is in Appendix B.

The above proposition allows us to have a road-map to solve the Bellman equation. Knowing that a threshold policy is optimal, we restrict our attention to this class of policies to simplify and solve the Bellman equation. Consequently, we lay out the following theorem.

Theorem 1 (Optimal Policy). The optimal transmission policy π * λ can be summarized as follows • Unbounded f (·): π * λ is a threshold policy such that a transmission is initiated when S t ≥ n * λ where n * λ = inf{n ∈ N * : H(n) > 0} − 1,

and H(n) and θ n are equal to

and the optimal threshold n * λ is equal to

where H (n) and θ n are reported in Table I . Otherwise, the optimal transmission policy π * λ is to never transmit, and we set n * λ = S thresh + 1.

Proof. The proof is in Appendix C.

The next step consists of deriving a closed-form expression of C π * λ for any λ ∈ R + . Finding this expression will allow us to propose an iterative algorithm later on that finds the optimal transmission policy, as will be seen in Section IV-D. To that end, we provide the following proposition.

Proposition 2 (Update Rate). The average update rate of the transmission policy π * λ is • Unbounded f (.):

(35) • Bounded f (.): it coincides with the unbounded case expression for any n * λ ∈ S, and is equal to 0 if n * λ = S thresh + 1.

Proof. The proof is in Appendix D.

Thus far, we have focused on finding the optimal transmission policy π * λ that solves problem (25) , which it turns solves (24) . However, our primary goal remains to optimally solve the original constrained problem reported in (22) . It turns out, we can relate the optimal policy for the constrained problem to that of (24) if certain conditions are satisfied. To that end, let us first define λ * inf{λ ∈ R + : C π * λ * ≤ δ} and ϑ = 1−α 2−α−a . With these definitions in mind, we summarize our findings in the following theorem.

Theorem 2 (Optimal Policy of the Constrained Problem). The optimal transmission policy of problem (22) can be summarized as follows • Unbounded f (·): the optimal transmission policy is a randomized threshold policy with parameter µ * such that -The thresholds n * λ * − 1 and n * λ * are adopted with probability µ * and 1 − µ * respectively.

µ * is chosen to ensure that the randomized policy has an average update rate equal to δ. In other words,

where C π * λ * ,1 and C π * λ * ,2 are the average update rate when the thresholds n * λ * −1 and n * λ * are used respectively. • Bounded f (·): the optimal transmission policy coincides with the unbounded function case if δ < ϑ. Otherwise, an optimal transmission policy is to transmit a packet in every timeslot t where S t = 0.

Proof. The proof is in Appendix E.

To obtain the above transmission optimal policy, we implement a specific low-complexity algorithm, as explained in the following. The first step in our algorithm implementation consists of finding the optimal threshold for any fixed λ ∈ R + . To that end, we recall from our analysis in the proof of Theorem 1, the functions H(n) and H (n) are both non-decreasing with n. Accordingly, we can use the binary search algorithm [28] to find the optimal threshold for any λ. Specifically, starting from an initial interval I = [1, 2], we exponentially enlarge this interval as long as n * λ ∈ I. When the interval is large enough to contain n * λ , a binary search algorithm is adopted to find it. Interestingly, this whole procedure is computationally efficient as it requires at most O(log n * λ ) iterations. (37) Accordingly, we can always choose an endpoint to the summation in a way that satisfies a predefined precision criterion.

Next, we derive a scheme to find the optimal Lagrange multiplier λ * . To that end, we note that the average update rate C π * λ is non-increasing with λ [29] , [30] and that C π * 0 = 1. Accordingly, we employ a bisection search method to find λ * [28] , which also has low complexity. Specifically, as it was done for the binary search algorithm, we start with an initial interval I 0 = [λ 0 min , λ 0 max ] where λ 0 min = 0 and λ 0 max = 1. As long as C π * λ t max > δ, we set λ t+1 min = λ t max and λ t+1 max = 2λ t max . We do so until we end up with an interval I t = [λ t min , λ t max ] such that C π * λ t min > δ and C π * λ t max ≤ δ for some t ≥ 0. The next step consists of evaluating the middle point of the interval

We keep doing this until a convergence criterion is satisfied and the algorithm outputs ξ ∞ . Consequently, to get the optimal transmission policy, it is sufficient to set n * λ * of Theorem 2 to n * ξ∞ . Finally, the randomization parameter µ * can be easily concluded using the resulting n * λ * that we adopt. A pseudo-code of the algorithm is reported in Appendix F.

An interesting question is how comparable our framework is to the standard error-based measure approach previously discussed in Section II? This section answers this question by comparing the performance of both the error-optimal policy and the AoII-optimal policies. Interestingly, we can obtain the error-optimal transmission policy π * e by adopting the following function f error (S t ) = 1 if S t = 0 and f (0) = 0, and applying Theorem 2. By doing so, we minimize the long-term average of the error measure d t depicted in Section III. Let us now consider a simple AoII measure f 1 (S t ) = S t , and let π * a denote the corresponding AoII-optimal policy. Moreover, suppose that α = 0.2, β = 0.9 and p s = 0.8. We compare the two policies in terms of average AoII and average error in the table below. II: Performance comparison between π * a and π * e . Interestingly, the AoII-optimal policy achieves the same error performance as the error-optimal policy (i.e., the AoII-optimal policy is also error-optimal). On the other hand, the erroroptimal approach is not AoII-optimal, as seen by the two policies' performance gap. This was first observed numerically in the work of Clement et al. [31] and our work here provides a rigorous understanding of this phenomena. To that end, we note an essential consequence of Theorem 1 and 2 and their proofs: the average update rate of all AoII-optimal policies when S t = 0 is the same. Namely, if an AoII-optimal policy for a certain function f 1 (·) transmits 90% of the time when S t = 0, then an AoII-optimal policy for any other function f 2 (·) will do the same, given of course that it satisfies the conditions we imposed on f (·). Now, knowing that the error penalty is equal to 0 when S t = 0 and 1 for any S t = 0, the only thing that matters to obtain an error-optimal performance is the policy's average update rate when S t = 0. With that in mind, we can lay out the following conclusion.

Adopting AoII-optimal policies minimizes the average error while also helping achieve the communication's goal. On the contrary, the converse is not necessarily true.

In this section, we provide real-life applications of the AoII and compare the performance between the AoII-optimal, the AoI-optimal [32] , and the error-optimal schemes.

We consider a transmitter-receiver pair where real-time video stream packets are sent from one end to the other. Time is slotted and normalized to the slot duration (i.e., the slot duration is taken as 1). The video stream comprises frames, each of which is a 1-D vector of length M in line-scan order. The stream's total duration is T timeslots. At each timeslot, a frame of the video stream is sent by the transmitter side. We suppose that the channel at timeslot t is X t , and its estimate at the transmitter's sideisX t . The transmitter can send pilot signals and learn the channel at the beginning of each timeslot. However, this training succeeds with a probability 0 < p s < 1 and incurs a cost, knowing that an average cost budget δ cannot be surpassed. As explained in Section III, by adopting a Markovian channel model, it can be shown that without channel learning, the process d t can be modeled using a Markov chain. We suppose that this chain's parameters are α and β as previously depicted in earlier sections. We assume that the receiver successfully decodes packets if d t = 0 and a transmission error occurs otherwise. At the receiver, we assume a simple loss concealment scheme where the lost frame due to a transmission error is replaced by the previous frame. The error propagation process is modeled with a geometric attenuation factor resulting from spatial filtering. Let us assume that each error introduces an initial error power γ, and the cross-correlation factor between each successive error is ρ. By following the derivations in [22] , we can show that the video distortion is a particular case of the AoII where f (0) = 0 and for any S t = S > 0, we have (38) where τ = 1 + α 0 ρ + c, and (α 0 , c) are two parameters of the video stream. It is important to note that the channel training goal is to minimize the total average distortion of the receiver's video signal. We are not interested in having fresh estimates of X t (AoI metric) or minimizing the channel prediction error (standard error metric). Therefore, we can see how by tweaking the function f (.), the AoII allows us to capture the channel training's goal. To highlight our AoII approach's benefits, we compare it to the AoI and the standard errorbased frameworks for this particular scenario. Specifically, we evaluate the average video distortion resulting from adopting the optimal policies for these different metrics. We consider α = 0.5, β = 0.8, p s = 0.8, T = 10 6 , ρ = 0.8, c = 2, γ = 1, and α 0 = 4. As seen in Fig. 3a , the AoII-optimal policy outperforms the two other policies for any δ.

In this scenario, we assume that a transmitter informs a remote monitor about whether or not the monitored electrical machine is overheating. An abnormal increase in temperature in electrical devices creates thermal stress on the machine, leading to the breakdown of the electrical insulation (e.g., motor winding insulation). This itself will lead to an eventual malfunction of the machine. Therefore, the transmitter needs to inform the monitor of the temperature's status and solicit instructions to minimize the probability of the machine malfunction. We suppose that the transmitter is limited on how often it can update the monitor. The average update rate allowed is δ. We consider that d t = 1 when the machine is overheating at time t and d t = 0 otherwise. We also assume that d t evolves as a Markov chain, and its parameters are α and β. Following the study in [33] , the probability of an insulation breakdown under temperature stress follows a Weibull distribution. Precisely,

where γ and ρ are parameters that depend on the machine's characteristics. When there is no temperature stress, the breakdown probability is negligible. The communication goal is to choose the update times such that the probability of a breakdown since the stress was applied is minimized.

We can see that this probability is a special case of the AoII where f (0) = 0, and for any S t = S > 0, we have f (S) = 1 − exp(−(S/γ) ρ ). We evaluate the average breakdown probability that results from adopting the optimal policies for the standard three metrics. We consider α = 0.2, β = 0.9, p s = 0.8, ρ = 1, and γ = 1. As seen in Fig. 3b , the AoII-optimal policy outperforms the two other policies for any δ.

Contrary to the previous cases, we consider an application outside the scope of traditional communication networks. Specifically, we consider a scenario where fires happen independently and fire stations have to respond to them. Accordingly, this application falls under the decision problems umbrella. As found by the UK fire research station, the spread of fire can be represented through an exponential statistical model [34] . Specifically,

where F (t) is the amount of fire damage at time t since ignition, F init is the initial ignite damage, γ is the fire growth parameter, F max is the maximum possible damage, and t fire is the ignition time. Given the restricted resources, the fire stations are limited on how often they can respond to fires, as an average response rate of δ cannot be surpassed. We consider that d t = 1 when a fire is happening at time t and d t = 0 otherwise. We also assume that d t evolves as a Markov chain, and its parameters are α and β such that β = 1.

The goal is to minimize the total average fire damage. Using (40), we can see that the fire damage is a special case of the AoII where f (0) = 0, and for any S t = S > 0, we have f (S) = min{F max ; F 0 exp(γS)}. We evaluate the average fire damage that results from adopting the AoII-optimal and the error-optimal policies. We consider α = 0.2, p s = 1, F max = 10, γ = 0.1, and F init = 1. As seen in Fig. 3c , the AoII-optimal policy outperforms the error approach for any δ. This example shows that the AoII is not restricted to communication networks and can be utilized in various other frameworks.

In this paper, we have shown how the AoII metric enables semantics-empowered communication, where the communication's goal is taken into account. We have also shown how it addresses several shortcomings of the AoI and the standard error metrics approaches. Additionally, we have developed an optimal transmission policy that minimizes the AoII, and we showcased its substantial performance advantages compared to the approaches mentioned above. Future research directions include the extension to more general information source models, examining continuous-time systems, investigating multiuser scenarios, and providing even a broader range of real-life applications of the AoII.

The first step consists of simplifying the Bellman equation. For the unbounded function class, and given the dynamics of the system reported in Section III-B, we can rewrite the Bellman equation as follows

Notice that the upper part of the minimization in (41) is associated with choosing ψ = 0, i.e., letting the transmitter idle, and the lower part with ψ = 1, i.e., initiating a transmission. To prove the desired results, we leverage the Relative Value Iteration Algorithm (RVIA) [27] . The RVIA is an iterative algorithm that calculates the differential cost-to-go function V (S) of the Bellman equation reported in (26) . To that end, and for any state S ∈ N, let V t (S) designate the differential cost-to-go function estimate at iteration t. Also, let us denote by T (V t )(S) the mapping obtained by applying the right-hand side of the Bellman's equation

where Pr(S → S |ψ) is the transition probability from state S to S given the action ψ. Without loss of generality, we suppose that V 0 (S) = 0 for all states S ∈ N and we let S = 0 be the reference point of the algorithm. With that in mind, the estimate of the differential cost-to-go function is updated as follows

Note that V t (0) = 0 holds for all iterations t. As stated in [27, Proposition 3.1], the above algorithm converges to the differential cost-to-go function V (S) (i.e., lim t→+∞ V t (S) = V (S), ∀S ∈ N). Accordingly, if we can show the nondecreasing property of V t (S) for any time t ∈ N, then we can assert that this property also holds for the differential cost-togo function. Therefore, our goal is to show that

Note that we restrict our attention to non-zero states since V t (0) = 0 for any t ∈ N. We prove the non-decreasing property reported in (44) by induction. First, given that V 0 (S) = 0 for all states S ∈ N, the above property holds for t = 0. Next, we suppose that the property holds up till iteration t > 0. By investigating eq. (41) for S = 0, we can see that the optimal action is to stay idle. Therefore, we have

Therefore, we can rewrite the update rule of the RVIA as

Next, given the system's dynamics reported in Section III-B, we can conclude that

Using the above equations, and by leveraging our assumption on V t (·) and the non-decreasing property of f (·), we can

Concerning the bounded function case, we first note that the equations in (41) hold for any S ∈ S \ {S thresh }. Moreover, we have

By following the same analysis as the one done in the unbounded case, we can prove that V (·) is also non-decreasing in the bounded function case. This concludes our proof.

Let us first focus on the unbounded function case. To establish the optimal policy of problem (25) , one has to recourse to solving the Bellman equation. However, without any knowledge of the optimal policy structure, deriving a closed-form expression of V (·) can be challenging. To address these challenges, we recall that the RVIA allows us to find the differential cost-to-go function iteratively. To that end, let us define V 1 t+1 (S) and V 0 t+1 (S) as the differential cost-to-go function estimate by the RVIA at iteration t + 1 if the optimal action is ψ = 1 and ψ = 0 respectively. Given the RVIA update rule reported in (46), we have

(50) Next, we define the difference between the two quantities as

By definition, the sign of ∆V t+1 (S) allows us to conclude the optimal action that minimizes the Right Hand Side (RHS) of the update rule reported in (46). For example, if ∆V t+1 (S) ≥ 0, then the minimum of the RHS in (46) is achieved for ψ = 0 and vice-versa. Note that, as we explained previously in Section III-B, we have a < β. Moreover, we recall the results of Lemma 1 where we have shown that V t (S + 1) is a non-decreasing function of S for all t ∈ N. With these two things in mind, we can conclude that ∆V t+1 (S) is nothing but the sum of a non-negative constant λ, and a non-increasing negative function (a − β)V t (S + 1). Knowing that the RVIA converges to the differential cost-to-go function V (·) when t → +∞, we can deduce that the optimal action is increasing with S from ψ = 0 to ψ = 1. In other words, the difference ∆V (S) decreases with S, and at a certain point, it could change sign and becomes negative. When that happens, the action of transmitting becomes more beneficial than remaining idle. Therefore, we can conclude that the optimal transmission policy is of a threshold nature. As for the bounded function case, the same analysis holds, and ∆V t+1 (S) is the sum of a non-negative constant λ and a non-increasing negative function for any S ∈ S. Accordingly, the difference ∆V (S) also decreases with S and, at a certain point, it could change sign and become negative. When that happens, the action of transmitting becomes more beneficial than remaining idle. However, the subtle difference with the unbounded function case is that the sign's change might not happen. In this case, the optimal policy is to never transmit a packet. This is a natural consequence of the finite state space assumption resulting from the boundedness of the function. In fact, λ can be significantly high that letting the system evolve on its own becomes optimal, given that f (·) is always bounded. APPENDIX C PROOF OF THEOREM 1

As always, we start by investigating the unbounded function case. Given that the optimal policy is a threshold policy, we can affirm that an integer value n ∈ N exists such that the optimal action is ψ = 1 and ψ = 0 when S ≥ n and S < n respectively. With that in mind, and by utilizing the RHS of the Bellman equation in (41), we can conclude that

Without loss of generality, we suppose in the sequel that V (0) = 0. To that end, and by rearranging the above terms, the following condition for activity can be deduced

In other words, the optimal action is to transmit whenever the system is in a state S that verifies the above condition.

Given the threshold property of the optimal policy, the Bellman equation can be rewritten for any state S ≥ n as follows

Note that we add the subscript n to θ to indicate that the average cost θ results from adopting the threshold n. By following a forward induction, we obtain

(55) Given that a < 1, we can invoke the geometric series sum property to end up with

Given the above equation, we can particularly conclude that

Next, we investigate the case where the system is in a state S < n. For any state S < n, the optimal action is to remain idle. Consequently, using the RHS of the Bellman equation, we obtain

By following a backward induction, we wind up with the following identity for any 1 ≤ S < n

(59) Knowing that V (0) = 0, and by using eq. (58), we get Using the expression of V (n) in (57), and by replacing S with 1 in eq. (59) and equating it to V (1) in (60), we end up with the following relationship between θ n , λ, and n

This fundamental relationship will be pivotal to our subsequent analysis to find the threshold n. The next step of our analysis revolves around deriving a criterion that will allow us find n.

To that end, we recall the activity condition reported in (53). Given that n is the threshold, we can assert that

Therefore, it is sufficient to find the value n that verifies the above equation. This is however easier said than done as one has to prove the existence of such a solution. To proceed in this direction, we recall the results of Lemma 1 where we have shown that V (·) is a non-decreasing function. With that in mind, we recall that the function f (·) is unbounded. Therefore, by leveraging the limit definition, we have

Using the above property of f (·) and the expression of V (n) found in (57), we can easily show that

Therefore, we are assured that a solution to eq. (62) exists in this case. In particular, the optimal threshold is

where

To understand the intuition behind these results, we recall that λ can be seen as a penalty paid for transmitting a packet. As f (·) is unbounded when S → +∞, we can deduce that no matter how high λ is, transmitting a packet will eventually become the optimal action. Let us now investigate the bounded function case. To that end, similarly to the previous case, we suppose that the optimal threshold is equal to n ∈ S. Following the same analysis above, we end up with the expressions of V (.) and θ n reported in Table III . Moreover, from the Bellman equation, we can conclude that the activity condition is

Given the above condition, we can conclude that if it is optimal to transmit when S = S thresh , then it is also optimal to transmit when S = S thresh − 1. Accordingly, we focus on n being in the set S \ {S thresh }. With the above activity condition in mind, the threshold n ∈ S \ {S thresh } is simply the first state that verifies V (n + 1) > λ β−a . In other words,

where

Now, unlike the unbounded function case, an interesting phenomenon can take place here: the activity penalty λ can be so high that it is optimal to simply not transmit, even if S is high. In other words, transmitting a packet will cost us more than letting the system evolve on its own without any intervention. Our aim becomes to characterize this regime and derive a condition on λ that allows us to know when this phenomenon occurs. If a threshold exists, it can be found using eq. (68). Therefore, if H (S thresh ) ≤ 0, then the optimal policy is to stay idle. In other words, if

then the optimal policy is to stay idle. On the other hand, if λ does not verify the above inequality, then the optimal policy is a threshold policy where the threshold can be found using eq. (68).

To proceed with our proof, we recall that the optimal transmission policy π * λ is a threshold policy with a threshold n * λ . Trivially, if n * λ = 0, a packet transmission is initiated at each timeslot and C π * λ = 1. In the case where n * λ > 0, we note that the system's state S t evolves as a Discrete-Time Markov Chain (DTMC) reported in Fig. 4 . Note that we first focus on the case of unbounded function f (·). By leveraging the general balance equations, we can show that the stationary distribution of the DTMC is

Given the above expressions, and knowing that C π * λ = +∞ k=n σ k (n * λ ), we can obtain the results of the proposition. By following a similar analysis for the bounded function case, we can show that the stationary distribution has the following expression

Next, we can demonstrate that the average update rate C π * λ = Sthresh k=n σ k (n * λ ) has the same expression as the one in the unbounded case for n * λ ∈ S. Note that, the average update rate is equal to 0 when n * λ > S thresh .

Let us first focus on the unbounded function case. To establish our theorem, we first need to show that the constrained problem reported in (22) verifies key properties listed in the assumptions of [29, Theorem 2.5] . To proceed in that direction, let R(s, G) be the class of policies such that

and the expected time m sG of a first passage from s to G is finite. Let R * (s, G) be the class of policies π ∈ R(s, G) such that, in addition, the expected AoII and update cost of a first passage from s to G are finite. With these definitions in mind, we can tackle the assumptions and prove that our problem verifies them. Assumption 1 -For all r > 0, the set G(r) = {s : there exists an action ψ such that f (s)+ψ ≤ r} is finite: To prove this assumption, we note that f (·) is a non-decreasing unbounded function. Accordingly, given that lim S→+∞ f (S) = +∞, we have

Therefore, knowing that ψ ∈ {0, 1}, we can set M = r to conclude that G(r) ⊆ [0, S − 1], asserting that it is a finite set. Assumption 2 -There exists a stationary policy π such that it induces a Markov chain where the state space consists of a single (nonempty) positive recurrent class R and a set U of transient states such that π ∈ R * (i, R), for i ∈ U , and both the average AoII and update rate are finite: To prove this assumption, we consider the always update policy π au that transmits a packet at each timeslot. Given the system's dynamics reported in Section III-B, we can conclude that the state space of the Markov chain induced by this policy consists of a single recurrent class R = N (the transient set U is empty). Moreover, we have C π = 1 and, given the assumption on f (·) found in (21), we can deduce that the average AoII of π au is finite.

Assumption 3 -Given any two states S 1 = S 2 , there exists a policy π such that π ∈ R * (i, j): To prove this assumption, let us suppose without loss of generality that S 2 ≥ S 1 . By considering the always update policy again, we can easily see that there is a non-zero probability to go from state S 1 to state S 2 and vice-versa. The expected AoII and update costs of the first passage from S 1 to S 2 (or vice-versa) are trivially finite.

Assumption 4 -If a stationary policy π has at least one positive recurrent state, then it has a single positive recurrent class R. Moreover, if 0 ∈ R, then π ∈ R * (0, R): To prove this assumption, we simply note that whatever the transmission policy is, there is a non-zero probability to go from any state S ∈ N * to state 0 and vice-versa. Therefore, any recurrent class must contain the state 0. With this in mind, we can conclude that there can only be one single positive recurrent class.

Assumption 5 -There exists a policy π such that the average AoII is finite and C π < δ: To prove this assumption, we can consider a threshold policy π n0 where the threshold n 0 = inf{n : N : C πn < δ}. Note that the update rate C πn is strictly decreasing with n [29] , which ensures the existence of n 0 . Given the assumption on f found in (21) , we can conclude that the AoII is finite.

Given the above assumptions, we can leverage the results of [29] (in particular, Theorem 2.5, Proposition 3.2, Lemma 3.4, and Lemma 3.9). These results affirm that the optimal transmission policy of the constrained problem is a mixture of two policies such that • The two policies coincide with those of the optimal policy of problem (24) for a certain λ * ≥ 0, but differ in at most a single state. • λ * is defined as λ * inf{λ ∈ R + : C π * λ * ≤ δ}. • The mixture parameter µ * ∈ [0, 1] is chosen in a way to ensure that the update rate constraint is verified with equality.

Given the above results, and the threshold structure of the optimal policy of problem (24), we can conclude the statements of our theorem. As for the bounded function case, we first discuss the validity of Hypothesis 2.2 and Hypothesis 4.1 of [30] for our problem. To that end, we have:

Hypothesis 2.2. -For any simple stationary policy, the state 0 is accessible from any S ∈ S: This hypothesis trivially holds for our problem as seen in the system's dynamics reported in Section III-B.

Hypothesis 4.1. -LetΠ denote the set of optimal policies for the unconstrained version of the problem in eq. (22) . Suppose that Cπ > δ for everyπ ∈Π and that there exists a stationary policyπ such that Cπ < δ: First, it is easy to see that the never transmit policyπ has an average update rate Cπ = 0. Next, a careful investigation of this hypothesis is needed as there could be cases where Cπ ≤ δ. To see this more clearly, consider the stationary policyπ where a transmission is initiated only when S t = 0. By leveraging the expression provided in Proposition 2, it can be shown that Cπ = 1−α 2−α−a . Moreover, given the system's dynamics reported in Section III-B, the Bellman equation in state 0 for the unconstrained version of the problem in eq. (22) can be written as follows θ + V (0) = min f (0) + αV (0) + (1 − α)V (1);

In other words, transmitting a packet in state S = 0 does not have any impact on the performance. Therefore, if δ ≥ 1−α 2−α−a , then the constraint becomes redundant and the AoII optimal policy can be obtained by transmitting whenever S = 0. Now, let us focus on the case where δ < 1−α 2−α−a . In this case, the two hypotheses hold. Let us now define λ * inf{λ ∈ R + : C π * λ * ≤ δ}. By leveraging Theorem 4.4 [30] , we can conclude that the optimal transmission policy of the constrained problem is a mixture of two policies such that • The two policies coincide with those of the optimal policy of problem (24) for λ * ≥ 0, but differ in at most a single state. • The mixture parameter µ * ∈ [0, 1] is chosen in a way to ensure that the update rate constraint is verified with equality. Given the above results, and the threshold structure of the optimal policy of problem (24), we can conclude the statements of the theorem.

Algorithm 1 AoII Optimal Policy -Unbounded Function 1: Input: the system's parameters α, β, p s , δ and the convergence tolerance 2: if δ = 1 then skip the algorithm and transmit at every timeslot t 3: else 4: Init. λ min ← 0, λ max ← 1 5: n * λmax ← FindThreshold(α, β, p s , λ max ) 6 :

C ← C π * λmax using Proposition 2 7: while C > δ do 8: λ min ← λ max , λ max ← 2λ max 9:

n * λmax ← FindThreshold(α, β, p s , λ max ) 10:

C ← C π * λmax using Proposition 2 11: end while 12: ξ ← λmin+λmax while |ξ − λ max | > do 14: n * ξ ← FindThreshold(α, β, p s , ξ)

C ← C π * ξ using Proposition 2 16: if C > δ then λ min ← ξ if C > δ then n * λ * ← n * + 1, C π * λ * ,1 ← C, C π * λ * ,2 ← else n * λ * ← n * , C π * λ * ,2 ← C 23:

if n * = 1 then C π * λ * ,1 ← 1 24: else n * ← n * − 1

C π * λ * ,1 ← Output: n * λ * , µ * 30: end if 31: procedure FINDTHRESHOLD(α, β, p s , λ) 32: Init. N LB ← 1, N U B ← 1 33: while H(N U B ) ≤ 0 do Output: the optimal threshold n * λ ← n − 1 44: end procedure

Real-time status: How often should one update

Age of information: An introduction and survey

Optimization of Wireless Networks : Freshness in Communications

Update or wait: How to keep your data fresh

On the optimality of the whittle's index policy for minimizing the age of information

On the age of information in a csma environment

Age-optimal updates of multiple information flows

Minimizing the age of information: Noma or oma?

Timely Status Update in Massive IoT Systems: Decentralized Scheduling for Wireless Uplinks

Scheduling policies for minimizing age of information in broadcast wireless networks

Age-based path planning and data acquisition in uav-assisted iot networks

Sampling of the wiener process for remote estimation over a channel with random delay

A unified sampling and scheduling approach for status update in multiaccess wireless networks

Age-of-information vs. value-of-information scheduling for cellular networked control systems

The age of incorrect information: A new performance metric for status updates

Beyond age: Urgency of information for timeliness guarantee in status update systems

Semantics-empowered communication for networked intelligent systems

6g networks: Beyond shannon towards semantic and goal-oriented communications

Two freshness metrics for local cache refresh

Optimal estimation with limited measurements

Optimal causal rate-constrained sampling for a class of continuous markov processes

Analysis of packet loss for compressed video: Effect of burst losses and correlation between error frames

Finite-state markov channel-a useful model for radio communication channels

A robust discrete state approximation to the optimal nonlinear filter for a diffusiont

Finite state markov-chain approximations to univariate and vector autoregressions

Aoi-penalty minimization for networked control systems with packet loss

Dynamic Programming and Optimal Control

Constrained average cost markov decision chains

Optimal policies for controlled markov chains with a constraint

Age of incorrect information for remote estimation of a binary markov source

Average age of information with hybrid arq under a resource constraint

Exponential model of fire growth

Algorithm 2 AoII Optimal Policy -Bounded Function 1: Input: the system's parameters α, β, p s , δ, S thresh and the convergence tolerance 2: if δ ≥ 1−α 2−α−a then skip the algorithm and transmit at every timeslot t when S t = 0 3: else 4:C ← C π * λmax using Proposition 2 7: while C > δ do 8: λ min ← λ max , λ max ← 2λ max 9:n * λmax ← FindThreshold(α, β, p s , λ max ), C ← C π * λmax using Proposition 2 10: end while 11: ξ ← λmin+λmax 2 12:while |ξ − λ max | > do 13: n * ξ ← FindThreshold(α, β, p s , ξ), C ← C π * ξ using Proposition 2 14: if C > δ then λ min ← ξ if n * = 1 then C π * λ * ,1 ← 1 25: else n * ← n * − 1 26: 

Output: n * λ * , µ * 31: end if 32: procedure FINDTHRESHOLD(α, β, p s , λ) 33: if λ ≥ Output: the optimal threshold n * λ ← n − 1 48: end procedure