key: cord-0138818-gx32bgne
authors: Marotta, Ra'ira; Alves, Mariane Branco; Migon, Helio S.
title: k-parametric Dynamic Generalized Linear Models: a sequential approach via Information Geometry
date: 2022-01-14
journal: nan
DOI: nan
sha: 90a5b76807216ccd0c216e2b3a9c70e75d492c72
doc_id: 138818
cord_uid: gx32bgne

Dynamic generalized linear models may be seen simultaneously as an extension to dynamic linear models and to generalized linear models, formally treating serial auto-correlation inherent to responses observed through time. The present work revisits inference methods for this class, proposing an approach based on information geometry, focusing on the $k$- parametric exponential family. Among others, the proposed method accommodates multinomial and can be adapted to accommodate compositional responses on $k=d+1$ categories, while preserving the sequential aspect of the Bayesian inferential procedure, producing real-time inference. The updating scheme benefits from the conjugate structure in the exponential family, assuring computational efficiency. Concepts such as Kullback-Leibler divergence and the projection theorem are used in the development of the method, placing it close to recent approaches on variational inference. Applications to real data are presented, demonstrating the computational efficiency of the method, favorably comparing to alternative approaches, as well as its flexibility to quickly accommodate new information when strategically needed, preserving aspects of monitoring and intervention analysis, as well as discount factors, which are usual in sequential analyzes.

In practical applications, it is usual to have data sets which should be modeled by multi-parametric (not necessarily Gaussian) distributions, as for example in Cargnoni et al. (1997) ; da Silva et al. (2011); Gamerman et al. (2013) ; Souza et al. (2016) . We focus on the class of dynamic generalized linear models (DGLM), defined by West et al. (1985) , which assume time indexed observations in the one parameter exponential family and may be seen as an extension to both generalized linear models (GLM, Nelder and Wedderburn, 1972) and dynamic linear models (Harrison and Stevens, 1976) . Our main focus is on inference on DGLMs for k-parametric, uni or multivariate exponential family responses, preserving the sequential aspect of the learning process with efficient computational time, allowing for on-line information updating.

Dynamic models are based on two levels of specification: time-indexed observables y t , t = 1, 2, . . . are assumed conditionally independent, given latent states θ t , governed by stochastic evolution rules which assign those non-observable quantities a Markovian temporal structure and formally address marginal autocorrelation among observables. The latent dynamic states guide the behavior of a dynamic predictor based on structural components which may relate to trends, seasonal patterns, covariate effects, propagation of intervention impacts, among others. Under normality of the response (conditional on states) and evolution errors, and a few additional assumptions (see Migon et al., 2005 , for details), the Bayesian update process is analytically conducted by well defined equations, which guide the evolution from the posterior distribution of non-observables at time t − 1 to the prior and predictive distributions at time t; finally leading to the updated posterior distribution for the states at time t, when the cycle repeats. In that framework, retrospective analysis is also analytically viable (see West and Harrison, 1997, Chap. 4 ). On the broader non-Gaussian DGLM context, there is no general analytical solution for the updating scheme and some sort of approximation is needed. Our aim is to produce on-line inference in the context of DGLMs defined in terms of uni or multivariate k-parametric exponential families, preserving the tractability inherited from conjugacy properties.

The present work revisits inference methods for this class, proposing an approach based on information geometry focusing on: i) dealing with responses on the kparametric exponential family (among others, the proposed method accommodates multinomial responses on k = d + 1 categories); ii) accommodating structural uncertainty on decision making; iii) preserving the sequential aspect of the Bayesian learning procedure, in order to produce real-time inference in dynamic contexts including intervention, monitoring, etc.; iv) benefiting from the conjugate setup in the exponential-family; v) assuring computational efficiency; vi) allowing dynamics to every structural component of interest, through time-varying parameters.

Monte Carlo methods via Markov Chains have become a usual methodology to produce inference on DGLMs.

Proposals aimed at improving Markov chain mixing rates and convergence in the context of non-Gaussian or non-linear dynamic models were fruitful in the literature in the 1990s and early 2000s, see for instance (Frühwirth-Schnatter, 1994; Carter and Kohn, 1994; Gamerman, 1998 Gamerman, , 1997 Shephard and Pitt, 1997; Durbin and Koopman, 2002) . Despite their contribution in what they set out to, all those approaches carry the burden of computational cost due to the need of processing the whole batch of data in each iteration of MCMC algorithms, making real-time inference unfeasible. Other approaches worth mentioning in the context of non-Gaussian / non-linear dynamic models are Kitagawa (1987) and Singh and Roberts (1992) , which, later, inspired the construction of proposals for Metropolis-Hastings steps on MCMC schemes for DGLMs.

In the 1980's, approaches designed to overcome the analytical intractability of the Bayesian information update in non-Gaussian/non-linear dynamic models resorted to simplifications on the inferential produced (for instance, restricting the inference on states to partial specifications, based on first and second moments or modes and curvatures around the mode (West et al., 1985; Fahrmeir, 1992 Fahrmeir, , 1997 . Our approach relates closely to these, and in some sense, we turn back to the 1980's, motivated by the computational efficiency of those methods, allied to their sequential updating structure.

DGLMs formally relate the canonical parameter vector of the exponential family to linear predictors on the states vector. No matter which form of prior specification is made for the non-observable states vector (partial or complete), it must be compatible with the prior specification for the canonical parameters of the DGLM.

For instance, the conjugate prior and the prior implied by the states' specification may be reconciled as in West et al. (1985) , by solving a system of equations equalizing their first and second moments.

We assume a normal prior distribution for the states, which implies that the linear predictor also follows a normal prior, but we still assign a conjugate prior distributions to the canonical parameters and these must be made compatible. Our approach is based on Bregman's projection theorem (Amari, 2016, p. 26, Theo. 1.4 ) , which leads to the minimization of a divergence measure between the canonical and states-induced specifications. The Bayesian update of the canonical parameters is trivially achieved via conjugacy, resulting in a posterior distribution which must be projected back to the domain of the states. We accomplish that once again relying on Bregman's projection theorem. We preserve the tractability and computational efficiency of West et al. (1985) , as well as the desirable property of producing on-line inference, but, additionally, our proposal accommodates k-parametric exponential families, which is a limitation of that approach, that deals solely with one-parametric exponential families. We show that our proposal favourably compares with methods that induce dynamics solely by a locally steady level (e.g. Smith, 1979; Harvey and Fernandes, 1989; Gamerman et al., 2013) , in scenarios in which such simple specification is not able to handle all the relevant structural behavior of the data, for instance when time-varying seasonal patterns occur.

Many of the papers in the current literature are similar to ours. Grunwald et al. (1993) developed a dynamic model for compositional data based on Dirichlet responses, which is sensitive to latent states. However, it lacks flexibility when it comes to addressing dynamic effects of covariates, for example. For Bayesian prediction of multinomial time series, Cargnoni et al. (1997) used a hierarchical framework. The inferential procedure is based on Markov chain Monte Carlo, losing the sequential aspect of the Bayesian paradigm.

updating and resulting in undesirable high computational time, depending on the dimension of the data. Terui et al. (2010) employed generalized linear DGLMs to adapt and forecast a multivariate time series of sales counts, and by modeling each category as Poisson dynamic counts, they implicitly induced a multinomial structure. Similarly, Berry and West (2020) used univariate models for series of non-negative counts, decoupled conditionally on components shared across series, and then re-coupled these marginal models by using a separate model to assess and forecast common factors.

Our proposal, on the other hand, deals with the context of time-varying counts on several categories directly assuming a joint multinomial distribution for the categories' counts, enjoying its conjugation properties, which is a key feature for computational efficiency as well as to preserve the sequential aspect of the analysis, allowing, for instance, to make interventions in strategic moments.

Our proposal originates in Marotta (2018) and extends the method and results there presented. One of the main contributions of the present work is the formulation of a methodology for the analysis of multivariate time series of counts, directly assuming a dynamic multinomial distribution for the time-varying response, thus naturally accommodating the dependency among categories. The proposed framework may be adapted to deal with compositional Dirichlet responses and allows for flexibility in the specification for the predictor of each category, accommodating particular or common features in each one. The proposed inferential procedure proves to be computationally efficient, compared to MCMC schemes. preserving the sequential aspect in the analysis and allowing for real-time monitoring and possible interventions, when necessary.

The paper is organized as follows: Section 2 presents a brief review on Dynamic Generalized Linear Models and alternative approaches for producing inference on this class; in Section 3, the necessary elements on Information Geometry for the development of this work are enunciated and we describe the algorithmic construction of our proposal for on-line inference on k-parametric DGLMs. Computations involved in the implementation of the method are described for Multinomial, Normal, Poisson and Binomial responses. In Section 4, the proposed method is applied to classical time series data, with a Poisson dynamic model applied to quarterly sales and a normal dynamic model applied in the context of stochastic volatility. Section 5 presents a case study, jointly modeling age groups' counts of hospital admissions on diarrhea and gastroenteritis in Brazil, via our Multinomial dynamic proposal. Section 6 closes the paper, with final remarks and work in progress.

In this section we review conjugacy properties on the exponential family and set notation that will be adopted throughout the paper. Dynamic generalized linear models (DGLMs) are described and inferential approaches for sequential learning in this class are revisited, with particular focus on conjugate updating proposals which are close to our own proposal, which will be presented in Section 3.

Dynamic generalized linear models (West et al., 1985) assume time-varying observations in the exponential family. The class enjoys nice conjugacy properties that confer it partial analytical tractability. Along this work we focus on observations y 1 , y 2 , . . . conditionally independent, given k−dimensional parameter vectors η t , t = 1, 2, . . ., distributed in the k-parametric exponential family, that is, the observables' probability density (or mass) function is given by p

, which can be rewritten in canonical form as:

where H (y t ) = (h 1 (y t ), . . . , h k (y t )) is a sufficient statistics vector with the exact dimension of the parameter vector η t ; ψ t = (ψ 1t , . . . , ψ kt ), ψ it = c i φ i (η t ), i = 1, . . . , k. ψ t is the natural or canonical parameter at time t. Note that the parametric vector is k-dimensional, but y t may be uni or multivariate. The conjugate family for ψ t is given by

, with τ = (τ 1 , . . . , τ k ).

It is well known that E[H(y)] = ∇b(ψ) and V [H(y)] = ∇ 2 b(ψ). Moreover, for regular exponential exponential families (Bernardo and Smith (2009) ), under conjugate analysis, posterior and predictive distributions are analytically available. In particular, the posterior density for ψ t is given by p(ψ t |y t , τ 0 , τ ) = p(ψ t |τ * 0 , τ * ), where τ * 0 = τ 0 + 1, τ * = τ + H(y t ), τ + H(y t ) = (τ 1 + h 1 (y t ), . . . , τ k + h k (y t )) and the predictive distribution for a future observable y f is p(y f |y t , τ 0 , τ ) = c(y f )

The class of dynamic generalized linear models was defined by West et al. (1985) , assuming data in the uniparametric exponential family, and relates the parameters η t to a dynamic linear predictor F t θ t through a link function g. The natural parameter ψ t = cφ(η t ) follows a conjugate prior of the form CP(τ 0t , τ t ), for some τ 0t and τ t . A DGLM is thus defined by three equations: an observable model in the k-parametric exponential family, an equation relating the parameter vector η t to a dynamic predictor governed by states θ t and an evolution or system equation ruling the temporal dynamic of the latent states, as follows: g(η t ) = F t θ t = λ t and θ t = G t θ t−1 + ω t , ω t ∼ (0, W t ). F t is a known (p × k) dynamic regression matrix; G t is a (p × p) state evolution matrix; W t is a (p × p) evolution covariance matrix. g is an invertible link function. If

. . , k, g is the canonical link. West et al. (1985) proposed the following procedure to produce real-time inference on uni-parametric exponential family DGLMs: at each time t = 1, 2, . . . , T , a prior distribution for the states θ t is partially specified, in terms of its first and second moments and the guiding relation g(ψ t ) = λ t is used in order to make the moments of the conjugate prior compatible with the moments induced by the states partial specification, eventually making use of low-order approximations for g(µ t ). The learning system is trivially updated to accommodate the observed y t , resulting in closed form for the posterior distribution of the natural parameter, from which analytically tractable predictive distributions are easily obtained. Finally, the updated information must be passed on to the states moments. Linear Bayes arguments (see West and Harrison, 1997 , Chapter 4) are used to deal with the unbalanced dimensions of the scala natural parameter, and the p-dimensional vector of states. The posterior moments of the states evolve to prior ones through the system equation and the cycle is repeated up to the last period of time, T , when backwards relations may be used in order to generate smoothed distributions for non-observables.

The class of extended dynamic generalized linear models (EDGLM) was presented by Souza et al. (2016) , for the biparametric exponential family, which built on the methods proposed by West et al. (1985) . In conjugate form, a joint prior distribution for η t -mean and precision, at each time t -is provided, resulting in the predictive distribution being closed. A bi-variate link function connects the linear predictors determined by states θ t to η t . The sequential inference algorithm is based in the conjugate distribution and linear Bayes estimation as proposed by West et al. (1985) . Because the equating step of the conjugate updating algorithm has more conditions than unknown quantities, an additional step is required to make the inference procedure possible. A solution inspired by the generalized method of moments is then proposed. This step is not required in our proposal, and as shown in Section 3, we deal with the unbalanced dimensions of the natural parameter vector and the vector of states in a very straightforward manner, using normal theory and conditional expectation properties.

Section 3 presents our proposal, building upon West et al. (1985) and Souza et al. (2016) and introducing information geometry arguments, extending these works in the sense of providing a general formulation for sequential inference on k-parametric Dynamic Generalized Linear Models.

In this Section, we present our proposal for sequential updating of DGLMs based on the k-parametric exponential family. An important aspect to highlight, in our proposal, is the fact that the sequential nature of the method naturally accommodates the possibility of carrying out monitoring and interventions. We assume conditionally independent observations y 1 , y 2 . . . characterized by an observational equation in the form of Equation (2), with dynamic predictors g(η t ) = F t θ t = λ t governed by latent states that evolve through time according to the system equation

Let D 0 denote the initial set of information, previous to any observation. The model is completed with the prior specification

. Discount factors are used to implicitly specify W t (Ameen and Harrison, 1984) .

Our proposal relies on information geometry arguments, which are briefly reviewed in Subsection 3.1. The algorithm for sequential updating is detailed in Subsection 3.2 and some special cases of the formulation, for members of the exponential family, are described in Section 3.3.

In this section we present a short review of the main concepts involved in our proposed algorithm for inference in k-parametric DGLM (k-DGLM). The central result is the projection theorem, which is applied in order to the reconcile prior and posterior distributions induced by the state vector to the linear predictors and the conjugate prior specification (and vice-versa). Let M be a manifold of probability distributions

where p(y|η) denotes a probability density function and η ∈ E can be viewed as a coordinate system. In particular, we are interested in the manifold of exponential family probability distribution denoted as presented in equation (1). Here, for ease of notation, we drop the temporal indexes:

is a convex function, called cumulant generating function and we are interested in the divergence induced by b(ψ). The Bregman divergence is defined as

The Bregman divergence for the special case where b(ψ) is the free energy convex function of the exponential family is the Kullback-Leibler (KL) divergence between distributions p and q:

We are ready to present an important result for our proposed methodology, which can be viewed as a special case of the theorem 1.4 presented in Amari (2016) . Although the theorem is stated, in general, in terms of approximations for arbitrary quantities y, we emphasize that it will be applied, throughout this work, in the sense of approximating two different prior or posterior specifications for a parametric vector.

Theorem 3.1 (projection theorem) Let p(y) be a probability distribution on a set Y. Let S be an exponential family over Y. The distribution q(y) that minimizes the divergence D KL [p : q] , q(y) ∈ S is such that

where H q denotes a vector of sufficient statistics under q.

A proof of Theorem 3.1 may be seen in Appendix A.1.

In the conjugate updating schemes presented in Section 2, the probabilistic characterization of the states is given only in terms of its first and second moments. Thus it is natural to go one step further and impose a normal prior distribution for the states. This specification implies a k-dimensional Gaussian distribution for the linear predictors and we aim at minimizing the Kullback-Leibler divergence between the prior induced by this specification and the joint conjugate prior for λ t = g(η t ). Thus, although we operate on the minimization of the divergence between two prior densities, at this point, our proposed method resumes to an optimization problem on the parametric space. Specifically, we search for the parameters τ t of the conjugate prior for η t that minimize the KL divergence between the prior induced by the Gaussian assumption for the states and the conjugate one.

Once the conjugate prior is completely specified and y t is observed, the updating at time t is completed via Bayes theorem, which results, by conjugacy, in a posterior distribution in the same parametric family of the prior, with updated parameters denoted by τ * . Notice that the Kullback-Leibler divergence does not enjoy symmetry properties ( D KL [p(y) : q(y)] = D KL [q(y) : p(y)] ); thus, after performing the updating step for the natural parameters, we come across another optimization problem: this time, we must obtain parameters for the k-variate Normal posterior that minimize its divergence to the conjugate posterior, thus producing an updated distribution for the vector of linear predictors, λ t . Under normality of the states, a multivariate Gaussian distribution is implied to (θ t , λ t |D t−1 ), where D t−1 = {D 0 , y 1 , . . . , y t−1 }. Thus, by normal theory properties, the conditional (θ t |λ t , D t−1 ) results in closed analytical form. The first and second moments of (θ t |D t ) are obtained following arguments similar to those in West and Harrison (1997, p.639 ).

The posterior distribution at time t induces a prior for time t + 1 and the learning cycle repeats.

It is worth mention that our understanding is that it makes much more sense to work on the compatibilization of the full prior or posterior densities than restricting ourselves to the compatibilization of specifications solely in terms of a few moments. In short, we feel that, in the process of compatibilizing prior or posterior distributions, it is much more reasonable to measure discrepancies between density curves, since proximities measured on the parametric space do not necessarily reflect close prior or posterior specifications. Nevertheless, it is also interesting to note that, although our proposal for compatibilization of prior and posterior distributions takes into account the divergence between two densities, the optimization problem still has its support on the parametric space.

Algorithm 1 describes the steps for the information filtering process, in our proposed methodology, following the notation for the mean vector and the covariance matrix:

iii) predictive distribution for the linear predictor:

iv) posterior distribution for the linear predictor:

Algorithm 1: Filtering and one-step ahead prediction in DGLMs via Information Geometry 1

Step 1: Evolution. Given m t−1 , C t−1 and F t θ t = λ t .

Step 2: Given f t and Q t , solve for τ t :

Step 2.1: Obtain the vector of sufficient statistics H q (η t ) of the conjugate distribution q(η t |τ t ).

Step 2.2:

Step 3: Update posterior moments using Bayes' Theorem

Step 4: Given τ * One could also be interested in obtaining j-steps ahead forecasts, for j = 1 : J, and smoothed estimates.

Once the prior and posterior distributions of the states θ t , t = 1 : T have been obtained, smoothed estimates of the posterior moments, E[θ t−k |D t ] and var[θ t−k |D t ], k = 1, · · · , t − 1, can be evaluated. Recall that the states are normally distributed, with prior mean a t , t = 1 : T and variance R t , t = 1 : T ; and posterior mean m t , t = 1 : T and variance C t , t = 1 : T . Similar comments apply to what concerns the j-step ahead forecasts. Once posterior moments of the states have been obtained and F t+j , G t+j are defined, it is possible to obtain j-steps ahead distributions for (θ t+j |D t ) and predictive distributions. The steps for Smoothing and prediction are described in Algorithm 2.

Algorithm 2: Smoothing and J steps-ahead Forecast Distributions 1 Given a t , R t , m t and C t , obtain the smooth the posterior moments for t = 1, . . . , T :

Given m T , a T , R T , C T , F t+j and G t+j , obtain the j-steps ahead prior for the linear predictor:

Once the j-steps ahead prior distribution for the linear predictor is obtained, it must be reconciled with the conjugate prior for the canonical parameters and the j-steps ahead predictive distribution for the observable Y t+j |D t is obtained directly by conjugacy properties.

In this subsection, we illustrate our proposal, summarizing the analytical developments involved in the process of information updating for some members of the exponential family: Multinomial, Normal, Poisson and Bernoulli, starting with the Multinomial dynamic model, which is one of the main contributions of this work.

In what follows, to simplify notation we suppress time indices, but one must keep in mind that all the relations described below must be considered for each timet, t = 1, . . . , T .

Multinomial: Let η = π = (π 1 , π 2 , . . . , π d+1 ) and y|π ∼ M ultinom(π, m), 0 < π l < 1, d l=1 π l ≤ 1, π d+1 = 1 − d l=1 π l , d l=1 y l ≤ m, y d+1 = m − d l=1 y l . Thus the observational model is a member of the exponential family with c(y) = m! d+1 l=1 y l ! , H(y) = y, h l (y) = y l , ψ l = log π l π d+1 , l = 1, . . . , d. The conjugate prior is a Dirichlet density q(π|τ ) =

. Consider a vector of linear predictors λ = [λ 1 , . . . , λ d ] = log π 1 π d+1t , . . . , log π d π d+1

= F θ. Therefore, using the canonical link function ψ l = λ l = log π l π d+1 and remembering that λ ∼ N d (f , Q), a system of equations in τ follows to be solved, that is,

, with H (π) = log π 1 π d+1 , . . . , log π l π d+1 , log(π d+1 ) , resulting in the system:

withH denoting the Hessian of log 1 1+ d l=1 e λ l and γ(·), the digamma function. After observing y, we obtain τ * l = τ l + y l , l = 1, . . . , d, τ * 0 = τ 0 + m, the updated parameter vector of the Dirichlet posterior distribution and need to evaluate the parameters (f * , Q * ) of the posterior distribution of the linear predictors which are compatible with them, solving:

, l = 1, . . . , d, Q * l,l = γ (τ * d+1 ), l = l , with γ (·) denoting the trigamma function.

Normal: Let η = (µ, φ) and (y|µ, φ) ∼ N ormal(µ, φ), µ ∈ , φ > 0. Thus the observational model is a member of the exponential family with c(y) = (2π) −1/2 , H(y) = (y, y 2 ), ψ = (µ, log(φ)).

The conjugate prior is a Normal-Gama density, hierarchically structured:

Consider a vector of linear predictors:

H q = (φµ 2 , φµ, φ, log φ) is a vector of sufficient statistics for Ψ q = − c 0 2 , c 0 µ 0 , − c 0 µ 2 0 2 + d 0 2 , n 0 −1 2 = (τ 1 , τ 2 , τ 3 , τ 0 ) and we highlight the following reparametrization:

Since λ ∼ N 2 (f , Q), a system of equations in τ follows to be solved, that is,

with q denoting a Normal-Gamma prior distribution and p, the density implied by the joint Gaussianity of the linear predictors (2), resulting in the system:

with γ(·) denoting the digamma function After observing y, we obtain τ * = (τ * 1 , τ * 2 , τ * 3 , τ * 0 ) = (τ 1 − 1/2, τ 2 + y, τ 3 − y 2 /2, τ 0 + 1/2), the updated canonical parameters of a Normal-Gamma posterior density for (µ, φ), and need to evaluate the parameters (f * , Q * ) of the posterior distribution of the linear predictors which are compatible with them, solving: E q [H q ] = E p [H q ] considering, now, that p is a Normal-Gamma density for (µ, φ) and q is a bivariate Normal for (µ, logφ). H q = (λ, λλ ) and the following system needs to be solved:

Following, we present two univariate particular cases, which are usual in practical applications: a Poisson dynamic model -which is adopted for quarterly sales in Section 4.1 -and a Binomial dynamic model. Since these are uniparametric exponential families, West et al. (1985) is an alternative approach that we do not expect to produce results or computational time that differ significantly from ours.

Poisson: Let y|η ∼ P o(η), η > 0, a member of the exponential family with c(y) = 1 y! , h(y) = y, ψ = ψ(η) = log(η) and b(ψ) = exp(ψ). The conjugate prior is a Gamma density q(η|τ ) = 1 η exp[τ 1 log(η) − τ 2 η − (log(Γ (τ 1 ) − τ 1 log(τ 2 ). Therefore, using the canonical link function λ = log(η) and remembering that λ ∼ N [f, q], a system of equations in τ follows to be solved, that is,

H q (η) = (log(η), η), resulting in the system: γ(τ 1 ) − log(τ 2 ) = f and τ 1 = τ 2 exp(f + q/2). Using an approximation for the digamma function reults in: τ 1 = q −1 and τ 2 = q −1 exp[−(f + q/2)], the parameters of the unique conjugate gamma prior compatible with the normal linear predictor. After observing y, we obtain τ * 1 = τ 1 + y, τ * 0 = τ 0 + 1, the updated parameter vector of the Gamma posterior distribution, and need to obtain the compatible parameters (f * , Q * ) of the posterior distribution of the linear predictor compatible with them, solving: f * γ(τ * 1 ) − log(τ * 2 ) and q * + (f * ) 2 = γ (τ * 1 ) + [γ(τ * 1 ) − log(τ * 2 )] 2 , which leads to the solution: f * = γ(τ * 1 ) − log(τ * 2 ) and q * = γ (τ * 1 ).

Bernoulli: Let η = π and y|π ∼ Ber(π), 0 < π < 1, a member of the exponential family with c(y) = 1, h(y) = y, ψ = ψ(π) = logit(π) = log( π 1−pi ) and b(ψ) = log(1 + e ψ ). The conjugate prior is a Beta density q(π|τ = exp{τ 1 log( π −π )+τ 0 log(1−π)−[log(Γ (τ 1 +1))+log(Γ (τ 0 −τ 1 +1))−log(Γ (τ 0 +2))]}. Therefore, using the canonical link function λ = logit(π), remembering that λ ∼ N [f, q] and taking a second order Taylor expansion of log(1 − π), a system of equations in τ follows to be numerically solved, that is, E q [H q (π)] = E p [H q (π)], with H q (π) = (log( π 1−π ), log(1 − π)), resulting in the system: γ(τ 1 + 1) − γ(τ 0 − τ 1 + 1) = f and γ(τ 0 − τ 1 + 1) − γ(τ 0 + 2) = log( 1 1+e f ) − qe f (1+e f ) 2 . After observing y, we have τ * 1 = τ 1 + y, τ * 0 = τ 0 + 1, the updated parameter vector of the Beta posterior distribution and need to evaluate the parameters (f * , q * ) of the posterior distribution of the linear predictor compatible with them, solving: f * γ(τ * 1 + 1) − γ(τ * 0 − τ * 1 + 1) and q * = γ (τ * 1 + 1) + γ(τ * 0 − τ * 1 + 1).

Next section presents two real data applications of our proposal.

This section aims to illustrate the proposed methodology. Our main goal is to show that our method can capture stochastic model components, as, for example, seasonality and trends, for the k-parametric exponential family.

We present two illustrative examples. In the first one, a classical time series on quarterly sales is modeled using a Poisson dynamic model with a growth trend and stochastic seasonal pattern, showing advantages of our proposal over alternatives based on locally constant level models. Next, we illustrate the proposed methodology with a normal model in the context of stochastic volatility, with dynamic predictors for both mean and precision, applied to monthly IBM stock log returns.

In this subsection, we adopt a Poisson dynamic model to adjust and predict a time series on quarterly sales, which is known in the literature for exhibiting a growth trend and stochastic seasonal pattern. The data are exhibited as points in Figure 1 . The purpose of using this series is to show the efficiency of the proposed method, which deals, in reduced computational time and preserving the sequential aspect of the analysis, with predictive structures that can be based on several dynamic components, allowing to accommodate various stochastic patterns often present in real time series. We fit the data using the following Poisson log-linear model with a linear growth term and two pairs of harmonics for a Fourier description of the seasonal pattern:

−sen(kw) cos(kw)   , w = 2π/4 and k = 1, 2.

We fit the same model through the alternative Conjugate Updating approach by West et al. (1985) . The prior specification, for both approaches, was defined as follows: in our proposal, we assumed θ 0 |D 0 ∼ N (m 0 = 0, C 0 = 1) and used a diagonal block discount factor matrix to indirectly specify the covariance evolution W t . The Conjugate Updating approach was run with the same prior first and second order moments. Both procedures initiate with a posterior specification for θ 0 |D 0 , being naturally updated to the prior for θ 1 |D 0 and then this (complete or partial) prior specification induces a prior structure to the predictor, which must be reconciled with the conjugate prior for the Poisson family, which is a Gamma distribution for η t -equivalently, a log-Gamma prior for the predictor -whose parameters are inherited from the prior compatibility step following the specific method recommended by each approach: equalizing moments or minimizing the KL divergence between prior densities.

We also compare our proposal to the one by Gamerman et al. (2013) , which unifies Smith and Miller (1986) and Harvey and Fernandes (1989) , introducing the Gamma Family Dynamic Models (GFDM), in which a local steady structure based on a dynamic level is coupled to a predictor with time-fixed coefficients. The evolution rule follows Smith and Miller (1986) .GFDMs form an elegantly defined class for a broad family of observable models (Poisson, Gamma, Weibull, Normal, among others), some of which lie outside the exponential family. The local level proposal by Gamerman et al. (2013) does not allow for dynamics on components other than the level. In order to adopt a model structure as close as possible to the one that we fit via our proposal, we defined the following regressors, to incorporate a growth trend and seasonal pattern on an alternative model: x 1 = t, x 2 = cos(wt), x 3 = sen(wt), x 3 = cos(2wt), where t=1, . . . , 35 and w = 2π/4. We apply Gamerman et al. (2013) 's local level approach to the following model:

The prior specification for model 3 was defined as follows: we assigned a Gamma(0.01, 0.01) prior for η 0 , an Uniform(0, 1) prior distribution was assigned to the uncertainty inflation parameter, which is responsible for the smoothness of the dynamic level evolution (see Gamerman et al. (2013) for details on the evolution of the states). A Normal(0, 100) prior was assigned to β k , k = 1, . . The method proposed by Gamerman et al. (2013) , which involves obtaining samples of the posterior distributions through MCMC methods, demanded more than a hundred times the computational time necessary to obtain estimates from our proposed method. On the other hand, our method and the Conjugate

Updating, in addition to having the advantage of preserving the sequential aspect of the analysis, both took less than 1 second to obtain the model's estimates. As can be seen in Figure 1 , one-step predictions obtained via our proposal are very close to the ones obtained through the Conjugate Updating method, as expected, since both methods are able to capture the change in the seasonal pattern, without any intervention. The local level model, however, did not capture the change in the seasonal pattern.

If we analyze the structural decomposition of the fitted models, it may be seen, in Figure 2 (a), that the level estimates obtained through the three approaches are very close. However, the estimates obtained through the local level model are lower than those obtained using our proposal and Conjugate Updating and we need to keep in mind that these last two were based on the same model, 3, which allowed for temporal evolution for Figure 1 : Prediction 1 step ahead via information geometry, West et al. (1985) and Gamerman et al. (2013) along the observed series (points). The solid line represents our proposal, the dotted line represents the method proposed by West et al. (1985) West et al. (1985) and the dashed line represents the proposed method of Gamerman et al. (2013) the level as well as for the growth trend. An aspect which is important to highlight in this example is that, when we analyze the smoothed estimates of the seasonal pattern, as may be seen in Figure 2(b) , the local level model was not able to capture the changes in the seasonal pattern, through time, since it does not allow for dynamic components which dealt with it, when model 3 was fir to the data.

Almost indistinct from our proposal for this particular application, the Conjugate Updating is flexible enough to accommodate several dynamic components in models' predictive structure and is adequate to fit model 3, but, unlike our proposal, which naturally deals with k-parametric exponential families, the Conjugate Updating restricts to uni-parametric ones. For a Poisson family, we did not expect significant differences between our proposal and the one by West et al. (1985) , both in terms of capturing stochastic changes structural patterns and computational times. On the other hand, as shown by our results, depending on the behavior of the data, local level approaches like the one by Gamerman et al. (2013) , which assign all the dynamic in the predictor to a single component, may lack flexibility to capture stochastic changes in the patterns exhibited by the time series.

In what follows, we focus on applications involving k−parametric exponential families, which are not accommodated by West et al. (1985) 's Conjugate Update approach.

In financial applications, stockholders often face decision problems that usually depend on measures of volatility as a measure of risk. Therefore, stochastic volatility (SV) models are useful tools for modelling time-varying variances. The foundations of SV models are strongly related to financial economic theories, and from a practical point of view, they are able to appropriately capture the main empirical properties often observed in daily series of financial returns. Many approaches assume that the log returns follow a zero-mean normal distribution with a time varying variance. Therefore, the log of the square of the return follows the model: log(y 2 t ) = h t + a t , where h t evolves throughout time as first order DLM and a t , the log-χ 2 , is approximated by a mixture of normal distributions (Chib et al. (2009) ). Our goal is to illustrate the proposed methodology for time series of normal distributed data, with unknown time varying mean and variance, which is a member of the biparametric exponential family. Let y t ∼ N (µ t , φ −1 t ), then

where γ is an autoregressive coefficient. The proposed model will be applied to IBM's monthly log-return series, which may be found in Tsay (2005) . There were 876 observations in total, dating back to January 1926 and ending in December 1999.

Let y t be modelled as 2-DGLM. In this part, the proposed model will be applied to IBM's monthly log-return series, which may be found in Tsay (2005) . There were 876 observations in total, dating back to January 1926 and ending in December 1999.

The average in Figure 3 (a) is about zero, which is consistent with the literature. Its variations correspond to the series' movements, indicating a good match. The volatility result (Figure 3 (b)) is consistent with what has been found in the literature. The same application example may be found in Souza et al (2016) . They end up with a system of equations in which there are more equations than parameters in the normal case. The authors propose using the Generalized Method of Moments to solve this system (GMM). Although efficient, the GMM methodology adds yet another approximation method to the estimating process, with its own set of processing overhead.

We don't have the same problem here, which is an advantage. patterns associated with each one.

We consider hospital admissions counts for three age groups: children that are less than 5 years old (that we refer, from now on, as early-childhood group), adults that are more than 50 years and others (that we refer, from now on, as senior group). Our main interest here is to understand the behavior of the number of admissions for the first two groups, since both children and elderly individuals are more susceptible to diarrhoea and gastroenteritis severe effects. Both time series are plotted as gray dots in Figure 4 .

Let Y t |η 1t , η 2t ∼ Multinomial(η 1t , η 2t , η 3t ) where η 3t = (1 − η 1t − η 2t ) and assume the following predictive

We consider the same structure for each category:

Therefore, F t = blockdiag [F 1t , F 2t ] and G t = blockdiag [G 1t , G 2t ]. Note also that we assigned discount factors for trend and seasonality, respectively 0.95 and 0.975. when it slowly grows, tending to nearly stabilize until the beginning of the pandemic period, when it drops to a level such that there is no significant difference to the reference group. On the other hand, the senior group presented a general increasing pattern of the odds for hospital admissions, if compared to the reference group, and finally no significant difference in relation to this group, from the beginning of the Covid-19 pandemic on. Figure 7 exhibits the smoothed growth factors for the levels of both groups and highlights our previous comments on the level variations, along the analysis period, with negative growth factors associated to decaying trends and growing trends associated to positive growth factors. Once again it is worth notice that not only the levels of the processes varied through time, but also their growth factors. (8)) clearly shows that there was significant variation in the seasonal pattern during the analysis period and, by allowing for dynamics on various structural components, our formulation was able to capture such changes.

In this work, we proposed a methodology for sequential information updating of in dynamic generalized linear models, for univariate and multivariate responses in the k-parametric exponential family. The proposed method naturally preserves desirable aspects of time series analysis, such as monitoring and the possibility of intervention, when strategically needed. It produce filtered and smoothed distribution as well as forecasts for future values of the response, in extremely reduced computational time, which makes it an attractive alternative in situations that demand real-time analysis and timely decision-making. The proposal was detailed for the particular cases: Multinomial (multivariate, k-parametric), Bernoulli and Poisson (univariate, uniparametric) and Normal with dynamic linear predictor for mean and variance (univariate, biparametric).

The development relies on conjugation properties associated with the exponential family and on the Projection Theorem, used to reconcile conjugate prior and posterior distributions for the canonical parameter vector to the disstributions induced by the assumption of normality for the states that govern the dynamics of the linear predictor. Such predictors easily accommodate stochastic trends, seasonality and dynamic effects of covariates. We are currently working on adapting the method for Dirichlet compositional responses as well as on its application to multivariate normal outputs in the context of stochastic volatility models.

Assume the canonical form of the exponential family as presented before. It is well known that minimizing D KL [p : q] is equivalent to maximizing E p [log(q(y)]. From the definition of the exponential family, it follows that l(ψ q ) = p(y) log q(y)dy = E p (log c(y t )) + ψ q E p (H q ) − b(ψ q ).

Deriving l(ψ q ) with respect to ψ q and equating this derivative to zero, one obtains the parameters of the distribution q(y) that best approximates a distribution p(y). That is, the following system must be solved:

Stochastic volatility in mean models with scale mixtures of normal distributions and correlated errors: A Bayesian approach

Information geometry and its applications

Discount weighted estimation

Bayesian Theory

Bayesian forecasting of many count-valued time series

Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models

On Gibbs sampling for state space models

Multivariate stochastic volatility

Dynamic bayesian beta models

A simple and efficient simulation smoother for state space time series analysis

Posterior mode estimation by extended Kalman filtering for multivariate dynamic generalized linear models

Penalized likelihood estimation and iterative Kalman smoothing for non-Gaussian dynamic regression model

Data augmentation and dynamic linear models

Sampling from the posterior distribution in generalized linear mixed models

Markov chain Monte Carlo for dynamic generalised linear models

A non-Gaussian family of state-space models with exact marginal likelihood

Time series of continuous proportions

Bayesian forecasting

Time series models for count or qualitative observations

Non-Gaussian state-space modeling of nonstationary time series (with discussion)

Revisiting dlm for p-dimensional exponential family

Dynamic models

Generalized linear models

Coda: Convergence diagnosis and output analysis for mcmc

Time series analysis of compositional data using a dynamic linear model approach

Compositional time series analysis of mortality proportions

Likelihood analysis of non-Gaussian measurement time series

State space modelling of cross-classified time series of counts

A generalization of the Bayesian steady forecasting model

A non-Gaussian state space model e application to prediction of records

Extended dynamic generalized linear models: The two-parameter exponential family

Finding market structure by sales count dynamics-Multivariate structural time series models with hierarchical structure for count data

Analysis of financial time series

Bayesian Forecasting and Dynamic Models

Dynamic generalized linear models and Bayesian forecasting