key: cord-0633723-qsge089w
authors: Levy, Bruno P. C.; Lopes, Hedibert F.
title: Dynamic Portfolio Allocation in High Dimensions using Sparse Risk Factors
date: 2021-05-13
journal: nan
DOI: nan
sha: cdb26e689b9cac1ab65f46a0fc45138360ee81cc
doc_id: 633723
cord_uid: qsge089w

We propose a fast and flexible method to scale multivariate return volatility predictions up to high-dimensions using a dynamic risk factor model. Our approach increases parsimony via time-varying sparsity on factor loadings and is able to sequentially learn the use of constant or time-varying parameters and volatilities. We show in a dynamic portfolio allocation problem with 452 stocks from the S&P 500 index that our dynamic risk factor model is able to produce more stable and sparse predictions, achieving not just considerable portfolio performance improvements but also higher utility gains for the mean-variance investor compared to the traditional Wishart benchmark and the passive investment on the market index.

Portfolio allocation is one of the most common problems in finance. Since the seminal work of Markowitz (1952) , mean-variance optimization has been the most traditional way to select stocks in the financial industry and is commonly applied in the academic literature. The covariance matrix of returns is the key input to generate optimal portfolio weights, which makes its forecast accuracy crucial for out-of-sample portfolio performance. However, the universe of assets available for allocation is vast nowadays, increasing the dimension of such covariance matrices potentially to hundreds or even thousands of stocks. Due to the fact that the number of parameters in a covariance matrix grows quadratically with the number of assets, it inserts the traditional Markowitz's problem into the curse of dimensionality.

Since large covariance matrices are extremely susceptible to estimation errors and instabilities, producing poor out-of-sample predictions for portfolio construction, we propose what we call a Dynamic Risk Factor Dependency Model (DRFDM). By the use of economically motivated risk factors and inducing time-varying sparsity on factor loadings, our approach is able to achieve higher model parsimony, dramatically reducing the parameter space and improving predictions for final portfolio decisions. The DRFDM combines a factor structure with sparsity in a conjugate and sequential fashion without the use of MCMC schemes, making the estimation process much faster and allowing investors to backtest a universe of hundreds of assets in a matter of few minutes.

The use of factor models in the financial literature is not new. With different applications in the asset pricing literature, since the CAPM of Sharpe (1964) , the APT of Ross (1976) and the seminal work of Fama and French (1992) , these models consider that all systematic variation in returns are driven by a set of common factors that can be observable financial indices or unknown latent variables. This framework is commonly used for evaluating return anomalies and portfolio manager performance but also for portfolios construction. Our paper will focus on the use of observable risk factor models to build optimal portfolios.

To illustrate the idea behind factor models, consider an exact K-risk factor model with a N-dimensional vector of assets returns:

where B t is a N × K matrix of factor exposures (loadings) to the K risk factors f t and Ω t = diag σ 2 1t , . . . , σ 2 Nt . If Var( f t ) = Σ f t , the model in (1) implies an unconditional variance for return as:

which is divided between a systematic component related to the factor exposures and factor covariances and an idiosyncratic component for each individual asset return. Note that using K << N implies a strong reduction on the total number of parameters to be estimated in Σ r t . When latent factor models are considered, f t is treated as an unobserved variable estimated from the data. Some important references are Aguilar and West (2000) , Han (2006) , Lopes and Carvalho (2007) , Carvalho, Lopes, and Aguilar (2011) , Zhou, Nakajima, and West (2014) , Kastner, Frühwirth-Schnatter, and Lopes (2017) and Kastner (2019) . Both Carvalho et al. (2011) and Zhou et al. (2014) explore the notion of time-varying sparsity, with factor loadings being equal to zero for different periods of time. Kastner (2019) induces static sparsity by the use of shrinkage priors on factor loadings, pulling coefficients toward zero. Although his approach induces greater parsimony, shrinkage priors do not impose coefficients to be exactly zero, remaining a portion of estimation uncertainty that is still carried to the covariance matrix.

The major drawback of latent factor models is that it requires MCMC schemes to simulate from joint posteriors, imposing great computational burden to scale up models to high-dimensions. The problem is itensified by the fact that strategies in quantitative finance require a sequential analysis for backtests, so the MCMC needs to be repeated for each period of time, which can take not just several days but weeks to be completed. Therefore, latent factor models are prohibitive for sequential analysis in high-dimensions (see Gruber and West, 2016 for a deeper discussion on model scalability).

In order to seek for fast sequential analysis and higher model flexibility, instead of estimating latent factors, we include observable risk factor commonly used in the financial literature to represent the common movements of returns. We use the traditional 5 factors from Fama and French (2015) as the main representation, also showing results for different subsets of risk factors. There are different papers in the literature that have already addressed the estimation of the covariance matrix of returns using observable risk factors.

Recent key references include Wang, Reeson, and Carvalho (2011), Brito, Medeiros, and Ribeiro (2018) , Puelz, Hahn, and Carvalho (2020) and De Nard, Ledoit, and Wolf (2020) . The main advantage of DRFDM in relation to those papers is its ability to take into ac-count model uncertainty in a dynamic fashion in different model settings for individual asset returns. As we explain in the next section, the DRFDM uses what started to be known in the recent econometric literature as a "Decouple/Recouple" concept. The basic idea is to decouple the multivariate dynamic model into several univariate customized dynamic linear models (DLM) that can be solved in parallel and then be recouple for forecasting and decisions. It is strictly related to the popular Cholesky-style Multivariate Stochastic Volatility of Lopes, McCulloch, and Tsay (2021) , Shirota, Omori, Lopes, and Piao (2017) and Primiceri (2005) and also applied in a similar fashion in Zhao, Xie, and West (2016) , Fisher, Pettenuzzo, Carvalho et al. (2020) , Lavine, Lindon, West et al. (2020) and Levy and Lopes (2021a) . This framework allow us to take the model uncertainty problem into the univariate context, making highly flexible dynamic model choices.

Since it is well known that the environment of the economy and the financial market is continuously changing, model flexibility becomes a extremely appealing feature to be incorporated nowadays. The common patterns in stock returns in the 90s are different from those during the Great Financial Crisis or during the recent Covid-19 pandemic. Factor exposures can be lower or higher, depending on calm or stressed periods. Additionaly, we can consider the fact that for some periods a subset of stock returns are not loading on specific risk factors. This motivates our work to impose time-varying sparsity, where inspired by the works of Raftery, Kárnỳ, and Ettler (2010) , Dangl and Halling (2012) and Koop and Korobilis (2013) we use a Dynamic Model Selection (DMS) approach to sequentially select risk factors via dynamic model probabilities, where a risk factor is included if it is empirically wanted. We also consider a model space that is determined not just by different risk factors, but also by different degrees of variation in factor loadings, return volatilities and factor volatilities. Therefore, the DRFDM can adapt to environments of higher, lower or no variation in coefficients, for each specific entry of matrices in Equation (2). Hence, one specific asset return can be much more volatile than others and some risk factors can vary differently over time, also moving from constant to time-varying parameters. This last setting has a similar flavour of parsimony in the sense of Lopes et al. (2021) .

Dynamic risk factor selection also offers an important benefit in terms of model parsimony since it imposes factor loadings of non-selected factors to be exactly equal to zero. It tends to induce much higher parsimony compared to continuous shrinkage priors applied in Kastner (2019) and many other papers nowadays. As argued in Huber, Koop, and Onorante (2020) and Hauzenberger, Huber, and Onorante (2020) , continuous shrinkage priors offer a lower bound of accuracy to be achieved and, for highly parametrized models, parameter uncertainty over-inflate predictive variances.

Finally, an additional contribution of DRFDM to the literature relies on its simple and fast computation. Since we rely on a conjugate model, with closed-form solutions for posterior distributions and predictive densities, there is no need for expensive MCMC methods, making the whole process extremely fast and easily scalable for high-dimensions. As we show in the empirical section, we perform a portfolio allocation procedure using data covering both the Great Financial Crisis and the initial stage of the Covid-19 pandemic with almost 500 stock returns from the S&P 500 index that can be backtested in just few minutes. We compare results with different specification choices and the traditional Wishart Dynamic Linear Model (W-DLM) as a benchmark. The W-DLM has been a standard model in the Bayesian financial time series and in the financial industry, because of its scalability and availability of sequential filtering. Our results show that the DRFDM is able to produce not just much stronger statistical improvements but substantial increase in Sharpe Ratios and risk reduction. We also show that a mean-variance investor will be willing to pay a considerable management fee to switch from the W-DLM benchmark, from different well known estimation methods in the literature and the passive investment on the S&P index to the Dynamic Risk Factor Dependency Model.

The remainder of the paper is organized as follows. The general econometric framework is introduced in Section 2. Section 3 details the time-varying model and factor selection approach and how it can be applied to impose sparsity. In Section 4 we perform our empirical analysis, providing an out-of-sample statistical and economic performance evaluation in a high-dimensional environment. Section 5 concludes.

As mentioned in Section 1, our work is inspired by the Cholesky-style framework in Lopes et al. (2021) and Primiceri (2005) , being closely related to the Dynamic Dependency Network Model of Zhao et al. (2016) . The great advantage of this framework is to model the cross-sectional contemporaneous relations among different series and customize univariate DLMs. Consider r t as a N-dimensional vector with asset returns time series r j,t and consider the following dynamic system:

where α t is a N-dimensional vector of time-varying intercepts and Ω t = diag σ 2 1t , . . . , σ 2 Nt .

All contemporaneous relations among different asset returns are coming from the N × N matrix B t , whose off-diagonal elements β jit s (for j = i) capture the dynamic contemporaneous relationships among series j and i at time t and B t has zeroes on the main diagonal.

In the work of Lopes et al. (2021) , Zhao et al. (2016) and Levy and Lopes (2021a) , B t is a lower triangular matrix with zeroes in and above the main diagonal:

Since the error terms in t are contemporaneouly uncorrelated, the triangular contemporaneous dependencies among asset returns in Equation (4) generate a fully recursive system, known as a Cholesky-style framework (West, 2020) . Hence, each equation j of the system will have its own set of parents (r pa(j),t ), that is, will depend contemporaneously on all other asset returns above equation j, following the triangular format in Equation (4) . In words, the top asset return in the system will not have parents, the second from the top asset return will have the first time series of returns as a parent and will load on it, the third asset return will have the first two asset returns as parents and will load on them all the way to the last asset return, which will depend on all other N − 1 returns above it.

Equation (3) can be rewritten in the reduced form as

where A t = (I N − B t ) −1 and u t = A t t . The modified Cholesky decomposition clearly appears in Σ t = A t Ω t A t which is now a full variance-covariance matrix capturing the contemporaneous relations among the N asset returns. Given the parental triangular structure of B t in (4), the equations will be conditionally independent, bringing the "Decoupled" aspect of the multivariate model. In other words, the multivariate model can be viewed as a set of N conditionally independent univariate DLMs that can be dealt with in a parallelizable fashion. The outputs of each equation are then used to compute B t and Ω t , hence recovering the full time-varying covariance matrix Σ t .

Although the model above shows greater flexibility, it remains highly parameterized and susceptible to producing poor out-of-sample forecasts. Notice that for a model with hundreds or thousands of equations, those asset returns at the bottom part will load on several hundreds of other assets, making out-of-sample forecasts too unstable. Additionally, the triangular form in Equation (4) makes the system dependent of the asset return ordering. As highlighted by Levy and Lopes (2021a) , it is an important drawback of the Cholesky-style framework, because imposing a specific order structure can lead to inferior final decisions and harm portfolio performance. Since the "correct" series ordering is uncertain and the environment of the economy is continuously changing, Levy and Lopes (2021a) propose what they call a Dynamic Ordering Learning. That is a flexible model that deals with the ordering uncertainty in a dynamic fashion, where the econometrician is able to sequentially learn the contemporaneous relations among different series over time. However, since there are N! possible orders to learn for each period of time, even with a flexible model it is a prohibitive task in high-dimensions.

In order to impose greater model parsimony and overcome the ordering uncertainty in high-dimensions, we propose what we call a Dynamic Risk Factor Dependency Model. Inspired by the literature on observable risk factors, in this method we augment the Nvector of asset returns with K economically motivated risk factors, for K << N, and impose the restriction that all dynamic contemporaneous dependencies among asset returns are coming from them. It drastically reduces both the parameter space and the ordering uncertainty problem to a low-dimension one, being easily implemented by the Dynamic Ordering Learning approach of Levy and Lopes (2021a) .

Defining a new vector of returns R t = (F t , r t ) , augmented by the K-dimensional vector of known risk factors, F t , we rewrite Equation (3) as:

where the tilde upscript represents the extended to K + N dimension version of previous vectors and matrices of Equation (3). Now, B t is a new (K + N) × (K + N) matrix containing all the dynamic contemporaneous dependencies, where both factor and asset returns are allowed to load only on the set of observable chosen factors: 

Our approach can be viewed as an extension of the work of Zhao et al. (2016) , however, instead of following the whole triangular format as in Equation (4), we impose that any asset return dependencies are coming from common factors in the economy and not on specific movements of different asset returns. Hence, for j = 1, . . . , K, the matrix B t follows the usual triangular format. However, for j > K, asset returns are restricted to load until series (factor) K. This new representation allows us to estimate a much lower number of parameters, since B t is filled with zeroes after the K-th column. The sparser B t is, the more stable and efficient the resulting inferences are, producing better out-of-sample predictions for decision analysis. Additionaly, it gives higher economic intuition for the variation of asset returns, since the common movements follow exposures to well know risk factors developed by the financial literature.

As mentioned before, given the structure of B t , the set of K + N univariate models can be represented as K + N univariate recursive dynamic regressions, where we have for each j = 1, . . . , K, . . . , K + N: 1

where for j = 1, . . . , K, the parental set R pa(j),t = F pa(j),t represents all risk factor series in R t that are above series j and, for j > K, R pa(j),t possibly represents all series until series K, i.e, all risk factors in F t .

We represent the dynamic coefficients in Equation (8) evolving according to random walks: Discount methods are used to induce time-variations in the evolution of parameters and have been extensively used in many applications (Raftery et al., 2010 , Dangl and Halling, 2012 , Koop and Korobilis, 2013 , McAlinn, Aastveit, Nakajima, and West, 2020 and well documented in West and Harrison (1997) , Gamerman and Lopes (2006) and Prado and West (2010) .

1-step ahead forecast at t − 1. The (prior) predictive distribution of y jt is a Student's t distribution with r jt degrees of freedom:

with f jt = F jt a jt and q jt = s j,t−1 + F jt R jt F jt . It is important to notice that in this framework we have a conjugate analysis for forward filters and one-step ahead forecasting. Therefore, we are able to compute closed-form solution for predictive densities for each equation j. Hence, conditional on parents, it is easy to compute the joint predictive density for y t :

which simply is the product of the already computed K + N different univariate Student's t distributions. After the time series are decoupled for sequential analysis, they are then recoupled for multivariate forecasting. In our decision analysis at Section 4, we divide the recoupled part in two: one related to the dynamics of the K factors and the other to the dynamic of the N asset returns. We will be concerned with the mean and variance of each of this parts for the portfolio allocation study:

for the K-vector of expected factor means and the K × K expected factor covariance matrix and

for the N-vector of expected asset returns and the N × N expected covariance matrix of returns. Further details about the derivations of the evolution, forecasting, updating distributions can be found in Appendices A and B.

As argued by Kastner (2019) , even imposing parsimony by a factor structure, models are still rich in parameters when considering a high-dimension portfolio problem. It motivates our work to induce stronger parsimony by the use of sparsity on factor loadings in a time-varying and sequential form. This approach is convenient to improve out-of-sample predictions and introduce greater model flexibility, since for each period of time different assets can load just on a subset of risk factors. Model uncertainty is a well-known challenge among applied researchers and industry practitioners that are interested in producing forecasts for decision-making problems. In the last decades, the Bayesian literature has addressed this question with great success. Bayesian Model Averaging and Bayesian Model Selection are well-known methodologies for static models when there is uncertainty about the predictors to include Raftery, 1994 and Hoeting et al., 1999) . More recently, Raftery et al. (2010) have propose a dynamic version of Bayesian Model Selection, called Dynamic Model Selection (DMS). They suggest the use of reasonable approximations, borrowing ideas from discount (forgetting) methods, avoiding simulations of transition probabilities matrices and maintaining the conjugate form of posterior distributions, allowing analytical solutions for forward filtering and forecasting which significantly reduces the computation burden of the process. Their approach has also shown great success, being applied recently in macroeconomics and finance (Dangl and Halling, 2012 , Koop and Korobilis, 2013 , Catania et al., 2019 and Levy and Lopes, 2021a .

In our portfolio allocation problem we apply DMS for each individual equation of the system in (6), learning sequentially about which specifications to choose, such as the main risk factors and the best discount factors conducting time-variation in factor loadings and volatilities. Therefore, not just risk factors can have a stochastic or constant volatility, but also each individual asset return can have constant or time-varying factor loadings and volatilities, switching its behavior depending on the environment of the economy. Using DMS we are able to dynamically select the best risk factors by imposing factor loadings equal to zero for non-selected risk factors. For a highly parametrized model it can be viewed as an advantage compared to traditional shrinkage priors that set coefficients close but not equal to zero, increasing uncertainty on predictions.

It is important to highlight here that the main focus of our dynamic factor selection approach is to induce model sparsity in order to deflate the covariance structure among the universe of asset returns and to produce more stable predictions of expected returns. Al-though we take advantage of the fact that our DRFDM is able to predict expected returns and we use this predictions for portfolio decisions, we will not be particularly concerned on inferences about asset pricing models or best risk factors to explain stock returns. For recent advances on Bayesian model selection/averaging with focus on investigating expected returns based on factor asset pricing models, we refer to Bryzgalova, Huang, and Julliard (2019) and Hwang and Rubesam (2020) .

In order to perform DMS in each equation, we introduce here the idea of dynamic model probabilities. Consider the case of a specific equation j in the system discussed on the previous section. Suppose the dependent serie of this equation can load on p different risk factors available. Hence, there are 2 p possible combinations of models defined by the parental set. Also, when considering n δ and n κ different possible discount factors for factor loadings and volatilities, the equation j will have a total of n j = 2 p × n δ × n κ possible models to choose in the univariate model space. 3 The DMS approach deals with the model uncertainty by assigning probabilities for each possible model. Denote π t−1|t−1,i,j = p(M j i | D t−1 ) as the posterior probability of model i and equation j at time t − 1. Following Raftery et al. (2010) , the predicted probability of the model i given all the data available until time t − 1 is expressed as:

where 0 ≤ α ≤ 1 is a forgetting factor. The main advantage of using α is avoiding the computational burden associated with expensive MCMC schemes to simulate the transition matrix between possible models over time. This approach has also been extensively used in the Bayesian econometric literaure in the last decade (Koop and Korobilis, 2013 , Zhao et al., 2016 , Lavine et al., 2020 and Beckmann et al., 2020 . After observing new data at time t, we update our model probabilities following a simple Bayes' update:

the posterior probability of model i at time t where p i (y jt | y pa(j) , D t−1 ) is the predictive density of model i evaluated at y jt . Hence, upon the arrival of a new data point, the investor is able to measure the performance for each univariate model i and to assign higher probability for those models that generate better performance. One possible interpretation for the forgetting factor α is through its role to discount past performance. Combining the predicted and posterior probabilities, we can show that

Since 0 < α ≤ 1, Equation (17) can be viewed as a discounted predictive likelihood, where past performances are discounted more than recent ones. It implies that models that received higher performance in the recent past will produce higher predictive model probabilities. The recent past is controlled by α, since a lower α discounts more heavily past data and generates a faster switching behavior between models over time. 4 The idea of DMS is to select the model with the highest model probability for each period of time. Given the fact that each equation is conditionally independent, as soon as we select the best model for each equation, the posterior model probability of the multivariate model can be viewed as a product of the K + N univariate model probabilities:

where the asterisk symbol represents the selected model. A major benefit of our our approach is that each equation in the system in (6) is conditionally independent and model uncertainty can also be addressed independently for each equation j. Hence, if for each equation we have a possible model space of size n j , it implies a total of ∑ K+N j=1 n j possible models. In a high-dimension environment with hundreds of equations and hundreds of possible univariate models for each one, it is a massive reduction on the total model space, since in a multivariate model with no conditional independence would require a total of ∏ K+N j=1 n j models. The use of Dynamic Model Probabilities offers a substantial flexibility for our approach. The DMS approach allow us to customize individual series, proposing different models with different risk factors and variation in coefficients, switching between them in a dynamic fashion. It is a important advantage compared to the traditional Wishart-DLM, where all series are stuck to the same discount factors for state evolutions.

As highlighted in Section 2, the Cholesky-style framework is based on the triangular structure of Equation (4). Therefore, models within this framework will depend on the series ordering structure selected by the researcher, leading to different contemporaneous relations among series. The recent work of Levy and Lopes (2021a) show evidences of instabilities on the "correct" ordering structure and propose a method to dynamically deal with the problem of ordering uncertainty. By the use of Dynamic Ordering Probabilities, the authors propose a Dynamic Ordering Learning (DOL) approach where the econometrician is able to select or average the outputs of different orderings in a sequential fashion. They show that taking into account the ordering uncertainty among different series improves out-of-sample predictions and portfolio performance.

In our high-dimension environment, the computations of the total number of possible orderings is prohibitive, since for N financial time series, there are N! possible orders to compute. 5 However, once we impose a factor dependency structure as in (7), we restrict our ordering uncertainty to a low dimension, because just the top K variables rely on a triangular format. Since K << N, we are able to follow the DOL approach of Levy and Lopes (2021a) . The idea is to assign ordering probabilities for the risk factors in a similar manner as we discussed in Section (3.1). However, instead of selecting the best factor ordering over time, we average the outputs of each possible order, weighting by their order probabilities. For more details of the DOL procedure we refer to the paper of Levy and Lopes (2021a).

In order to test how the DRFDM performs in a real world problem, in this section we compare our method in terms of out-of-sample statistical and portfolio performance compared to different model settings and the Wishart-DLM. Our data is based on weekly stock returns from the Standard & Poor's 500 index. A stock is included if it was traded over the full horizon 2002-2020:5 and was a constituent of the index at some period during that time, resulting in N = 452 stocks over 959 time periods. The data was colected from Bloomberg Terminal and log-returns were used for the analysis. In our main empirical analysis, we use as observable risk factors the 5 factors of Fama and French (2015) downloaded for the same period from the website of Kenneth French. 6 The five risk factors are the market excess return (MKT), a size factor (SMB), a value factor (HML), a profitability factor (RMW) and a investment factor (CMA). In our portfolio performance below, we also show results when considering different subset of risk factors, such as the first three factors (MKT, SMB and HM), the three factors plus the Momentum factor (Carhart, 1997) and the five factors plus Momentum. We group stocks based on eleven sectors of the Global Industry Classification Standard (GICS). It makes easier the visual interpretation of figures below. Table 1 list the number of stocks in each sector.

First, we illustrate in Figure 1 the mean posterior factor loadings of our DRFDM with the five factors for three specific time periods. Each row represents the factor loads of an individual stock return loading on different risk factors of each column. The first figure at the left is referred to a calm time period, while the second and third dates were intentionally chosen to consider highly turbulence periods in the stock market, the Global Financial Crisis of 2008 and the the Covid-19 pandemic. The white colour represents factor loadings equal to zero. For the three periods, there is considerable sparsity on factor loadings, being stronger at the end of 2006 and weaker for the other stressed periods. Also note that not just the sparsity is changing over time, but also the mean of factor loadings are substantially varying, with periods of higher and lower values, with signs of many asset returns changing for the same risk factor. One interesting aspect of Figure 1 is the great importance of the market factor for asset returns, being quite rare to observe sparsity on its loadings. To clarify the dynamic sparsity pattern of our approach, Figure 5 in Appendix C focus on the time-varying movements of posterior mean factor loadings of two specific stocks of different sectors over time. In fact, Figure 5 highlights how differently they load on risk factors, with one company inducing much more sparsity than the other. Interestingly, with the exception of the market factor, all factor loadings are set to zero at least once. Also, depending on the company, some factor loadings are set to zero for the majority of the sample period.

Although the estimation of factor loadings plays an important role, they are not the only ingredients to predict the covariance matrix of returns. As discussed in previous sections, the investor also needs to learn the time-variation on the factor covariance matrix. Figure 2 shows the predicted mean of the time-varying factor correlation matrix over the same selected three periods. The first aspect to note is how the correlations among factors change over time. As an example is the sign changing in the correlation between the MKT and the HML factors at the end of 2006 to the stressed periods. Additionally, it seems that during bad periods the correlations tend to be stronger, specially for the three factors MKT, SMB and HML. One exception is the CMA factor, presenting very low factor correlations during these bad times. The evolution of correlations among stock returns is illustrated in Figure 3 , where we display the predicted mean correlation matrix for the selected periods. Stock returns are grouped according and following the same ordering as in Table 1 . Considering the end of 2006, the sparser factor loadings in Figure 1 and weaker factor correlations of Figure 2 are translated into a lighter correlation matrix of returns. Again, as lighter the colours the closer to zero the correlations are. Note a sligthly stronger correlation among stocks from the Financial sector and how they are also more correlated to almost all other sectors. This patterns are repeated for the three selected periods. At the other hand, we can note how stocks from the Health Care sector are much less correlated with other stock returns. When we focus on the stressed periods of the Global Financial Crisis and the Covid-19 pandemic, the higher factor loadings and greater factor correlations are translated in higher stock return correlations. It is an expected pattern, since in bad market periods stock returns variations tend to be more related to factor movements. Interestingly, during the Covid-19 pandemic the Consumer Staples sector and some companies from the Energy sector presented a strong correlation reduction, while the Financial sector is very high correlated not just among its own companies but with other sectors. Finally, Figure 4 investigates the evolution of risk factors inclusion probabilities. It can be defined as the posterior probability of an asset return including a specific covariate variable. Hence, for a given asset return j, the inclusion probability of a risk factor can be computed as the sum of probabilities of all univariate models for return j including this specific risk factor. Since each asset return has different inclusion probabilities for different risk factors, we display in Figure 4 the cross-sectional average of inclusion probabilities for each period of time. We can notice that the probability of including the market factor is considerably higher than all other risk factors, which demonstrates its importance in dictating common movements in the cross-section of stock returns. Since the market factor inclusion probability always fluctuates slightly close to 100%, we conclude that models including this factor tend produce much stronger predictive densities compared to models excluding the traditional market factor. It is interesting to note a slight drop in the importance of the market factor since 2016, but with the advent of the Covid-19 pandemic, its importance returned to past values. Indeed, except the CMA factor, all other risk factors have become substantially more relevant since the beginning of the pandemic. This increase in importance was also observed during the Great Recession for SMB, HML and CMA. The general assessment after the Great Recession is that the HML factor demonstrated a higher inclusion probability than the SMB, RMW and CMA factors for the great majority of the time. Therefore, Figure 4 clearly shows how our DRFDM is able to infer about the importance of different risk factors in a dynamic fashion, providing evidences of a time-varying impact of different risk factors on stock returns.

In what follows in the next subsections, we study analysis of several variants and re- strictions of our DRFDM based on different discount factor specifications and risk factors to be considered. We let the first four years of data (from 2002 to 2005) as a training period and we perform statistical and portfolio out-of-sample evaluation for the next years. Therefore, we discard the first 208 data points to train our models and use the next 751 data points for evaluation.

Model specifications: Before going to the statistical and portfolio analysis, we detail here several model variants and restrictions to be compared. Different model settings are based on previous tests and experience using the initial training sample and values are similar to the commonly applied by the econometric literature West, 2017 and Zhao et al., 2016) :

• DRFDM: This model considers the 5 Fama-French factors (Fama and French, 2015) as asset returns parents, applying dynamic risk factor and discount factors selection. Hence, it learns automatically the variation in betas (factor loadings), the variation in factor covariances, the variation in return volatilities and the selection of the best risk factors for each individual asset return and for each period of time. In terms of discount factors for different degrees of variation in coefficients, we let the model choose between: κ r ∈ {0.99, 0.995, 1} for return volatilities; δ ∈ {0.998, 0.999, 1} for factor loadings; and κ f ∈ {0.999, 1} for factor volatilities. Model probabilities are computed using a forgetting factor α = 0.99.

• DRFDM (α = 0.98): The same as DRFDM, but considering α = 0.98 for model probabilities.

• DRFDM (α = 1): The same as DRFDM, but considering α = 1 for model probabilities. In this case, it applies BMS, the static version of DMS.

• DRFDM (No Spars.): A dense version of DRFDM. It does not induce sparsity by dynamic factor selection. Therefore, all 5 Fama-French factors are always considered.

• 3F-DRFDM: The same as DRFDM, but restricting the set of risk factors to MKT, SMB and HML (Fama and French, 1992) .

• 4F-DRFDM: The same as 3F-DRFDM, but including MOM as an additional risk factor (Carhart, 1997) .

• 6F-DRFDM: The same as DRFDM, but including MOM as an additional risk factor.

• W-DLM: It is the standard multivariate Wishart-DLM (see Prado and West, 2010, Ch. 10) . We use δ = 0.997 for the local level evolution discounting and κ = 0.99 for the multivariate volatility discount factor. 7

• Factor W-DLM: It is a multivariate Wishart-DLM using the 5 Fama-French factors as observable common predictors. We have considered δ = 0.997 for states discounting and κ = 0.99 for the multivariate volatility discount factor. 8

We also show different combination of models still applying dynamic factor selection, but considering the following restrictions: 7 For both W-DLM and Factor W-DLM we set the initial prior location S 0 to a diagonal matrix whose diagonal elements are given by 0.1. We also tested using the residual variances over the training period from an OLS model, but using 0.1 instead produced a stronger benchmark. Moura, Santos, and Ruiz (2020) have shown that using a Wishart process with a shrinkage towards a diagonal covariance matrix delivers better portfolio performance with lower turnovers. 8 We model risk factors in a separate W-DLM and predict asset returns and covariances conditional on risk factor predictions.

• TVB: A model using time-varying betas (factor loadings), where δ = 0.999.

• CB: A model using constant betas (factor loadings), where δ = 1.

• FSV: A model using factor stochastic volatility, where κ f = 0.999.

• FCV: A model using factor constant volatility, where κ f = 1.

• SV: A model using return stochastic volatility, where κ r = 0.995.

• CV: A model using return constant volatility, where κ r = 1.

As we explain in Appendix A, discount factors equal to one represent the case of no variation in coefficients, while discount factors lower than one induce time-variability in coefficients. Hence, the DRFDM is the most flexible model, since it allows to switch between constant and time-varying parameters and dynamically selects different subset of risk factors over time if it is empirically desirable.

In order to evaluate different model settings in terms of statistical accuracy, we compute density forecasts and hit rates. As we explain in details in Appendix B, for each period of time an individual asset return j can load in different subsets of risk factors parents. Hence, in order to generate an asset return forecasting for the next week, the investor considers the predictions obtained from the specific risk factor parents for that asset and prior coefficients:

where f jt|t−1 is the asset return j forecasting for the next week and λ pa(j)t|t−1 represents a vector of risk factor predictions. It is important to highlight here that for our Dynamic Risk Factor Dependency Model, this set of risk factor dependencies changes depending on the period of time and for different asset returns under analysis. Finally, a jαt and a jβt are the prior means for time t for, respectively, the intercept and factor loading distributions given all data available until time t − 1.

In Table 2 we show sign accuracy (Acc.) and the Log-Predictive Density (LPD) 9 . The first is computed as the sum of the log of the 1-step ahead predictive densities over the evaluation period as in Equation (12) for the N assets available, and higher values represent better performance. The LPD gives a sense of how well a model performs out-ofsample considering its whole predictive distribution and not just its mean. Therefore, it suits quite well to our portfolio analysis, since it also takes into account the impact of the covariance structure. This metric has been applied recently in many bayesian econometric papers and has become common practice for model comparison (see Koop and Korobilis, 2013 , Zhou, Nakajima, and West, 2014 , Gruber and West, 2017 and McAlinn, Aastveit, Nakajima, and West, 2020 . The second statistical metric is simply computed as the number of corrected sign predictions averaged across all assets and over the out-of-sample evaluation period. Hence, we represent the the accuracy of model k over the evaluation period as

where T 0 represents the training period and 1 {sign( f k jt|t−1 )=sign(r jt )} is an indicator function, which is equal to one when the sign of forecasted asset return j for period t using model k is the same sign of the actual observed asset return j for period t. 10 Table 2 provides statistical out-of-sample performances. First, what can be noticed is that all different settings are performing much better than the Wishart approach in terms of density forecast. In general, models with constant parameters produce worst density forecast as well, specially with CV. In terms of the number of factors to be considered, the density forecast measure is quite similar among different choices, with a slight worsening for models including the CMA and RMW factors. In terms of correct return sign forecasts, differences are quite small. However, the recent literature on empirical asset pricing has shown evidence that even models providing only a small improvement in return predictability are capable to generating significant impacts on portfolio performance over time (Chinco, Clark-Joseph, and Ye, 2019 , Gu, Kelly, and Xiu, 2020 , Jiang, Kelly, and Xiu, 2020 and Levy and Lopes, 2021b . For instance, one may note in Table 2 that the model with no sparsity on factor loadings produced very similar statistical performance than those models inducing sparsity, with just a small statistical deterioration. As we show in the empirical portfolio analysis (section 4.3), achieving sparsity promotes strong mean and variance prediction stability. In a high-dimension portfolio allocation with thousands of factor loadings for each period of time, a reduction on the parameter space and small 10 A sign is equal to 1 when returns are positive and equal to -1 when are negative. differences on predictions are able to generate great improvements on final portfolio performance. We can also conclude from Table 2 that models with constant factor loadings tend to produce worse out-of-sample accuracy in terms of both predictive densities and hit rates. The Table reports out-of-sample forecasting performance for different model settings. The first column displays Log-Predictive Densities (LPD) as a measure of density forecasting. The second column shows hit rates (Acc.), represented as the average number of correctly forecasted return signs.

It is important to notice that even though both W-DLM and Factor W-DLM provide much weaker predictive densities than other model settings, they show stronger out-ofsample accuracy in terms of hit rates, with the Factor W-DLM showing the same accuracy as the main DRFDM. However, as we show in the empirical portfolio allocation section, both Wishart approaches produce portfolios with performances considerably lower than our DRFDM. In fact, this results confirm the evidence obtained in Cenesizoglu and Timmermann (2012) , where the authors show stronger correlation between density forecast and final portfolio performance than with point forecast metrics. Since any point forecast metric is not able to incorporate any uncertainty around predictions and the covariance structure among returns, it tends to generate a lower correlation with final portfolio performances.

Finally, at the bottom of Table 2 we provide the hit rate using the momentum signal as a return predictor. In this case, instead of using an econometric model, the investor only considers the momentum from previous 12 months as a measure to predict future returns. This type of approach has received increasing attention in the recent literature and has been applied to high-dimensional portfolio problems (Engle, Ledoit, and Wolf, 2019 , De Nard, Ledoit, and Wolf, 2020 and Moura et al., 2020 . 11

After describing the set of possible models considered in this study and their statistical performance, we discuss how our DRFDM is able to improve final investor decisions. We will take the perspective of an investor who allocates her wealth among all different stocks available in the dataset. At each period of time, the investor applies two steps. The first is to use the econometric method to generate one-week ahead forecasts of return means and covariances. In the second step, she dynamically rebalances the portfolio by finding new optimal portfolio weights. In our main analysis, we perform a Mean-Variance portfolio optimization, where the investor uses both the vector of predicted mean stock returns and its respective predicted covariance matrix, as we describe below. Although it is not the focus of our analysis, in Appendix C we also show additional results for the case where the investor only considers the predicted covariance matrix for a Global Minimum Variance portfolio. Hence, from this setup we are able to assess the economic value of the DRFDM with different settings within a dynamic framework, implementing an efficient-frontier strategy subject to return target or a minimum-risk portfolio. Below we describe in more details the portfolio strategies.

The Mean-Variance Portfolio (MVP), also know as Efficient Frontier (EFF) portfolio or Markowitz portolio due to the seminal work of Markowitz (1952) , solves the following investment problem in the absence of short-sales constraints: min ω t ω t Σ t ω t subject to ω t µ = τ, and ω t 1 = 1 (18) where 1 is a vector of ones, Σ t is the covariance matrix of returns for time t, ω t represents portfolio weights, µ is the expected return vector and τ is the return target. Replacing µ by the predicted vector of returns from our econometric approach f t|t−1 = α t|t−1 + β t|t−1 λ t|t−1 12 , and Σ t by the respective point estimate of the predicted covariance matrix, Σ t|t−1 , the expression for the optimal portfolio weights can be expressed as:

t|t−1 f t|t−1 . Following Engle and Kelly (2012), in our main results we have considered an annualized return target of τ = 10%, but we also report additional results for τ = 15% and 20%.

In Appendix C we also show results for a restricted portfolio optimization, where we solve a problem similar to (18), but including one additional restriction: the maximum (absolute) weights on individual stocks to be 5%.

In Appendix C we show as additional analysis the results for the Global Minimum Variance portfolio (GMV) with no short-sales constraints. The main investment problem of the GMV is to reduce total portfolio risk and can be expressed as:

where again Σ t is the covariance matrix of returns for time t and ω t represents a vector of portfolio weights. Replacing Σ t by the predicted covariance matrix, the optimal portfolio weights for the GMP is given by:

12 More details about predictive moments computation can be found in Appendix B.

Since the GMV focus is to reduce portfolio volatility, it ignores the use of the mean of returns and tend to generate lower Sharpe ratios in practice. Although it is not widely used in practice, this approach can be viewed as a way to evaluate covariance matrix estimation and is still a common practice in academic works. Besides the GMV portfolio as in Equation (20), we also show results when we include an additional restriction of maximum (absolute) weights on individual stocks to be 5%, as we did on the MVP case.

After producing forecast outputs to dynamically build portfolios, we backtest our models in terms of portfolio performance out-of-sample. Investors face portfolio allocation problems and, at the end of the day, it is not just about out-of-sample predictability, but how predictions are translated into better final decisions, i.e., better portfolio choices. With this in mind, we evaluate portfolios based on financial metrics, such as portfolio turnovers, annualized mean excess returns (Mean), standard deviations (SD) and Sharpe ratios (SR). The latter is commonly used among practicioniers in the financial market and by academics. Despite their popularity, those portfolio metrics are unconditional measures and are not well suited for dynamic allocations with time-varying and sequential predictions (see Marquering and Verbeek, 2004) . Also, they do not take into account the investor risk aversion. In order to overcome this problems and improve our model comparisons we follow Fleming, Kirby, and Ostdiek (2001) and provide a measure of economic utility for investors.

We compute ex-post average utility for a mean-variance investor with a quadratic utility. As in Fleming et al. (2001) and Della Corte, Sarno, and Tsiakas (2009) we can calculate the performance fee that an investor will be willing to pay to switch from the traditional Wishart Dynamic Linear Model (W-DLM) benchmark to our Dynamic Risk Factor Dependency Model. The performance fee is computed by equating the average utility of the W-DLM portfolio with the average utility of the DRFDM portfolio (or any other alternative portfolio), considering the latter with a management fee Φ:

where γ is the investor's degree of relative risk aversion and R DRFDM p,t is the gross excess return of the DRFDM portfolio and R W−DLM p,t is the gross excess return from the W-DLM portfolio. As in Fleming et al. (2001) , we report our estimates of Φ as annualized fees in basis points using γ = 10. 13 All economic measures displayed in Section 4.3 are already net of transaction costs (TC). Following Marquering and Verbeek (2004) , we deduct transaction costs from the portfolio return ex-post. Despite the great majority of the papers related to covariance matrix estimation for portfolio allocation cited above do not take into account transaction costs in their findings, in our main results we consider TC = 5 bps of the traded volume in an effort to approximate our results to a real world example. In general, there is disagreement about which transaction cost to incorporate. In the past, many papers applied transaction costs of 50 bps, but recently this value has been substantially reduced for the most liquid stocks (French, 2008) . Hence, in the same spirit of Moura, Santos, and Ruiz (2020) we display additional results for TC ∈ {0, 10} bps.

Here we detail some additional covariance matrix estimation approaches included in our portfolio performance analysis. We consider some recent traditional benchmarks from the literature. We also show results for two benchmarks that do not require the use of econometric models to be estimated: the equally-weighted portfolio from our universe of stocks and the passive investment in the S&P 500 index. The latter can be viewed as a strong benchmark, since it is well known that is quite hard to beat the market.

• EFM: this is a static estimator based on an exact factor model, where Σ f is given by the sample covariance matrix of risk factors and the residual covariance matrix Ω is a diagonal matrix filled with the sample variances estimates of residuals. We estimate residuals and factor loadings using four years rolling regressions.

• DCC-NL: it is a dynamic estimator using the multivariate GARCH of Engle et al. (2019) .

• AFM-DCC-NL: it is the dynamic approximate factor model of De Nard, Ledoit, and Wolf (2020) . 14 • LW: it is the static linear shrinkage estimator of Ledoit and Wolf (2004) .

• EWMA: the traditional exponentially weighted moving average estimator where Σ t+1 = (1 − λ)y t y t + λΣ t . We consider two values for the decay factor, λ ∈ {0.97, 0.99}.

• EW: a simple strategy considering a equal-weighted portfolio over our universe of stocks. As claimed by DeMiguel, Garlappi, and Uppal (2009) , it tends to perform better than the simple unconditional covariance matrix of returns and it has been claimed to be hard to be outperformed.

• S&P: it represents the passive investment (buy-and-hold) on the S&P 500 index over the evaluation period.

It is important to highlight that for models EFM, DCC-NL, AFM-DCC-NL, LW and EWMA detailed above, we use the average momentum signal from the previous 12 months as vector of predicted mean returns to find optimal portfolio weights in Equation (19). It follows a similar procedure as applied in Engle et al. (2019 ), De Nard et al. (2020 and Moura et al. (2020) and can be viewed as a competitive method to forecast returns.

We recognize the existence of other recent competing models in the literature involving Bayesian analysis using MCMC methods, as we have described in Section 1. However, the simulation schemes dramatically limit scalability and require repeat MCMC simulation analyses at each time point which makes the whole backtesting procedure computationally prohibitive. Although factor stochastic volatility models of Kastner et al. (2017) and Kastner (2019) have been implemented in the R package factorstochvol, 15 the estimation is not sequential, which requires the model to be rerun at each time over the evaluation period. As argued by Gruber and West (2017) , it will take several weeks to backtest a universe of stocks like ours. At the other hand, since our DRFDM estimation is sequential and does not require any simulation schemes, it is able to handle the whole estimation procedure in only few minutes.

In order to evaluate the validity of the DRFDM out-of-sample in a real world context, we conduct a backtest analysis using the different portfolio strategies and model settings described in the previous sections. One of the great advantages of our approach is its flexibility and fast computation. For the sake of curiosity, using parallel computations among 32 cores we run our main model (DRFDM with 5 Fama-French factors) for the entire dataset and produced portfolio backtests in less than 10 minutes. 16 15 See Hosszejni and Kastner (forthcoming) 16 We also have repeated the exercise using almost one thousand stocks from the Russel 1000 Index and we were able to run the whole estimation procedure and portfolio backtest in less than 25 minutes. The Table reports out-of-sample portfolio performances for several covariance models following optimal portfolio weights from Equation (19) and using an annualized return target of τ = 10%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10.

The main focus of our portfolio analysis is to find an econometric model able to satisfactorily handle the best balance between risk and return. Table 3 present results related to the MVP portfolio with 5 bps transaction costs during the out-of-sample evaluation period (from January, 6, 2006 until May, 22, 2020 . The main columns to be analyzed from this table are in terms of SR and annualized management fee (Φ). What can be noticed from Table 3 is that regardless of the selected model setting, all models outperform the EW portfolio suggested by DeMiguel et al. (2009) . It is interesting to notice that although the EW strategy presents the lowest weekly turnover, it has the worst SR because of its high volatility that is caused by the absence of a covariance structure among returns on its formulation. Also, the great majority of the models were able to outperform the passive investment on the S&P500 index. In special, the SR from the DRFDM and its different subsets of risk factors counterparts dominate all other models, with DRFDM presenting an out-of-sample SR of 0.82, almost two times the SR from the S&P500 index, the Factor W-DLM and the W-DLM benchmark. The latter performed quite similar to the S&P500 index in terms of SR, but since it was able to produce much lower portfolio volatility, it generates utility gains for the investor compared to the market index. However, since our DRFDM was able to reduce volatilty even further and produce quite stable return predictions, it has produced a considerable utility gain for the investor. In fact, a mean-variance investor would be willing to pay an annualized management fee of 585 bps to switch fromm the traditional W-DLM benchmark to the DRFDM approach. Using different subsets of risk factors in our DRFDM setting also produced quite strong portfolio results.

We also included in our analysis what we call DRFDM (Mom signal), which represents the same model as the main DRFDM, but instead of using its predicted mean returns for portfolio optimization in Equation (19), we used the same momentum signal as we did for competing models at the top of Table 3 . What we observe is that using the predictions from our approach generates more stable estimates and portfolio performance than using the momentum signal. Although the differences are small, the DRFDM (Mom signal) induces higher turnovers, what harms final portfolio performances as higher transaction costs start to be considered, as Table 4 shows. Also, as Table 2 had demonstrated, predictions from the DRFDM generated higher out-of-sample accuracy than the Momentum Signal. Hence, we see the mean predictions from our approach as a clear competitor to the classical Momentum Signal broadly used in the academic literature and among practitioners.

In terms of the benefits of inducing sparsity on the covariance structure of returns, we see from Table 3 that the DRFDM (No Spars.) produced a much more volatile portfolio than when we allow to dynamically select the best risk factors for each stock return, which is translated in lower utility gains and SR for the investor. The benefits of time-varying sparsity on factor loadings are observed in all different portfolio setting of this paper, regardless of the optimization problem, amount of transaction costs or risk aversions. When the DRFDM set many factor loadings to exactly zero, it is indeed improving covariance matrix estimation by deflating its whole structure.

We also compare different models within our model structure but restricting their vari- The Table reports out-of-sample portfolio performances for several covariance models following optimal portfolio weights from Equation (19) and using an annualized return target of τ = 10%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 10 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10.

ation in coefficients to follow the same pattern for all periods of time by fixing their discount factors. For instance, models with constant volatilities (CV) tend to produce portfolios with higher volatilities and lower SR and utility gains than models with time-varying volatilities. It is interesting to notice that models with both time-varying factor volatilities (FSV) and time-varying factor loadings (TVB) were not able to considerably improve portfolio performance, giving evidence of no significant predictive power increment by allowing time-varying betas and factor volatilities for all periods of time. At the other hand, since our DRDFM was allowed to dynamically select different discount factors over time, it was able to switch between periods of low and high variation on parameters, producing better adaptation to the data and stronger portfolio improvements. In terms of the selected forgetting factor α for model probabilities, we see an increase in portfolio volatility when no forgetting is applied (α = 1) -which is equivalent to a static Bayesian model selection approach -and comes with a lower SR and utility gain for the investor. However, when a higher forgetting is allowed using α = 0.98, a considerable portfolio volatility reduction is observed, but worse mean excess returns and higher turnovers are produced, affecting final performance. Hence, despite models with different α's are still delivering robust results and are able to outperform several competitor models, it seems that applying the intermediate value α = 0.99 generates a better balance between risk and return. When we focus on the competitor models at the top of Table 3 , we see lower SR and utility gains compared to our approach, jointly with much higher turnover. In special, the EWMA (0.97) produced large turnovers and volatility. Although the AFM-DCC-NL model presented lower SR and utility gains than DRDFM, it was the approach with lower annualized volatility, a pattern that is repeated throughout the different portfolio specifications in this paper. The DCC-NL and LW approaches were also able to deliver lower volatilities, but as the AFM-DCC-NL, they fail to produce a good balance between risk and return, delivering lower SR and utility gains. Since those approaches are quite unstable, they require the portfolio to be highly rebalanced over time, harming final performance. Figure (6) in the Appendix compares turnovers from our DRFDM to the dynamic estimators AFM-DCC-NL and DCC-NL and it is clear the ability of the DRDFM approach to produce more stable rebalances and reducing turnovers, specially during the Great Recessions and the Covid-19 pandemic. It also can be seen as an advantage of DRFDM for portfolio managers interested to invest in low liquid markets with much higher transaction costs. Tables 4 and 5 repeat the same portfolio procedure as Table 3 , using different transaction costs. The conclusion are quite similar to those described before. However, when a higher transaction cost of 10 bps is considered, we can notice from Table 4 the considerable negative portfolio impacts on those competitor models with high turnovers, such as the DCC-NL, EWMAs, Factor W-DLM and W-DLM. Due to the characteristic of high diversification and low portfolio rebalancing changes from the DRFDM, it improves even further in terms of utility gains compared to the W-DLM benchmark, requiring the investor to pay an annualized management fee of 712 bps to change from the W-DLM to the DRDFM. When no transaction costs are considered, the W-DLM improves which reduce the utility The Table reports out-of-sample portfolio performances for several covariance models following optimal portfolio weights from Equation (19) and using an annualized return target of τ = 10%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are reported considering no transaction costs (TC = 0 bp). Annualized management fees are computed considering a relative risk aversion of γ = 10.

gains of several models, including the DRFDM. However, our main model get a SR of 0.86, the same as the 6F-DRFDM and almost 23% higher than the SR obtained from the AFM-DCC-NL and DCC-NL models.

In Appendix C the reader can find several additional tables reporting results using different return targets, risk aversions, portfolio constraints and applying the Minimum Variance Portfolio optimization of Equation (20). In Table 8 we show annualized management fees when lower relative risk aversions are considered as a robustness analysis. In fact, the main conclusions remain the same for different levels of risk aversion, i.e., inducing time-varying sparsity and dynamically selecting different discount-factors for variation in coefficients by our DRFDM is able to considerably add economic value for the investor compared to the traditional W-DLM benchmark and other common competitor models regardless of the risk aversion. Tables 6 and 7 show mean-variance portfolio performances using higher annualized portfolio return targets of 15% and 20%, respectively. What can be observed is that not only the conclusions are the same as those using a return target of 10% from Table 3 , but are even stronger in terms of Sharpe Ratio and management fees the investor would pay to use the DRFDM. The same happens from the GMV optimization in Tables 9, 10 and 11 using different transaction costs. The DRFDM and its different specifications were able to deliver volatility reduction compared to the W-DLM and produce extremely lower turnovers and economic value for the investor. In spite of the fact that the DCC-NL, AFM-DCC-NL and LW have generated lower SD than our DRFDM, they fail to produce what is the advantage from our approach and is the main investor interest at the end of the day: better returns adjusted by risk and strong utility gains. Last but not least, Tables 12 and 13 show mean-variance optimization and global minimum-variance portfolios with maximum weight constraints, where little impacts were observed to our approach because of the high diversification quality of the covariance matrix from the DRFDM, so the vast majority of portfolio weights were already below the 5% limit in absolute terms when the formulas from Equations (18) and (20) were being applied. Interestingly, competitor models with high turnovers and low portfolio diversification, such as EWMAs and Wishart models were able to considerably reduce portfolio turnovers and risks.

Summarizing, the main conclusions from different tables and results in this empirical section display similar informations: regardless of the parameters set by the investor, she is always benefited from the dynamic choices made by the DRFDM, producing lower risks, portfolio turnovers and utility improvements. Therefore, a mean-variance investor who dynamic learns about best risk factors, variation in coefficients and volatilities in an automatic and online fashion improves final decisions and utility measures compared to the benchmarks.

Dynamic portfolio allocation in high-dimensions require a refined estimation of the covariance matrix of returns. The goal of this paper was to introduce a fast and flexible multivariate model for returns that is able to improve predictions for bayesian portfolio decisions. Inspired by the Cholesky-style framework for multivariate inferences, we impose economically motivated risk factor dependencies that are able to solve the curse of dimensionality. Due to the low dimension of the risk factors, we are able to deal with the problem of ordering uncertainty in a sequential fashion. The conjugate format for foward filters and the use of discount factors for state evolutions and dynamic model probabilities allow our model to sequentially select the best specification choices for each asset return in parallel, such as best risk factors, degree of variation in factor loadings, volatilities and factor volatilities. We show that by the use of dynamic factor selection, we can achive higher model parsimony by time-varying sparsity on the parameter space in an online fashion.

We have found that the Dynamic Risk Factor Dependency Model is able to improve final portfolio decisions compared to the traditional W-DLM benchmark, the Equally Weighted portfolio, the passive investment on the S&P 500 index and many other competitor models in the literature. It generates not just risk reduction and better Sharpe ratios, but significant lower portfolio turnovers and utility gains for investors. We show that a mean-variance investor will be willing to pay a considerable management fee to switch from those strategies to the DRFDM approach. Also, since our approach does not require expensive MCMC schemes to draw from posterior distributions, we can backtest high-dimension portfolios in a matter of few minutes. This is good news for portfolio managers who are interested to improve investment strategies in a financial world with a large number of assets available, high model uncertainties and rapid and complex changes over time.

Following Zhao et al. (2016) and Levy and Lopes (2021a) , we give details about the evolution and updating steps for the set of K + N univariate DLMs. Posterior at t − 1: At time t − 1 and for each series j, we define the initial states for θ jt−1 and volatility σ jt−1 as:

Equation (23) is the joint posterior distribution of model parameters at time t − 1, known as a Normal-Gamma distribution. Hence, given the initial states, posteriors at t − 1 evolve to priors at t via the evolution equations:

where we can rewrite W j,t as a discounted function of C j,t−1 , W j,t = C j,t−1 1 − δ j /δ j for 0 < δ j ≤ 1 and the beta random variable η j,t is defined by the discount factor 0 < κ j ≤ 1. Discount methods are used to induce time-variations in the evolution of parameters and have been extensively used in many applications (Raftery et al., 2010 , Dangl and Halling, 2012 , Koop and Korobilis, 2013 , McAlinn et al., 2020 and well documented in Prado and West (2010) . Note that lower values of δ and κ induce higher degrees of variation in parameters and when discount factors are equal to one, both coefficients and volatilities will be constant. Hence, the prior for time t is given by

where r jt = κ j n j,t−1 , a jt = m j,t−1 and R jt = C j,t−1 /δ j . 1-step ahead forecasts at time t − 1: The predictive distribution for t at time t − 1 given specif the risk factor parental set for equation j will be given by a Student's-t distribution with r jt degrees of freedom:

y jt | y pa(j)t , D t−1 ∼ T r jt f jt , q jt with f jt = F jt a jt and q jt = s j,t−1 + F jt R jt F jt . To make it explicit, we can define as the following manner a jt = a jαt a jβt and R jt = R jαt R jαβt R jαβt R jβt we have f jt = x jt−1 a jαt + y pa(j)t a jβt q jt = s j,t−1 + y pa(j)t R jγt y pa(j)t + 2y pa(j)t R jβγt x jt−1 + x jt−1 R jβt x jt−1 Updating at time t: with the previous prior, the Normal-Gamma posterior is

with parameters following standard updating equations:

Posterior mean vector: m jt = a jt + A jt e jt Posterior covariance matrix factor: C jt = R jt − A jt A jt q jt z jt Posterior degrees of freedom: n jt = r jt + 1 Posterior residual variance estimate: s jt = s j,t−1 z jt where 1 -step ahead forecast error: e jt = y jt − F jt a jt 1-step ahead forecast variance factor: q jt = s j,t−1 + F jt R jt F jt Adaptive coefficient vector:

A jt = R jt F jt /q jt Volatility update factor: z jt = r jt + e 2 jt /q jt / r jt + 1 The Table reports out-of-sample portfolio performances for several covariance models following optimal portfolio weights from Equation (19) and using an annualized return target of τ = 15%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample portfolio performances for several covariance models following optimal portfolio weights from Equation (19) and using an annualized return target of τ = 20%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample annualized management fees (Φ) an investor would be willing to pay to switch from the Wishart Dynamic Linear Model (W-DLM) to the following covariance models. Results refer to optimal porfolios weights from Mean-Variance strategies as in Equation (19). Management fees are computed using returns net of transaction costs of 5 bps and relative risk aversions of γ ∈ {2, 6}. The Table reports out-of-sample portfolio performances for several covariance models following optimal Minimum Variance portfolio weights from Equation (21). Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample portfolio performances for several covariance models following optimal Minimum Variance portfolio weights from Equation (21). Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 10 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample portfolio performances for several covariance models following optimal Minimum Variance portfolio weights from Equation (21). Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are reported considering no transaction costs (TC= 0 bp). Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample portfolio performances for several covariance models in a mean-variance problem as Equation (18) after including a maximum (absolute) weight constraint of 5%. We use an annualized return target of τ = 10%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10. The Table reports out-of-sample portfolio performances for several covariance models in a global minimum-variance optimization problem as in Equation (20) after including a maximum (absolute) weight constraint of 5%. Mean excess returns (Mean), volatilities (SD), Sharpe Ratios (SR) and management fees (Φ) are reported in annual terms whereas turnover are in weekly terms. Results are computed using returns net of transaction costs of 5 bps. Annualized management fees are computed considering a relative risk aversion of γ = 10.

Bayesian dynamic factor models and portfolio allocation

Exchange rate predictability and dynamic Bayesian learning

Forecasting large realized covariance matrices: The benefits of factor models and shrinkage

Bayesian solutions for the factor zoo: We just ran two quadrillion models

On persistence in mutual fund performance

Dynamic stock selection strategies: A structured factor model framework

Forecasting cryptocurrencies under model and parameter instability

Do return prediction models add economic value?

Sparse signals in the crosssection of returns

Predictive regressions with time-varying coefficients

Factor models for portfolio selection in large dimensions: The good, the better and the ugly

An economic evaluation of empirical exchange rate models

Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy

Dynamic equicorrelation

Large dynamic covariance matrices

The cross-section of expected stock returns

A five-factor asset pricing model

Optimal asset allocation with multivariate Bayesian dynamic linear models

The economic value of volatility timing

Presidential address: The cost of active investing

MCMC-Stochastic Simulation for Bayesian Inference

GPU-accelerated Bayesian learning and forecasting in simultaneous graphical dynamic linear models

Bayesian online variable selection and scalable multivariate volatility forecasting in simultaneous graphical dynamic linear models

Empirical asset pricing via machine learning

Factor momentum everywhere

Asset allocation with a high dimensional latent factor stochastic volatility model

Combining Shrinkage and Sparsity in Conjugate Vector Autoregressive Models

Bayesian model averaging: a tutorial

Modeling Univariate and Multivariate Stochastic Volatility in R with stochvol and factorstochvol

Inducing sparsity and shrinkage in time-varying parameter models

Bayesian Selection of Asset Pricing Factors Using Individual Stocks

Re-) Imag (in) ing Price Trends

Sparse Bayesian time-varying covariance estimation in many dimensions

Efficient Bayesian inference for multivariate factor stochastic volatility models

Large time-varying parameter VARs

Adaptive variable selection for sequential prediction in multivariate dynamic models

Honey, I shrunk the sample covariance matrix

Dynamic Ordering Learning in Multivariate Forecasting

Trend-Following Strategies via Dynamic Momentum Learning

Factor stochastic volatility with time varying loadings and Markov switching regimes

Parsimony inducing priors for large scale state-space models

Model selection and accounting for model uncertainty in graphical models using Occam's window

Portfolio selection

The economic value of predicting stock index returns and volatility

Multivariate Bayesian predictive synthesis in macroeconomic forecasting

Comparing high-dimensional conditional covariance matrices: Implications for portfolio selection

Time series: modeling, computation, and inference

Time varying structural vector autoregressions and monetary policy

Portfolio selection for individual passive investing

Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill

The arbitrage theory of capital asset pricing

Capital asset prices: A theory of market equilibrium under conditions of risk

Cholesky realized stochastic volatility model

Dynamic financial index models: Modeling conditional dependencies via graphs

Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions

Bayesian forecasting and dynamic models

Dynamic dependence networks: Financial time series forecasting and portfolio decisions

Bayesian forecasting and portfolio decisions using dynamic dependent sparse factor models

After computing the predictive density for each equation j, we are able to compute the joint predictive density for y t conditional on its risk factor parents: , p (y t | D t−1 ) = ∏ j=1: (K+N) p y jt | y pa(j)t , D t−1being simply the product of the already computed K + N different univariate Student's-t distributions. Hence, after series being decoupled for sequential analysis, they are recoupled for multivariate forecasting. The recouple part can be divided in two: one related to the dynamics of the K risk factors and the other related to the N asset returns to be allocated by the investor. We use the predictions from the first part to produce predictions of the second. Therefore, in our decision analysis at Section 4, the investor is concerned about the mean and variance of each of this parts for the portfolio allocation study:for the K-vector of expected factor means and the K × K expected factor covariance matrix andfor the N-vector of expected asset returns and the N × N expected covariance matrix of returns. The structure in Equation (7) allows for a recursive computation of moments according to the factor dependency. Since the first dependent variable has an empty parental set, the forecast mean and variance for j = 1 are given byinserting λ 1t|t−1 as the first element of λ t|t−1 and q f 1t|t−1 the (1, 1) element of Σ f t|t−1 . For j = 2, . . . , K, we can recursively find the subsequent predicted moments. Their conditional distributions also follow Student's t distribution, with predictive moments given by λ jt|t−1 = x jt−1 a jαt + λ pa(j)t|t−1 a jβt q f jt|t−1 = r jt r jt − 2 s j,t−1 + u jt + a jβt Σ f pa(j)t|t−1 a jβt with u jt = λ pa(j)t|t−1 R jβ λ pa(j)t|t−1 + tr R jβt Σ f pa(j)t|t−1 + 2x jt−1 R jαβ λ pa(j)t|t−1 + x jt−1 R jαt x jt−1 . Now, we just need to plug λ jt|t−1 as the j-th element of λ t|t−1 and q f jt the (j, j) element of Σ f t|t−1 . Finally, the covariance vector among factor y jt and its parents y pa(j)t is computed as C y jt , y pa(j),t | D t−1 = Σ f pa(j)t|t−1 a jβt . Hence, after reaching j = K, we have filled all elements of the K-vector λ t|t−1 and the K × K covariance matrix Σ f t|t−1 . For j = K + 1, . . . , K + N, we can recursively find the subsequent predicted moments given by:The K-vector of predicted risk factors means, λ pa(j)t|t−1 , the K-vector of predicted mean factor loadings, a jβt , and the K × K predicted factor covariance matrix, Σ f pa(j)t|t−1 , may all be filled with zeroes when those elements were not selected from the DMS procedure.We then plug f jt|t−1 as the (j − K)-th element of f t|t−1 and q r jt the (j − K, j − K) element of Σ r t|t−1 . After reaching j = K + N, we have filled all elements of the N-vector f t|t−1 and all diagonal elements of Σ r t|t−1 . Now we just need to fill all the off-diagonal elements of Σ r t|t−1 by the off-diagonal elements of the following matrix:where β t|t−1 is a N × K matrix containing all predicted mean factor loading that also may be filled with zeroes for those factors not selected by the DMS procedure. Finally, we denote α t|t−1 = (a K+1,α,t , . . . , a K+N,α,t ) as the N-vector containing the timevarying intercepts of each asset return. Then, it can be noticed that the vector of predicted mean asset returns can be represented by:f t|t−1 = α t|t−1 + β t|t−1 λ t|t−1 Appendix C: Additional Results We use a window size of four weeks as a proxy for monthly turnovers.