key: cord-0196630-u0lysydj
authors: Alaya, Mohamed Ben; Kebaier, Ahmed; Sarr, Djibril
title: Deep Calibration of Interest Rates Model
date: 2021-10-28
journal: nan
DOI: nan
sha: 82f0a3a6cd16e6b0e86cff4d4459f293375e1ade
doc_id: 196630
cord_uid: u0lysydj

For any financial institution it is a necessity to be able to apprehend the behavior of interest rates. Despite the use of Deep Learning that is growing very fastly, due to many reasons (expertise, ease of use, ...) classic rates models such as CIR, or the Gaussian family are still being used widely. We propose to calibrate the five parameters of the G2++ model using Neural Networks. To achieve that, we construct synthetic data sets of parameters drawn uniformly from a reference set of parameters calibrated from the market. From those parameters, we compute Zero-Coupon and Forward rates and their covariances and correlations. Our first model is a Fully Connected Neural network and uses only covariances and correlations. We show that covariances are more suited to the problem than correlations. The second model is a Convulutional Neural Network using only Zero-Coupon rates with no transformation. The methods we propose perform very quickly (less than 0.3 seconds for 2 000 calibrations) and have low errors and good fitting.

Governments, industries or banks, all have to handle behaviour of some financial elements. Whether it is to manage risks or to optimize investments returns, it is necessary to be able to understand, forecast and stress drivers of fair values assets for example. Among those drivers are interest rates (IR). They can affect for instance IR derivatives such as IR swaps, swaptions, cross-currency basis swaps and so on. IR models are widely used in banks and financial institutions to apprehend IRs behaviour. Many models exist. The Vasicek paved the way for those models [Vasicek, 1977] . It was then improved by many others. The Cox Ingersoll Ross Model (CIR) [Cox et al., 1985] is often employed as it is quite simple to use and to calibrate. However it is unifactorial which is a limit to its use. The Gaussian model G2++ [Brigo and Mercurio, 2006 ] allow 2 factors as it is a deterministically-shifted two-factor Vasicek model. This document focuses on the calibration of the G2++ (2 factors Hull-White) using deep learning techniques (DL). More precisely we will propose 2 different approaches applied to different sets of relevant data to calibrate the G2++ model using Neural Networks (NN). This approach is expected to be at least as accurate as the classic ones (see section 2 for the state of the art), to perform faster and above all, to be easier to use regarding the treatments necessary on the data. We recall the equations defining the the G2++ model: dr(t) = φ(t) + x(t) + y(t) dx(t) = −K x x(t)dt + σ x dW x , x(0) = x 0 dy(t) = −K y y(t)dt + σ y dW y , y(0) = y 0 with k x , k y , σ x , σ y > 0

( 1) In what follows, section 2 will give an overview of the state of the art regarding both calibration of IR models and the use of deep learning in finance. Then, section 3 will describe the models we propose in this article. For each model, we will also explain how the data set used for the training was constructed. The results obtained both in terms of accuracy and computational performance will be introduced in section 4.

This section will give a brief insight into the two main matters that are addressed in this document, firstly calibration of IR models and then applying deep learning to purposes in finance. Regarding the first question, the state of the art shows many solutions, most of them seem to use financial derivatives, especially swaptions' prices. While for the second, the literature shows a growing interest in applying deep learning to finance and calibration of its models. However, as it is, options pricing seems to be getting more attention.

In what follows, regardless of the model, θ will define its parameters while θ * will define calibrated parameters.

The accuracy of the model used for IR is highly dependent on the quality of the calibration. For the one factor Hull-White (HW) model, [Gurrieri et al., 2009] define the calibration as these 3 elements:

1. The choice of constant or time-dependent mean reversion and volatility. 2. The choice of products to calibrate to, and whether to calibrate locally or globally. 3. Whether to optimize on the mean reversion a(t) and the volatility (t) together or separately, and in the latter case, how to estimate one independently of the other.

We can generalize this definition by extending the first and third points to not only the mean reversion and volatility but to any parameters required in the model. Each of these 3 points are important and will be discussed in a moderate to thorough manner. Independently of these 3 points, in general, the calibration process implies the minimization of a theoretical value obtain from the model and the actual corresponding market observation. [Hull and White, 2001] for example uses numerical optimization techniques such as the Levenberg-Marquardt algorithm to find the set of volatility parameters minimizing the error between the model's prices for cap & floor options and swaptions. The error used is the sum of the squares of the differences (SSD). I-e:

defining an option price. N defines the number of options that are included in the calibration basket.

The same idea of optimization process on swaption prices can also be seen in [Russo and Torri, 2019] . Where, instead of using the SSD, the relative error is used i-e: Schlenkrich, 2012] still calibrates the HW model on swaptions. The optimization process is done using Gauss Newton and an Adjoint Broyden quasi-Newton methods. The required derivatives being approximated by automatic differentiation techniques.

Other calibration methods address more specific and advanced questions. For instance in the current macroeconomic context with low to negative IRs, the methods above might not be well suited. [Russo and Fabozzi, 2017] addresses the problem of negative IRs with a focus on the second point of our definition, the choice of products to calibrate to. More precisely, while they still use captions, they aim at finding the best swaption quotation. They come to the conlcusion that calibrating the Hull-White model, the shift-extended CIR model or the Shift-extended squared Gaussian, using the shifted log-normal or normal volatility implies more stable quotations which in return gives more stable parameters. [Orlando et al., 2019] tackle the calibration of the CIR model with negative or near-zero IR values. Their approach consists in translating the IRs to positive values, keeping that way the initial volatility. The rate is shifted by a constant α (allowing one to keep the initial dynamic), i-e:

They use the 99th percentile of the empirical IR probability distribution as the constant. For the calibration process, instead of minimizing the error between theoretical and market data, the authors instead take advantage of the fact that likelihood of parameters in the CIR model are available (for example [Kladívko, 2007, Ben Alaya and Kebaier, 2013] ).

DL, especially (NN) is being more and more applied in most of fields including finance where many applications have been found. Many NN architecture types exists. According to [Huang et al., 2020] for example, feedforward neural networks (FNN) which are multiple layers of fully connected perceptrons ([Pal and Mitra, 1992] ) and Long Short-Term Memory (LSTM) ( [Hochreiter and Schmidhuber, 1997] ) which are recurrent networks (NN that can process information in two directions instead one one like FNNs) are the architecture mainly used in forecasting (exchange rates, option prices, risks, stock markets, . . . ).

Also, DL can be used to replicate financial instruments or characteristic behaviours. [Heaton et al., 2016] for example, uses auto-encoders [Ng et al., 2011] which are neural architectures originally designed to automatically learn features from unlabeled data to replicate the inputs. The authors purpose is to replicate indexes such as the SP500. The same idea of replication can be also found with [Bloch, 2019] where the authors use a NN with dimensionality reduction through Princiapl Component Analysis (PCA) to learn the dynamics of the implied volatility surface. Replicating dynamics of indexes or volatilities can be a first step into choosing an investment or a hedging strategy. However, DL can be used directly for both of these purposes.

Regarding investing strategies, [Heaton et al., 2017] uses an approach called "Deep Portfolios" which tries to understand the key factors driving asset prices, before generating the mean-variance efficient portfolio. As for hedging, [Buehler et al., 2019] with the "Deep hedging" approach uses reinforcement learning -which basically uses a reward system so the algorithm can learn from it's experience (in depth elements can be found with [Kaelbling et al., 1996] ) to find the optimal hedge given available instruments in a factitious market driven by the Heston model.

Another application of DL in finance which is getting more and more attention is assets pricing. [Chen et al., 2020] for example appraoch combines 3 different architectures with different purposes, a FNN aiming at understanding non-linearities, a LSTM net to find set of economic state processes, that will help the model understand the macroeconomic conditions relevant for the asset pricing and finally a generative adversarial network (GAN) (about which the reader can learn more here [Creswell et al., 2018] ) to identify the strategies with the most unexplained pricing information. The authors work added value, is also in their inclusion of the no-arbitrage assumption via a stochastic discount factor which they show helps performing well. [Jang et al., 2021 ] use a deep FNN to predict values of market options. Their performances are bettered by the fact that to overcome the lack of market data and unbalanced data sets, they pre-trained their network with data generated using parametric option pricing methods (Black-Scholes, Monte-Carlo, finite differences, and binomial tree).

Finally, another application of DL to finance, which is the main interset of the work being presented is calibration of financial models. [Pironneau, 2019] for example calibrated the Heston model for European options using FNN. While with a quite shallow architecture the author achieves good precision, the author has a stalling performance at some point. [Hernandez, 2016] also uses FNNs but to calibrate financial models in general and applies it to the one factor Hull-White model using swaptions prices. The author outperforms classic methods in terms of calibration times.

The approach we present is different than the two aforementioned calibration process using DL in both the input data (we do not use options prices) and architectures we propose. Indeed, depending on the input data, we will use either a FNN or Convolutional Neural Network (CNN) (see [Albawi et al., 2017] for details).

In what follows, we will present two different approaches to calibrate IR models. Both the approaches actually are not model-dependent and could be applied to any model (financial or not) as long as what we call the Observable Quantity Of Interest (OQOI) which has to be a relevant information bearer, can be both observed (in the market in our financial context) and obtained through an analytical expression from our model.

One of the main concerns of our DC models is the ease of use, regarding data. We want to use easily available data in the market and data that wouldn't need much transformations. For the training of our DL models we decide to use in both approaches synthetic data. Many reasons are in that favour :

• The data set becomes of unlimited size ;

• We can modify the data to be more representative of a specific context (a stressed one for example) or we can decide to include many different contexts ;

• Unlike the other calibration papers mentioned previously, we can measure our error directly vis-à-vis the real parameters, we do not need the intermediary of another quantity like an option price.

• Our DL architectures get to actually learn what drives the IR model. Which means that when out-of-sample calibrating (out of the learning process) the DL model will find the parameters that really suit the IR model. If we used real world data instead, we would have found the parameters corresponding to the real world data closer to the IR model's. In other words, using synthetic data allows us to replicate the behaviour of the IR model we are currently calibrating.

The last point is of high importance as it will also helps us staying consistent with our aim of calibrating IR models rather than inferring IR dynamics -which has already been done using DL (see for example [Oh and Han, 2000 , Jacovides, 2008 , Vela, 2013 )-. The main reasons to support this decision are the fact that, as of today, professional and researchers using IR forecasts have more mastery on IR models than they have on DL models since the former are widely used since now many decades. Also, IR models have many advantages, for example, it is quite straightforward to stress IR forecasts via classic models (through real world simulation for example or alterations of parameters).

The following two subsections in this chapter will present each approaches by explaining how analytical expressions allowing synthetic data generation is obtained, how said data set is constructed and which architecture is chosen. Both will -in different ways -use the risk neutral price of a Zero-Coupon (ZC) rate and / or the forward (FWD) rate.

We recall the expression of ZC bond brice P (., .) at time t and maturity T (see [Brigo and Mercurio, 2006 ] for the proof):

With P M (0, T ), the initial ZC market curve at maturity T and V(., .) verifying:

And since for the ZC rate Z(., .), we have P (t, T ) = e −(T −t)Z(t,T ) :

And for the FWD rate f (., .) we have f (t, T ) = − ∂ ∂T lnP (t, T ): 

In the above expressions, depending on G(T) which can be either the ZC or the FWD we have the expression of X(.) and Y (.)

By symmetry, we get the expressions for Y (T ).

The data set construction is made in 5 steps:

1. Choosing reference parameters ;

2. Extending the reference parameters to define a range of acceptance and drawing the parameters within the corresponding intervals ;

3. Using equations 5 and 6, computing the set of covariances and correlations for ZCs and FWDs ;

4. Choosing the scaling transformations needed ;

5. And finally splitting the data set for training and validation.

Choice of reference parameters: We make a classic error minimisation calibration from market Euro ZCs, ranging from June 2019 to November 2020. The parameters obtained are the following:

0.07173132 0.08930784 0.09465584 0.094675523 -0.999318 Choosing γ = 2 3 we get the intervals: We decide to draw uniformely in each interval N = 100 000 parameters. The obtained distributions are the following: Figure 1 : Generated parameters histograms

Here the only question is which maturities are chosen for the computations. As for short terms, liquidity can affect the rates curves ( [Covitz and Downing, 2007] ), we decide to only use tenors from 1 year and above. As we are willing not to have to much features, we decide not to take long terms tenors, above 12 years. We finally take tenors from 1 to 12 years with a 1 year step. The covariance and correlation matrices are transformed into vectors by keeping the triangular matrix (with the diagonal only for covariances) and stacking each columns. Hence obtaining an R nf array (nf = 66 for correlations and nf = 78 for covariances).

For three sets of randomly selected parameters, figure 2 below shows covariance of ZCs and figure 3 correlations of FWDs Scaling transformations: We have 3 possibilities, performing no transformation, performing them on both the features (covariances and correlations) and the targets (the parameters we are trying to find) or scaling only the features or only the targets. The results presented in section 4 for the indirect calibration are obtained when scaling both the features and the labels as it gave better results. The transformation used for a feature or a label u is the min-max scaler :

Splitting the data set: In fact, we now have 4 sets, each of 10 000 entries (correlations and covariances sets for ZCs and FWDs). Each of them will be randomly split into 2 subsets, one of 8 000 entries to train the model and one of 2 000 to test it.

As said previously we want here to keep a simple FCN architecture. Our NN has 5 linear layers : 1 input layer with nfthe number of features -neurons, 3 hideen layers, the first one with h 1 = 1 000 neurons, the second with h 2 = 1 500, the last hidden layer with h 3 = 1 000 neurons and finally 1 output layer of np -the number of parameters to calibrate (np = 5) -neurons. The first 3 layers are activated using a ReLU function and the last one as a prediction layer has no activation. To help prevent overfitting, we also include a dropout of probability 0.25 between the last hideen layer and the prediction layer.

The architecture can be represented as follow:

. . . M SE(y,ŷ) = 1 size(set)

We now want to consider calibration directly using ZCs curves 1 which are direct market observation unlike the correlations and covs. Many complications might occur, mostly regarding the dimensions. First of all, because when calibrating we generally want to apprehend an historical behaviour, it won't make much sense to use only observations from one date. Also, to get enough information, it will be better to still include many maturities. Altogether, it means that we will have to deal with matrices with one dimension conveying the temporal depth information and the other the maturities. A FCN won't be for suited this purpose and we will instead use CNNs.

The process is mostly the same as the one for the indirect DC. Only the step 3 (actual computation of the OQOI) changes 2 .

Computing ZCs: We will use equation 3. However the expression of A(t, T ) (see equation 2), holds a stochastic part in x(t) and y(t) which means that there won't be only one value for Z(t, T ). To get around this potential issue, instead of using 3D arrays, we will take the expectation of the ZCs rates computed.

If in equation 1, we set x 0 = y 0 = 0, it is straightforward to show that :

For the P M (0, t) curve, we take the Euro curve as of 2020/11/04. Unlike the indirect calibration and with a CNN we don't have issues with the number of tenors. Also even if when considering the initial market curve, we can observe an inhomogeneous behaviour on the first short tenors with respect to the long ones, we can expect that behaviour not to have a negative influence on the DC as long as we have enough tenors. We hence take the following set of maturities Finally, for the propagation, the step is 1 week and we make 105 steps of propagations (about 2 years). Our inputs will thus be in R nbsteps×nbtenors with nb steps = 106 and nb tenors = 28 Figure 6 shows for a randomly selected set of parameters the generated ZCs rate curves for a set of projection dates. Figure 6 : Example of generated ZC curve Appendix B shows market ZC rates. We can observe similar curves.

The architecture combining convolution and linear layers is still quite shallow. We start with a convolution layer of dimension (C=1, H=106, nf =28), with a filter of size (7 x 7). The stride of the convolution is 2 and we also allow padding. Then we add a pooling layer of stride 2 also. Finally follows 2 linear layers, the first one with 100 neurons and the second one (which is the prediction layers) with stil np = 5 layers. Here also we include in the second to last layer a dropout of probability 25%.

ZCs ∈ R nbsteps×nbtenors layer l = 0 Similarly as the indirect DC, we use mini-batches of size 1 000 and Adam algorithm without weight decay is applied to the MSE. The learning rate is 0.0002 and the number of epochs 4 000.

Regarding accuracy, both algorithms lead to pretty good results, whilst time efficiency is also remarkable as calibration on the test set (2 000 entries) is performed in the indirect DC model in less than 0.30 seconds and less than 0.10 seconds for direct DC. Table 3 below recapitulates the MSE obtained on the validation set for each parameters. The really interesting result we get is the fact that the error made using covariances is about half the error with correlations. It appears that they bring significantly more information for the calibration process.

We can verify this assertion numerically by observing the derivative curves of covariances and correlations w.r.t. to the parameters. We select -arbitrarily -the couple of maturities (5 Years, 7 Years) and a subset of 100 parameters for each of K x , σ y and ρ and we compute the derivatives. Figure 8 , 10 and 12 show the derivatives of ZCs covariances for the sample of parameters. While 9, 11 and 13 show the derivatives of ZCs correlations for the same sample.

We directly see that the correlations derivatives quickly vanish for the three parameters. Meanwhile, for covariances, we only have the vanishing derivative for K x but it occurs less quicker. For σ y and ρ, we don't even have that behaviour. The results obtained with the indirect DC could be improved by focusing on the covariances, allowing more tenors and proceeding to a dimension reduction via auto-encoders or PCA (Principal Component Analysis). Also, deepening the network could also improve performances but might require to complexify the model to avoid overfitting, like regulariazation.

Direct DC: Compared, to the indirect DC with ZC covariances, the direct method leads to slightly less accurate calibrations. Even though the differences are not important, it actually makes sense. Indeed, the indirect DC benefits from information already extracted from the ZC curve. Improving this method would be of great use as it is for the researcher or the professional, the easiest to use. The first idea would be to have a deeper reflection on the tenors that really bring information, choosing them more wisely could lead to better results and better computational performances. Another lead could be to challenge the CNN architecture, as the data is more complex than indirect DC's inputs, choosing the right pooling and activation functions, the right hyper-parameters (including the number of layers, i-e the depth of the CNN) and so on could have a great impact on the accuracy.

While MSEs analysis allows to compare the different models, visualization allow to observe the fitting and assess whether the results seem good or not. Figure 14 shows the fitting on the validation data set for the indirect DC with a zoom on a randomly selected interval and figure 15 shows the fitting for the correlation of FWDs 3 . Finally figure 16 show the fitting for the direct DC. 

Despite the growing use of DL and related Artificial Intelligence (AI) techniques in finance, analytical models, such as IR ones like the G2++ remain commonly used due to the expertise professional and researchers have developed over the years of these techniques. However, it doesn't quite mean that DL and AI in general, shouldn't try to improve our use of those models. In this paper we have shown that we could use DL to calibrate IR models.

Starting from parameters from a market classic calibration of the G2++ model, we have generated a synthetic data set of parameters by uniformly sampling around those reference parameters. Using results from the literature, the synthetic data set of parameters allowed use to compute 2 types of sets. The first one for the so called indirect DC consists of correlation and covariances of ZCs and FWDs. The second set serves the direct DC and consists of raw ZCs rate curves which are both observable on the market. For the first model we used a shallow FCN and for the second a shallow CNN.

We have shown that covariances gave more information to the NNs, as a consequence, the indirect DC using correlations made about double the errors of the covariance's indirect DC. The direct DC with ZCs like the indirect covariance gave a good accuracy. We also have identified improvements leads. For the first method, it could be useful to include more maturities and combine the model with another one for dimension reduction. For the second, we expect a better / deeper architecture and a good hyper-parameters tuning to results in smaller errors.

The errors and computational performances obtained allow us to conclude that using DL leads to very fast calibration and with low errors for the G2++ model -and subsequently for any other model fitting our requirements for the OQOI, i-e to be observable in the market and to hold an analytical expression -. Also as we aimed, the models are really effortless to use and only require very easily available data with few to none transformation to perform. As next step other than trying the improvements identified, it would be of high interest to apply our pre-trained models on other models such as option pricing ones for example or models with time-dependant parameters.

We can observe similar ZC covariance and correlation curves to those simulated for our synthetic data set.

We also observe similar ZC graphs to those simulated for our synthetic data set. 

An equilibrium characterization of the term structure

A theory of the term structure of interest rates

Interest rate models-theory and practice: with smile, inflation and credit

Calibration methods of hull-white model. Available at SSRN 1514192

The general hull-white model and supercalibration

Calibration of one-factor and two-factor hull-white models using swaptions

Efficient calibration of the hull white model

Calibrating short interest rate models in negative rate environments

Interest rates calibration with a cir model

Maximum likelihood estimation of the cox-ingersoll-ross process: the matlab implementation

Asymptotic behavior of the maximum likelihood estimator for ergodic and nonergodic square-root diffusions

Deep learning in finance and banking: A literature review and classification

Multilayer perceptron, fuzzy sets, classifiaction

Long short-term memory

Deep learning in finance

Sparse autoencoder. CS294A Lecture notes

Neural networks based dynamic implied volatility surface. Available at SSRN 3492662

Deep learning for finance: deep portfolios

Deep hedging

Reinforcement learning: A survey

Deep learning in asset pricing

Generative adversarial networks: An overview

Deepoption: A novel option pricing framework based on deep learning with fused distilled data from multiple parametric methods

Calibration of heston model with keras

Model calibration with neural networks

Understanding of a convolutional neural network

Using change-point detection to support artificial neural networks for interest rates forecasting. Expert systems with applications

Forecasting Interest Rates from the Term Structure: Support Vector Machines Vs Neural Networks

Forecasting latin-american yield curves: An artificial neural network approach

Liquidity or credit risk? the determinants of very short-term corporate yield spreads

Adam: A method for stochastic optimization