The StanMoMo
package includes two methods for model selection and model averaging based on leave-future-out validation, called stacking and pseudo-BMA. First we briefly discuss why the standard Bayesian model averaging and leave-one-out cross-validation should be avoided for mortality forecasting. Second, we explain with some mathematical details how the methods of stacking and pseudo-BMA can be used for mortality forecasting based on future-out validation.
Instead of choosing one model, model averaging stems from the idea that a combination of candidate models among a model list \(\mathcal{M}=(M_1,\dots,M_K)\) may perform better than one single model. The standard Bayesian approach, called (BMA), consists in weighing each model by its posterior model evidence. This approach should be avoided for mortality forecasting for several reasons. Among them, BMA is very sensitive to prior choices and tends to select only one model asymptotically (see e.g. Fernandez et al. (2001)). Moreover, like the Bayes Information Criterion (BIC), BMA measures how well the model fits the past but not how well the model predicts the future.
As an alternative approach, different authors considered model selection and averaging based on prediction performance on hold-out data. For instance, Yao et al. (2018) proposed Bayesian model averaging approaches based on leave-one-out cross-validation (LOO-CV). While this method is valid as a measure of fit, LOO-CV is problematic for forecasting.
Indeed, leaving out only one observation at a time will allow information from the future to influence predictions of the past (i.e., data from times \(t + 1, t + 2, \dots,\) would inform predictions for time \(t\)). Instead, it is more appropriate to use leave-future-out validation. In our context of mortality forecasting, instead of leaving one point out, we leave the last \(M\) years of data out and evaluate the prediction performance on these \(M\) years. This approach proposed in Barigou et al. (2021) is further discussed in the next section.
In order to select the weights by validation, the data is split into a training set and a validation set as follows:
After fitting the mortality model to \(y_{1:N}\), we can obtain an empirical distribution of the future death rates \(\mu_{x,t}\) for \(t=t_{N+1},\dots,t_{N+M}\) based on the MCMC samples obtained via Stan. The stacking approach searches for the averaged model which maximizes the likelihood of the observed number of deaths on the validation set. In simple words, if the validation length is \(M=10\), the stacking approach will obtain the best forecasting averaged model for the last 10 years of data. The stacking maximization problem can be expressed as \[ \max _{w \in \mathcal{S}_{1}^{K}} \sum_{x=x_1}^{x_n} \sum_{j=t_{N+1}}^{t_{N+M}} \log \sum_{k=1}^{K} w_{k} p\left(d_{x,j} \mid y_{1: N},M_k\right), \] where
By construction, the averaged distribution maximizes the log likelihood of the observed number of deaths in the validation set among all distributions in the convex hull: \[\mathcal{C}=\left\{\sum_{k=1}^{K} w_{k} \times p\left(\cdot \mid M_{k}\right): \sum_{k} w_{k}=1, w_{k} \geq 0\right\}\]
As an alternative approach, we consider an AIC-type weighting using leave-future-out validation, called pseudo-BMA. This method is based on the expected log predictive density (elpd) which is an aggregate measure of fit which is in our mortality model given by \[ \operatorname{elpd}^{k}= \sum_{x=x_1}^{x_n} \sum_{j=t_{N+1}}^{t_{N+M}} \log p\left(d_{x,j} \mid y_{1: N},M_k\right). \] Typically, \(\operatorname{elpd}^{k}\) will be higher if the model \(M_k\) better predicts the observed deaths in the validation set. The pseudo-BMA weighting is then given by \[ w_{k}=\frac{\exp \left(\operatorname{elpd}^{k}\right)}{\sum_{k=1}^{K} \exp \left(\operatorname{elpd}^{k}\right)}. \] Hence, the pseudo-BMA finds the relative weights proportional to the elpd of each model. As discussed in our paper Barigou et al. (2021), while this method provides a simple weight mechanism, stacking tends to outperform pseudo-BMA.
The package includes the mortality_weights
function which performs Bayesian model averaging via stacking and pseudo-BMA. First, the user fits the different models with an extra argument validation
which represents the number of years of validation. In a second step, the user calls the mortality_weights
function with the list of model fits as argument.
For instance, if we want to apply the averaging methods to the LC, APC and RH models to French data with 10 years of validation for ages 50-90 and years 1970-2017, the weights can be obtained via
#We extract deaths and exposures
ages.fit<-50:90
years.fit<-1970:2017
deathFR<-FRMaleData$Dxt[formatC(ages.fit),formatC(years.fit)]
exposureFR<-FRMaleData$Ext[formatC(ages.fit),formatC(years.fit)]
#We fit the three mortality models
fitLC=lc_stan(death = deathFR,exposure=exposureFR, forecast = 10, validation=10,family = "poisson",cores=4)
fitRH=rh_stan(death = deathFR,exposure=exposureFR, forecast = 10, validation=10,family = "poisson",cores=4)
fitAPC=apc_stan(death = deathFR,exposure=exposureFR, forecast = 10, validation=10,family = "poisson",cores=4)
#We compute the model weights
model_weights<-mortality_weights(list(fitLC,fitRH,fitAPC))
Fernandez, C., Ley, E., & Steel, M. F. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100(2), 381-427.
Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3), 917-1007.
Barigou K., Goffard P-O., Loisel S., Salhi Y. (2021). Bayesian model averaging for mortality forecasting using leave-future-out validation. arXiv preprint arXiv:2103.15434.