# Install necessary packages
list_packages <- c("AER", "dynlm", "tidyverse", "fpp", "fpp2", 
                   "forecast", "readxl", "stargazer", "scales",
                   "quantmod", "urca", "vars", "tseries", "sarima")
new_packages <- list_packages[!(list_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load necessary packages
lapply(list_packages, require, character.only = TRUE)

So far, we have dealt with univariate time series data, what about multiple variables?

1 Time Series Regression

The time series models in the previous chapters allow for the inclusion of information from past observations, but not for the inclusion of other variables.

We considered regression models of the form \[ Z_t=\beta_0 + \beta_1 X_{1,t} + ... \beta_k X_{k,t} + \epsilon_t \]

In other words, \(Z_t\) is a linear combination of \(X_{1,t}\)\(X_{k,t}\). When \(\epsilon_t\) is assumed to be an uncorrelated error term (i.e., it is white noise), this is a typical multiple linear regression.

We further allow the errors from a regression to contain autocorrelation. To emphasise this change, we replace \(\epsilon_t\) with \(\eta_t\). The error series \(\eta_t\) is assumed to follow an ARIMA model. For example, if \(\eta_t\) follows an ARIMA(1,1,1) model, we can write

\[ Z_t=\beta_0 + \beta_1 X_{1,t} + ... \beta_k X_{k,t} + \eta_t\\ (1-\phi_1 B)(1- B)\eta_t = (1-\theta_1 B) a_t \]

Notice that the model has two error terms here — the error from the regression model, which we denote by \(\eta_t\), and the error from the ARIMA model, which we denote by \(a_t\). Only the ARIMA model errors are assumed to be white noise.

Note that if \(X_{1,t}\),…,\(X_{k,t}\) are excluded from the model, the model becomes an ARIMA model.

1.1 Estimation

We cannot use least square estimate for this model because:

  • The estimated coefficients are no longer the best estimates, as some information has been ignored in the calculation;
  • Any statistical tests associated with the model (e.g., t-tests on the coefficients) will be incorrect. In most cases, the p-values associated with the coefficients will be too small, and so some predictor variables will appear to be important when they are not. This is known as “spurious regression”.
  • The AICc values of the fitted models are no longer a good guide as to which is the best model for forecasting.

We use maximum likelihood estimation.

An important consideration when estimating a regression with ARMA errors is that all of the variables in the model must first be stationary. Thus, we first have to check that \(Z_t\) and all of the predictors appear to be stationary. If we estimate the model when any of these are non-stationary, the estimated coefficients will not be consistent estimates (and therefore may not be meaningful). One exception to this is the case where non-stationary variables are co-integrated. If there exists a linear combination of the non-stationary \(Z_t\) and the predictors that is stationary, then the estimated coefficients will be consistent. We will explain this later.

It is often desirable to maintain the form of the relationship between \(Z_t\) and the predictors, and consequently it is common to difference all of the variables if any of them need differencing. The resulting model is then called a “model in differences”, as distinct from a “model in levels”, which is what is obtained when the original data are used without differencing.

If the above regression model with ARIMA(1,1,1) errors is differenced we obtain the model

\[ \nabla Z_t=\beta_0 + \beta_1 \nabla X_{1,t} + ... \beta_k \nabla X_{k,t} + \nabla \eta_t\\ (1-\phi_1 B) \nabla \eta_t = (1-\theta_1 B) a_t \]

R function Arima() will fit a regression model with ARIMA errors if the argument xreg is used. The order argument specifies the order of the ARIMA error model.

The auto.arima() function will also handle regression terms via the xreg argument. The user must specify the predictor variables to include, but auto.arima() will select the best ARIMA model for the errors.

If differencing is required, then all variables are differenced during the estimation process, although the final model will be expressed in terms of the original variables. To include a constant in the differenced model, specify include.drift=TRUE.

Here is an example

We analyze the quarterly changes in personal consumption expenditure and personal disposable income from 1970 to 2016 Q3. We would like to forecast changes in expenditure based on changes in income.

library(fpp2)
data(uschange)
head(uschange)
##         Consumption     Income Production   Savings Unemployment
## 1970 Q1   0.6159862  0.9722610 -2.4527003 4.8103115          0.9
## 1970 Q2   0.4603757  1.1690847 -0.5515251 7.2879923          0.5
## 1970 Q3   0.8767914  1.5532705 -0.3587079 7.2890131          0.5
## 1970 Q4  -0.2742451 -0.2552724 -2.1854549 0.9852296          0.7
## 1971 Q1   1.8973708  1.9871536  1.9097341 3.6577706         -0.1
## 1971 Q2   0.9119929  1.4473342  0.9015358 6.0513418         -0.1
autoplot(uschange[,1:2], facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("Quarterly changes in US consumption and personal income")

The data are clearly already stationary

(fit <- auto.arima(uschange[,"Consumption"], xreg=uschange[,"Income"]))
## Series: uschange[, "Consumption"] 
## Regression with ARIMA(1,0,2) errors 
## 
## Coefficients:
##          ar1      ma1     ma2  intercept    xreg
##       0.6922  -0.5758  0.1984     0.5990  0.2028
## s.e.  0.1159   0.1301  0.0756     0.0884  0.0461
## 
## sigma^2 estimated as 0.3219:  log likelihood=-156.95
## AIC=325.91   AICc=326.37   BIC=345.29
(fit <- Arima(uschange[,"Consumption"], order=c(1,0,2), xreg=uschange[,"Income"]))
## Series: uschange[, "Consumption"] 
## Regression with ARIMA(1,0,2) errors 
## 
## Coefficients:
##          ar1      ma1     ma2  intercept    xreg
##       0.6922  -0.5758  0.1984     0.5990  0.2028
## s.e.  0.1159   0.1301  0.0756     0.0884  0.0461
## 
## sigma^2 estimated as 0.3219:  log likelihood=-156.95
## AIC=325.91   AICc=326.37   BIC=345.29

The fitted model is

\[ Z_t = 0.599 + 0.203 X_t + \eta_t,\\ \eta_t = 0.692 \eta_{t-1} + a_t + 0.576 a_{t−1} - 0.198 a_{t−2} \]

where \(a_t\) is white noise.

We can recover estimates of both the \(\eta_t\) and \(a_t\) series using the residuals(). function.

cbind("Regression Errors" = residuals(fit, type="regression"),
      "ARIMA errors" = residuals(fit, type="innovation")) %>%
  autoplot(facets=TRUE)

It is the ARIMA errors that should resemble a white noise series.

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 5.8916, df = 3, p-value = 0.117
## 
## Model df: 5.   Total lags used: 8
uschange %>%
  as.data.frame() %>%
  ggplot(aes(x=Income, y=Consumption)) +
    ylab("Consumption (quarterly % change)") +
    xlab("Income (quarterly % change)") +
    geom_point() +
    geom_smooth(method="lm", se=FALSE)

1.2 Forecast

To forecast using a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model, and combine the results.

When the predictors are themselves unknown, we must either model them separately, or use assumed future values for each predictor.

We will calculate forecasts for the next eight quarters assuming that the future percentage changes in personal disposable income will be equal to the mean percentage change from the last forty years.

fcast <- forecast(fit, xreg=rep(mean(uschange[,2]),20))
autoplot(fcast) + xlab("Year") +
  ylab("Percentage change")

The prediction intervals for this model are narrower than the ARIMA model for \(Z_t\) because we are now able to explain some of the variation in the data using the income predictor.

fit2 <- Arima(uschange[,"Consumption"], order=c(1,0,2))
fcast2 <- forecast(fit2,h=20)
autoplot(fcast2) + xlab("Year") +
  ylab("Percentage change")

It is important to realise that the prediction intervals from regression models (with or without ARIMA errors) do not take into account the uncertainty in the forecasts of the predictors. So they should be interpreted as being conditional on the assumed (or estimated) future values of the predictor variables.

1.3 Gasoline price

Suppose we are interested in the level of gas prices (Data) as a function of various explanatory variables.

We observe Gas Prices \(Y_t\) over \(n\) time periods, \(t = 1, 2, \dots, n\).

  1. we can forecast \(Y\) purely against time and itself;

  2. Or, we can apply a VAR by using possible variables: Personal Disposable Income Consumer Price Index Unemployment Housing Starts Price of crude oil

# my_data1 <- Gas_prices[,-c(1:3,10:29)]
# Import the data ---------------------------------------------------------
GasPrices<- readxl::read_xlsx(path = "Gas_prices_1.xlsx")
head(GasPrices)
## # A tibble: 6 x 29
##   Date   Year Month Unleaded Crude_price SP500   PDI   CPI RR_sales Unemp
##   <chr> <dbl> <dbl>    <dbl>       <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
## 1 15-J~  1996     1     109.        18.9  636. 5660   154.   134926   5.6
## 2 15-F~  1996     2     109.        19.1  640. 5713.  155.   136781   5.5
## 3 15-M~  1996     3     114.        21.3  646. 5749.  156.   137527   5.5
## 4 15-A~  1996     4     123.        23.5  654. 5733.  156.   137504   5.6
## 5 15-M~  1996     5     128.        21.2  669. 5816   157.   138302   5.6
## 6 15-J~  1996     6     126.        20.4  671. 5851.  157.   137881   5.3
## # ... with 19 more variables: Housing <dbl>, Demand <dbl>, Production <dbl>,
## #   Imports <dbl>, Stocks <dbl>, Time <dbl>, L1_Unleaded <dbl>,
## #   L1_Crude_price <dbl>, L1_SP500 <dbl>, L1_PDI <dbl>, L1_CPI <dbl>,
## #   L1_RR_sales <dbl>, L1_Unemp <dbl>, L1_Housing <dbl>, L1_Demand <dbl>,
## #   L1_Production <dbl>, L1_Imports <dbl>, L1_Stocks <dbl>, Recession <dbl>
if (!require("forecast")){install.packages("forecast")}
Unleaded <- ts(GasPrices$Unleaded, start=1996, frequency = 12, end=c(2015, 12))

# Transformation----------------------------------------------------------
log_Unleaded <- log(Unleaded)

# auto.arima
fit.auto <- auto.arima(log_Unleaded)
print(fit.auto)
## Series: log_Unleaded 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.5588
## s.e.  0.0488
## 
## sigma^2 estimated as 0.003014:  log likelihood=354.82
## AIC=-705.64   AICc=-705.59   BIC=-698.69
checkresiduals(fit.auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 46.458, df = 23, p-value = 0.002622
## 
## Model df: 1.   Total lags used: 24
mod1 <- lm(Unleaded~Crude_price + CPI + Demand, data = GasPrices)
summary(mod1)
## 
## Call:
## lm(formula = Unleaded ~ Crude_price + CPI + Demand, data = GasPrices)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.549 -10.814  -1.954   7.357  65.137 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -48.26392   19.52878  -2.471   0.0142 *  
## Crude_price   2.30659    0.06870  33.575  < 2e-16 ***
## CPI           0.76253    0.09357   8.149 2.14e-14 ***
## Demand       -0.14675    0.14335  -1.024   0.3070    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.58 on 236 degrees of freedom
## Multiple R-squared:  0.9681, Adjusted R-squared:  0.9677 
## F-statistic:  2386 on 3 and 236 DF,  p-value: < 2.2e-16
fit <- auto.arima(GasPrices$Unleaded, xreg=as.matrix(GasPrices[,c("Crude_price", "CPI", "Demand")]))
summary(fit)
## Series: GasPrices$Unleaded 
## Regression with ARIMA(1,0,2) errors 
## 
## Coefficients:
##          ar1     ma1     ma2  intercept  Crude_price     CPI  Demand
##       0.7134  0.4971  0.1281  -207.2742       1.6327  1.6563  0.3211
## s.e.  0.0955  0.0997  0.0950    57.5287       0.1951  0.3035  0.2907
## 
## sigma^2 estimated as 92.26:  log likelihood=-880.77
## AIC=1777.55   AICc=1778.17   BIC=1805.39
## 
## Training set error measures:
##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.06211942 9.463917 6.632496 -0.1367517 3.007046 0.6452159
##                     ACF1
## Training set -0.00322775
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 6.8914, df = 3, p-value = 0.07544
## 
## Model df: 7.   Total lags used: 10

1.4 Another Example

data(elecdaily)
elecdaily[, c("Demand","Temperature")] %>% autoplot(facets=TRUE)

elecdaily %>% 
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point()

xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
              MaxTempSq = elecdaily[, "Temperature"]^2,
              Workday = elecdaily[, "WorkDay"])
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
## Q* = 28.229, df = 4, p-value = 1.121e-05
## 
## Model df: 10.   Total lags used: 14
fcast <- forecast(fit,
  xreg = cbind(MaxTemp=rep(26,14), MaxTempSq=rep(26^2,14),
    Workday=c(0,1,0,0,1,1,1,1,1,0,0,1,1,1)))
autoplot(fcast) + ylab("Electricity demand (GW)")

3 VAR

One limitation of the models that we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa. However, there are many cases where the reverse should also be allowed for — where all variables affect each other. the changes in personal consumption expenditure were forecast based on the changes in personal disposable income. However, in this case a bi-directional relationship may be more suitable: an increase in consumption will lead to an increase in income and vice versa.

Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they all influence each other equally. In more formal terminology, all variables are now treated as “endogenous”.

A VAR model is a generalisation of the univariate autoregressive model for forecasting a vector of time series.21 It comprises one equation per variable in the system. To keep it simple, we will consider a two variable VAR with one lag. We write a 2-dimensional VAR(1) as

\[ Z_{t} = c_1 + \phi_{11,1} Z_{t-1} + \phi_{12,1} Y_{t-1} + a_{1,t}\\ Y_{t} = c_2 + \phi_{21,1} Z_{t-1} + \phi_{22,1} Y_{t-1} + a_{2,t} \] where \(a_{1,t}\) and \(a_{2,t}\) are white noise processes that may be contemporaneously correlated.

If the series are stationary, we forecast them by fitting a VAR to the data directly (known as a “VAR in levels”). If the series are non-stationary, we take differences of the data in order to make them stationary, then fit a VAR model (known as a “VAR in differences”).

The other possibility is that the series may be non-stationary but cointegrated, which means that there exists a linear combination of them that is stationary. In this case, a VAR specification that includes an error correction mechanism, usually referred to as a vector error correction model (VEC), should be included.

Forecasts are generated from a VAR in a recursive manner. The one-step-ahead forecasts are generated by

\[ \hat{Z}_{1,t}(1) = \hat{c}_1 + \hat{\phi}_{11,1} Z_{1,t} + \hat{\phi}_{12,1} Z_{2,t}\\ \hat{Z}_{2,t}(1) = \hat{c}_2 + \hat{\phi}_{21,1} Z_{1,t} + \hat{\phi}_{22,1} Z_{2,t} \]

There are two decisions one has to make when using a VAR to forecast, namely how many variables (denoted by K) and how many lags (denoted by p) should be included in the system. The number of coefficients to be estimated in a VAR is equal to \(K+pK^2\) (or \(1+pK\) per equation). It is usual to keep K small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included.

VAR models are implemented in the vars package in R. It contains a function VARselect() for selecting the number of lags p. 

VARs are useful in several contexts:

Example

A VAR model for forecasting US consumption

VARselect(uschange[,1:2], lag.max=8,
  type="const")[["selection"]]
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      1      1      5

The R output shows the lag length selected by each of the information criteria available in the vars package. We have met the AIC before, and SC is simply another name for the BIC (SC stands for Schwarz Criterion, after Gideon Schwarz who proposed it). HQ is the Hannan-Quinn criterion, and FPE is the “Final Prediction Error” criterion. Care should be taken when using the AIC as it tends to choose large numbers of lags. Instead, for VAR models, we prefer to use the BIC.

There is a large discrepancy between the VAR(5) selected by the AIC and the VAR(1) selected by the BIC. This is not unusual. As a result we first fit a VAR(1), as selected by the BIC.

var1 <- VAR(uschange[,1:2], p=1, type="const")
serial.test(var1, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var1
## Chi-squared = 49.102, df = 36, p-value = 0.07144
var2 <- VAR(uschange[,1:2], p=2, type="const")
serial.test(var2, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var2
## Chi-squared = 47.741, df = 32, p-value = 0.03633

In similar fashion to the univariate ARIMA methodology, we test that the residuals are uncorrelated using a Portmanteau test. Both a VAR(1) and a VAR(2) have some residual serial correlation, and therefore we fit a VAR(3).

var3 <- VAR(uschange[,1:2], p=3, type="const")
serial.test(var3, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var3
## Chi-squared = 33.617, df = 28, p-value = 0.2138
var3
## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation Consumption: 
## ================================================ 
## Call:
## Consumption = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
## 
## Consumption.l1      Income.l1 Consumption.l2      Income.l2 Consumption.l3 
##     0.19100120     0.07836635     0.15953548    -0.02706495     0.22645563 
##      Income.l3          const 
##    -0.01453688     0.29081124 
## 
## 
## Estimated coefficients for equation Income: 
## =========================================== 
## Call:
## Income = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
## 
## Consumption.l1      Income.l1 Consumption.l2      Income.l2 Consumption.l3 
##     0.45349152    -0.27302538     0.02166532    -0.09004735     0.35376691 
##      Income.l3          const 
##    -0.05375916     0.38749574

We can conduct Granger causality tests as follows

summary(var3)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Consumption, Income 
## Deterministic variables: const 
## Sample size: 184 
## Log Likelihood: -380.155 
## Roots of the characteristic polynomial:
## 0.7821 0.4981 0.4981 0.4441 0.2857 0.2857
## Call:
## VAR(y = uschange[, 1:2], p = 3, type = "const")
## 
## 
## Estimation results for equation Consumption: 
## ============================================ 
## Consumption = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
## 
##                Estimate Std. Error t value Pr(>|t|)    
## Consumption.l1  0.19100    0.07965   2.398 0.017518 *  
## Income.l1       0.07837    0.05453   1.437 0.152445    
## Consumption.l2  0.15954    0.08252   1.933 0.054792 .  
## Income.l2      -0.02706    0.05723  -0.473 0.636831    
## Consumption.l3  0.22646    0.07970   2.841 0.005018 ** 
## Income.l3      -0.01454    0.05425  -0.268 0.789055    
## const           0.29081    0.08300   3.504 0.000581 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.5955 on 177 degrees of freedom
## Multiple R-Squared: 0.2132,  Adjusted R-squared: 0.1865 
## F-statistic: 7.994 on 6 and 177 DF,  p-value: 1.216e-07 
## 
## 
## Estimation results for equation Income: 
## ======================================= 
## Income = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
## 
##                Estimate Std. Error t value Pr(>|t|)    
## Consumption.l1  0.45349    0.11560   3.923 0.000125 ***
## Income.l1      -0.27303    0.07915  -3.450 0.000702 ***
## Consumption.l2  0.02167    0.11977   0.181 0.856663    
## Income.l2      -0.09005    0.08306  -1.084 0.279788    
## Consumption.l3  0.35377    0.11568   3.058 0.002572 ** 
## Income.l3      -0.05376    0.07875  -0.683 0.495699    
## const           0.38750    0.12047   3.217 0.001543 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.8643 on 177 degrees of freedom
## Multiple R-Squared: 0.1755,  Adjusted R-squared: 0.1475 
## F-statistic: 6.279 on 6 and 177 DF,  p-value: 5.309e-06 
## 
## 
## 
## Covariance matrix of residuals:
##             Consumption Income
## Consumption      0.3546 0.1845
## Income           0.1845 0.7470
## 
## Correlation matrix of residuals:
##             Consumption Income
## Consumption      1.0000 0.3585
## Income           0.3585 1.0000

The residuals for this model pass the test for serial correlation. The forecasts generated by the VAR(3) are plotted below.

forecast(var3) %>%
  autoplot() + xlab("Year")