We require the following R packages.

# Install necessary packages
list_packages <- c("forecast", "readxl", "stargazer", "fpp", 
                   "fpp2", "scales", "quantmod", "urca",
                   "vars", "tseries", "ggplot2", "dplyr")
new_packages <- list_packages[!(list_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load necessary packages
lapply(list_packages, require, character.only = TRUE)

1 Model Identification

Given time series data, we need to identify the form of the model first, i.e., AR, MA, ARMA, or ARIMA? The order (p,d,q)?

The main tool is sample ACF \(\hat{\rho}_k\) and sample PACF \(\hat{\phi}_{kk}\).

Model	ACF, \(\rho_k\)	PACF, \(\phi_{kk}\)
AR(p)	exponentially decay	cut off at lag \(p\)
MA(q)	cut off at lag \(q\)	exponentially decay
ARMA(p,q)	exponentially decay after lag \(q\)	exponentially decay
ARIMA(p,d,q)	slowly decrease	exponentially decay

Model identification requires a good understanding the process AR, MA, ARMA, ARIMA, for example, their ACFs and PACFs.

In practice, these ACF and PACF are unknown. For given time series data, ACF and PACF have to be estimated.

In model identification, our goal is to match patterns in the sample ACF and PACF with the known patterns of ACF and PACF of a specific model.

To get sample ACF and PACF, we use:

In R, we use

acf()
pacf()

In SAS, we use

IDENTIFY VAR=Z NLAG =50

Examples

Simulated data:

n=200
ts1=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
par(mfrow=c(1,3))
plot.ts(ts1)
acf(ts1,ylim=c(-1,1),lag.max=10)
pacf(ts1,ylim=c(-1,1),lag.max=10)

n=20000
ts1=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
par(mfrow=c(1,3))
plot.ts(ts1)
acf(ts1,ylim=c(-1,1),lag.max=10)
pacf(ts1,ylim=c(-1,1),lag.max=10)

2 Cases

Case 1: Business Inventories data fWe analyzed the quarterly change in bupiness inventories, stated at annual rates in billions of dollars. We examine 60 observations covering the period from the first quarter of 1955 through the fourth quarter of 1969. The data is seasonally adjusted.

case1=as.ts(scan("case1.txt"))
case1

## Time Series:
## Start = 1 
## End = 60 
## Frequency = 1 
##  [1]  4.4  5.8  6.7  7.1  5.7  4.1  4.6  4.3  2.0  2.2  3.6 -2.2 -5.1 -4.9  0.1
## [16]  4.1  3.8  9.9  0.0  6.5 10.8  4.1  2.7 -2.9 -2.9  1.5  5.7  5.0  7.9  6.8
## [31]  7.1  4.1  5.5  5.1  8.0  5.6  4.5  6.1  6.7  6.1 10.6  8.6 11.6  7.6 10.9
## [46] 14.6 14.5 17.4 11.7  5.8 11.5 11.7  5.0 10.0  8.9  7.1  8.3 10.2 13.3  6.2

par(mfrow=c(1,3))
plot(case1)
acf(case1)
pacf(case1)

fit <- arima(case1,order=c(1,0,0))
fit

## 
## Call:
## arima(x = case1, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.6803     6.0423
## s.e.  0.0914     1.2870
## 
## sigma^2 estimated as 10.88:  log likelihood = -157.04,  aic = 320.08

par(mfrow=c(1,2))
plot(forecast(fit,h=30))
fitauto <- auto.arima(case1)
fitauto

## Series: case1 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.5017  -0.8767
## s.e.  0.1643   0.0918
## 
## sigma^2 estimated as 11.34:  log likelihood=-154.63
## AIC=315.26   AICc=315.7   BIC=321.5

plot(forecast(fitauto,h=30))

Case 2: Saving Rate dataThe saving rate is personal saving as a percent of disposable personal income. Some economists believe shifts in this rate contribute to business fluctuations. For example, when people save more of their income they spend less for goods and services. This reduction in total demand for output may cause national production to fall and unemployment to rise.

We analyze 100 quarterly observations of the US saving rate for the years 1955-1979. The data is seasonally adjusted.

case2=as.ts(scan("case2.txt"))
case2

## Time Series:
## Start = 1 
## End = 100 
## Frequency = 1 
##   [1] 4.9 5.2 5.7 5.7 6.2 6.7 6.9 7.1 6.6 7.0 6.9 6.4 6.6 6.4 7.0 7.3 6.0 6.3
##  [19] 4.8 5.3 5.4 4.7 4.9 4.4 5.1 5.3 6.0 5.9 5.9 5.6 5.3 4.5 4.7 4.6 4.3 5.0
##  [37] 5.2 6.2 5.8 6.7 5.7 6.1 7.2 6.5 6.1 6.3 6.4 7.0 7.6 7.2 7.5 7.8 7.2 7.5
##  [55] 5.6 5.7 4.9 5.1 6.2 6.0 6.1 7.5 7.8 8.0 8.0 8.1 7.6 7.1 6.6 5.6 5.9 6.6
##  [73] 6.8 7.8 7.9 8.7 7.7 7.3 6.7 7.5 6.4 9.7 7.5 7.1 6.4 6.0 5.7 5.0 4.2 5.1
##  [91] 5.4 5.1 5.3 5.0 4.8 4.7 5.0 5.4 4.3 3.5

par(mfrow=c(1,3))
plot(case2)
acf(case2)
pacf(case2)

Case 3: Coal Production data The data in this case is monthly bituminous coal production in the US from January 1952 through December 1959, a total of 96 observations. The data is seasonally adjusted.

case3=as.ts(scan("case3.txt"))
case3

## Time Series:
## Start = 1 
## End = 96 
## Frequency = 1 
##  [1] 47730 46704 41535 41319 36962 32558 31995 32993 44834 29883 39611 40099
## [13] 38051 36927 37272 39457 38097 40226 43589 39088 39409 37226 34421 34975
## [25] 32710 31885 32106 30029 29501 31620 34205 32153 32764 33230 35636 35550
## [37] 34529 37498 37229 36021 38281 36676 44541 40850 38404 37575 41476 42267
## [49] 43062 45036 43769 42298 44412 40498 37830 42294 38330 43554 42579 36911
## [61] 42541 42430 43465 44468 43597 40774 42573 41635 39030 41572 37027 34732
## [73] 36817 34295 33218 32034 31417 35719 30001 33096 35196 36550 33463 37195
## [85] 34748 36461 35754 36943 35854 37912 30095 28931 31020 31746 34613 37901

par(mfrow=c(1,3))
plot(case3)
acf(case3)
pacf(case3)

Case 4: Housing Permits data We develop a model to forecast the index of new private housing units authorized by local building permits. These are quarterly seasonally adjusted data covering the years 1947-1967.

case4=as.ts(scan("case4.txt"))
case4

## Time Series:
## Start = 1 
## End = 84 
## Frequency = 1 
##  [1]  83.3  83.2 105.3 117.7 104.6 108.8  93.9  86.1  83.0 102.4 119.6 141.4
## [13] 158.6 161.3 158.2 136.1 121.9  97.7 103.3  92.7 106.8 102.1 110.3 114.1
## [25] 109.1 105.4  97.6 100.7 102.7 110.9 120.2 131.3 138.9 130.9 123.1 110.8
## [37] 108.8 103.8  97.0  93.2  89.7  89.9  90.2  89.6  85.8  96.9 112.7 122.7
## [49] 119.8 117.4 111.9 104.7  98.3  94.9  93.3  90.9  91.9  97.2 104.7 107.7
## [61] 108.2 110.7 113.1 114.6 112.2 120.2 122.1 126.6 122.3 115.9 116.9 110.1
## [73] 110.4 108.9 112.1 117.6 112.2  96.0  78.0  66.9  83.5  95.8 107.7 113.7

par(mfrow=c(1,3))
plot(case4)
acf(case4)
pacf(case4)

Case 5: Rail Freight data We model the quarterly freight volume carried by Class I railroads in the US measured in billions of ton-miles. The data cover the period 1965-1978, a total of 56 observations. The data is seasonally adjusted.

case5=as.ts(scan("case5.txt"))
case5

## Time Series:
## Start = 1 
## End = 56 
## Frequency = 1 
##  [1] 166.8 172.8 178.3 180.3 182.6 184.2 188.9 184.4 181.7 178.5 177.6 181.0
## [13] 186.5 185.7 186.4 186.3 189.3 190.6 191.7 196.1 189.3 192.6 192.1 189.4
## [25] 189.7 191.9 182.0 175.7 192.0 192.8 193.3 200.2 208.8 211.4 214.4 216.3
## [37] 221.8 217.1 214.0 202.4 191.7 183.9 185.2 194.5 195.8 198.0 200.9 199.0
## [49] 200.6 209.5 208.4 206.7 193.3 197.3 213.7 225.1

par(mfrow=c(1,3))
plot(case5)
acf(case5)
pacf(case5)

Case 6: AT&T Stock Price data The data is the weekly closing price of American Telephone and Telegraph (AT&T) common shares for the year 1979. The observations were taken from various issues of the Wall Street Journal, with only 52 observations.

case6=as.ts(scan("case6.txt"))
case6

## Time Series:
## Start = 1 
## End = 52 
## Frequency = 1 
##  [1] 61.000 61.625 61.000 64.000 63.750 63.375 63.875 61.875 61.500 61.625
## [11] 62.125 61.625 61.000 61.875 61.625 59.625 58.750 58.750 58.250 58.500
## [21] 57.750 57.125 57.750 58.875 58.000 57.875 58.000 57.125 57.250 57.375
## [31] 57.125 57.500 58.375 58.125 56.625 56.250 56.250 55.125 55.000 55.125
## [41] 53.000 52.375 52.875 53.500 53.375 53.375 53.500 53.750 54.000 53.125
## [51] 51.875 52.250

par(mfrow=c(1,3))
plot(case6)
acf(case6)
pacf(case6)

Case 7: Real-Estate Loan data We analyze the monthly volume of commercial bank real-estate loans in billions of dollars, from January 1973 to October 1978, a total of 70 observations. The data is derived from reports to the Federal Reserve System from large commercial banks.

case7=as.ts(scan("case7.txt"))
case7

## Time Series:
## Start = 1 
## End = 70 
## Frequency = 1 
##  [1] 46.5 47.0 47.5 48.3 49.1 50.1 51.1 52.0 53.2 53.9 54.5 55.2 55.6 55.7 56.1
## [16] 56.8 57.5 58.3 58.9 59.4 59.8 60.0 60.0 60.3 60.1 59.7 59.5 59.4 59.3 59.2
## [31] 59.1 59.0 59.3 59.5 59.5 59.5 59.7 59.7 60.5 60.7 61.3 61.4 61.8 62.4 62.4
## [46] 62.9 63.2 63.4 63.9 64.5 65.0 65.4 66.3 67.7 69.0 70.0 71.4 72.5 73.4 74.6
## [61] 75.2 75.9 76.8 77.9 79.2 80.5 82.6 84.4 85.9 87.6

par(mfrow=c(1,3))
plot(case7)
acf(case7)
pacf(case7)

Case 9: Air-Carrier Freight data We study the volume of freight, measured in ton-miles, hauled by air carries in the US. There are 120 monthly observations covering the years 1969-1978.

case9=as.ts(scan("case9.txt"))
case9

## Time Series:
## Start = 1 
## End = 120 
## Frequency = 1 
##   [1] 1299 1148 1345 1363 1374 1533 1592 1687 1384 1388 1295 1489 1403 1243 1466
##  [16] 1434 1520 1689 1763 1834 1494 1439 1327 1554 1405 1252 1424 1517 1483 1605
##  [31] 1775 1840 1573 1617 1485 1710 1563 1439 1669 1651 1654 1847 1931 2034 1705
##  [46] 1725 1687 1842 1696 1534 1814 1796 1822 2008 2088 2230 1843 1848 1736 1826
##  [61] 1766 1636 1921 1882 1910 2034 2047 2195 1765 1818 1634 1818 1698 1520 1820
##  [76] 1689 1775 1968 2110 2241 1803 1899 1762 1901 1839 1727 1954 1991 1988 2146
##  [91] 2301 2338 1947 1990 1832 2066 1952 1747 2098 2057 2060 2240 2425 2515 2128
## [106] 2255 2116 2315 2143 1948 1460 2344 2363 2630 2811 2972 2515 2536 2414 2545

par(mfrow=c(1,3))
plot(case9)
acf(case9)
pacf(case9)

Case 10: Profit Margin data The data is the after-tax profits, measured in cents per dollar of sales, for all US manufacturing corporations. The data covers the period 1953-1972, a total of 80 observations.

case10=as.ts(scan("case10.txt"))
case10

## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1] 4.4 4.3 4.4 4.0 4.3 4.6 4.5 4.7 5.2 5.4 5.5 5.6 5.4 5.4 5.0 5.1 5.3 4.9 4.7
## [20] 4.3 3.6 3.7 4.4 4.8 5.0 5.3 4.6 4.4 5.0 4.4 4.3 3.9 3.8 4.2 4.4 4.7 4.6 4.4
## [39] 4.5 4.7 4.4 4.7 4.7 5.0 5.1 5.2 5.3 5.3 5.6 5.5 5.6 5.6 5.8 5.7 5.6 5.4 5.0
## [58] 5.0 4.9 5.1 5.1 5.0 5.1 5.1 5.1 4.9 4.8 4.5 4.1 4.2 4.0 3.6 4.0 4.2 4.2 4.1
## [77] 4.2 4.2 4.3 4.5

par(mfrow=c(1,3))
plot(case10)
acf(case10)
pacf(case10)

Case 13: Cigar Consumption data The data represents monthly cigar consumption (withdrawals from stock) for the years 1969-1976.

case13=as.ts(scan("case13.txt"))
case13

## Time Series:
## Start = 1 
## End = 96 
## Frequency = 1 
##  [1] 484 498 537 552 597 576 544 621 604 719 599 414 502 494 527 544 631 557 540
## [20] 588 593 653 582 495 510 506 557 559 571 564 497 552 559 597 616 418 452 460
## [39] 541 460 592 475 442 563 482 562 520 346 466 403 465 485 507 483 403 506 442
## [58] 576 480 339 418 380 405 452 403 379 399 464 443 533 416 314 351 354 372 394
## [77] 397 417 347 371 389 448 349 286 317 288 364 337 342 377 315 356 354 388 340
## [96] 264

par(mfrow=c(1,3))
plot(case13)
acf(case13)
pacf(case13)

Case IMA: Refrigerator daca

d=scan("case_refrigerator.txt")
case_ref=as.ts(d[seq(2,length(d),2)])
case_ref

## Time Series:
## Start = 1 
## End = 52 
## Frequency = 1 
##  [1] 390 323 371 326 358 538 533 458 414 489 306 654 458 507 362 367 306 223 281
## [20] 317 238 286 306 307 275 284 298 318 340 497 349 380 379 526 272 401 553 527
## [39] 485 722 474 510 760 515 560 751 842 818 746 672 854 692

par(mfrow=c(1,3))
plot(case_ref)
acf(case_ref)
pacf(case_ref)

Case WWWusage: the numbers of users connected to the Internet through a server every minute.

par(mfrow=c(1,3))
plot.ts(WWWusage)
acf(WWWusage)
pacf(WWWusage)

fit=arima(WWWusage,order=c(3,1,0))
fit

## 
## Call:
## arima(x = WWWusage, order = c(3, 1, 0))
## 
## Coefficients:
##          ar1      ar2     ar3
##       1.1513  -0.6612  0.3407
## s.e.  0.0950   0.1353  0.0941
## 
## sigma^2 estimated as 9.363:  log likelihood = -252,  aic = 511.99

3 Advanced Model Identification Tools for ARMA

Identifying the order of the a pure AR or a pure MA is relatively easy.

The order of a pure AR is equal to the lag of the last significant spike in sample PACF.
The order of a pure MA is equal to the lag of the last significant spike in sample ACF.

What about ARMA(p,q)? It is much harder.

Minimum Information Criterion

Akaike’s information criterion: choose the fitted model which minimizes \[ AIC=-2 \log \mathcal{L} + 2 k \] where \(\mathcal{L}\) is the likelihood, \(k\) is the number of parameters to estimate.

The model with the minimum value of the AIC is often the best model for forecasting. For large values of \(n\), minimizing the AIC is equivalent to minimising the cross validation error.

For small values of \(n\), the AIC tends to select too many predictors, and so a bias-corrected version of the AIC has been developed.

AIC corrected for small sample bias (AICc) is defined as

\[ AICc=AIC+\frac{k(k+1)}{n-k-1}. \]

Bayesian information criterion: choose the fitted model which minimizes \[ BIC = -2 \log \mathcal{L} + k \log n \] where \(\mathcal{L}\) is the likelihood, \(k\) is the number of parameters to estimate.

The smaller AIC and BIC are, the better goodness of fit we obtain, i.e., AIC(model1)<AIC(model2) means model1 is preferred to model2, BIC(model1)<BIC(model2) means model1 is preferred to model2.

AIC focuses on predicting power, it is good for forecasting, but tends to overestimate the order of the AR part.

BIC focuses on model specification, good for identifying the true model.

It is important to note that these information criteria tend not to be good guides to selecting the appropriate order of differencing (d) of a model, but only for selecting the values of p and q. This is because the differencing changes the data on which the likelihood is computed, making the AIC values between models with different orders of differencing not comparable. So we need to use some other approach to choose d, and then we can use the AICc to select p and q.

case1=as.ts(scan("case1.txt"))
par(mfrow=c(1,3))
plot(case1)
acf(case1)
pacf(case1)

fit <- arima(case1,order=c(1,0,0))
AIC(fit)

## [1] 320.0826

BIC(fit)

## [1] 326.3657

fit2 <- arima(case1,order=c(1,0,1))
AIC(fit2)

## [1] 321.279

BIC(fit2)

## [1] 329.6564

4 Parameter Estimation

After the form of the model and the order are determined in model identification phase, we need to estimate the parameters, i.e., \(\phi_1\), \(\theta_1\) and etc.

There are many estimation methods:

methods of moments, i.e., Yule-Walker equation
least square estimate, conditional or unconditional
maximum likelihood estimation

In SAS, we use

ESTIMATE P=1/2/3 Q=1/2/3 METHOD=CLS/ULS/ML

n=200
ts1=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
par(mfrow=c(1,3))
plot.ts(ts1)
acf(ts1,ylim=c(-1,1),lag.max=10)
pacf(ts1,ylim=c(-1,1),lag.max=10)

fit = arima(ts1,order =  c(1,0,0))
fit

## 
## Call:
## arima(x = ts1, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.4675    -0.1679
## s.e.  0.0623     0.1380
## 
## sigma^2 estimated as 1.089:  log likelihood = -292.45,  aic = 590.89

5 Diagnostic Check

Determine model adequacy and suggest alternative models

General idea: have a fitted model. Is the model adequate? If not, how do we modify it?

No model is absolutely true, but it can be adequate.
Sometimes tests of adequacy will fail to show serious flaws in assumptions due to insensitivity.
Use the best procedure/model available and adopt the model which shows only slight lack of fit.

5.1 Fitted Values

For a time series model, let’s first consider the expectation of \(Z_t\) conditional on the past observations, \[ \mathbb{E}[Z_t|\boldsymbol{I}_{t-1}] = f(\mu, \boldsymbol{\phi},\boldsymbol{\theta},\boldsymbol{I}_{t-1}). \]

This is essentially a function of the true parameters, and past observations. Here we have:

\(\mu\): time series model level.
\(\boldsymbol{\phi}= (\phi_1,...,\phi_p)\): AR parameters.
\(\boldsymbol{\theta}= (\theta_1,...,\theta_p)\): MA parameters.
\(\boldsymbol{I}_{t-1}=z_{t-1},z_{t-2},...\) i.e., the past observations.

In practice, we don’t know the true parameters. We can fit the model to the data and estimate the parameters as \(\hat{\mu}\), \(\hat{\boldsymbol{\phi}}\), \(\hat{\boldsymbol{\theta}}\). Once we plug these estimates in the conditional expectation \(f\), and we can define the fitted value as

\[ \hat{Z}_t = f(\hat{\mu}, \hat{\boldsymbol{\phi}}, \hat{\boldsymbol{\theta}}, \boldsymbol{I}_{t-1}) \]

\(\hat{Z}_t\) is called the fitted value of \(Z_t\).

Example: For an AR(2), we have \(\tilde{Z}_t = \phi_1 \tilde{Z}_{t-1}+\phi_2 \tilde{Z}_{t-2}+a_t\), or in other words, \(Z_t=\mu + \phi_1 (Z_{t-1}-\mu) + \phi_2 (Z_{t-2}-\mu) + a_t\), then the fixed value becomes

\[\begin{align*} \mathbb{E}[Z_t|\boldsymbol{I}_{t-1}]&=\mathbb{E}[\mu + \phi_1 \tilde{Z}_{t-1}+\phi_2 \tilde{Z}_{t-2}+a_t | Z_{t-1}=z_{t-1}, Z_{t-2}=z_{t-2}]\\ &=\mu + \phi_1 (z_{t-1}-\mu) + \phi_2 (z_{t-2}-\mu) \end{align*}\]

Suppose the parameter estimates are \(\hat{\mu}=4.97\), \(\hat{\phi}=1.56\), \(\hat{\theta}=-0.72\), and suppose we know \(z_{1}=3.81\), \(z_{2}=3.82\), \(z_{3}=3.83\). Then

\[\begin{align*} \hat{Z}_3&=4.97+1.56(3.82-4.97)-0.72(3.81-4.97)=4.00\\ \hat{Z}_4&=4.97+1.56(3.83-4.97)-0.72(3.82-4.97)=4.01 \end{align*}\]

Note we cannot obtain fitted value for \(\hat{Z}_1\) and \(\hat{Z}_2\) since we don’t know \(z_1\) and \(z_0\).

Example

set.seed(123)
n=50
ts1=arima.sim(n = n, list(order = c(2,0,0), ar = c(0.4,0.2)), sd = 1)
fit = arima(ts1,order =  c(2,0,0))
plot.ts(ts1)
points(fitted(fit),pch=20,col="grey")
points(fitted(fit),type="l",col="grey")

5.2 Residuals

The residual from a time series model are defined as \[\begin{align*} \hat{a}_t = Z_t - \hat{Z}_t \end{align*}\]

Alternatively, in operator notation, the residuals can understood the following way. If the model is \[\begin{align*} (Z_t-\mu) - \phi_1 (Z_{t-1}-\mu) - ... - \phi_p (Z_{t-p}-\mu) &= a_t - \theta_1 a_{t-1} ... - \theta_q a_{t-q} \end{align*}\] where \(Z_t\) is observed. \(\mu\), \(\phi_1\), … , \(\phi_p\), \(\theta_1\), …, \(\theta_q\) are estimated. Then \(a_t\) can be inferred based on \(Z_t\) \(\hat{\mu}\), \(\hat{\phi}_1\), … , \(\hat{\phi}_p\), \(\hat{\theta}_1\), …, \(\hat{\theta}_q\).

\[\begin{align*} \phi(B) (Z_t-\mu) &= \theta(B) a_{t}\\ a_{t} &= \frac{\phi(B) (Z_t-\mu)}{\theta(B)}\\ \end{align*}\]

Therefore, the residuals can be obtained as \[\begin{align*} \hat{a}_{t} &= \frac{\hat{\phi}(B) (Z_t-\hat{\mu})}{\hat{\theta}(B)}\\ \end{align*}\]

Remarks:

It can be shown that \(\hat{a}_{t}=a_t + O(1/\sqrt{n})\), if the model is correct. As more and more data becomes available, i.e., \(n \to \infty\), \(\hat{a}_t\) closely approximates \(a_t\), a white noise. Therefore, studying \(\hat{a}_t\) could indicate model inadequacy, for example, checking residual ACF to reveal model misspecification.
If the model is correct, \(\hat{a}_{t}\) and \(\hat{a}_{t+k}\) should be uncorrelated for all \(k\neq 0\) , and should be normal with mean 0, and variance estimated by \(\hat{\sigma}^2_a=\sum \hat{a}_{t}^2 / (n-r)\) where \(r\) is the total number of parameter estimated in the model.
test of normality of residual: QQ-plot, Anderson-Darling Test, Shapiro-Wilk Test, Ryan-Joiner Test, Kolmogorov-Smirnov Test Details
test of zero mean: \(\sum \hat{a}_{t}/n\)
test of zero correlation: ACF of \(\hat{a}_{t}\)

5.3 Residual Analysis to Detect Lack of Fit

Residuals’ ACF reveals much about the lack of fit. If the model fits the data well, then residual will behave like white noises.

5.3.1 ACF of Residuals

We can test the ACF of the residuals to be zero or not. If we can reject \(H_0: \rho_k(a_t)=0\), then the residuals are not white noise, which indicates the lack of fit.

To test \(H_0: \text{Corr}(\hat{a}_t,\hat{a}_{t+k})=\rho_k(\hat{a}_t)=0\) for a particular \(k\), we reject \(H_0\) if \(|\hat{\rho}_k(\hat{a}_t)| > 2 \text{SD}(\hat{\rho}_k(\hat{a}_t))\).

For a large or moderate \(k\), \(\text{SD}(\hat{\rho}_k(\hat{a}_t)) \approx 1/\sqrt{n}\). This is due to Bartlett theorem. This is useful because we now understand the width of the confidence interval of ACF is essentially \(4 * \text{SD}(\hat{\rho}_k(\hat{a}_t)) \approx 4/\sqrt{n}\).

Simulation Example

par(mfrow=c(1,3))
n=10
sqrt(1/n)*2

## [1] 0.6324555

ts1=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
acf(ts1,ylim=c(-1,1),lag.max=8)

n=100
sqrt(1/n)*2

## [1] 0.2

ts2=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
acf(ts2,ylim=c(-1,1),lag.max=8)

n=1000
sqrt(1/n)*2

## [1] 0.06324555

ts3=arima.sim(n = n, list(order = c(1,0,0), ar = c(0.5)), sd = 1)
acf(ts3,ylim=c(-1,1),lag.max=8)

However, the approximation is not very good for small \(k\). The SD is usually more complex. For example, for AR(1), \(\tilde{Z}_t=\phi_1 \tilde{Z}_{t-1} + a_t\), we have

\[\begin{align*} \text{Var}(\hat{\rho}_1(\hat{a}_t)) &\approx \frac{\phi_1}{n} < \frac{1}{n}\\ \text{Var}(\hat{\rho}_2(\hat{a}_t)) &\approx \frac{1-\phi_1^2+\phi_1^4}{n} < \frac{1}{n}\\ \text{Var}(\hat{\rho}_k(\hat{a}_t)) &\approx \frac{1}{n} \text{ for }k \geq 3 \end{align*}\]

For MA(1), \(\tilde{Z}_t=a_t - \theta_1 a_{t-1}\), we have

\[\begin{align*} \text{Var}(\hat{\rho}_1(\hat{a}_t)) &\approx \frac{\theta_1}{n} < \frac{1}{n}\\ \text{Var}(\hat{\rho}_2(\hat{a}_t)) &\approx \frac{1-\theta_1^2+\theta_1^4}{n} < \frac{1}{n}\\ \text{Var}(\hat{\rho}_k(\hat{a}_t)) &\approx \frac{1}{n} \text{ for }k \geq 3 \end{align*}\]

Bartlett Theorem: Let \(z_1,...,z_n\) be a realization from a stationary time series model with Gaussian random error. Then for large \(n\), we have

If \(\rho_k=0\) for \(k \geq 0\), then \[\text{Var}(\hat{\rho}_k) \approx \frac{1}{n} \text{ for } k \geq 0.\]
If \(\rho_k \neq 0\) for \(k \leq q\), and \(\rho_k = 0\) for \(k > q\), then \[\text{Var}(\hat{\rho}_k) \approx \frac{1}{n}(1+2\sum_{i=1}^{q}\rho_i^2) \text{ for } k > q.\] for \(k>q\).

This theorem is useful for determining if \(\rho_k\) is effectively zero. This is important since the behavior of the ACF not only helps us identifying a time series model, but it indicates whether the series is stationary.

R.L. Anderson shows that the sampling distribution of \(\hat{\rho}_k\) is approximately normal with zero mean for large \(n\).

Example

Case 1: business inventories

case1=as.ts(scan("case1.txt"))
case1

## Time Series:
## Start = 1 
## End = 60 
## Frequency = 1 
##  [1]  4.4  5.8  6.7  7.1  5.7  4.1  4.6  4.3  2.0  2.2  3.6 -2.2 -5.1 -4.9  0.1
## [16]  4.1  3.8  9.9  0.0  6.5 10.8  4.1  2.7 -2.9 -2.9  1.5  5.7  5.0  7.9  6.8
## [31]  7.1  4.1  5.5  5.1  8.0  5.6  4.5  6.1  6.7  6.1 10.6  8.6 11.6  7.6 10.9
## [46] 14.6 14.5 17.4 11.7  5.8 11.5 11.7  5.0 10.0  8.9  7.1  8.3 10.2 13.3  6.2

1/sqrt(length(case1))*2

## [1] 0.2581989

par(mfrow=c(2,3))
plot(case1)
acf(case1,ylim=c(-1,1))
pacf(case1,ylim=c(-1,1))
fit <- arima(case1,order=c(1,0,0))
fit

## 
## Call:
## arima(x = case1, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.6803     6.0423
## s.e.  0.0914     1.2870
## 
## sigma^2 estimated as 10.88:  log likelihood = -157.04,  aic = 320.08

plot(fit$residuals)
acf(fit$residuals,ylim=c(-1,1))
pacf(fit$residuals,ylim=c(-1,1))

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0) with non-zero mean
## Q* = 4.4135, df = 8, p-value = 0.818
## 
## Model df: 2.   Total lags used: 10

5.3.2 A Portmanteau Lack of Fit Test

Alternative and adjunct to testing on \(\hat{\rho}_k(\hat{a}_t)\) individually, diagnose of first a few \(\hat{\rho}_k(\hat{a}_t)\) “taken as a whole” to indicate model inadequacy.

Results: Suppose we have the first \(L\) residual autocorrelation (or \(L\) is the maximum lag being considered), if the fitted model is correct then the Box-Pierce test statistics has the following form:

\[\begin{align*} Q=n\sum_{k=1}^{L} \hat{\rho}_k^2(\hat{a}_t) \sim \chi^2(L-r) \end{align*}\]

where \(r\) is the number of parameters estimated in the model, i.e., \(p/q/p+q\), and n is the sample size, i.e., number of observations. This is called Portmanteau test, or Box–Pierce test or Box-Ljung test.

If the model is inadequate, \(Q\) tends to get large. If the model is adequate, then \(\hat{a}_t\) is white noise, \(Q\) tends to be small.

Reject \(H_0\) if \(Q > \chi^2_{1-\alpha,L-r}\), which means the model is inadequate.

A more accurate test is the Box-Ljung test, based on

\[\begin{align*} Q^*=n(n+2)\sum_{k=1}^{L} (n-k)^{-1} \hat{\rho}_k^2(\hat{a}_t) \sim \chi^2(L-r) \end{align*}\]

Box.test(resid(fit), lag = 10, type = "Ljung-Box", fitdf = 1)

## 
##  Box-Ljung test
## 
## data:  resid(fit)
## X-squared = 4.4135, df = 9, p-value = 0.8822

acf=acf(resid(fit))

length(case1)*(length(case1)+2)*sum(acf$acf[2:11]^2/(length(case1)-1:10))

## [1] 4.413462

A close look at the empirical distribution of \(Q\):

n = 100
MC=10000
Q_vec=rep(NA,MC)
for (it in 1:MC)
{
  z = rnorm(n)
  test = Box.test(z, lag = 10, type = "Ljung-Box", fitdf = 0)
  Q_vec[it]=test$statistic
}
hist(Q_vec,xlab="Q",breaks=30,main="Q's distribution")
abline(v=qchisq(0.95,df=10-0))

A close look at the statistical derivation of \(Q\):

\[ Q=n \sum_{k=1}^{L} \hat{\rho}_k^2(\hat{a}_t)=\sum_{k=1}^{L} [\sqrt{n}\hat{\rho}_k(\hat{a}_t)]^2 \] We know \(\hat{\rho}_k(\hat{a}_t) \sim N(0,1/\sqrt{n})\) by Bartlett theorem, therefore, \(\sqrt{n}\hat{\rho}_k(\hat{a}_t) \sim N(0,1)\).

Note that if \(X \sim N(0,1)\), \(Y \sim N(0,1)\), and \(X\) and \(Y\) are independent, then \(X^2 \sim \chi^2(df=1)\), \(Y^2 \sim \chi^2(df=1)\), and \(X^2+Y^2 \sim \chi^2(df=2)\).

Once we sum over all the ACF squared together (but the sample ACF are not independent any more), we have

\[ \sum_{k=1}^{L} [\sqrt{n}\hat{\rho}_k(\hat{a}_t)]^2 \sim \chi^2(df=L-r) \]

5.3.3 Summary

The residual ACF \(\hat{\rho}_k^2(\hat{a}_t)\) is useful for detecting lack of fit by:

providing a check on the white noise property of the individual \(\hat{a}_t\) on individual ACFs.
provide an overall check of adequacy (\(Q\))

5.4 Revise Model Specification

If the residuals do not behave like a white noise, then we need to re-specify the model. For this part, the residuals ACF \(\hat{\rho}_k(\hat{a}_t)\) can also propose alternative models, i.e., model identification.

Suppose a model is fitted as MA(1)

\[\begin{align*} \tilde{Z}_{t} = a_t - \theta_1 a_{t-1} \end{align*}\]

and the residuals \(\hat{a}_t\) and residuals ACF \(\hat{\rho}_k(\hat{a}_t)\) was obtained below.

k	1	2	3	4	5	6	7	8	9
\(\hat{\rho}_k(\hat{a}_t)\)	0.62	0.35	0.17	0.08	0.03	0.01	-0.01	-0.02	0.03

plot(0:9,c(1,0.62 , 0.35 , 0.17 , 0.08 , 0.03 , 0.01 , -0.01 , -0.02 , 0.03 ),type="h")
abline(h=0)

The function \(\hat{\rho}_k(\hat{a}_t)\), for a large \(k\), dies out exponentially. It is like \(\hat{\rho}_k(\hat{a}_t)\) is from a AR(1) with \(\hat{\phi}_1=0.6\). It is certainly not like a white noise.

Therefore, the original model of MA(1) is probably wrong, i.e., \(\hat{a}_t\) is correlated.

In fact, from the form of the residual ACF \(\hat{\rho}_k(\hat{a}_t)\), it looks like \(\hat{a}_t\) might be AR(1), i.e.,

\[\begin{align*} a_t - \phi_1 a_{t-1} = e_t \end{align*}\]

where \(e_t\) is white noise.

So we write

\[\begin{align*} \tilde{Z}_t = a_t - \theta_1 a_{t-1} &\text{ and } \tilde{Z}_{t-1} = a_{t-1} - \theta_1 a_{t-2}\\ \tilde{Z}_t - \phi_1 \tilde{Z}_{t-1} &= ( a_t - \theta_1 a_{t-1} )- \phi_1 ( a_{t-1} - \theta_1 a_{t-2} )\\ &= ( a_t - \phi_1 a_{t-1} )- \theta_1 ( a_{t-1} - \phi_1 a_{t-2} )\\ &= e_t -\theta_1 e_{t-1} \end{align*}\]

where \(e_t\) is white noise.

Therefore, \(\tilde{Z}_t\) has a form of ARMA(1,1) instead of MA(1), because the residual is a AR(1).

In operator notation, we originally have

\[\begin{align*} \tilde{Z}_t = (1-\theta_1 B ) a_t \end{align*}\]

From the form of the residual ACF, we suspect \(\hat{a}_t\) is an AR(1)

\[\begin{align*} (1-\phi_1 B )a_t = e_t \end{align*}\]

where \(e_t\) is white noise. In other words

\[\begin{align*} a_t = \frac{1}{1-\phi_1 B}e_t \end{align*}\]

Substitute the above formula into \(\tilde{Z}_t = (1-\theta_1 B ) a_t\), we have

\[\begin{align*} \tilde{Z}_t &= \frac{1-\theta_1 B}{1-\phi_1 B}e_t\\ (1-\phi_1 B)\tilde{Z}_t &= (1-\theta_1 B)e_t \end{align*}\]

This type of manipulation can be generalized e.g., if the original series is fitted to AR(1)

\[\begin{align*} (1-\phi_1 B ) \tilde{Z}_t = a_t \quad \quad (*) \end{align*}\]

But residual ACF behave like AR(1)

\[\begin{align*} (1-\phi^* B ) a_t = e_t \end{align*}\]

Then substitute the above formula into \((*)\), i.e., multiply \((1-\phi^* B )\) on both sides of \((*)\). we have

\[\begin{align*} (1 - \phi^* B )(1-\phi_1 B ) \tilde{Z}_t &= (1-\phi^* B )a_t = e_t\\ (1 - (\phi^* + \phi_1) B + \phi^* \phi_1 B^2 ) \tilde{Z}_t &= e_t\\ (1 - \phi_{01} B - \phi_{02} B^2 ) \tilde{Z}_t &= e_t \end{align*}\]

which is an AR(2) model.

Other types of model inadequacy include:

changing parameter values over time
change of model over time.

To deal with these issues, we refer to change time detection and divide the time series data into parts, and refitting each part.

Case 2: Saving Rates

case2=as.ts(scan("case2.txt"))
par(mfrow=c(1,3))
plot(case2)
acf(case2)
pacf(case2)

From the ACF and PACF, it seems to be an AR(1). So we fit AR(1) to the data as follows.

fit=arima(case2,order=c(1,0,0))
fit

## 
## Call:
## arima(x = case2, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.8117     6.0134
## s.e.  0.0613     0.3590
## 
## sigma^2 estimated as 0.4834:  log likelihood = -106.09,  aic = 218.17

After fitting AR(1), we obtain the residuals and plot their ACF and PACF as follows.

par(mfrow=c(1,3))
plot(resid(fit))
acf(resid(fit))
pacf(resid(fit))

Clearly, the ACF at lag of 2 jumps out, which means that the residuals are not white noise. In fact, if we use Box-Ljung test, we get:

Box.test(resid(fit), lag = 3, type = "Ljung-Box", fitdf = 1)

## 
##  Box-Ljung test
## 
## data:  resid(fit)
## X-squared = 8.3423, df = 2, p-value = 0.01543

which rejects the null hypothesis. Suppose we are concerned with the nonzero ACF at lag 2, we need to revise our model. But how do we revise it? Looking at the ACF of residuals, we think the residuals are possibly an MA(2) since ACF gets cut off at lag of 2. Therefore, we refit an ARMA(1,2) to the data.

fit2=arima(case2,order=c(1,0,2))
fit2

## 
## Call:
## arima(x = case2, order = c(1, 0, 2))
## 
## Coefficients:
##          ar1      ma1     ma2  intercept
##       0.7673  -0.0587  0.3197     6.0066
## s.e.  0.0952   0.1268  0.1158     0.3521
## 
## sigma^2 estimated as 0.4403:  log likelihood = -101.56,  aic = 213.13

The output shows that most of the parameters are significant. We further look at the residuals.

par(mfrow=c(1,3))
plot(resid(fit2))
acf(resid(fit2))
pacf(resid(fit2))

The residual ACF and PACF all look fine too. So it seems like we have arrived at the final model? Not really. If you look the output from fit2, we see that the one coefficient in ARMA(1,2) is not significant, that is, \(\theta_1\). Its estimate is less than twice of the standard error. Therefore, we can further simplify the model by suppressing \(\theta_1=0\). Here is the code for doing this regularization.

fit3=arima(case2,order=c(1,0,2),fixed=c(NA,0,NA,NA))
fit3

## 
## Call:
## arima(x = case2, order = c(1, 0, 2), fixed = c(NA, 0, NA, NA))
## 
## Coefficients:
##          ar1  ma1     ma2  intercept
##       0.7376    0  0.3414     6.0207
## s.e.  0.0770    0  0.1064     0.3319
## 
## sigma^2 estimated as 0.4412:  log likelihood = -101.67,  aic = 211.34

As we can see all coefficients are significant. The residual below look fine too.

par(mfrow=c(1,3))
plot(resid(fit3))
acf(resid(fit3))
pacf(resid(fit3))

So our final model is ARMA(1,2) with \(\theta_1=0\).

In fact, if we recall the ACF of MA(2) with \(\theta_1=0\) at lag of 1 is \(\rho_1 = \frac{-\theta_1+\theta_1 \theta_2}{1+\theta_1^2 + \theta_2^2} = 0\), we have

n=20000
ts1=arima.sim(n = n, list(ma = c(0, 0.5)), sd = 1)
par(mfrow=c(1,2))
acf(ts1,ylim=c(-1,1),lag.max=10)
pacf(ts1,ylim=c(-1,1),lag.max=10)

Our final model is

\[\begin{align*} (1 - \phi_{01} B ) (Z_t - \mu) &= (1 - \theta_{1} B - \theta_{2} B^2) a_t\\ (1 - 0.7376 B ) (Z_t - 6.0207) &= (1 - 0 B - 0.3414 B^2) a_t\\ (1 - 0.7376 B ) (Z_t - 6.0207) &= (1 - 0.3414 B^2) a_t \end{align*}\] where \(a_t \sim N(0,\sigma^2_a=0.4412)\).

Forecasting and Time Series Methods Notes 4

Xiaorui Zhu (zhuxr@mail.uc.edu)