1 A Case Study on the Price of Gasoline

1.1 Do you believe the ARIMA model can do the magic?

1.1.1 YES!

Give me your reasons.

1.1.2 NO!

Why?

1.2 Import the data

Gas_prices <- readxl::read_xlsx(path = "data/Gas_prices_1.xlsx")
head(Gas_prices)
tail(Gas_prices)
if (!require("forecast")){install.packages("forecast")}
L1_Crude_price <- Gas_prices$Crude_price[-1]
Unleaded <- Gas_prices$Unleaded[-dim(Gas_prices)[1]]
ts_Unleaded <- ts(Unleaded, start=1996, frequency = 12)

1.3 Time-series exploration

plot(ts_Unleaded) # Nonstationary time-series

seasonplot(ts_Unleaded) # Strong evidence of seasonality

1.4 Scatter plot between Unleaded and L1_Crude_price

plot(Unleaded~L1_Crude_price)

1.5 Simple Linear Model

model1 <- lm(Unleaded ~ L1_Crude_price) # obtain least square estimate
plot(L1_Crude_price,Unleaded,pch=20) 
abline(model1, col="red")

1.6 Interpretation of Least Square Estimates

summary(model1)
## 
## Call:
## lm(formula = Unleaded ~ L1_Crude_price)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.109 -14.510  -5.058  11.189  90.272 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    67.17064    3.43101   19.58   <2e-16 ***
## L1_Crude_price  2.84481    0.05435   52.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.13 on 237 degrees of freedom
## Multiple R-squared:  0.9204, Adjusted R-squared:   0.92 
## F-statistic:  2740 on 1 and 237 DF,  p-value: < 2.2e-16

1.7 Fitted Values and Graphical Display

Merge1 <- cbind(Unleaded, Fitted=model1$fitted.values)
rownames(Merge1) <- Gas_prices$Date[-dim(Gas_prices)[1]]
head(Merge1)
##             Unleaded   Fitted
## 15-JAN-1996    109.0 121.4780
## 15-FEB-1996    108.9 127.8504
## 15-MAR-1996    113.7 134.0236
## 15-APR-1996    123.1 127.3952
## 15-MAY-1996    127.9 125.2616
## 15-JUN-1996    125.6 127.7650
plot(L1_Crude_price, Unleaded, pch=20, ylim = c(90,450)) 
points(L1_Crude_price, model1$fitted.values, pch=4)
abline(model1, col="red")

1.8 Interpretation of Fitted Values

Merge2 <- cbind(Merge1, Residuals=model1$residuals)
head(Merge2)
##             Unleaded   Fitted  Residuals
## 15-JAN-1996    109.0 121.4780 -12.478008
## 15-FEB-1996    108.9 127.8504 -18.950376
## 15-MAR-1996    113.7 134.0236 -20.323607
## 15-APR-1996    123.1 127.3952  -4.295207
## 15-MAY-1996    127.9 125.2616   2.638398
## 15-JUN-1996    125.6 127.7650  -2.165032
plot(L1_Crude_price, Unleaded, pch=20, ylim = c(90,450))
points(L1_Crude_price, model1$fitted.values, pch=4)
for (i in 1:dim(Gas_prices)[1])
{
  lines(c(L1_Crude_price[i], L1_Crude_price[i]),
        c(model1$fitted.values[i],Unleaded[i]), col="red")
}

1.9 Coefficient of Determination, \(R^2\)

For the gas price data set, we have built a simple linear regression between the gasoline price y and lagged crude oil price. However, how strong is the relationship between x and y? To answer this question, we need coefficient of determination, \(R^2\).

\[y_i-\bar{y} = (y_i - \hat{y}_i) + (\hat{y}_i -\bar{y})\]

\[\underbrace{\sum_{i=1}^{n} (y_i-\bar{y})^2}_{SST} = \underbrace{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}_{SSE} + \underbrace{\sum_{i=1}^{n} (\hat{y}_i -\bar{y})^2}_{SSR}\]

Here is a figure illurstrating this decomposition.

Figure: R squre explanation

1.10 Definition of \(R^2\)

\(R^2\) is defined as

\[R^2=\frac{SSR}{SST} = \frac{\sum_{i=1}^{n} (\hat{y}_i -\bar{y})^2}{\sum_{i=1}^{n} (y_i-\bar{y})^2} = 1-\frac{SSE}{SST}\]

2 Summary

Social science:

huge variation in human behavior very hard to get \(R^2\) values much above, say 25% or 30%. Engineering:

more exact systems Higher \(R^2\).