Macroeconomic Forecasting Lecture
Each distribution is a draw from a random process.
Just to remind you….
The first step is visual inspection: graph and observe your data.
“You can observe a lot just by watching” Yogi Berra
Difference can remove the trend.
Assuming that the process is stationary, there are three basic types that interest us:
Autoregressive (process) \(y_t=a+b_1y_{t-1}+b_2y_{t-2}+...+b_py_{t-p}+\epsilon_t\)
Moving Average : MA(process) : \(y_t=\mu+\phi_1\epsilon_{t-1}+\phi_2\epsilon_{t-1}+...+\phi_q\epsilon_{t-1}+\epsilon_t\)
Combined ARMA-process \(y_t=a+b_1y_{t-1}+b_2y_{t-2}+...+b_py_{t-p}+\\\phi_1\epsilon_{t-1}+\phi_2\epsilon_{t-1}+...+\phi_q\epsilon_{t-1}+\epsilon_t\)
Some notation: AR(p), MA(q), ARMA(p,q), where p,q refer to the order (maximum lag) of the process
\(\epsilon_t\) is a white noise disturbance:
\(E(\epsilon_t=0)\), \(Var(\epsilon_t=\sigma^2)\) , \(Cov(\epsilon_t,\epsilon_s=0 , if \ t\neq s)\)
Where are we? Where are we going?
Stationary process (visual inspection) y
Learned about possible processes for y
Need to identify which one in order to understand, then eventually forecast y
tools to help identify
Autocovariance and autocorrelation relations between observations at different lags:
We now have a tool (ACF, PACF) to help us identify the stochastic stochastic process underlying time series we are observing. Now we will: - Summarize the basic patterns to look for - Observe an actual data series and make an initial guess Observe an actual data series and make an initial guess
ACF’s that do not go to zero could be sign of nonstationarity
ACF of both AR, ARMA decay gradually, drops to 0 for MA
PACF decays gradually for ARMA, MA, drops to 0 for AR
Possible approach: begin with parsimonious low order AR, check residuals to decide on possible MA terms.
When looking at ACF, PACF
Aim to eliminate autocorrelation in the residuals (could indicate that model does not reflect the lag structure well)
Plot “standardized residuals” (\(\epsilon_{it}\) ) No more than 5% of them should lie outside [-2,+2] over all periods
Look at \(r_s\), \(b_s\) (and significance) at different lags Box-Pierce Statistic: joint significance test up to lag \(s\): \(\bar{x} = \frac{1}{n}\)
\(Q=T \sum_{k=1}^{s} r_k^2\)
\(H_0\) : all \(r_k=0\), \(H_1\): at least one \(r_k\neq0\)
\(Q\) is distributed as \(\chi^2(s)\) under \(H_0\)
Can assess how well the model forecasts ” out of sample”:
How to approach:
Own priors/suspicion : Chow test for parameter change
If priors not strong, recursive estimation, tests for parameter stability over the sample, for example, CUSUM
Lets simulate MA(1), AR(1) series in R
# Set the parameters
mu <- 0 # Mean of the series
theta <- 0.6 # Moving average coefficient
n <- 200 # Number of time points
# Simulate data from the MA(1) model
set.seed(123) # For reproducibility
ma1_data <- arima.sim(model = list(ma = theta), n = n, mean = mu)
# Create a data frame with time series data
time_series_data <- data.frame(Time = 1:n, Value = ma1_data)
# Create a ggplot2 line plot
ggplot(time_series_data, aes(x = Time, y = Value)) +
geom_line(linewidth=1) +labs(title = "MA(1) Model ", x = "Time", y = "Value") +
phi <- 0.8 # Autoregressive coefficient
n <- 200 # Number of time points
# Simulate data from the AR(1) model
set.seed(123) # For reproducibility
ar1_data <- arima.sim(model = list(ar = phi), n = n)
# Create a data frame with time series data
time_series_data <- data.frame(Time = 1:n, Value = ar1_data)
# Create a ggplot2 line plot
ggplot(time_series_data, aes(x = Time, y = Value)) +
geom_line(linewidth=1) +labs(title = "AR(1) Model ", x = "Time", y = "Value") +
Now let’s work with real world data
Rows: 733
Columns: 8
$ dateid01 <date> 1954-02-01, 1954-03-01, 1954-04-01, 1954-05-01, 1954-06-01, …
$ dateid <dttm> 1954-02-28 23:59:59, 1954-03-31 23:59:59, 1954-04-30 23:59:5…
$ date <date> 1954-02-26, 1954-03-31, 1954-04-30, 1954-05-31, 1954-06-30, …
$ pe_aus <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ pe_ind <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ pe_ndo <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ pe_saf <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ pe_usa <dbl> 9.9200, 10.1700, 10.5700, 11.0000, 11.0800, 12.1740, 11.7600,…
Use SACF and SPACF to choose model(s) and also use auto.arima and see which of these two wins: your judged model or auto.arima
acf and pacf patterns indicate series is non-stationary. So here we run acf and pacf of difference of the series.
From these two graphs it seems model is ARIMA(0,1,1)
Series: pe_us$pe_usa
ar1 ma1
-0.7554 0.8268
s.e. 0.1106 0.0944
sigma^2 = 0.7151: log likelihood = -914.95
AIC=1835.91 AICc=1835.94 BIC=1849.69
So our model was ARIMA(0,1,1) while auto.ARIMA is ARIMA(1,1,1) not bad.
Recall from earlier part on stationarity:
\(Cov(Y_t,Y_{tj})= Cov(Y_s,Y_{s+j})= \gamma_j\)
Thus, if any of these conditions does not hold, we say that y is nonstationary:
There is no long-run mean to which the series returns (economic concept of long-term equilibrium)
The variance is time-dependent. For example, could go to infinity as the number of observations goes to infinity
Theoretical autocorrelations do not decay, sample autocorrelations do so very slowly.
Nonstationary series can have a trend:
\(y_t=\mu+\beta t+u_t\) , where \(u_t\) is “iid”