Having selected a model and fitted Its parameters to a given times series, the model can then be used to estimate new data of the time series. If such data are estimated for a time period following the final data value X T of the given time series, we speak of a prediction or forecast. The estimation of data lying between given data points Is called interpolation. The question now arises as to how a model such as those given In Equations 31.6 or 31.13 could be used to obtain an "optimal" estimate. To answer this question the forecasting error $${X_{T\, + \,k}}\, - \,{{\hat X}_{T\, + \,k}},\,k \in$$ between the estimated values $${{\hat X}_{T\, + \,k}}$$ and the actual observed time series values X T+k can De used iftne 'ast value used in the calibration of the model was X T . The best forecast is that which minimizes the mean square error (MSE for short). The MSE is defined as the expectation of the squared forecasting error (32.1) $$\matrix{ {{\rm{MSE}}\,: = \,{\rm{E}}\left[ {{{\left( {{X_{T\, + \,k}}\, - \,{{\hat X}_{T\, + \,k}}} \right)}^2}} \right]} & {\left( {32.1} \right)} \cr }$$ This expression is the mathematical formulation of the intuitive concept of the "distance" between the estimated and the actual values which is to be minimized "on average" (more cannot be expected when dealing with random variables). Minimizing this mean square error yields the result that the best forecasting estimate (called the optimal forecast) is given by the conditional expectation (32.2) $$\matrix{ {{{\hat X}_{T\, + \,k}}\, = \,{\rm{E}}\left[ {{X_{T\, + \,k}}|{X_T},...,{X_2},{X_1}} \right]} & {\left( {32.2} \right)} \cr }$$ This is the expectation of X T+k , conditional on all available information about the time series up to and including T.
Hans-Peter DeutschMark W. Beinker