Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!ames!attctc!smunews!ti-csl!m2!oh
From: oh@m2.csc.ti.com (Stephen Oh)
Newsgroups: comp.dsp
Subject: Re: Just one more AR/MA/ARMA/Marple-type question
Keywords: AR MA ARMA Sir Lawrence Marple confusion p ip N << >>
Message-ID: <104815@ti-csl.csc.ti.com>
Date: 8 Jan 90 15:16:10 GMT
References: <1819@mrsvr.UUCP> <1420@umigw.MIAMI.EDU>
Sender: news@ti-csl.csc.ti.com
Organization: TI Computer Science Center, Dallas
Lines: 102

In article <1420@umigw.MIAMI.EDU> mariano@umigw.MIAMI.EDU (Arthur Mariano) writes:
>In article <1819@mrsvr.UUCP>, kohli@gemed (Jim Kohli) writes:
>...
>> When I learned about autocorrelations, the autocorrelation
>> vector was either considered to be the same dimension as the
>> dataset (i.e., p=N), or it was padded with zeroes (p>N).
>
>Dear Jim, This wrong. Real data is noisy. Thus, your estimated
>correlations do not equal the true correlations. A good correlation estimate
>uses the same data in the numerator as the denominator, viz.
>C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))), where x is the detrended
>(very important) data, k is the lag, * is multiplication and the sum is
>over all possible s. A rule of thumb is never calculate your correlation
>function for lags greater than 1/4 the data length. The rationale behind
>this is that for large lags, very few (relative to zero and small lags)
>data points go into the products needed for C(k), e.g. for lags equal to
>N-1, only one product can be calculated. Thus large lags have high
>estimation error that will corrupt fits to or transforms of your
>ESTIMATED correlation function. So keep p small to get best results.
>Cheers, Arthur
>-- 
>Arthur Mariano                     Inet: mariano@umigw.miami.edu [128.116.10.1]
>SPAN: miami::arthur (host 3074::)      arthur%miami.span@star.stanford.edu
>UUCP: ...!ncar!umigw!mariano               arthur%miami.span@vlsi.jpl.nasa.gov

Yea, I agree with Arthur.

For AR/ARMA models, it is not always true that the loger order of AR model is
better than the shorter one.

Principle of Parsimony (in estimation sense):
	The larger the number of unknown parameters to be estimated for the
	same number of measurements,
	the lower is the accuracy of the estimate.

But, also note that the shorter one is not better than logner one, either.
There is no solid thoeretical ground, but it is known that if the order (ip) of
AR/ARMA model is not greater than 1/5 or 1/4 of the total observations (N),
generally the longer AR/ARMA model is the better. I should say "generally"
since I cannot promise this claim. In fact, for AR models, if you use Burg's,
Yule-walker's, or Modified Covariance methods, the variance estimate
of the residual process is lesser for the longer order. Apparently, it does not
mean that AR coefficients estimates are more accurate, though because of the
principle of parsimony.
Yea, Yea, I know it is confusing, but like I said, the rule of thumb to
determine the order of AR model is as follows:

1. let ip=(20 or 25 %) * N
2. employing any of three methods: Burg, Y-W, M Cov. (see appendix)
3. compute information statistics such as AIC, MDL, CAT, etc
4. determine the order of AR model based on step 3. Choose the order so as
   to minimize the information criterion values.

Comments? Questions? Post it!!


---------------------  Appendix -------------------------------------

Information Statistics: for the background, see pp.229-231 of Marple's book, or
pp. 234-237 of Kay's book. I am listing six most famous Information creteria
here:

1. FPE (Final Prediction Error)

                        ^
	     FPE(k) =   p   ( N + k + 1 )/( N - k - 1)
                          k

2. AIC (Akeike's Information Criteria)

	                     ^
	    AIC(k) =  N ln ( p ) + 2k
			      k

3. MDL (Minimum Description Length)

	                     ^
	    MDL(k) =  N ln ( p ) + k ln(N)
			      k

4. CAT ( Criterion Autoregressive Tranfer)

	             /   k     _   \            _
	    CAT(k) = |  sum 1/ p   |  / N  - 1/ p
		     \  j=1     j  /             k

5. KIC (Kashyap's Information criteria)

	                             ^
	    KIC(k) =  ( N - k ) ln ( p ) + (k+1) ln( N / (2*pi) )
			              k

5. HIC (Hannan's Information criteria)

	                       ^
	    HIC(k) =   N  ln ( p ) + k ln ln N 
		                k

+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Stephen Oh         oh@csc.ti.com     |  Texas Instruments     |
|  Speech and Image Understandung Lab.  | Computer Science Center|
+----+----+----+----+----+----+----+----+----+----+----+----+----+