Newsgroups: comp.lang.apl
Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!geac!itcyyz!yrloc!hui
From: hui@yrloc.ipsa.reuter.COM (Roger Hui)
Subject: Statistical Functions in J
Message-ID: <1991May12.145907.19563@yrloc.ipsa.reuter.COM>
Reply-To: hui@yrloc.ipsa.reuter.COM (Roger Hui)
Organization: Iverson Software Inc.
Date: Sun, 12 May 91 14:59:07 GMT

The definitions of mean (m.), normalize (n.), and spread (s.) have
changed in response to a detailed critique by Professor Fraser Jackson
of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson).
Professor Jackson's comments are as follows (minor editing by me):
 
--------------------------------------------------------------------
 
The concept of expectation is central to both probability and
statistics.  Indeed it is quite possible to axiomatise
probability theory in terms of expectations rather than
probabilities.  One didactic advantage of doing so is that the
whole apparatus of measure theory essential for an
axiomatisation based on probabilities can be introduced much
later.  From the classical viewpoint, P. Whittle, Probability,
Penguin, 1970?  (To appear in a revised edition by Springer
Verlag) gives an illustration of this alternative approach in
a text designed to cover the subject from first principles.
He illustrates numerous results from quantum theory to
investment analysis where the expectation is the natural
function to use.  From the subjective probability perspective,
De Finneti's classic published by Wiley also provides an
axiomatisation based on expectation.
 
m.
 
I believe therefore that the function  m.  should be the
expectation.   This would give
 
     m.  y        as the usual mean
     x  m.  y     as the mean formed with weights given
                  by the vector  x%+/x
 
Hence the monadic case would just be  1 m. y
 
This definitions permits a very simple treatment of something
which appears at the beginning of virtually all statistical
texts - frequency distributions.
 
For a frequency distribution with frequencies f and values y
this gives  (in expressions which I am sure can be written
more simply)
 
     the mean as   f m. y
     deviations from the mean   y  -  f m. y
     the mean squared deviation   f m. *: y - f m. y
 
In doing calculations with density functions the weighted mean
is always required to find statistics of the distribution, and
if there is any transformation of the variable then that also
requires use of a weighted mean calculation to find parameters of
the transformed distribution.  Since variable transformation is
often required the function proves very useful in numerical studies.
 
The expectation of a function g of y under a density  p y
becomes just  (p y) m. g y  or better  (p m. g) y
 
Using an obvious notation, the two commonest price index
numbers become
 
     laspeyres      (p0*q0) m. p1%p0
     paasche        (p0*q1) m. p1%p0
 
In the Dictionary, the concept of using the left argument for
the degrees of freedom is introduced.  The degrees of freedom
are generally only important in second moment calculations
with unweighted data.  It seems to me that is giving a special
case unwarranted importance.  The degrees of freedom are
simply corrected for using a multiplier.  In the simplest case
it is just  (#%(_1&+&#)) y  , but a whole range of other cases
are equally simply dealt with.
 
I note that Table 1 uses  moment  as the description of the
dyadic case of  m.  If that represents an earlier (or later
idea) it again seems to me inappropriate.  The moments are
highly specialised means and best dealt with using the mean of
some transformation of the variable, e.g. in the monadic case
 
     m. (y - m. y)^4
 
with the dyadic case
 
     f  m. ( y - f m. y)^4
 
n.
 
I would define  n.  to be consistent with  m.  Hence the
monadic case would be with divisor  #y, and the dyadic case
would give normalised deviations from a weighted mean.  Again
if there is a need to adjust degrees of freedom for a
particular case I would prefer to see a multiplier used since
in many cases items have non-integer weights.  Alternatively,
a similar procedure could be adopted as with the fit
conjunction, this time however using a degrees of freedom
conjunction.  That solution could give I believe the features
you want with the added flexibility and power permitted by
generalising to weighted means.  The DF conjunction would then
only be needed in cases where the data were individual
observations and the adjustment was required for a particular
statistic or calculation.
 
The dictionary definition of the dyadic case seems to me to
give an expression almost never used since apart from squared
deviations we seldom use a mean with divisor less than #y.
In the squared deviations case when it is used it is not then
common to normalise the squared values in any calculations I
have had occasion to use.
 
s.
 
I believe a case can be made here for two functions.  The
first would give a measure of spread for the variables
considered individually, and be defined across arbitrary arrays.  
For that I would prefer either the sum of squared deviations or 
the mean squared deviation on the grounds noted below.
 
The second is a function which generates information about the
covariation of the variables in each item of the data object.
If there is to be just one function, I would favour this, in
view of the importance of many areas of multi-variate analysis.
 
The standard deviation is a useful measure but very
specialised to the univariate case.  Further the standard
deviation is not an additive measure like the variance (using
it now with divisor #y) and the sum of squares.  Variances
can be simply combined, using the m. function above.  (For a
population the variance is just the weighted mean of the
variances if the populations consists of subsets with
different variances.)  The standard deviation needs to be
squared before it can be combined.  As soon as you move to
multivariate statistics it is the covariance matrix which is
the basic tool, not the correlation matrix or the variable
standard deviations.  In multivariate analyses it has been my
experience that the speed of formation of a cross product
matrix  X'X  is crucial.  This is so important that it deserves
to be a special function and programmed for speed of execution.
So my primitive function would be to form  (|:y) +/ . * y  in the
monadic case.  A case can be made for using the mean value of
the outer products of the items of y for consistency with the
previous definitions though I do not find that compelling.
The avoidance of centering within the function is deliberate.
Nearly all of the multivariate texts and the works on regression
express the theory in terms of the original observation units.
The case where the data are deviations from the mean then
becomes a special case.  With the  m.  primitive function it is
easily treated as just that.
 
The conventional covariance matrix then is
 
     (#y)%~(|:yd) +/ . * yd =. y - m. y
 
but for the occasions where that is required, if  s.  is the
cross product verb it is simple enough to enter
 
     (#y)%~ s. y - m. y
 
Using the normalised values we obtain the correlation matrix as
 
     (#y)%~ s. n. y
 
The choice of dyadic form is more difficult.  I believe that
consistency with the previous definitions would lead to choice
of the dyadic form  x s. y  as weighting the items in y by
%:x  where x is a vector of weights.  In this way each item in
y is given a weight in the crossproduct matrix equal to the
corresponding item in x.  The weights are not normalised to sum to 1
since in the applications a range of normalisations will be used.
 
This provides a very powerful primitive function, encompassing
all of the weighted regression models, and the powerful tools of
generalised least squares as outlined by McCullough and Nelder in
'Generalised Linear Models' (2nd ed) Chapman and Hall, 1989.
It would also provide a primitive function for many methods based
on weighted observations in Econometrics, in particular methods
due to H. White of constructing consistent estimators in the
common case when there are heteroscedastic errors.  It would also
be a core function in those methods of non-linear optimisation
designed to find the extreme of a likelihood function.
 
An alternative would be to define it as forming the expression
(|:x)+/ . * y  since this is also often required.  This would
be of value if the code for it was much faster than the using
the general expression above.
 
I prefer the definition which preserves consistency with
the concept of weighted values.  It is a more natural
extension consistent with the usage in the definitions of m.
and n. proposed.  By introducing a weighting function it is
signficantly generalising the monadic operation.  Second, and
less important from a language design persepective (but
perhaps as important for potential users), it is rather messy
to program otherwise, and exploiting the special structure of
the function could lead to important speed gains in execution.
 
I recognise that speed of execution has not been a primary
factor in your design, but it is an important consideration,
especially with the data intensive calculations likely in
multi-variate studies.  If the functions selected contribute
both to ease of statement of the problem, and fast execution
then they seem to me to have a strong case for inclusion.
 
Looking at these three functions now as a group, and
endeavouring to put them into Dictionary style --
 
m.
 
MEAN (_)  m. is the mean value, and  m. y  is  1 m. y
MEAN (_ _)  x m. y  is the weighted mean of the values of y
with weights given by x.  The weights will be normalised to
sum to one in forming the weighted mean.
 
(It may be desirable to restrict the dimensionality of the
arguments to (1 _) though the general case is often useful.)
 
n.
 
NORMALIZE (_)  The result of  n. y  is y translated to have a
mean of zero and a mean squared deviation of 1.  It is defined
as  1 n. y  .
 
NORMALIZE (_ _)  x n. y  provides normalised values of y
which when applied with weights x sum to zero and have mean
squared deviation equal to 1.  The weights x will be
normalised to sum to 1.  The result of  x n. y  is therefore
(y-x m. y)% %:x m.*:y-x m. y   Thus  x m. x n. y  is 0
and  x m. *: x n. y  is 1.  In the case where the weights are
frequencies and it is desired to standardise so that variance
is 1 the values must be multiplied by  %:(+/x)%(+/x)-1
 
(Again a restriction on the arguments to (1 _) could be
considered.)
 
s.
 
SPREAD (2)  s. y  is the crossproduct matrix of the values in
y and equal to  (|:y)+/ . * y  .  Consistent with the usage of
the other statistical functions,  s. y  is  1 s. y  .  The
covariance matrix is just  (#y)%~ s. y - m. y   and the
correlation matrix is  (#y)%~ s. n. y
 
SPREAD (1 2)  x s. y  is the crossproduct matrix
s. (%:x)*"0 1 y .  Note that the weights are not normalised
to sum to one in this case.  The dyadic form weights the outer
product of each item of y by the corresponding item in x.
 
----------------------------------------------------------
 
The new definitions have been implemented in J 3.1.  They are
modelled thus:
 
rx  =. ($@$@[ <@}. $@]) $&> [
ry  =. ($@$@] <@}. $@[) $&> ]
m2  =. (rx * ry) %&(+/) rx
m1  =. +/ % #
 
dev =. ry - m.
rms =. [ %:@m. *:@dev
n2  =. dev % rms
n1  =. 1&n2
 
s1  =. |: +/ . * ]
s2  =. s1@(%:@[ *"0 1 ])
 
m.  is   m1 : m2
n.  is   n1 : n2
s.  is   s1 : s2 " 2 1 2

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096