Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/5/84; site philabs.UUCP
Path: utzoo!linus!philabs!dpb
From: dpb@philabs.UUCP (Paul Benjamin)
Newsgroups: net.sport.baseball
Subject: Re: Re: Re: Re: Re: playoff slugging + onbase avg.
Message-ID: <513@philabs.UUCP>
Date: Fri, 15-Nov-85 13:17:51 EST
Article-I.D.: philabs.513
Posted: Fri Nov 15 13:17:51 1985
Date-Received: Sat, 16-Nov-85 03:44:31 EST
References: <483@philabs.UUCP> <941@water.UUCP> <489@philabs.UUCP>
Distribution: na
Organization: Philips Labs, Briarcliff Manor, NY
Lines: 101

> >> 1940's :  7-3
> >> 1950's :  7-3
> >> 1960's :  5-5
> >> 1970's :  5-5
> >> 80 & 81:  0-2
> >> -------------
> >> 42 years 24-18
> >>
> >42 Series aren't significant?!?! That's over an entire season's worth of
> >games! Perhaps you should look up the definition of statistically
> >significant. If we ignore these stats, we might as well ignore all season
> >stats.
> 
> If you are looking only at who wins the series, you only have 42 cases.  If
> you want the results to reflect the number of games, you have to have the
> statistics by game, not by series.
> 
> Also, the statistics for the 42 series *do* tend to support the importance
> of the statistic.  Not as strongly as I would have expected, but well
> within the normal range of variation.  If the expected number is 30 out
> of 42, the standard deviation is about 2.9.  Thus 24 is not much more than
> two standard deviations away.  About a one in twenty shot.

But it's closer to 21 out of 42 than to 30 out of 42. If you really like
statistical arguments, how can you prefer an expectation of 30/42 to 21/42 
unless you are previously biased?

> >> >Also
> >> >note the circular nature of your argument #2. You state that the Yankees
> >> >dominated in all statistical departments. This applies only to those stats
> >> >in which the Yankees dominated!
> >> But batting average, slugging average, on base average, earned run average,
> >> and runs scored weren't retrofitted to the data.  These are standard
> >> statistics which are generally applied.  Since the measures are pre-
> >> selected, the argument is not circular.
> >
> >Think again. They dominated only in the stats in which they dominated. Also
> >please note that those "standard" stats are highly redundant - they all
> >are different ways of saying similar things. For example, team runs and the
> >opposing team's ERA are very similar. And note that there are stats in which
> >the Pirates led, such as game-winning RBI.
> >
> >Also realize that these statistical categories were not handed down by
> >God. They arose because they were retrofitted at one time to previous data.
> >Thus, they were never pre-selected. BA and ERA did not exist before
> >baseball!
> 
> They were pre-selected *for that series*.  That is, they were the established
> criteria by which the play in the series would be judged, when it was played.
> Game winning RBI, by contrast, is a retro-fit for that series.  (It also
> bears such a trivial relationship to winning that one can hardly regard it
> as a *predictor* of victory.  Any more than pitcher's win/loss records are.)

You're missing the point. The '60 Yankees dominated in stats which the
papers find easy to compute from boxscores. These stats are highly redundant.
There exist many other stats which could be computed. I am not talking about
game-winning hits. I am referring to things like "BA with men in scoring
position", "BA when your team is losing or tied or 1 run ahead", etc. Stats
like this reduce the impact of blowouts. After all, a HR when your team is
8 runs ahead in the late innings is worth less than a single when the
score is tied. I have always, and will always object to simple-minded
statistics. Your postings reveal that you understand more than a little
about statistics - you know about standard deviations, etc. Why do you
like a simple average like SA+OBA so much? If you were to try to build a
mathematical model of the game, would you include only statistical means,
or would you include more complex statistics? The papers aren't going to
try to compute things like "BA with team behind, tied, or ahead by 1 run"
or a more complicated nonlinear scheme, such as weighting runs by the
probability that the other team will come from behind. Does this mean than
the stats the papers publish are the best?

> >> >The only thing we can say with certainty is that SA+OBA clearly does not
> >> >correlate with winning a short series in the last 20 or so years (since
> >> >artificial turf, night baseball, etc.).
> >> 
> >> The only thing we can say with certainty is that we don't know.
> >
> >No. We DO know that SA+OBA does not correlate with winning a short series
> >in the last 20 years or so, which is EXACTLY what I said.
> 
> But that data is not statistically significant, so we don't know; which is
> EXACTLY what I said.  (By the way, night baseball goes back to the 30's.)

But what you said was in response to my statement that the correlation does
not exist. EXACTLY what I said is "SA+OBA clearly does not correlate with
winning a short series in the last 20 or so years." I did not state a
negative correlation. I stated that the correlation doesn't exist for those
20 years. It doesn't matter if the data is insignificant or not! If the
data is insignificant, then an existing correlation could be put in doubt,
but since the correlation does not exist, then there is no evidence to
support SA+OBA from short series results. Again, I have not stated that
this disproves the importance of SA+OBA, I have only said that it means that
there is no evidence to support SA+OBA. That is all I have to show. Those
who wish to proclaim the importance of a stat must provide evidence for it.
In a sense, we are both right, because we are saying different things. I
am saying that there is no evidence for SA+OBA from recent short series
results, and you are saying that there aren't enough data points to make
any evidence either way - which still means that there is no correlation,
based upon the data, to support SA+OBA.

					Paul Benjamin