Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site philabs.UUCP Path: utzoo!linus!philabs!dpb From: dpb@philabs.UUCP (Paul Benjamin) Newsgroups: net.sport.baseball Subject: Re: Re: Re: Re: Re: playoff slugging + onbase avg. Message-ID: <513@philabs.UUCP> Date: Fri, 15-Nov-85 13:17:51 EST Article-I.D.: philabs.513 Posted: Fri Nov 15 13:17:51 1985 Date-Received: Sat, 16-Nov-85 03:44:31 EST References: <483@philabs.UUCP> <941@water.UUCP> <489@philabs.UUCP> Distribution: na Organization: Philips Labs, Briarcliff Manor, NY Lines: 101 > >> 1940's : 7-3 > >> 1950's : 7-3 > >> 1960's : 5-5 > >> 1970's : 5-5 > >> 80 & 81: 0-2 > >> ------------- > >> 42 years 24-18 > >> > >42 Series aren't significant?!?! That's over an entire season's worth of > >games! Perhaps you should look up the definition of statistically > >significant. If we ignore these stats, we might as well ignore all season > >stats. > > If you are looking only at who wins the series, you only have 42 cases. If > you want the results to reflect the number of games, you have to have the > statistics by game, not by series. > > Also, the statistics for the 42 series *do* tend to support the importance > of the statistic. Not as strongly as I would have expected, but well > within the normal range of variation. If the expected number is 30 out > of 42, the standard deviation is about 2.9. Thus 24 is not much more than > two standard deviations away. About a one in twenty shot. But it's closer to 21 out of 42 than to 30 out of 42. If you really like statistical arguments, how can you prefer an expectation of 30/42 to 21/42 unless you are previously biased? > >> >Also > >> >note the circular nature of your argument #2. You state that the Yankees > >> >dominated in all statistical departments. This applies only to those stats > >> >in which the Yankees dominated! > >> But batting average, slugging average, on base average, earned run average, > >> and runs scored weren't retrofitted to the data. These are standard > >> statistics which are generally applied. Since the measures are pre- > >> selected, the argument is not circular. > > > >Think again. They dominated only in the stats in which they dominated. Also > >please note that those "standard" stats are highly redundant - they all > >are different ways of saying similar things. For example, team runs and the > >opposing team's ERA are very similar. And note that there are stats in which > >the Pirates led, such as game-winning RBI. > > > >Also realize that these statistical categories were not handed down by > >God. They arose because they were retrofitted at one time to previous data. > >Thus, they were never pre-selected. BA and ERA did not exist before > >baseball! > > They were pre-selected *for that series*. That is, they were the established > criteria by which the play in the series would be judged, when it was played. > Game winning RBI, by contrast, is a retro-fit for that series. (It also > bears such a trivial relationship to winning that one can hardly regard it > as a *predictor* of victory. Any more than pitcher's win/loss records are.) You're missing the point. The '60 Yankees dominated in stats which the papers find easy to compute from boxscores. These stats are highly redundant. There exist many other stats which could be computed. I am not talking about game-winning hits. I am referring to things like "BA with men in scoring position", "BA when your team is losing or tied or 1 run ahead", etc. Stats like this reduce the impact of blowouts. After all, a HR when your team is 8 runs ahead in the late innings is worth less than a single when the score is tied. I have always, and will always object to simple-minded statistics. Your postings reveal that you understand more than a little about statistics - you know about standard deviations, etc. Why do you like a simple average like SA+OBA so much? If you were to try to build a mathematical model of the game, would you include only statistical means, or would you include more complex statistics? The papers aren't going to try to compute things like "BA with team behind, tied, or ahead by 1 run" or a more complicated nonlinear scheme, such as weighting runs by the probability that the other team will come from behind. Does this mean than the stats the papers publish are the best? > >> >The only thing we can say with certainty is that SA+OBA clearly does not > >> >correlate with winning a short series in the last 20 or so years (since > >> >artificial turf, night baseball, etc.). > >> > >> The only thing we can say with certainty is that we don't know. > > > >No. We DO know that SA+OBA does not correlate with winning a short series > >in the last 20 years or so, which is EXACTLY what I said. > > But that data is not statistically significant, so we don't know; which is > EXACTLY what I said. (By the way, night baseball goes back to the 30's.) But what you said was in response to my statement that the correlation does not exist. EXACTLY what I said is "SA+OBA clearly does not correlate with winning a short series in the last 20 or so years." I did not state a negative correlation. I stated that the correlation doesn't exist for those 20 years. It doesn't matter if the data is insignificant or not! If the data is insignificant, then an existing correlation could be put in doubt, but since the correlation does not exist, then there is no evidence to support SA+OBA from short series results. Again, I have not stated that this disproves the importance of SA+OBA, I have only said that it means that there is no evidence to support SA+OBA. That is all I have to show. Those who wish to proclaim the importance of a stat must provide evidence for it. In a sense, we are both right, because we are saying different things. I am saying that there is no evidence for SA+OBA from recent short series results, and you are saying that there aren't enough data points to make any evidence either way - which still means that there is no correlation, based upon the data, to support SA+OBA. Paul Benjamin