Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!brutus.cs.uiuc.edu!apple!usc!bbn!bbn.com!cosell
From: cosell@bbn.com (Bernie Cosell)
Newsgroups: comp.text
Subject: Re: Urban Legends (was Re: Dvorak Keyboard Layout)
Message-ID: <43460@bbn.COM>
Date: 28 Jul 89 17:38:11 GMT
References: <787@dms> <10500004@prisma>
Sender: news@bbn.COM
Reply-To: cosell@BBN.COM (Bernie Cosell)
Organization: Bolt Beranek and Newman Inc., Cambridge MA
Lines: 31

In article <10500004@prisma> kolstad@prisma writes:
}If one divides the keyboard like this (I copied this keyboard from an
}earlier article and split it as best I could) and ran /usr/dict/words
}through a trivial script:
}
} ...
}
}Now if we count the transitions, we should be able to measure the
}`goodness' of a keyset.  (I'm doing this in real time as I type, and I
}have to think about this for a moment.  For you, it will be appear to
}be an instant cuz you'll get this all at once!)  Let's make a chart:

This is the right kind of analysis, but absoluetly the *wrong* way to compute
it.  The problem is that the every word appears in /usr/dict/words with equal
probability (that is, once), but the probabilities in normal English are
nothing of the like [e.g., a scan of /usr/dict/words will not balance the
fact that 'the' occurs a LOT more than cwm, although both have now
contributed the same 'weight' to your stats].

Try rerunning your results over English, instead of a dictionary.  A
reasonable and easy way to do this is pick a mostly-text newsgroup (one that
doesn't have a lot of "tty graphics" and acronyms and odd words and such),
and run your program over the bodies of the message in it (e.g.,
talk.politics.misc would be pretty good, but comp.dcom.telecom is all filled
with NXX's and ISDNx ahd LATAs and such that'll skew the stats).

Beyond that, your numbers, while good intentioned, really aren't very useful
even as a rough comparison vehicle because the underlying distribution of
words from which they gather their statistics is so utterly wrong.

  /Bernie\