Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!brutus.cs.uiuc.edu!apple!usc!bbn!bbn.com!cosell From: cosell@bbn.com (Bernie Cosell) Newsgroups: comp.text Subject: Re: Urban Legends (was Re: Dvorak Keyboard Layout) Message-ID: <43460@bbn.COM> Date: 28 Jul 89 17:38:11 GMT References: <787@dms> <10500004@prisma> Sender: news@bbn.COM Reply-To: cosell@BBN.COM (Bernie Cosell) Organization: Bolt Beranek and Newman Inc., Cambridge MA Lines: 31 In article <10500004@prisma> kolstad@prisma writes: }If one divides the keyboard like this (I copied this keyboard from an }earlier article and split it as best I could) and ran /usr/dict/words }through a trivial script: } } ... } }Now if we count the transitions, we should be able to measure the }`goodness' of a keyset. (I'm doing this in real time as I type, and I }have to think about this for a moment. For you, it will be appear to }be an instant cuz you'll get this all at once!) Let's make a chart: This is the right kind of analysis, but absoluetly the *wrong* way to compute it. The problem is that the every word appears in /usr/dict/words with equal probability (that is, once), but the probabilities in normal English are nothing of the like [e.g., a scan of /usr/dict/words will not balance the fact that 'the' occurs a LOT more than cwm, although both have now contributed the same 'weight' to your stats]. Try rerunning your results over English, instead of a dictionary. A reasonable and easy way to do this is pick a mostly-text newsgroup (one that doesn't have a lot of "tty graphics" and acronyms and odd words and such), and run your program over the bodies of the message in it (e.g., talk.politics.misc would be pretty good, but comp.dcom.telecom is all filled with NXX's and ISDNx ahd LATAs and such that'll skew the stats). Beyond that, your numbers, while good intentioned, really aren't very useful even as a rough comparison vehicle because the underlying distribution of words from which they gather their statistics is so utterly wrong. /Bernie\