Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!mailrus!ames!lll-winken!uunet!ncrlnk!ncr-sd!hp-sdd!hplabs!hpfcdc!hpldola!hp-lsd!prisma!kolstad From: kolstad@prisma Newsgroups: comp.text Subject: Re: Urban Legends (was Re: Dvorak Keyboard Layout) Message-ID: <10500004@prisma> Date: 22 Jul 89 01:12:00 GMT References: <787@dms> Lines: 328 I wasn't able to mail this to Mr. Leichter: ----------------------------------------------------------------------- In comp.text, you say: > ...But it turns out that that model is just > plain wrong! ... > A side-effect of the Scholes layout is to place many > of the common "units" on alternating hands, which makes typing them easier. > Dvorak, on the other hand, tends to place many units under the SAME hand, > which interferes with typing. I am not a real fan of the Dvorak keyboard but knew someone who could hit 160 WPM (Andrew Shapira: shapira@docsun.rpi.edu). Because he could type a couple per cent faster than I, my ego was bruised and I tried the keyboard for a bit (at the behest of Dan Kopetzky, I believe). At any rate, while I never became proficient at all, in the limited number of tests I did (i.e., writing letters like this one), I found that your thesis that the units are under the same hand is not born out. It is understand that one will always have a few combinations that turn out that way (witness the word `recede' on the QWERTY keyboard), nevertheless the number of digrams and trigrams that were true alternation appeared to me to be very high on the Dvorak keyboard. If one divides the keyboard like this (I copied this keyboard from an earlier article and split it as best I could) and ran /usr/dict/words through a trivial script: left right / , . P Y --- F G C R L A O E U I --- D H T N S ; ' Q J K X --- B M W V Z tr "'PYAOEUIQJKXFGCRLDHTNSBMWVZpyaoeuiqjkxfgcrldhtnsbmwvz" \ llllllllllllrrrrrrrrrrrrrrrlllllllllllrrrrrrrrrrrrrrr < /usr/dict/words Then we have a file which tells which fingers get used (here's an excerpt): l lll <-- obviously bad lllr llrrlr llrlr lrl <-- the best we can do lrlrl <-- the best we can do lrlrl <-- the best we can do lrlrlr <-- the best we can do lrlrlrl <-- the best we can do Now if we count the transitions, we should be able to measure the `goodness' of a keyset. (I'm doing this in real time as I type, and I have to think about this for a moment. For you, it will be appear to be an instant cuz you'll get this all at once!) Let's make a chart: number words w/n `alternations' length of word 0 1 2 3 4 5 ... 2 x x 3 x x x 4 x x x x 5 x x x x x 6 x x x x x x 7 and so on... The program appears as Appendix A below. tr script < /usr/dict/words | program yields: 0 1 2 3 4 5 6 7 8 9 10 11 2 n= 131 48 83 3 n= 775 60 517 198 4 n=2152 29 864 1163 96 5 n=3093 16 462 1902 679 34 6 n=3794 3 130 1698 1619 329 15 7 n=3929 0 23 913 2013 896 82 2 8 n=3484 0 7 366 1299 1441 347 22 2 9 n=2970 0 1 121 735 1292 694 121 6 0 10 n=1883 0 0 22 287 680 673 195 26 0 0 11 n=1052 0 0 0 66 248 429 266 42 1 0 0 12 n= 542 0 0 1 13 70 169 203 76 10 0 0 0 13 n= 260 0 0 0 2 20 55 98 66 15 4 0 0 0 14 n= 102 0 0 0 1 5 15 27 34 17 3 0 0 0 0 15 n= 39 0 0 0 0 3 1 10 12 9 4 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 [29 uninteresting cases of >16 character words omitted] Now, it doesn't look too bad. I can't think of a quick metric that says `oh it's obvious this is great'. Let's quickly write another tr script for qwerty to have some raw data to compare: tr "qwertyasdfgzxcvbyuiophjklnm.'&QWERTYASDFGZXCVBYUIOPHJKLNM" \ llllllllllllllllrrrrrrrrrrrrrrllllllllllllllllrrrrrrrrrrr \ < /usr/dict/words > /tmp/alternates Now how does QWERTY do? 0 1 2 3 4 5 6 7 8 9 10 11 2 n= 131 68 63 3 n= 775 186 388 201 4 n=2152 266 837 790 259 5 n=3093 211 795 1168 749 170 6 n=3794 142 696 1217 1164 482 93 7 n=3929 85 398 983 1249 850 322 42 8 n=3484 39 231 616 979 929 505 164 21 9 n=2970 18 135 421 691 782 565 280 71 7 10 n=1883 7 46 154 339 453 460 283 111 29 1 11 n=1052 3 17 56 135 238 270 173 109 42 9 0 12 n= 542 0 1 8 50 92 142 111 90 36 9 3 0 13 n= 260 0 0 5 11 36 39 68 58 32 8 3 0 0 14 n= 102 0 0 0 6 6 16 25 29 10 6 4 0 0 0 15 n= 39 0 0 0 1 1 5 7 11 10 2 1 1 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 [29 uninteresting cases of >16 character words omitted] Unfortunately, I must admit that there doesn't seem to be a tremendous obvious difference in the alternating behavior. There is some, but it's not just overwhelming. Consider the most common words, those of 7 letters: 0 1 2 3 4 5 6 dvorak: 7 n=3929 0 23 913 2013 896 82 2 qwerty: 7 n=3929 85 398 983 1249 850 322 42 Now qwerty has a few more perfect words, a bunch more almost perfect but also has dramatically more `poor' words (0 and 1 alternations). Let's calculate the average alternations: dvorak: ( 0*0+ 23*1+ 913*2+ 2013*3+ 896*4+ 82*5+ 2*6) / 3929 = 3.02723 qwerty: (85*0+ 398*1+ 983*2+ 1249*3+ 850*4+ 322*5+ 42*6) / 3929 = 2.89462 This shows a very slight (4.38%) improvement for dvorak. I'll go back and modify the program to calculate this for us (see Appendix C): l n qwerty dvorak 2 n= 131 0.4809 0.6336 3 n= 775 1.0194 1.1781 4 n=2152 1.4842 1.6162 5 n=3093 1.9586 2.0818 6 n=3794 2.3761 2.5762 7 n=3929 2.8946 3.0272 8 n=3484 3.3789 3.5250 9 n=2970 3.7832 3.9912 10 n=1883 4.3542 4.4302 11 n=1052 4.8042 4.9743 12 n= 542 5.4244 5.5277 13 n= 260 5.9769 6.0269 14 n= 102 6.3627 6.4804 15 n= 39 6.9231 6.8974 Well, the dvorak keyboard wins every time -- but not by much! Maybe what we REALLY want to know is how much time we spend off the home row ... could that be the REALLY important metric? Let's translate into keyboard row numbers for dvorak: tr "/,.PYFGCRLAOEUIDHTNS;'QJKXBMWVZ&pyfgcrlaoeuidhtnsqjkxbmwvz" \ 1111111111222222222223333333333411111112222222222333333333 \ < /usr/dict/words > /tmp/alternates And let's modify the program to calculate on-home-row -vs- off-home-row (see Appendix D). [My buddy just pointed out to me that few people type the dictionary and we should use more realistic text like a book or a newgroup/notesfile. OOPS. I'll just continue on this tack for now.] OK, that done, let's also make a tr script for the qwerty keyboard for rows: tr "qwertyasdfgzxcvbyuiophjklnm.'&QWERTYASDFGZXCVBYUIOPHJKLNM" \ 111111222223333311111122233324111111222223333311111122233 \ < /usr/dict/words > /tmp/alternates Running the home key calculation program yields (with a bit of text editing for ease of reading): qwerty dvorak 2 n= 131 nhome= 75 = 28.63% nhome= 157 = 59.92% 3 n= 775 nhome= 747 = 32.13% nhome= 1367 = 58.80% 4 n=2152 nhome=2914 = 33.85% nhome= 5196 = 60.36% 5 n=3093 nhome=4892 = 31.63% nhome= 9328 = 60.32% 6 n=3794 nhome=6778 = 29.78% nhome=14335 = 62.97% 7 n=3929 nhome=8080 = 29.38% nhome=17297 = 62.89% 8 n=3484 nhome=8279 = 29.70% nhome=18078 = 64.86% 9 n=2970 nhome=7376 = 27.59% nhome=17359 = 64.94% 10 n=1883 nhome=4841 = 25.71% nhome=12474 = 66.25% 11 n=1052 nhome=2827 = 24.43% nhome= 7604 = 65.71% 12 n= 542 nhome=1579 = 24.28% nhome= 4323 = 66.47% 13 n= 260 nhome= 796 = 23.55% nhome= 2260 = 66.86% 14 n= 102 nhome= 335 = 23.46% nhome= 933 = 65.34% 15 n= 39 nhome= 126 = 21.54% nhome= 389 = 66.50% 16 n= 15 nhome= 50 = 20.83% nhome= 157 = 65.42% 17 n= 6 nhome= 20 = 19.61% nhome= 70 = 68.63% 18 n= 4 nhome= 19 = 26.39% nhome= 46 = 63.89% 20 n= 1 nhome= 5 = 25.00% nhome= 11 = 55.00% 21 n= 2 nhome= 8 = 19.05% nhome= 26 = 61.90% 22 n= 1 nhome= 5 = 22.73% nhome= 12 = 54.55% Well, it appears that the dvorak keyboard stays on the home row about 60-65% of the time and that qwerty keyboard stays on the home row about 20-30% of the time (for the most part). That would be a factor of 2x improvement of home row keys. Not bad. I'll bet that's the big difference. [electroencephalography is the 22 letter word, by the way]. So, in summary: * Alternation is just a bit better (pretty much always) * Home row keys are phenomenally better placed Now we know. Thanks for providing fodder for this interesting exercise. ps: Re-reading your note and this one, I find that I might have been a bit more clever about my treatment of common digrams and trigrams. Oh well. ============================= program listings (appendices) follow ======= APPENDIX A ----------------------------------- tr script < /usr/dict/words | program #include int nlengths[40]; int nalternates[40][40]; main () { char buf[512]; int l; /* length of this word */ int n; /* counter of alternates */ int i, j; char *p; char thishand; while (gets (buf) != NULL) { l = strlen (buf); if (l < 2) continue; nlengths[l]++; p = buf; thishand = *p++; for (n = 0; *p; p++) if (*p != thishand) { *p = thishand; n++; } nalternates[l][n]++; } for (i = 2; i < 40; i++) { if (nlengths[i] == 0) continue; printf ("%2d n=%4d ", i, nlengths[i]); for (j = 0; j < i; j++) printf ("%4d ", nalternates[i][j]); printf ("\n"); } exit (0); } ------------------------------------------------------- APPENDIX B The actual tr script for dvorak: tr ".'PYAOEUIQJKXFGCRLDHTNSBMWVZ&pyaoeuiqjkxfgcrldhtnsbmwvz" \ lllllllllllllrrrrrrrrrrrrrrrrlllllllllllrrrrrrrrrrrrrrr \ < /usr/dict/words > /tmp/alternates The actual tr script for qwerty: tr "qwertyasdfgzxcvbyuiophjklnm.'&QWERTYASDFGZXCVBYUIOPHJKLNM" \ llllllllllllllllrrrrrrrrrrrrrrllllllllllllllllrrrrrrrrrrr \ < /usr/dict/words > /tmp/alternates ------------------------------------------------------- APPENDIX C The program which computes average alternations: #include int nlengths[40]; int nalternates[40][40]; main () { char buf[512]; int l; /* length of this word */ int n; /* counter of alternates */ int i, j; char *p; char thishand; while (gets (buf) != NULL) { l = strlen (buf); if (l < 2) continue; nlengths[l]++; p = buf; thishand = *p++; for (n = 0; *p; p++) if (*p != thishand) { *p = thishand; n++; } nalternates[l][n]++; } for (i = 2; i < 40; i++) { double sum; if (nlengths[i] == 0) continue; sum = 0; printf ("%2d n=%4d ", i, nlengths[i]); for (j = 0; j < i; j++) sum += j * nalternates[i][j]; printf ("%6.4f\n", sum/nlengths[i]); } exit (0); } -------------------------------------------------------