Newsgroups: comp.arch Path: utzoo!henry From: henry@utzoo.uucp (Henry Spencer) Subject: Re: Understanding variations in Dhrystone performance Message-ID: <1989May15.173631.3029@utzoo.uucp> Organization: U of Toronto Zoology References: <474@estevax.UUCP> Date: Mon, 15 May 89 17:36:31 GMT In article <474@estevax.UUCP> wck353@estevax.UUCP (HrDr Weicker Reinhold ) writes: >... Note that >processors with an instruction that checks a word for a null byte (such >as AMD's 29000 and Intel's 80960) have an advantage here... Only a small one; you can do the same check on a machine without the fancy instruction by being clever. Consider: (((x & ~0x80808080) - 0x01010101) & 0x80808080) The result is nonzero if, and only if, there was a NUL byte in x. This is a bit more expensive than a single instruction, but not a whole lot if you put the constants in registers... especially on a machine where you can juggle the code to put most of the operations in load-delay slots. If you're into benchmarksmanship seriously, you can omit the first "&" if you're careful to use only ASCII (or if you expect high-bit characters to be rare and are willing to do a more precise check afterward to eliminate false alarms). There are a number of variations. >If the fixed-length and word-alignment assumption can be used, a wide >bus that permits fast multi-word load instructions certainly does help; Beware that there are alignment restrictions here too: you don't want a multi-word load to cross a page boundary unless you are sure the string crosses it too. Accessing the next page may cause a trap. -- Subversion, n: a superset | Henry Spencer at U of Toronto Zoology of a subset. --J.J. Horning | uunet!attcan!utzoo!henry henry@zoo.toronto.edu