Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!decvax!ima!cfisun!lakart!dg From: dg@lakart.UUCP (David Goodenough) Newsgroups: comp.unix.questions Subject: Re: Detecting type of file in a program Message-ID: <414@lakart.UUCP> Date: 9 Feb 89 17:37:11 GMT References: Organization: Lakart Corporation, Newton, MA Lines: 12 tale@pawl.rpi.edu (David C Lawrence) sez: Stuff about file(1) deleted > (Aside: I am curious how it determines something is English > text rather than just ascii text.) I'd hazard a guess that it looks at the letter distributions. English has well defined (well fairly well defined) ratios of letters. So you count how many E's, T's etc. etc. occur, see how close you are to the "standard". If you are close, say it's English, else say it's ascii. This may be wrong - those in the know are welcome to correct me, but it's one possibility that could be made to work. -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+