Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!usc!chaph.usc.edu!girtab.usc.edu!jeenglis From: jeenglis@girtab.usc.edu (Joe English Muffin) Newsgroups: comp.unix.programmer Subject: Re: File "type" Message-ID: <12141@chaph.usc.edu> Date: 23 Sep 90 09:24:04 GMT References: <171@alchemy.UUCP> <13114@june.cs.washington.edu> Sender: news@chaph.usc.edu Organization: Joe's Homeopathic Hangover Remedies Lines: 52 Nntp-Posting-Host: girtab.usc.edu robertb@cs.washington.edu (Robert Bedichek) writes: >In article <171@alchemy.UUCP> bbs@alchemy.UUCP (BBS Administration) writes: >> >> Could someone explain how the command "file" works? Specifically, I am >>writing a program that allows users to navigate their $HOME directory and > >I suggest that you read the man page for 'file'. Also, read the file >that the man pages specifies as the database that 'file' uses. Not all versions of 'file' use a separate database; I believe the 4.2BSD 'file' has it hardcoded. (Not to mention the fact that not all Unices have on-line man pages, and not all sites make the hard-copy versions easy to get to, but that's another gripe :-) To answer the original question, 'file' first does a stat() to determine if the file is an executable, setuid, symbolic link, etc. Then it reads in the first N characters of the file and checks it against a predefined set of patterns. Many of the patterns are just ``magic numbers''; for example, under SunOS the file types "mc68020 demand paged dynamically linked executable" and "shell script" are determined from the first two bytes of the file. Some of the other patterns it looks for are a little more complicated; for example, a period at the beginning of the line indicates "[nt]roff, tbl, or eqn input" (which is why it tends to think makefiles are for troff so often.) Certain patterns of punctuation and capitalization (not too sure what they are) distinguish "English text" from "ascii text." If none of the patterns match, it looks for non-printable characters; if there are any it will report "data", otherwise "ascii text." >There are many file types that editors will like besides files reported >by 'file' as text. For example shell scripts are usually reported as >such and not as text. So the result of 'file' isn't what I think that >you want. Also, some text editors can edit any file, including >executable files. This is true. Your best bet is to write a simple C program that reads in the first block of the file and checks for non-printing characters and possibly for lines that are too long as well. --Joe English jeenglis@alcor.usc.edu