Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!usc!jarthur!polyslo!vlsi3b15!vax1.cc.lehigh.edu!sei.cmu.edu!krvw From: XPUM01@prime-a.central-services.umist.ac.uk (Dr. A. Wood) Newsgroups: comp.virus Subject: Arrayboundcheck in C Message-ID: <0007.9002131641.AA18689@ge.sei.cmu.edu> Date: 13 Feb 90 10:17:52 GMT Sender: Virus Discussion List Lines: 71 Approved: krvw@sei.cmu.edu This is not a virus as such; but program mishaps cause more system upsets and loss of data than viruses do, and users should eliminate all other causes of error before definitely suspecting a virus. One main cause of mishaps is writing to arrays out of bounds. In Fortran and Algol60 and Algol68 and similar, writing a compiler so it can compile in array bound check mode is easy; but C can step pointers along by adding arithmetic values, which complicates the job a lot. I don't know if there are any C compilers with full arraybound check, but Prime's C compiler hasn't got one. Bound checking array accesses is easy; the problem is bound checking pointer accesses. I hereby submit a possible method of bound checking pointer accesses in C programs. In C, I define a as any group of stored values of all the same type which are all adjacent in store. There are these types of tables:- (1) declared at compile time with [ ] . E.g. 'int x[12],y[4][7];'. A pointer to an array is created by one of these forms:- a) An array element preceded by '&', e.g. &x[i] &y[5][j+k] b) An arrayname followed by 'too few' subscripts, e.g. y[h] c) An arrayname without subscripts, e.g. x y The arrayname can be some compound form such as a struct field or the like, e.g. 'struct {int a; char[12]c; } z; ------ z.c' . (2) alias created by calls of malloc() and similar functions, which return a pointer to the allocation thus created. (3) , i.e. consecutive members of a struct which are all of the same type, e.g. 'x,y,z' in 'struct density {float x,y,z; double value; } den;'. Pointers to them are created by prefixing a '&', e.g. '&den.y'. (4) Other cases where users are tempted to step a pointer over several values, e.g. 'a,b,c,d' in the declaration 'double a,b,c,d;', are compiler dependent and I will not consider them further. My suggestion is for all pointer values to be accompanied by two other pointer values which contain its safe range limits. (Thus sizeof(), which == 6 in Prime C ordinarily, will become 18 in Prime C compiled in array bound check mode.) Examples are:- declaration assumed pointer value lower limit upper limit type int x[4]; &x[3] &x[0] &x[4] int* int x[4]; x &x[0] &x[4] int* int y[6][7]; y[i] y[0] y[6] int** struct(int w,x,y,z;}a; &a.x &a.w &a.z+1 int* int k; malloc(k) (returned value) (same + k bytes) int s; /* not table */ &s &s &s+1 int* Procedure in the various uses of pointers:- (Here, b and c are pointers) sort of use example procedure accessing value pointed at *b check that b is within its limits. pointer +- integer b+i check that b+i is within limits of b; copy limits of b as limits of b+i . pointer with ++ or -- b++ (ditto) pointer-pointer b-c error unless b and c have same ranges. pointer[integer] b[i] treat as *(b+i) . casting a pointer (float*)c if casting to a pointer to a pointer, or to a pointer to a struct with a pointer member, the compiler should moan that "array bound check can't help here". A pointer to an allocation which is lost by a call of 'free()', will then be invalid. Best not to call 'free()' when running in array bound check mode. - ---------------------------------------------------------------------- This should ensure that any pointer will only point to within the bounds of the table that it was intended to point to. {A.Appleyard} (email: APPLEYARD@UK.AC.UMIST), Tue, 13 Feb 90 08:38:40 GMT