Path: utzoo!telly!attcan!uunet!mcvax!kth!draken!tut!tukki!tarvaine From: tarvaine@tukki.jyu.fi (Tapani Tarvainen) Newsgroups: gnu.utils.bug Subject: bug in Gnu e?grep / regex.c Summary: all machines don't have flat address space Message-ID: <920@tukki.jyu.fi> Date: 17 Jun 89 13:08:52 GMT Reply-To: tarvaine@tukki.jyu.fi (Tapani Tarvainen) Distribution: gnu Organization: University of Jyvaskyla, Finland Lines: 106 I tried porting Gnu e?grep to MS-DOS (Turbo C 2.0). In the process I found something of a bug, or at least a piece of not-so-portable code in regex.c. The program compiled easily, with only a few trivial modifications like different include files and specifying stack size - in only one place I changed actual code (in displaying usage it assumes directory separator is /). And of course makefile had to be changed rather drastically to get Borland's make digest it. At first it worked fine, until I tried a rather complicated regexp and got "Memory exhausted". Well, I recompiled it with -mc (compact memory model, i.e., far data pointers). After which it still gave "Memory exhausted" for just about anything but fixed strings, regardless of how much memory was available. I traced the problem to the following macro, used in function re_compile_pattern in regex.c: #define EXTEND_BUFFER \ { char *old_buffer = bufp->buffer; \ if (bufp->allocated == (1<<16)) goto too_big; \ bufp->allocated *= 2; \ if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \ if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \ goto memory_exhausted; \ c = bufp->buffer - old_buffer; \ b += c; \ if (fixup_jump) \ fixup_jump += c; \ if (laststart) \ laststart += c; \ begalt += c; \ if (pending_exact) \ pending_exact += c; \ } What do you think a stupid compiler with 16-bit ints makes out of an expression like 1<<16? Right, zero. I substituted 1L<<16 (what would be the aesthetically correct form?) and changed the definition of allocated in struct re_pattern_buffer in regex.h from int to long, and the problem disappeared. (BTW, is there some machine where this could harm anything? I mean, both ints and longs are 32 bits in 32-bit machines anyway, aren't they?) So far so good. But then I tried some even more complicated regexps and - the machine crashed. Oh well, debugger out again, and so it turned out the problem was again in the above macro. Look at this piece of code: c = bufp->buffer - old_buffer; b += c; Pointer subtraction is only guaranteed to work when the pointers point to the same structure, which is not the case here. And indeed, in 80x86 large memory model pointer subtraction is done by subtracting offsets only, which is OK as long as individual structures are <64K, *as long as the segments are same*. And here they may not be. Using huge pointers would solve the problem but waste time, and I wanted a portable (and standard-conforming) solution. This one seems to fit the bill: #define EXTEND_BUFFER \ { char *old_buffer = bufp->buffer; \ if (bufp->allocated == (1L<<16)) goto too_big; \ bufp->allocated *= 2; \ if (bufp->allocated > (1L<<16)) bufp->allocated = (1L<<16); \ if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \ goto memory_exhausted; \ c = b - old_buffer; \ b = bufp->buffer + c; \ if (fixup_jump) { \ c = fixup_jump - old_buffer; \ fixup_jump = bufp->buffer + c; \ } \ if (laststart) { \ c = laststart - old_buffer; \ laststart = bufp->buffer + c; \ } \ c = begalt - old_buffer; \ begalt = bufp->buffer + c; \ if (pending_exact) { \ c = pending_exact - old_buffer; \ pending_exact = bufp->buffer + c; \ } \ } I *think* b = bufp->buffer + (b - old_buffer); etc should also work, but some compiler might rearrange it as b = (bufp->buffer - old_buffer) + b; which again would fail. Anyway, a decent compiler (like gcc) should produce as good code either way. -- Tapani Tarvainen BitNet: tarvainen@finjyu Internet: tarvainen@jylk.jyu.fi -- OR -- tarvaine@tukki.jyu.fi