Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!MCC.COM!rfg
From: rfg@MCC.COM (Ron Guilmette)
Newsgroups: gnu.g++
Subject: Compiler reliability & testing
Message-ID: <8906201624.AA12323@pink.aca.mcc.com>
Date: 20 Jun 89 16:24:22 GMT
Sender: daemon@tut.cis.ohio-state.edu
Distribution: gnu
Organization: GNUs Not Usenet
Lines: 92


Recently, Eugene Brooks wrote:

> Several posters have indicated, and later strongly supported,
> statements about reliability problems with GCC.  As you might
> guess at this point, the reliability problems are real.
> 
> The reason why new releases are often flakey is probably
> the lack of sufficient regression testing.

Congratulations Mr. Brooks!  Why don't you tell us that it gets
cooler at night than it is during the daytime, or that Arabs don't
like Israelies.  Sorry.  I'm not attacking you or your statement,
but I would be surprized if anybody *DIDN'T* understand that this
is exactly and precisely THE MAIN THING that is holding back the
relaibility of both GCC and G++ (not to mention GDB, for which there
could also be automated tests).

> A regression
> test package should be built and distributed as part of
> GCC.

Amen.  Portable compilers have a higher than normal need for
complementary validation suites.  Such a suite (for GCC) could
significantly reduce the uncertainty left after an initial port to
a new machine is done.

> Before each publicly announced release FSF should
> ship the new release to several friendly sites which will
> run the regression tests on hardware FSF does not
> have direct access to.

This is happening now, except that rms does *not* give people any
special guidance on *what* to use as a test.  Usually, the pre-testers
just use their favorite packages.  If there are enough pre-testers,
this can give good coverage, but it is still too haphazard for my tastes.

> The major problem with regression
> test packages is the sheer volume of code which must be
> written, with the writer thinking that he is not doing something
> useful (like working directly on the compiler).  The solution
> to this problem is to request that bug reports include if
> possible a test program which is in the standard regression
> test format.

I have been slowly building up test suites for G++ and GCC (i.e. C++ and C).
I now have an automated means of executing all of the test cases in the
suites, analyzing the outcomes of each test, and reporting the results.
The "driver" routines are Borne shell scripts.  They were originally
written as C-shell scripts, but I converted them so that AT&T could
run my C++ tests also.

I have also encouraged at least one person to convert his base of tests
to my format.  I will gladly publish my format here (and accept new
contributions) if that will help.

This approach, of building up tests based on bugs found, is certainly
better than nothing, but a far more robust suite could be built through
a concerted effort based on an analysis of the ANSI standard (as was
done in the case of ADA).  This could take up an enormous amount of
manpower (and would probably have a non-trivial cost in $$), but if
the resulting suite were to be placed into the public domain then
everyone would benefit greatly.

> A single program which runs and produces the message PASSED or FAILED
> on standard output.   The regression test driver can trigger on these
> keywords and inform the tester.  Liberal use of tests for failure
> with suitable printing of line and file numbers of where the compilation
> failure occured.  A pointer to the source of the test program in case help
> is required in bug shooting the compiler.

My test suite drivers now do all of the above, except that I have decided
that it is simpler and easier to have the executable tests simply return
a zero or non-zero exit code to indicate pass/fail results.

> With all the bug reports which are coming in, it should be no time
> at all that we have a large and useful regression test library for
> GCC.  At least we wont have repeat visits on previous bugs.

I have *not* been collecting GCC bugs reports very intensely, but I have
got a nice set of tests for G++.  One thing worth noting is that I
always try to get the tests themselves down to less that 100 lines.
Thus, when I see a G++ bug report (usually from an novice) come across
on bug-g++, and I see that the mail message is greater than about 10K
long, I don't even bother with it, because it is too much work to
hack it down to size and to figure out what it is *supposed* to do
when it is working correctly.

// Ron Guilmette  -  MCC  -  Experimental Systems Kit Project
// 3500 West Balcones Center Drive,  Austin, TX  78759  -  (512)338-3740
// ARPA: rfg@mcc.com
// UUCP: {rutgers,uunet,gatech,ames,pyramid}!cs.utexas.edu!pp!rfg