Path: utzoo!attcan!uunet!mcvax!hp4nl!eurtrx!euraiv1!evas
From: evas@euraiv1.UUCP (Eelco van Asperen)
Newsgroups: comp.sys.ibm.pc
Subject: Re: Microsoft Vs. Borland; benchmarks!
Summary: MSC wins on points in the benchmark-battle...
Message-ID: <788@euraiv1.UUCP>
Date: 6 Oct 88 14:41:56 GMT
References: <876@galaxy> <8254@haddock.ima.isc.com>
Organization: Erasmus University EF/AIV,Rotterdam,Netherlands
Lines: 291


[Here's a comparison of MSC and TurboC as my contribution to the 
"Microsoft vs. Borland" discussion. I wrote this a couple of months 
ago and posted it but that failed due to administrative reasons.
Note that I don't have Turbo C v2.0 yet; if anybody wants the source 
of the benchmarks to run them for Turbo C v2.0, I'll be happy to send 
them. To make a fair comparison, you should run them on the same type
of machine; I also have access to an Olivetti M24 (aka. AT&T 6300) and
an Olivetti M280.					-EvAs.]


Benchmarking Borland's Turbo C v1.5 and Microsoft's C v5.1 Compilers

			
INTRO

To get some clarity in the continuing debate concerning the Microsoft and
Borland C compilers, I've benchmarked them according to some of the
benchmarks used in the article "Benchmarking C Compilers" which
appeared in Dr.Dobb's Journal (DDJ), August 1986. (Philip Freidin, one
of the authors, was kind enough to send them to me. Thanks, Phil!)

The compilers compared are Microsoft C v5.1 (MS-C) and Borland Turbo C 
v1.5 (TC). All tests were done on an AT-clone, running at 12 Mhz with 
0 wait-states under MS-DOS v3.3; to eliminate the speed of the hard 
disk from the results, I ran the programs on a ramdisk.

The programs were compiled with all optimizations enabled; for MS-C,
the flags are '-Ox -Gs' and for TC they are '-G -O -Z -r'.  The tests
were compiled and run for each memory model available on both
compilers; TC's Tiny-model has not been included because MS-C hasn't
got a comparable model (at least the compiler does not generate special
code for it; I haven't checked if programs compiled with the
small-model can be converted to com-files after linking). Each test was
repeated a number of times to increase accuracy; the loop-count for
each test is in the table.


THE BENCHMARKS

A brief description of the benchmarks used;

ARRAY tests the compiler's ability to efficiently access arrays using
conventional array operations. A 10x10x10 int-array is copied using
three nested for-loops.

ATOX tests the atoi, atol and atof functions; it has 21 atoi calls, 16
atol-calls, and 8 atof-calls. Each call passes a string constant, some
of which have many leading blanks or zeros.

CPYBLK copies a file of 10,000 bytes using fread and fwrite in 1024
bytes blocks.

CPYCHR copies the same file but this time using fgetc and fputc; a
comparison of the times for CPYBLK and CPYCHR should tell you more
about the difference between block and character I/O.

DISKIO does random seeks in a file of 240 Kb and thus measures the
speed of fseek.

FIBTEST is the standard recursive Fibonacci number generator. We call
it for 24. This mainly tests function entry and exit code.

FILLSCR writes 1,248 characters to the screen, consisting of sequences
of 78 a's followed by a carriage return. This measures the speed of
screen output in the absence of scrolling.  (The test is done just
after a CLS.)

The FUNCOVR programs test function call overhead; they consist of
procedures with zero, one, two and three arguments respectively and no
body.

DFUNCRET tests the ability to return function-values efficiently; the
function returns a double.

LOOPTST does a simple for-loop test.

MEMORY was created to test the speed of malloc/free; per loop, 500
blocks of 50 bytes are malloc'ed. Then every fifth one is free'ed and
100 blocks of 35 bytes are malloc'ed, followed by a free of all
allocated blocks.

The MIN programs are used to determine the minimum size for a program;
	MINMAIN         no code; this measures startup + exit code
	MINPRTF         printf's in main 
	MINPUTS         uses puts rather than printf 
	MINFIO          calls to fopen, fgetc, fputc, fread, fwrite and 
			fclose

OPTIMIZE should test the compiler's ability to optimize code; as the
authors of the DDJ-article note, this is one of the weakest benchmarks
because even a relatively simple optimizer could reduce it to nothing.
With the arrival of more and more optimizing compilers, this will
become one of the hardest things to test.

POINTER is a pointer-version of the ARRAY-test; it uses 6 pointers and
three levels of indirection to copy the 10x10x10 array.

PRTF is meant to determine the speed of printf; the results should be
compared to the result of SCROLL. (They print the same line.)

RSIEVE and SIEVE are versions of the infamous sieve benchmark program;
RSIEVE uses register-variables whereas SIEVE does not.

SCROLL is similar to FILLSCR but instead of the carriage-return, a
newline is printed.

STORAGE is used to determine the difference between the various storage
classes in C; four variables are declared automatic, register and
static. To see if the compiler will allocate more than two registers
for variables, the register-test is also done with just two of the four
variables declared as register.

STRINGS assesses the quality of the library-routines strcat, strcpy,
strncpy, strlen, strcmp, and strncmp.

TDOUBLE and DFLOAT test floating-point performance; in each loop, 40
adds, subtractions and multiplies and 20 divides are done. A compiler
that conforms the ANSI C standard (yeah, I know; this should read
'conforms to a draft version of' etc), should be faster than a compiler
that conforms to K&R in the DFLOAT test because it doesn't have to
convert floats to doubles before each operation.

TINT and TLONG attempt to measure the performace of integer and long
operations, respectively.  For each loop, 1,500 adds, 1,600 subtracts,
200 multiplies and 200 divides are done.

TRIG times the speed of the trigonometric functions sin, cos and tan.
For each loop, these functions are called 12 times.

And now for the real stuff; here are the...

EXECUTION TIMES

Model:             Small       Compact      Medium        Large        Huge

Test     Loops   TC    MSC    TC    MSC    TC    MSC    TC    MSC   TC    MSC
--------------+------------+------------+------------+------------+----------
array     1500| 24.9   2.4 | 25.5   2.4 | 25.0   2.4 | 25.5   2.4 | 25.5  2.4
atox       100|  1.1   1.7 |  1.2   1.7 |  1.2   1.7 |  1.2   1.7 |  1.2  1.7
cpyblk      15|  7.8   2.3 |  8.8   3.0 |  7.9   2.3 |  9.1   3.1 |  9.2  3.2
cpychr      15|  9.5   6.4 | 10.3   6.9 |  9.8   6.6 | 11.0   7.3 | 11.4  7.3
diskio     350| 15.7  15.6 | 15.7  15.6 | 15.7  15.6 | 15.7  15.7 | 15.8 15.6
fibtest     18| 14.1  13.4 | 14.4  13.4 | 15.3  14.5 | 15.4  14.5 | 17.9 14.5
fillscr     12|  9.0   3.2 |  9.0   8.9 |  9.0   3.3 |  9.0   8.9 |  9.0  8.8
funcov0  10000| 16.3  15.1 | 17.3  15.1 | 22.2  16.8 | 22.9  16.8 | 34.3 16.8
funcov1  10000| 22.7  22.6 | 22.9  22.6 | 28.0  26.0 | 28.3  26.0 | 35.6 26.0
funcov2  10000| 24.9  24.3 | 23.8  24.3 | 29.6  27.2 | 28.7  27.2 | 37.1 27.2
funcov3  10000| 29.7  28.3 | 30.6  28.2 | 34.8  31.8 | 35.6  31.7 | 43.0 31.8
ifuncret  2500| 12.0  11.7 | 11.8  11.7 | 13.1  13.4 | 13.7  13.4 | 17.4 13.4
lfuncret  2500| 16.7  15.1 | 16.2  15.1 | 18.9  17.6 | 19.1  17.6 | 22.3 17.6
dfuncret   250| 37.8  28.2 | 37.6  29.7 | 37.8  28.4 | 38.1  29.9 | 38.2 29.9
looptst    500|  7.6   0.0 |  6.9   0.0 |  7.6   0.0 |  6.9   0.0 |  6.9  0.0
memory     500| 30.8  11.7 |196.5  14.5 | 31.4  12.3 |198.8  15.3 |206.3 17.5
optimize   100|  4.0   0.5 |  4.1   0.6 |  4.1   0.5 |  4.0   0.6 |  4.0  0.6
pointer   1500|  6.8   5.2 | 12.5   2.5 |  6.7   5.2 | 12.4   2.5 | 12.5 20.8
prtf        12| 12.6   7.0 | 12.6   7.1 | 12.6   7.0 | 12.6   7.1 | 12.6  7.1
rsieve     140| 13.9  11.8 | 13.7  11.8 | 14.0  11.8 | 13.6  11.8 | 13.6 11.8
scroll      12| 12.4   6.5 | 12.4  12.3 | 12.4   6.5 | 12.4  12.3 | 12.3 12.3
sieve      140| 14.0  12.7 | 13.6  12.7 | 14.0  12.7 | 13.7  12.7 | 13.7 12.7
storage:
 autotst   150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
 stattst   150| 15.2   0.0 | 16.1   0.0 | 15.2   0.0 | 16.1   0.0 | 15.2  0.0
 regtest   150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
 reg2test  150| 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8   0.0 | 12.8  0.0
strings   1000|  2.0   1.7 |  2.0   1.7 |  2.0   1.7 |  2.0   1.7 |  2.0  1.7
switch1   1000|  0.6   1.8 |  0.6   1.8 |  0.6   1.8 |  0.6   1.8 |  0.6  1.8
switch2   1000|  0.6   0.7 |  0.6   0.7 |  0.6   0.7 |  0.6   0.7 |  0.7  0.7
switch3   1000|  0.6   0.7 |  0.7   0.7 |  0.6   0.7 |  0.7   0.7 |  0.7  0.7
tdouble    500| 21.0  10.3 | 21.0  10.3 | 21.0  10.3 | 21.0  10.3 | 21.0 10.3
tfloat     500| 22.6  10.1 | 22.6  10.1 | 22.5  10.1 | 22.6  10.1 | 22.6 10.1
tint      1500|  5.7   2.0 |  5.7   2.0 |  5.7   2.0 |  5.7   2.0 |  5.7  2.0
tlong     1000| 34.0   2.7 | 34.3   2.7 | 34.1   2.7 | 34.3   2.7 | 34.3  2.7
trig       100|  6.4   0.0 |  6.4   0.0 |  6.4   0.0 |  6.4   0.0 |  6.4  0.0
--------------+------------+------------+------------+------------+----------
all times are in seconds.


CODE SIZE

           Small     Compact      Medium      Large       Huge
          TC  MS-C   TC   MS-C   TC   MS-C   TC   MS-C   TC   MS-C
-------- ---- ----  ---- -----  ---- -----  ---- -----  ---- -----
minfio   7560 9319  9700 12049  7806  9523 10474 12253 11978 12301
minmain  2402 4399  2942  4567  2472  4469  3012  4637  3382  4637
minprtf  6214 9081  7762 11315  6356  9263  7904 11497  9025 11497
minputs  4572 7233  6072  9691  4706  7373  6206  9847  7296  9847

(programs where compiled with all optimization-flags on.)


In addition to these tests, I ran the dhrystone-program (compiled
with the Small memory model);

	TC	2590  dhrystones/second
	MS-C	3401  dhrystones/second


The results clearly show that the Microsoft compiler produces superior
code when compared to Borland's. In a number of cases the MS-C code
outperformed TC's by a factor of ten, for example with the TLONG and ARRAY
tests. The good optimization in MS-C also provides some problems; since
benchmarks are artificial and try to measure the efficiency of a
certain type of operation, they are extremely prone to being optimized
away, ie. reduced to no code at all.  This is shown best by the
LOOPTST, STORAGE and TRIG programs. We definitely need a new class of
benchmarks for future tests.

The only areas in which TC has the lead are switch-statements (and only
marginally so) and the ATOX benchmark.  The result of the MEMORY test
are kind of dramatic for TC; these functions get very slow when using a
large data-model, while MS-C performs more or less the same for all
models.


COMPILATION SPEED

The price one usually pays for better optimization is longer
compile-times; to check this, I timed the compilation and linking of
the test suite for the Small memory-model. For TC, the Turbo Linker
TLINK was used; as this is a limited yet fast linker, I reran the test
for TC with the standard linker, the one that was also used for MS-C,
MS LINK v5.01.04.  Before running each test, I ran a disk-compression
utility to make sure that file fragmentation would not distort the
timings.  In the DDJ-review, they used a different method to measure
compilation speed. Since I don't have the files they used for this
test, this will have to do.

Compile and Link Times;

				Optimization
			Enabled		Disabled
			-------		--------
	TC with TLINK :	284.9		284.3
	TC with LINK  :	331.5		331.0
	MS-C with LINK:	681.7 		642.8


The compile time for the following program

	int alfa;

should give us some idea of the amount of overhead associated with
calling the compiler.

		Compile Load 
		------- ----
	TC:	 2.8	2.0
	MS-C:	 9.7	1.4

'Compile' is the total time required to compile this mini-program and
'Load' is the time needed to load the compiler.  (All times are given
in seconds.)


CONCLUSIONS

Based on the data presented here and my experiences with both products,
Microsoft C wins the battle; it generates by far the best code. Turbo
C's one-pass compiler has shorter compile times and creates smaller 
executables but the code produced is inferior to MS-C's. 

Furthermore, when it comes to writing a reference manual for a language
the boys (and girls) at Borland could learn something from the
Unix-community; start each reference on a separate page !  In its
current form, the TC reference manual is a real pain to use. As they
use the same style in the Turbo Pascal 3.0 and 4.0 manuals, I guess
this is a Borland "feature" used to save paper and thus money on the
cost of the manual. 

One of the things missing from both compilers (and from most PC
C-compilers for that matter) is profiling, ie. the ability to get an
overview of where your program spends most of its time when executing.
As they can already do stack-overflow checking upon function entry,
this should not be hard to add.

Naturally, this test has not been as extensive as the one performed by
the DDJ editors; their annual C issue will certainly contain an updated
overview of the C compiler battlefield.

[Well, DDJ ain't what it used to be; their last C compiler test was
rather bleak when compared to the August 86 one. They left out the
extensive tables that made the '86 review stand out. Refer to the
comp.misc for the discussion on the death of DDJ....]


-- 
Eelco van Asperen.		
uucp:        evas@eurtrx / mcvax!eurtrx!evas	#include <inews/filler.h>
earn/bitnet: asperen@hroeur5			#include <stdjunk.h>
"We'ld like to know a little bit about you for our files" - Mrs.Robinson,	 Simon & Garfunkel