Path: utzoo!mnetor!uunet!husc6!necntc!ima!johnl
From: johnl@ima.UUCP
Newsgroups: comp.compilers
Subject: Re: Assemblers can be fast
Message-ID: <813@ima.ISC.COM>
Date: 20 Dec 87 00:28:51 GMT
Sender: johnl@ima.ISC.COM
Reply-To: Green Eric Lee <ihnp4!usl!usl-pc!jpdres10>
Lines: 49
Approved: compilers@ima.UUCP

> [It's true, assemblers used to run in real memory just like any other
> program.  I've used very credible assemblers that ran in 4K.  But if you
> have infinite memory, sure, what the heck, buffer it all in memory.  The
> scheme in your pass two is the same one that existing Unix assemblers use
> now to do branch optimization.  It's not quite optimal, but the extra work
> to ensure optimality isn't worth it.  But be prepared for users to punch
> you in the nose when they add two lines to their assembler program and
> the assembler says "you lose, can't buffer it in 640K."  -John]

Real computers aren't limited to 640K :-). And if anybody wrote a 640K
assembly language program all in one file, and really expected my
assembler to assemble the darn thing, I'd have to say that they got
what they deserved :-). There's always a limit, somewhere -- if it
ain't 640K, it's the amount of space available for your symbol table,
or what have you. You're never going to be able to assemble
infinite-size source files.

But, seriously... one-pass assembling isn't extremely difficult. What
you do is, on the first pass, emit object code with "fillers" for
expressions you couldn't evaluate, and in memory, you keep a symbol
table and a list of expressions that couldn't be evaluated, along with
their location (NOT a copy of the entire source program), and then for
the second pass go back and fix the object code with the right
addresses.  It's easy enough to do for something like a 6502 where
there's no long-branch vs. short-branch to worry about (only thing to
worry about is zero-page vs. absolute, and most assemblers wimp out by
saying that if it's not defined the first time the expression is hit,
then it's absolute).
    For more complex architectures, more work is required, but the
principle is the same.  Note that you do NOT have to buffer the entire
thing in RAM (just some, the object itself can be buffered on disk),
and yes, there is a limit, but there's always a limit, somewhere -- I
regularly use a 2-pass assembler-editor package that takes up 12K on a
8-bit machine, and what I'm always running out of is symbol table
entries (darned symbol table is limited to 8K, total).

Still, on the great assembly-language vs. direct-to-object debate, I
still must conclude that emitting assembly language slows things down
a lot. This is especially true on microcomputers, with their slow disk
drives, which is why most microcomputer compilers directly emit object
files.
--
Eric Green  elg@usl.CSNET       P.O. Box 92191, Lafayette, LA 70509
{ihnp4,cbosgd}!killer!elg,      {ut-sally,killer}!usl!elg
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request