Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rochester!ritcv!cci632!rb
From: rb@cci632.UUCP (Rex Ballard)
Newsgroups: net.arch
Subject: Re: paging and loading
Message-ID: <389@cci632.UUCP>
Date: Mon, 22-Sep-86 15:08:33 EDT
Article-I.D.: cci632.389
Posted: Mon Sep 22 15:08:33 1986
Date-Received: Mon, 22-Sep-86 21:33:29 EDT
References: <832@hou2b.UUCP>
Reply-To: rb@ccird1.UUCP (Rex Ballard)
Organization: CCI, Rochester Development, Rochester, NY
Lines: 126
Summary: Situations for each.

In article <832@hou2b.UUCP> dwc@hou2b.UUCP (D.CHEN) writes:
>>>3) The real reason for virtual memory (and the one which won't
>>>away when memories get big) is that I can quickly load [a]
>>>working set of a large program ...
>
>>... but it seems to me that you'll use up [the faster startup] and
>>a lot more in page faults.  Since you are reading the program
>>piecemeal into virtual memory you are going to be a lot slower
>>because of the extra seek and rotational delays. It is sort of like
>>a guy driving on a surface street, instead of going over and getting
>>on the freeway. You get started faster, but you hit all those
>>traffic lights. The only case that is valid is if the overwhelming
>>majority of the pages in the program are never referenced.

In computing, there is an appropriate analogy to the "freeway"
described.  This would be the single process, single processor,
single task, single level, single loop, in which data is either
accessed in a nearly "random" or "large loop" fashion.  For such
applications, such as scientific matrix manipulation or calculations,
on single processor systems like the Cray 1, VM is definitely a lose.

The next question is, is this model still apropriate?  Scientific
and number crunching applications are finding a better home in
multi-processor environments, which effectively become multi-tasking
systems even when only a single "program" is being run.

>another analogy (and one that i've been using) is this:  imagine
>that you have to xerox ten sets of notes and each set consists of
>ten pages.  if there is a large setup time on the machine, you would
>like to copy the 100 pages in one shot.  even without a large setup
>time, if there are other people on line, you would probably want to
>copy all of your work in one shot instead of getting on the end of
>the line every 10 pages.

Ok, let's continue with this.  Do you really want to make the person
who needs two copies wait until your "batch job" is complete?  It
may be more desirable to have a second machine for the "smaller jobs",
or to split the 100 page copy among more machines, by makeing one
copy and putting that on another machine.  If you make the "little
jobs" wait, then they are more likely to collect many "little jobs"
to get one "big job" that is "worth the wait".

I even lean toward the "grocery store" analogy.  Many such stores
have "express lanes" so that the person with 2 items doesn't have
to wait for five people who are buying for the month.  The "bulk
buyers" line may also be staffed with a bagger, a scanner, and
a nice conveyer belt, while the "express lane" might only be a
counter and a manual cash register.  Many stores adhere to a "3
person" rule, where clerks are added as the number of people
in a line exceeds 3, be they express or bulk.  Without this
technique, people would only come to the grocery store when
they were buying large quantities, and go to other stores for
small purchases.

>the answer is "it depends".  if your executable is on a unix file
>system, you probably would have to do multiple i/os to load the
>entire address space anyway.  however, if it is contiguous on some
>swap device, then it depends on program behavior.

Some Unix variations, which are specifically designed to support
VM, keep executable binaries in contiguous form.  The read-only
portion need not be mapped or swapped.

>one important aspect that people rarely consider when talking about
>response time and loading is what happens if, in the process of demand
>paging (and loading of "working sets"), memory runs out?  what are
>the implications on response time then?  it would seem that without
>any other aids, pure demand paging is a clear loser in this situation.

Depending on the mechanism used, this might require up to 500 processes
to be running simultaneously.  Assuming only a modest 1 meg memory
of 1K pages.  There may be an argument for "request without fault"
type accesses, where the OS could be advised by the application that
a new page will be needed in the near future.

>danny chen
>ihnp4!hou2b!dwc

There are several types of processing.  Batch, interactive, pipelined,
I/O intensive, and transaction.  Some unpleasant experiences with
a non-virtual memory transaction processing system have given good
cause to prefer VM for this type of application.  Similar experiences
with interactive applications also lead to the same conclusion.
Even a batch processing compiler that attempted to do "hand optimised"
overlays in a VM environment (thrashed itself to death), tend to
make a case for VM.

Most applications spend 90% of their time executing only 10% of their
code.  Interactive and transaction processing applications spend
nearly 95% of their time in a "wait for I/O, parse, loop" with an
occaisional "do special purpose 'case' processing".  There are more
than a few stories of such applications where swapping that involved
a loop smaller than 2K caused serious degradation due to the 64K to
1Mb "tails" that would get swapped in with them.

Perhaps when it is possible to get 4 Gigabytes of 50ns ram for
$200-$300 that will require only a few watts, VM will become
unnecessary.  Even then however, the lack of need for "heap compaction"
and the ability to "remap" instead of "copy" data from one place
to another will continue to be a win for most applications.

Even this statement should have :-)'s all over it.  I remember thinking
that 2K was a lot of memory on my VIP, the transition from PDP-8
to PDP-11 thrilled many because of the thought of 64K addressing
space.  Microsoft's Bill Gates couldn't imagine why anyone would
need more than 640K with MS-Dos, and even Motorola was surpised
to discover that some systems couldn't fit in the 16 megabyte
virtual address space of the 68010.

There is also the issue of software costs in relation to the
system costs.  When adding an enhancement requires 10-50 times
the actual enhancment costs to "make room" for the enhancement,
the long-term systems costs can get out of hand very quickly.

With complete, detailed structure charts, data-flow diagrams
and data-hierarchy, it is possible to "manually" support
anything from simple "physical=logical" addressing, bank switching,
overlays, segment registers, and/or swapping to "virtual memory
with virtual libraries".  For "automatic" support via linkers
and complers, however, the benefits of "virtual memory" are
difficult to match.

Finally, one of the main trade-offs of VM is the TLB lookup time.
When the processor is spending 10% of it's time "looking for
something to do", CPU speed becomes much less important than
Memory Management, Disk Caching, and Co-processor interlock.