Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!unmvax!pprg.unm.edu!hc!lll-winken!uunet!microsoft!brianw From: brianw@microsoft.UUCP (Brian Willoughby) Newsgroups: comp.sys.apple Subject: Re: multitasking, softswitch, etc. Summary: HINT to Apple II Design Team... Keywords: virtual memory, OS calls, RAM speed Message-ID: <1189@microsoft.UUCP> Date: 31 Mar 89 02:33:13 GMT References: <8903181259.aa27052@SMOKE.BRL.MIL> <10183@bloom-beacon.MIT.EDU> Organization: Microsoft Corp., Redmond WA Lines: 98 In article <10183@bloom-beacon.MIT.EDU>, dcw@athena.mit.edu (David C. Whitney) writes: > I think everyone here needs a bit of enlightenment. As much as I would LOVE a > multitasking environment on my //GS, I *know* that it can't be done within > any reasonable degree of efficiency. Very true, I agree, but I have a few comments to enlighten you with... > > Even Macintoshes don't have "true" multitasking (ie, Multifinder is a good > hack, but it isn't "for real"). Only Mac IIs with the PMMU installed (or any I'd settle for a "good hack" for the IIGS. > IIx - 68030) can even *hope* to do multitasking. The problem? A system for > handling virtual memory is a MUST for multitasking. Now, a virtual memory > handler *could* be written in assembler, but one running on a screeching 68030 > going at 45MHz still wouldn't go fast enough to make the machine come within > tolerable speed limits. That's why there is the PMMU - that's Paged Memory > Managment Unit. It handles page faults in hardware, so things go reasonably > fast (although most A/UX people will tell you it's still too slow). Due to the intelligent design of the 65816, hardware controlled virtual memory is actually possible. Although it works differently on the asynchronous memory interface of the 68000, a bus fault could still be handled by the 65816. All it would take is an intelligent clock generator, and some cache memory to store the virtual translation table (i.e. a PMMU816). With its single cycle memory access, you can adjust the period of the clock cycle to match the speed of the memory that the 65816 is interfaced to. This is how the GS accesses slow video RAM, and it is the same method used by the TransWarp accelerator in my II+. Each access to memory could be checked against the translation table to see where that page is stored in memory. If the page were found, then the upper address bits would be swapped to access the physical address, and if the page is not found, then the clock could be frozen until the page is properly loaded. The 65816 would never realize what happened, so you would avoid the instruction restart problems that plague the 68000 virtual memory method (sometimes a simple CPU design has its advantages!). In other words, you wouldn't have to abort and then restart the instruction later, you could just pause the CPU and let the clock run when the page is loaded. It might actually mean that you need a second processor to load the page into RAM if it is currently on disk, but considering that Apple already has a separate 6502 to handle the Desktop Bus in the IIGS that is only about 2 cm square, I think it would be easy (and price effective) to design this sort of power into a new GS! Of course, Apple probably wont do this unless they see a demand for such power in a 65816 based machine. But the problem is that there WILL BE NO DEMAND unless Apple comes out with a IIGS that shows people how powerful the new 65xxx series is! Personally, I'd love to be on the Apple design team, because I know that current technology can make the II scream. For example, the dual-port video RAM in the SE/30 would remove the restriction that requires the GS clock to change to 1 MHz for a single cycle when writing to screen memory. Why not have a powerful IIGS/8MHz (in the spirit of the SE/30) that is aimed towards those who can afford it, and lower the price of the original GS for the price concerned consumer? (the current position of the Mac Plus). These designs would open up other enhancements to speed. You could have different speeds of RAM in use at once. The virtual memory controller could place the most frequently accessed pages in static RAM addresses, with the less expensive dynamic RAM for storing less frequent data. This would help in an area that EVERY high speed processor faces: how to get RAM speeds fast enough to match CPU speed. (Someone on this net was complaining that he didn't want an 8 MHz 65816-based GS because it would require 60ns RAM. The problem is that EVERY fast CPU [don't be fooled into comparing clock speed to RAM speed] needs RAM to be that fast. That's why Compaq has a special bus on their Deskpro to allow static 32-bit RAM to be added, any normal PC RAM card would really slow down a 16 MHz or greater 80x86 machine) > > The next problem is designing a way to quickly make calls to the OS. The > Mac (more accurately, the 680x0 series), along with 80x86 machines use > something called "traps." They are basically undefined processor instructions > that can be defined by the host computer. The Mac implements its toolbox > calling by using traps. As far as I know, the 65816 doesn't have a trap > mechanism (although, I suppose the COP instruction might get useful here...). If you expand the code that gets called by the Mac OS traps, you will see several lines of assembly code which lookup the address in a BIG table and then call the proper routine based on the lower bits of the trap opcode. Only ONE trap is used, so they have to use software to access more than one OS call. Except for the method used to call this routine on the 68000 ($Axxx opcode exception trap), the 65816 could achieve the same result at the same speed. The only advantage of the method used on the Mac is that each OS call only takes 2 bytes (one opcode), while the 65816 would need 3 or 4 bytes (JSR or JSL), so what is the big problem? Most of the 65816 instructions are shorter than 68000 opcodes, so the greater size (only 1 or 2 extra bytes per call) shouldn't affect the overall code size. Another method would be to use the BREAK opcode $00 which is a software initiated interrupt and is currently not used for anything important on the II series. Each BRK could be followed by a code for the OS call needed. If you smartly used one byte for frequent calls, you could have 255 2-byte calls, with the 256th code reserved as an extension flag. > > Dave Whitney A junior in Computer Science at MIT Brian Willoughby microsoft!brianw@uunet.UU.NET or uw-beaver!microsoft!brianw or just microsoft!brianw