Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!uwm.edu!linac!att!pacbell.com!tandem!zorch!amiga0!mykes From: mykes@amiga0.SF-Bay.ORG (Mike Schwartz) Newsgroups: comp.sys.amiga.programmer Subject: Re: Lemmings - a tutorial Part V (last) Message-ID: Date: 1 Apr 91 00:24:54 GMT References: <23788@well.sf.ca.us> <23837@well.sf.ca.us> <781@tnc.UUCP> <20213@cbmvax.commodore.com> Organization: Amiga makes it possible Lines: 278 In article <20213@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes: >In article mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes: >>the machine will boot from the floppy. In any case, the ROM Kernel >>loads what is called the boot program from track 0 of the floppy >>disk into RAM and does a JSR to it. The standard Amiga OS bootsector >>program simply opens dos.library and then does an RTS and the system >>continues to boot up into the normal Operating System. > > You should learn more about the OS... It actually never returns >if it opens dos.library. Dos starts the initial process, and then kills >the initial task (itself). The initial process continues the boot process. >(I'm the person who rewrote the Dos in C and asm.) > See page VII-4 and VII-5 of AmigaMail, which shows the source code to the "official" bootsector program. If the program OPENs dos.library, it returns a ZERO in D0, if if fails, it returns -1. Not that I am arguing, just telling you what your own documentation says. >>a BIOS of sorts. In addition to this 8K of KERNEL code, there is another >>12K of floppy disk drivers, because I will not have the operating system >>running to read any further data from the floppies. > > Note that even 2.0 trackdisk is only about 7K long. > I provide additional features that trackdisk doesn't, like the ability to search for a disk in any drive. This function is NOT performed by trackdisk, but by DOS. >>Since I am in supervisor mode, there are NO illegal instructions that >>can be executed (priviledge violations cause a GURU under the OS). The >>User Stack Pointer (USP) also can be used as a quick place to save >>an address register (this is 2x faster than a push on the stack). >>The upper byte of the status register is available to disable various >>levels of interrupts (the INTENA on Paula is just as useful), and >>the TRACE bitis available for debugging purposes. > > Hopefully, if you programmed your game right, you shouldn't have >to worry about executing an illegal instruction by mistake (except of >course ILLEGAL). Disabling interrupts in the processor is faster than >disabling them in Paula, since you don't have to get on the chip bus to >do it. > Agreed. If you access the IM (interrupt mask) bits to disable interrupts, and you are not in supervisor mode, you guru. I do use ILLEGAL for breakpoints. I also use the TRACE bit for single stepping and other features. A couple of neat tricks that can be done with the TRACE bit: 1) Instead of a standard single step trace handler, you can install a trace handler that fetches the PC from the stack and stores it in a circular buffer. When your program has a "bannana land" bug (i.e. you push some number on the stack and do an RTS without poping it by accident), you can examine the circular buffer to see what sequence of instructions got you there. 2) You can implement a trace handler that checks to see if a memory location is being clobbered. Useful for detecting wild stores. >>The floppy disk drivers I wrote use a single 10K buffer to handle as >>many disk drives (up to 4) that may be configured. The OS routines >>will steal 40K if you have 4 drives (10K per drive). > > Amusing. First, you need more than 10K for a MFM buffer, since >the number of bytes (decoded) per track can be as high as 6812, so the MFM >buffer must be at LEAST 13624, and you actually want it a bit larger in >some cases (we use 15296 - we keep a gaps-worth of NULLs (aaaaaaaa) before >the spot where we read, to make writing faster/easier). Sure you don't mean >16K? (Which is what the OS in 1.3 used, though it was a bit more than was >needed.) >>enhanced performance a few ways. One way is that the blitter is in >>nasty mode, so encoding/decoding the MFM data is as fast as possible. > > > Decoding is faster with the processor, if you also are going to >check the checksum. Nasty mode will hurt your interrupt response time. >Sounds like the classic "optimize the routine within a inch of it's life, >and miss the fact that a different algorithm would be twice as fast". >BTW, when there's a >2 bitplane (>4 in 320x200/400), running code from >the ROMs is faster than from ram, since you don't have to pay the penalty >for getting cycles to from the chip bus 9since in your way of programming, >all your code ends up in chip ram - annoying for something that could use >the extra horsepower, like 3d games. I actually use 12K or 14K. I decode the MFM a sector at a time when needed. Are you sure that the using the blitter in NASTY mode isn't faster than the CPU? I've been asking for ROM routines to do what 90% of games do (without the OS). And one last point (I do know the OS :) the OS (1.3) uses QBlit to encode/decode MFM. Not the fastest way, by any means. > >>When data is written to diskette, it is arranged so that NO extra disk >>revolutions will be made during readback. This is done by timing >>the read and write routines so that by the time a track has been read >>in and the head stepped, the next start of track is under the head >>and ready to go. > > Unless your're pulling partial tracks off and using them before >the revolution is complete, or unless you're going to write it out again, >this makes no difference - and for writing all it saves you is a block-move >to eliminate the gap. > It DOES save a block move, which is not a particularly fast thing to do. I hope that 2.0 pulls partial tracks off and uses them before the revolution is complete! >> The routines also make use of the DSKSYNC capability >>of the drives, which the OS routines don't (under 2.0 they probably >>do). > > Yup. Not as big a win as you'd think (I thought it would be a big >win, but floppy rotation time swamps almost anything). Again, I time things out so that immediately after stepping and letting the head settle, the start of the next track is right there. Can't get any faster. > >> The routines use a CIA timer to get perfect timing, no matter >>what processor. Try popping the disk out with the disk light on (it >>works). Try ejecting the disk in the middle of a load and put it >>into a different drive (it works). Try that with the OS and watch >>your disk go bad in 1 second, thanks to the disk validator. > > I guarantee that if you pop a disk while it's writing, it WILL go bad. >Even with your code. If it's reading, it may forcefully ask for it back, but >it won't go bad. > True. Once the write head is turned on, it will write all over whatever tracks it is over as the disk is ejected. >>And what if you >>allow multitasking and some CHIP RAM pig program (like DPaint) is already >>running? GURU. And you need to test with 4 floppy drives under the OS, >>just to make sure the OS hasn't taken more memory than you can allow. > > You should learn to check allocation returns. It's not hard. >There's even a tool for selectively denying memory allocations to stress- >test your program that we distribute (written by Bill Hawes to help test >2.0). > There ARE cases where the allocation routines don't return (at least under 1.3). In too many cases, the OS throws up the ol' AG_NoMemory guru alert, instead of waiting for some other process to free some memory and retrying. >>Once you start >>using the floppy disk hardware directly, for example, you must put >>the CIAs back into a state that the OS wants them in. What state is that? >>The ROM Kernel manuals LIE. Have fun finding out what page they lie on, >>because there is NO index. > > Sure, the 1.1 RKMs had the CIA allocations backwards. The 1.3 RKMs >(which have been out a long time now) had the correct information (and >indexes). We told people this. This caused our worst 2.0 compatibility >problems, though we solved almost all of them (by dink of truely tricky >programming...) > >>>On the other hand, games like Sim-City and Lemmings have no real use >>>for that kind of environment. I agree that the game comes first, >>>*BUT* if you don't need total control, *DON'T* take it. > >>I agree with this 100%. If you don't need to take over the machine, >>don't. If you want to push the machine to its limits, there is NO >>other way. > > True. > >> Let the game come first. If you know you can do the >>game in a small amount of RAM and that performance is not an issue, >>go ahead and use the OS. In my approach, if I have RAM left over, >>I use it for more sounds or instrument samples to make the music >>better, or to cache more data from the floppy drives. > > You don't let the _game_ come first, you put your _implementation_ >first. There is a difference. As for your approach, you may find your >game didn't need all these tricks, and had ram left over when you're >done. But since you programmed yourself into a corner you can't go back >and cooperate with the system, so you just look for ways to use up the >ram in a more-or-less-useful manner. > Using the OS is just as easy a way to paint yourself into a corner. When I did Budokan, I could have easily gone made it run under the OS in a few days. I specifically chose not to and wouldn't want to do it any other way. It had nothing to do with laziness. I deliberately chose to NOT use the OS and would do it exactly the same way if I had to do it over. >>Unfortunately, the sales life of a game is about 3 months. Royalties >>don't keep trickling in. > > Good _games_, ones that have a depth beyond flashy graphics, and >have replayability, do continue to sell (though they do best when first >released, like most authored products). Sure, they do eventually trend >towards 0, but by no means do they walk off a cliff for a good game >(or even a well-done flashy game). > Wrong. If a dealer has the luck to sell out every copy of a game that he has after 3 months, he won't order more. He'd rather use the shelf space for a new game. >>The OS does have bugs, however. I spent weeks finding them for other >>people at EA. Ever hear about the trackdisk bug? It seems that if >>you have an external floppy drive and have no disk in it, and do intensive >>disk access to the internal drive, it gurus after a random (long) amount >>of time. > > Actually it has nothing to do with internal versus external. This >was fixed in one of the 1.3 releases in SetPatch. Note: I don't ever >remember seeing a bug report from EA about this (or in fact about almost >anything, though some of the people who sell things through EA do report bugs >well, and this has changed somewhat for the better in the last few years). >Perhaps there was one, it was a while ago, but I don't remember it. Developer >support is a 2-way street. > SetPatch didn't fix it. There is a program that comes with CrossDos called TDPatch that might fix it though. We WERE in touch with CATS on this one. At least you ADMIT there is such a bug. Just one of too many gotchas. Gladly, 2.0 will someday be available to EVERYONE and should have far fewer gotchas. >>Did you know that LoadRGB4() takes a full 60th of a second on a 68000? > > No, it merely doesn't take effect until the next vblank (it >modifies the copperlist). > No, it doesn't return until next frame on a 68000. It is faster on an 030. I specifically ran into this problem with a library of routines that I wrote for someone else. I made a loop that looks like this (to test out whether LoadRGB4 was the culprit): Loop: bsr WaitTOF bsr LoadRGB4 bra Loop It takes 2 60ths of a second per loop. >>How long does MrgCop() take (pick random number). How long does >>RethinkDisplay() take (seconds)? How long does BltMaskBitMapRastPort() >>take? > > (1) those are not called all the time, (2) you're exaggerating >by a lot. BMBMRP() is not fast because it operates on arbitrary rectangles, >and some sets require using the A-channel as a mask. So if you do know >the alignments are ok, OwnBlitter() and program it directly. This is not supposed to be a good programming practice, because future versions of the OS and hardware might be different, no? My point is that if the OS were intended to make games, there would be routines for doing these things fast. The OS is a FINE general purpose OS, and despite how ridiculously slow BmBMRP() is, it still blows the doors off of CopyMaskBits() on the Mac. I should also point out that BMBMRP() is THE most important subroutine in the whole library to support a game. > >-- >Randell Jesup, Keeper of AmigaDos, Commodore Engineering. >{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup >Thus spake the Master Ninjei: "To program a million-line operating system >is easy, to change a man's temperament is more difficult." >(From "The Zen of Programming") ;-) -- ******************************************************** * Appendix A of the Amiga Hardware Manual tells you * * everything you need to know to take full advantage * * of the power of the Amiga. And it is only 10 pages! * ********************************************************