Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!ames!vsi1!zorch!amiga0!mykes From: mykes@amiga0.SF-Bay.ORG (Mike Schwartz) Newsgroups: comp.sys.amiga.programmer Subject: Re: Lemmings - a tutorial Part V (last) Message-ID: Date: 30 Mar 91 14:38:44 GMT References: <23788@well.sf.ca.us> <23837@well.sf.ca.us> <781@tnc.UUCP> Organization: Amiga makes it possible Lines: 330 Sorry to paste this whole thing again, but it is the best article done so far. In article <781@tnc.UUCP> m0154@tnc.UUCP (GUY GARNETT) writes: > >[As much as I've been trying to ignore this discussion, now I'm going >to open my big mouth ...] > >I, too would like to see sources for both permanently taking out the >operating system (for a high performance game), and for suspending and >resuming multitasking properly. How about a list of trade-offs (what >parts of the OS you can and can't use, and how v2.0 impacts the whole >scheme). This would be useful information, no matter which "side" of >the discussion you are on. > No sources will be posted here, but I have done a small game (in 3 days in assembler) that I do intend to publish source code to at a future date. And believe it or not, it does not kill the OS to the point that it can't be restored. In other words, it runs from DOS and returns to DOS, but it does NOT multitask. But, a small description of how I do things. I always develop for an Amiga 500 with 512K of RAM. My 500 has 1Meg and the old Agnus and 1.2 ROMs. This configuration is guaranteed to give you the basic features of the vast majority of Amigas around, and I get to see the game perform exactly as the end user will, including floppy disk access, the whole time I develop. When the Amiga first boots, it asks for a workbench disk. If you have an autobooting hard drive and a bootable floppy is inserted, the machine will boot from the floppy. In any case, the ROM Kernel loads what is called the boot program from track 0 of the floppy disk into RAM and does a JSR to it. The standard Amiga OS bootsector program simply opens dos.library and then does an RTS and the system continues to boot up into the normal Operating System. Well, I wrote my own boot sector program that just doesn't return to the OS. In this boot sector program, I call several ROM Kernel routines, but this is the ONLY time when they will be available to be called. Upon entry to the boot sector program, Exec is already running, as is trackdisk.device, and Commodore was even nice enough to make the A1 register point to an already opened IORequest structure for use with trackdisk. I use AllocMem to allocate enough memory to hold 2 tracks worth of code and use the IORequest to read 2 tracks into this memory. I also make sure that the Allocated memory is above $40000. I also use OS calls to find out how many floppy drives are connected, where any FAST memory is, and what kind of CPU the machine has. The two tracks that I read in contain an 8K kernel of code that replaces ALL of the ROM Kernel routines that are needed for a game. Consider it a BIOS of sorts. In addition to this 8K of KERNEL code, there is another 12K of floppy disk drivers, because I will not have the operating system running to read any further data from the floppies. As soon as the 2 tracks are read in, I turn off interrupts and dma and the OS is officially dead. I then jump to the beginning of the allocated block which contains the KERNEL. The first thing the Kernel does is to copy itself down to low memory ($200). The Kernel initialization installs all the low memory vector handlers I want to use, including all those nasty GURU vectors, VBlank, Copper, Audio, CIA, etc. To make things easy to debug, I have 2K of code that is part of the Kernel that allows debugging out the serial port at 4*57.6K baud, but when these are conditionally assembled in so I can ship a version of the game without the debugging kernel. I should also point out that immediately after copying itself down to low memory, the Kernel puts itself into supervisor mode (benefits described below). When the game normally boots, the trackdisk routines in the kernel are used to load the actual game code into memory and the kernel jumps to it and everything is hunky dory. So what are the benefits of taking over? Well, I am guaranteed that ALL 512K are mine to use. I can ORG any graphics, code, or audio data at any hard coded location I want. This practice allows things like blitter routines to have hard coded constant addresses in them, which saves CPU cycles where you need them the most. I can put graphics screens and the stack anywhere I want. For a 16 color game, I put the stack at $80000 and a screen at $78000. The stack needed for any program I write is < 512 bytes. The resulting memory map gives me from $100 to $78000 to squeeze the game into. And I do mean squeeze. EVERY single instruction that ever gets executed is my own code. When I single step through routines, I get symbolic information for every single instruction. I never see jsr offset(a6) and wonder what the heck is going on. When I do use the OS and step partway into one of the ROM routines, I am apalled by how ugly and inefficient the code is. When I write the code myself, I am in full control of every clock cycle and byte that is used by the program. Since I am in supervisor mode, there are NO illegal instructions that can be executed (priviledge violations cause a GURU under the OS). The User Stack Pointer (USP) also can be used as a quick place to save an address register (this is 2x faster than a push on the stack). The upper byte of the status register is available to disable various levels of interrupts (the INTENA on Paula is just as useful), and the TRACE bitis available for debugging purposes. Since my program is the only one running (i.e. no multitasking OS), there are lots of programming techniques that violate normal programming practices that become valid. For example, you can busy wait for blitter finished (if you need to) without being considered a HOG. Another trick I like to use is to put the blitter into NASTY mode ALL of the time. This effectively STOPS the CPU (even a 68030) when it accesses chip memory, until the blitter is finished. Without blitter nasty, which is how the OS works, blits take at least twice as long to perform. Another technique that is not valid under the OS is to set up blitter registers on a semipermanent basis. By doing this, you only need to store 2 or 3 blitter registers to start each blit instead of 14. In order to get the most performance out of the Amiga, you should keep the blitter busy almost all the time. VBlank is a particularly precious time period, especially if you are using a graphics mode that steals CPU/Blitter time. If you can do all of your blits to the screen during VBLank, you don't need to do double buffering. When the OS is active, a huge amount of VBLank time is used by the standard OS handler because it has to handle server chains, etc. I want every single clock cycle I can get during this time. It is important to note that on a PAL machine, VBL is a bit longer than for NTSC. Those Europeans get all the breaks :) I also get the benefit of putting my variables in low memory. You see, the 68000 allows you to use absolute short addressing mode to access these, which frees a register for other things. People have ragged about this being useless, but if you note the tone of what I am writing about, I am saving/shaving every clock cycle out of everything I can find. I am SEEKING the best performance possible. The register that would normally point into your variables under the OS (Manx uses A4...) I point at $dff000 so I get fast access to the hardware registers all the time (including interrupt handlers). The floppy disk drivers I wrote use a single 10K buffer to handle as many disk drives (up to 4) that may be configured. The OS routines will steal 40K if you have 4 drives (10K per drive). They read and write standard trackdisk format, so the floppies can be copies with DiskCopy (or by dragging the icon on a blank disk icon) under the OS (to allow users to make as many backups as they want). They provide enhanced performance a few ways. One way is that the blitter is in nasty mode, so encoding/decoding the MFM data is as fast as possible. When data is written to diskette, it is arranged so that NO extra disk revolutions will be made during readback. This is done by timing the read and write routines so that by the time a track has been read in and the head stepped, the next start of track is under the head and ready to go. The routines also make use of the DSKSYNC capability of the drives, which the OS routines don't (under 2.0 they probably do). The routines use a CIA timer to get perfect timing, no matter what processor. Try popping the disk out with the disk light on (it works). Try ejecting the disk in the middle of a load and put it into a different drive (it works). Try that with the OS and watch your disk go bad in 1 second, thanks to the disk validator. I NEVER need to do dynamic memory allocation, so my memory never fragments. Under the OS, if an application doesn't respond quickly enough to Intuimessages, or you have enough windows opened, the OS starts allocating memory and never frees it up. And the OS has serious problems with low memory situations (mostly it gurus). I implement my own BOBs and Multitasking routines. I preallocate enough memory to hold 80 task structures and 80 bob structures by just reserving part of the memory map for them. It takes exactly 4 instructions to allocate a task or bob structure and 4 to free it up. Typically, every OBJECT in the game is implemented as both a bob and a task. The task code is a finite state machine that controls the animation and movement of the object in the game. Anytime you fire a bullet, a new BOB and new task is created, for example. A task switch takes about 10 instructions. My tasking scheme easily supports 80 tasks getting a slice of the CPU in a 60th of a second. The BOBs system is dependant on the number and size of the BOBs, naturally. All tasks share the same stack, which again is < 512 bytes for the whole game. Playtesting is a piece of cake. You don't have to try too many hardware configurations of Amigas to see how compatible the code is. There are only a few software configurations to check out, too (like 1.0, 1.1, 1.2, 1.3, and 2.0). If the game were written under the OS, you'd have nightmares testing all the possible software configurations. For example, does the game work with GOMF installed? How about GOMF and DMouse? How many possibilities do you see? I see bazillions :) And what if you allow multitasking and some CHIP RAM pig program (like DPaint) is already running? GURU. And you need to test with 4 floppy drives under the OS, just to make sure the OS hasn't taken more memory than you can allow. >I, too have been involved with writing high-performance games on the >Amiga (no, you won't find my name in any credits; I was a technical >advisor and algorithm guru rather than a programmer --- most of the >programmers I worked with were high-school or college kids who taught >themselves everything; lots of raw talent, and most of them handn't >the foggiest idea how to figure out a blitter minterm, so I taught >them). At the time, the only way we knew of to get effective arcade >games was to kill the OS at boot-up time; I would have preferred to be >able to suspend and then resume the OS, but couldn't figure out a way >to do it without causing a crash later on. Most of the >"whiz-kids" working with me thought of the OS as an obstacle anyway. > No sh*t sherlock! The blitter is a very powerful coprocessor and is no piece of cake to learn. On the other hand, the OS routines are PIG slow. In many cases, you have to restore the hardware to a state that the OS needs so the system won't crash. Once you start using the floppy disk hardware directly, for example, you must put the CIAs back into a state that the OS wants them in. What state is that? The ROM Kernel manuals LIE. Have fun finding out what page they lie on, because there is NO index. >On the other hand, games like Sim-City and Lemmings have no real use >for that kind of environment. I agree that the game comes first, >*BUT* if you don't need total control, *DON'T* take it. With the OS >comes a lot more flexability (HD installation, a real file system, >multitasking, and lots of "Wow! What a neat game, and it multitasks, >TOO!"). Take what you need, bit be sure you need it before you take >it. > I agree with this 100%. If you don't need to take over the machine, don't. If you want to push the machine to its limits, there is NO other way. Let the game come first. If you know you can do the game in a small amount of RAM and that performance is not an issue, go ahead and use the OS. In my approach, if I have RAM left over, I use it for more sounds or instrument samples to make the music better, or to cache more data from the floppy drives. >There is no excuse for software to break on 68010, 68020, and 68030 >machines, and too many programs (most of them games, but does anybody >remember TDI Modula-2?) break on accellerated systems. Be aware of >what you are doing when your write your code, and make it upward >compatible. For timing, use one of the many timebases supplied by the >OS, or if you have killed it, then program one of the CIA's directly. >Pay attention to compatibility; it will ensure that royalties come >trickling in for years to come, instead of months. > In most cases, the things that break on 680x0 (where x >= 1) are not dependant on whether you are using the OS or not. Self modifying code breaks in either case. Using the CPU for timing breaks either way. Use VBL for timing, or the beam position register, but don't use a software loop. Don't use the upper byte of address pointers (either in RAM or address registers) for flags/variables. Read your AmigaMail (sent to registered developers) because they constantly remark on what practices are invalid. I sure wish Commodore would collect all this information in one place and post it to the net and publish it in all the manuals, etc. Unfortunately, the sales life of a game is about 3 months. Royalties don't keep trickling in. >*NOT* learning how the OS works is a kind of intellectual lazieness, >even if you take the effort to "roll your own". There is always a >programmer or program out there who can show you a trick or two, and a >lot of clever people spent a lot of time working on the AmigaOS (I >have nothing but respect for people like -=RJ=- and the rest of the >Amiga crew). The OS is decidedly *NOT* full of bugs (and if the last >time you looked at it was in v1.1, take another look!) and can be your >ally, if you learn to control it ("Use the OS, Luke ... Use the OS!" >;-) > I agree here too (except about Use the OS). But I know it too well... The OS does have bugs, however. I spent weeks finding them for other people at EA. Ever hear about the trackdisk bug? It seems that if you have an external floppy drive and have no disk in it, and do intensive disk access to the internal drive, it gurus after a random (long) amount of time. Did you know that LoadRGB4() takes a full 60th of a second on a 68000? How long does MrgCop() take (pick random number). How long does RethinkDisplay() take (seconds)? How long does BltMaskBitMapRastPort() take? The OS is definately worth knowing. I do know it. The first game I did used the OS (the only one I will ever do that way). I have written megabytes of software that uses the OS. It is a great OS. No argument from me. It just is built to outperform a Mac, but not a C64. The hardware blows away everything from C64s to Macs to the Genesis, but you wouldn't know it from watching "performance oriented" games that use it. By the way, I have an A2000 with a 2630 (25 MHz 68030) and run 1.3. I have written my own libraries and devices and dos handlers and lots of other things that many people don't even get into. It is a clever piece of work and I am not at all trying to bash it. I am only saying that it is not good for games that need performance. >I'm not trying to flame or put down anyone, but better games than the >current crop can still be written! Better both in terms of >awesome-take-over-the-machine-graphics, and better in terms of >awesome-playability-and-multitasking-code. > I agree here too. Looking at most Amiga games next to C64 games makes me want to puke. It is painful to see Amiga games run at 8 frames per second while the C64 version of the same game runs at 60. There is no excuse for this. If you can't achieve the performance under the OS, boot it by all means. That is what the C64 guys do. It is a proven technique. You left out what I expect to see from games. It is awesome-take-over-the- machine-awesome-playability-code. BTW, has anybody ever tried Music-X? It is a performance oriented piece of midi software. The first thing I did with it was to plug in my $175 synth and tried to sequence in the built-in demo song. Poof - guru. Damn software couldn't keep up with all those midi events... haha. Anybody doubt David Joyner's abilities? I don't. The program is extremely well done, just the OS can't keep up with 32K baud. >I suggested earlier that Mike Farren expand and polish his articles, >include source code examples which make a complete, small game, and send >the whole thing off to AC's Tech. I still feel very strongly that this >should be done. I also would like to encourage Mike Schwartz to do >the same: write up his techniques for taking over the system and >programming down to the hardware, inclulde sample code, and send it >off to one of the technical magazines (like AC's Tech). > Mike already seems to have this idea in mind, except for the part about writing the game. His Lemmings posts (which actually started this whole thread) were full of copyright notices because he expected his stuff to be published. You will see something from me in the not so distant future. >Wildstar -- ******************************************************** * Appendix A of the Amiga Hardware Manual tells you * * everything you need to know to take full advantage * * of the power of the Amiga. And it is only 10 pages! * ********************************************************