=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= tiny_hgr8 an 8-byte hi-res Apple II demo by Deater / dSr Lovebyte 2023 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= TLDR: I wrote an Apple II graphics demo that's only 8 bytes of 6502 assembly language LINK: https://youtu.be/8QYezzXC9PA =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= I really wanted to make a hi-res 8-byte demo but that is trickier than you might think. === THE CHALLENGE === The Apple II has a 6502 processor. To enable hi-res graphics you need three bytes, typically a jump to the HGR routine in the Applesoft BASIC ROM: JSR HGR The HGR routine will flip the proper soft-switches to enable graphics mode, enable split graphics/text mode, select viewing the 8K of graphics info in PAGE1 and then clear the screen to black. (The nearby HGR2 call is similar but makes the graphics full-screen and uses PAGE2 instead). Once you set hi-res mode you still need to draw some graphics. It is hard to do this compactly. The most obvious way is the ROM HPLOT call, but this depends on the A, X, and Y registers holding the screen co-ordinates as well as the desired color being set up at a zero page location. When I create 16 byte demos I often use the built-in ROM vector drawing shapetable/XDRAW functionality which avoids the need for color setting because it just XORs pixels. However you still usually need to call the HPOSN routine to set up the co-ordinate values in the zero page such as GBASL/GBASH. The default values from uninitialized RAM at boot usually aren't useful. You can try drawing directly to screen memory at addresses $2000* (PAGE1) or $4000 (PAGE2) , but that takes 3 bytes and if you want to draw to all 8K of the screen you need to have a way to increment a 16-bit pointer. If we were lucky at boot there'd be an indirect pointer in the zero page with a good address for this, but alas there isn't. So to summarize, to do hi-res graphics it takes 3 bytes to init, at least 3 to draw a pixel, and then 2 bytes for a loop. We're at 8-bytes already and we haven't even done anything useful like increment the pixel location or change the color. So is all hope lost? * note a leading $ is how you traditionally indicate hexadecimal numbers on 6502 computers === THE CHRGET TRICK === We can use a trick I found in a previous lo-res graphics entry shown at Lovebyte 2022. We can abuse some code put into the zero page by the Applesoft ROM at boot (this is available on any Apple II from the Apple II+ onward, which is to say most of them). Applesoft uses this code when parsing BASIC programs, and it is apparently put into the zero page so the address being loaded can be self-modified. The code looks like this: CHRGET: 00B1- E6 B8 INC $B8 00B3- D0 02 BNE $00B7 00B5- E6 B9 INC $B9 00B7- AD 05 02 LDA $0205 00BA- C9 3A CMP #$3A 00BC- B0 0A BCS $00C8 00BE- C9 20 CMP #$20 00C0- F0 EF BEQ 00B1 What the code originally does is not important, what is interesting is that it does a 16-bit increment of the address of the LDA (load accumulator) instruction at $B7, and there's a convenient BEQ (branch if equal) back to the beginning of the routine at $C0. If we drop our code in between these two chunks of code we can just barely do some interesting graphics. === THE PLAN === The first thing we need to do is get into hi-res graphics mode. As discussed earlier doing a 3-byte jsr HGR2 will do this. It uses soft-switches to enable graphics, switch to hi-res, set it to full-screen (no text), and finally to get the graphics from PAGE2 ($4000). It then drops into a routine that does a linear clear of the screen to color 0 (black). This might seem boring, but on the Apple II due to the weird (and clever) way Woz designed the DRAM/video refresh circuitry this gives a venetian-blind effect which looks pretty neat. This is great, but we want some pretty pixels on the screen too. It turns out that if we jump into the middle of the previously mentioned routine we can hit the screen clearing code at a point where it is drawing the pattern in the A register to the screen. So if we do a jsr BKGND0 it will fill the screen with a nice pattern. This is an unofficial entry point in the ROM, but for various complex reasons involving the license with Microsoft it turns out Apple never updated the Applesoft BASIC ROMs despite there being various known bugs. So now we in theory have 6 bytes of code we can drop into the middle of the CHRGET routine and have it repeatedly clear the screen to a color and then clear it back to black, with a nice blinds effect. That's boring though, can we switch up the colors drawn? It'd be nice to load a random value into the accumulator (A register) before the call to fill the screen. The existing code does a load from an always- incrementing 16-bit address, let's point it into the ROM code and that can act as a random enough series of bytes. == LOAD ADDRESS CONSIDERATIONS == The CHRGET load address starts at $800, the default load address of BASIC programs. We want to point it to ROM which is at the top of the address space. The easiest way to do this is just have some high address bytes at the start of the code and just load the program so it drops into the middle of the LDA instruction. If we were running code by entering it into the assembly language monitor that would be fine, we could load the bytes and then jump to an arbitrary memory offset. However for the competition we are going to load from disk so we have to start executing from the start of our binary. This means these address bytes also need to be valid code with no bad side effects. An obvious choice would be the no-operation NOP instruction, which is $EA. Convenient, as $EAEA points nicely into the ROM. It turns out there are some fun** complications with doing this. ** As per 4am, no fun is actually guaranteed in this process === WHEREIN WE GET A BEEP AND === ====== A TEXT SCREEN OF Ws ======= So we set our code to load in the middle of CHRGET, calling BKGND0 immediately after the LDA which puts the needed color pattern into the A register. We can't call HGR2 first as it will always reset A to be $60. Sadly, if you run this, you'll get a text screen filled with characters before crashing into the monitor. The problem here is BKGND0 assumes the value of the first page of graphics you want to fill is in zero-page location HGR_PAGE ($E6). On bootup this is likely uninitialized (it often ends up $00 or $FF), so when you call the routine it happily writes your color pattern across the first 8k of RAM which unfortunately is where the zero-page, stack, and your code live. Not Good. We need a way to skip BKGND0 the first time through the loop. === SKIPPING CHUNKS OF INSTRUCTIONS === = SURPRISINGLY YOU DO THIS A LOT WHEN = ======= WRITING 6502 ASSEMBLY ======== There's one famous way to skip ahead on the 6502. This is to use the BIT instruction. By putting a $2C byte in your code it will do a BIT (logical AND to set bits but throw away the result) and it will use two bytes following (that you are trying to skip) as an address. This is usually harmless (unless those address bits point to a soft-switch). You can use this trick to compactly have code where you can jump into the middle of the BIT instruction to execute the two address bytes as code, but otherwise execute the BIT as sort of a 3-byte almost NOP. We can construct our code so the entry point is a BIT instruction that skips the first JSR, but later loop iterations branch earlier and instead the BIT is part of the address to the LDA instruction and the JSR happens as normal. So the first time through the loop BKGND0 is skipped and HGR2 gets called first. HGR2 usefully sets up the HGR_PAGE value in $E6 to a good value so the BKGND0 call works in all future loop iterations. === ALMOST ON THE HOME STRETCH === We should be just about there, right? There is a problem though, the first time through the loop the BIT consumes the next two bytes, avoiding the JSR to BKGND0. However it means the address of BKGND0, $F3F4, (actually $F4, $F3 as the 6502 is little-endian) get executed as code. Is this a problem? It turns out those two instructions are invalid opcodes on both 6502 and 65c02 processors. Luckily, though, instead of trapping like a modern processor would the processor tries to execute them anyway. You can look up the side effects for these invalid instructions online; on the NMOS 6502 at least you get behavior based on the don't care terms in the instruction PLA. Happily though in our case the instructions are close enough to NOPs that our code will work. === POINTING TO ROM === So with the BIT in place the last step is to make sure we are pointing to ROM when we load the accumulator. If we load our 8-bytes of code at address $B8 we can have $2C of the BIT as the low byte of the LDA instruction address, and the high byte can be anything we want. I arbitrarily put a NOP there even though the code never gets executed as $EA works to give a nice "random" set of color patterns starting at $EA2C (If you're curious, this is in the middle of the ROM Floating Point addition routine). === FINALLY, THE LOOP === We can't forget we need to loop. If we load our code at $B8, the 8-bytes stop just short of the BEQ branch-if-equal instruction back to the beginning. BEQ checks the Zero flag, but luckily the HGR2 call always ends with the Zero flag set so this nicely turns the BEQ into a branch-always. === ALL FINISHED === The program loads, it skips the first color fill, inits the screen, then loops back alternately setting and clearing the screen based on a color pattern from an incrementing pointer into ROM, leading to a colorful animated venetian-blind pattern. It actually looks lovely, arguably nicer than many of the 16-byte intros I've done. === TRY IT FOR YOURSELF === On an Apple II (or emulator) get to the ']' BASIC prompt and enter these commands to run it for yourself: CALL -151 B8: 2C EA 20 F4 F3 20 D8 F3 B8G =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- by Vince `deater` Weaver http://www.deater.net/weave 11 February 2023 with apologies to 4AM for vaguely stealing his writeup format