============================================================================== 64 Documentation by Jarkko Sonninen, Jouko Valta, John West, and Marko M"akel"a (sonninen@lut.fi, jopi@stekt.oulu.fi, john@ucc.gu.uwa.edu.au, msmakela@hylk.helsinki.fi) [Ed's Note: I'm leaving this file as is because of its intention to serve as a reference guide, and not necessarily to be presented in article format. The detail and clarity with which the authors have presented the material is wonderful!!] # # $Id: 64doc,v 1.3 93/06/21 13:37:18 jopi Exp $ # # This file is part of Commodore 64 emulator # and Program Development System. # # See README for copyright notice # # This file contains documentation for 6502/6510/8502 instruction set. # # Written by # Jarkko Sonninen (sonninen@lut.fi) # Jouko Valta (jopi@stekt.oulu.fi) # John West (john@ucc.gu.uwa.edu.au) # Marko M"akel"a (msmakela@hylk.helsinki.fi) # # $Log: 64doc,v $ # Revision 1.3 93/06/21 13:37:18 jopi # X64 version 0.2 PL 0 # # Revision 1.2 93/06/21 13:07:15 jopi # *** empty log message *** # # # 6510 Instructions by Addressing Modes ++++++++ Positive ++++++++++ -------- Negative ---------- 00 20 40 60 80 a0 c0 e0 mode +00 BRK JSR RTI RTS NOP* LDY CPY CPX Impl/immed +01 ORA AND EOR ADC STA LDA CMP SBC (indir,x) +02 t t t t NOP*t LDX NOP*t NOP*t ? /immed +03 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* (indir,x) +04 NOP* BIT NOP* NOP* STY LDY CPY CPX Zeropage +05 ORA AND EOR ADC STA LDA CMP SBC -"- +06 ASL ROL LSR ROR STX LDX DEC INC -"- +07 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* -"- +08 PHP PLP PHA PLA DEY TAY INY INX Implied +09 ORA AND EOR ADC NOP* LDA CMP SBC Immediate +0a ASL ROL LSR ROR TXA TAX DEX NOP Accu/impl +0b ANC** ANC** ASR** AR 7 Cancel 8 /duck/mailserv/hacking> 7 Interrupt 8 /duck/mailserv/hacking> 6510 Instructions by Addressing Modes ++++++++ Positive ++++++++++ -------- Negative ---------- 00 20 40 60 80 a0 c0 e0 mode +00 BRK JSR RTI RTS NOP* LDY CPY CPX Impl/immed +01 ORA AND EOR ADC STA LDA CMP SBC (indir,x) +02 t t t t NOP*t LDX NOP*t NOP*t ? /immed +03 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* (indir,x) +04 NOP* BIT NOP* NOP* STY LDY CPY CPX Zeropage +05 ORA AND EOR ADC STA LDA CMP SBC -"- +06 ASL ROL LSR ROR STX LDX DEC INC -"- +07 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* -"- +08 PHP PLP PHA PLA DEY TAY INY INX Implied +09 ORA AND EOR ADC NOP* LDA CMP SBC Immediate +0a ASL ROL LSR ROR TXA TAX DEX NOP Accu/impl +0b ANC** ANC** ASR** ARR** ANE** LXA** SBX** SBC* Immediate +0c NOP* BIT JMP JMP STY LDY CPY CPX Absolute +0d ORA AND EOR ADC STA LDA CMP SBC -"- +0e ASL ROL LSR ROR STX LDX DEC INC -"- +0f SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* -"- +10 BPL BMI BVC BVS BCC BCS BNE BEQ Relative +11 ORA AND EOR ADC STA LDA CMP SBC (indir),y +12 t t t t t t t t ? +13 SLO* RLA* SRE* RRA* SHA** LAX* DCP* ISB* (indir),y +14 NOP* NOP* NOP* NOP* STY LDY NOP* NOP* Zeropage,x +15 ORA AND EOR ADC STA LDA CMP SBC -"- +16 ASL ROL LSR ROR STX y) LDX y) DEC INC -"- +17 SLO* RLA* SRE* RRA* SAX* y) LAX* y) DCP ISB -"- +18 CLC SEC CLI SEI TYA CLV CLD SED Implied +19 ORA AND EOR ADC STA LDA CMP SBC Absolute,y +1a NOP* NOP* NOP* NOP* TXS TSX NOP* NOP* Implied +1b SLO* RLA* SRE* RRA* SHS** LAS** DCP* ISB* Absolute,y +1c NOP* NOP* NOP* NOP* SHY** LDY NOP* NOP* Absolute,x +1d ORA AND EOR ADC STA LDA CMP SBC -"- +1e ASL ROL LSR ROR SHX**y) LDX y) DEC INC -"- +1f SLO* RLA* SRE* RRA* SHA**y) LAX* y) DCP ISB -"- Legend: t Jams the machine *t Jams very rarely * Undocumented command ** Unusual operation y) indexed using IY instead of IX 6510/8502 Undocumented Commands -- A brief explanation about what may happen while using don't care states. ANE $8B AC = (AC | #$EE) & IX & #byte same as AC = ((AC & #$11 & IX) | ( #$EE & IX)) & #byte In real 6510/8502 the internal parameter #$11 may occasionally be #$10, #$01 or even #$00. This occurs probably when the VIC halts the processor right between the two clock cycles of this instruction. LXA $AB C=Lehti: AC = IX = ANE Alternate: AC = IX = (AC & #byte) TXA and TAX have to be responsible for these. SHA $93,$9F Store (AC & IX & (ADDR_HI + 1)) SHX $9E Store (IX & (ADDR_HI + 1)) SHY $9C Store (IY & (ADDR_HI + 1)) SHS $9B SHA and TXS, where X is replaced by (AC & IX). Note: The value to be stored is copied also to ADDR_HI if page boundary is crossed. SBX $CB Carry and Decimal flags are ignored but set in substraction. This is due to the CMP command, which is executed instead of the real SBC. Many undocumented commands do not use AND between registers, the CPU just throws the bytes to a bus simultaneously and lets the open-collector drivers perform the AND. I.e. the command called 'SAX', which is in the STORE section (opcodes $A0...$BF), stores the result of (AC & IX) by this way. More fortunate is its opposite, 'LAX' which just loads a byte simultaeously into both AC and IX. $CB SBX IX <- (AC & IX) - Immediate The 'SBX' ($CB) may seem to be very complex operation, even though it is combination of subtraction of accumulator and parameter, as in the 'CMP' instruction, and the command 'DEX'. As a result, both AC and IX are connected to ALU but only the subtraction takes place. Since the comparison logic was used, the result of subtraction should be normally ignored, but the 'DEX' now happily stores to IX the value of (AC & IX) - Immediate. That is why this instruction does not have any decimal mode, and it does not affect the V flag. Also Carry flag is ignored in the subtraction but set according to the result. Proof: These test programs show if your machine is compatible with ours regarding the opcode $CB. The first test, vsbx, shows that SBX does not affect the V flag. The latter one, sbx, shows the rest of our theory. The vsbx test tests 33554432 SBX combinations (16777216 different AC, IX and Immediate combinations, and two different V flag states), and the sbx test doubles that amount (16777216*4 D and C flag combinations). Both tests have run successfully on a C64 and a Vic20. They ought to run on C16, +4 and the PET series as well. The tests stop with BRK, if the opcode $CB does not work expectedly. Successful operation ends in RTS. As the tests are very slow, they print dots on the screen while running so that you know that the machine has not jammed. On computers running at 1 MHz, the first test prints approximately one dot every four seconds and a total of 2048 dots, whereas the second one prints half that amount, one dot every seven seconds. If the tests fail on your machine, please let us know your processor's part number and revision. If possible, save the executable (after it has stopped with BRK) under another name and send it to us so that we know at which stage the program stopped. The following program is a Commodore 64 executable that Marko M"akela developed when trying to find out how the V flag is affected by SBX. (It was believed that the SBX affects the flag in a weird way, and this program shows how SBX sets the flag differently from SBC.) You may find the subroutine at $C150 useful when researching other undocumented instructions' flags. Run the program in a machine language monitor, as it makes use of the BRK instruction. The result tables will be written on pages $C2 and $C3. Other undocumented instructions usually cause two preceding opcodes being executed. However 'NOP' seems to completely disappear from 'SBC' code $EB. The most difficult to comprehend are the rest of the instructions located on the '$0B' line. All the instructions located at the positive (left) side of this line should rotate either memory or the accumulator, but the addressing mode turns out to be immediate! No problem. Just read the operand, let it be ANDed with the accumulator and finally use accumulator addressing mode for the instructions above them. The rest two instructions on the same line, called 'ANE' and 'LXA' ($8B and $AB respectively) often give quite unpredictable results. However, the most usual operation is to store ((A | #$ee) & X & #$nn) to accumulator. Note that this does not work reliably in a real 64! On 8502 opcode $8B uses values 8C,CC, EE, and occasionally 0C and 8E for the OR instead of EE,EF,FE and FF used by 6510. With 8502 running at 2 MHz #$EE is always used. Opcode $AB does not cause this OR taking place on 8502 while 6510 always performs it. Note that this behaviour depends on chip revision. Let's take a closer look at $8B (6510). AC <- IX & D & (AC | VAL) where VAL comes from this table: IX high D high D low VAL even even --- $EE (1) even odd --- $EE odd even --- $EE odd odd 0 $EE odd odd not 0 $FE (2) (1) If the bottom 2 bits of AC are both 1, then the LSB of the result may be 0. The values of IX and D are different every time I run the test. This appears to be very rare. (2) VAL is $FE most of the time. Sometimes it is $EE - it seems to be random, not related to any of the data. This is much more common than (1). In decimal mode, VAL is usually $FE. Two different functions has been discovered for LAX, opcode $AB. One is AC = IX = ANE (see above) and the other, encountered with 6510 and 8502, is less complicated AC = IX = (AC & #byte). However, according to what is reported, the version altering only the lowest bits of each nybble seems to be more common. What happens, is that $AB loads a value into both AC and IX, ANDing the low bit of each nybble with the corresponding bit of the old AC. However, there are exceptions. Sometimes the low bit is cleared even when AC contains a '1', and sometimes other bits are cleared. The exceptions seem random (they change every time I run the test). Oops - that was in decimal mode. Much the same with D=0. What causes the randomness? Probably it is that it is marginal logic levels - when too much wired-anding goes on, some of the signals get very close to the threshold. Perhaps we're seeing some of them step over it. The low bit of each nybble is special, since it has to cope with carry differently (remember decimal mode). We never see a '0' turn into a '1'. Since these instructions are unpredictable, they should not be used. There is still very strange instruction left, the one named SHA/X/Y, which is the only one with only indexed addressing modes. Actually, the commands 'SHA', 'SHX' and 'SHY' are generated by the indexing algorithm. While using indexed addressing, effective address for page boundary crossing is calculated as soon as possible so it does not slow down operation. As a result, in the case of SHA/X/Y, the address and data are prosessed at the same time making AND between the to take place. Thus, the value to be stored by SAX, for example, is in fact (AC & IX & (ADDR_HI + 1)). On page boundary crossing the same value is copied also to high byte of the effective address. Register selection for load and store bit1 bit0 AC IX IY 0 0 x 0 1 x 1 0 x 1 1 x x So, AC and IX are selected by bits 1 and 0 respectively, while ~(bit1 | bit0) enables IY. Indexing is determined by bit4, even in relative addressing mode, which is one kind of indexing. Lines containing opcodes xxx000x1 (01 and 03) are treated as absolute after the effective address has been loaded into CPU. Zeropage,y and Absolute,y (codes 10x1 x11x) are distinquished by bit5. Decimal mode in NMOS 6500 series Most sources claim that the NMOS 6500 series sets the N, V and Z flags unpredictably. Of course, this is not true. While testing how the flags are set, I also wanted to see what happens if you use illegal BCD values. ADC works in Decimal mode in a quite complicated way. It is amazing how it can do that all in a single cycle. Here's a pseudo code version of the instruction: AC accumulator AL low nybble of accumulator AH high nybble of accumulator C Carry flag Z Zero flag V oVerflow flag N Negative flag s value to be added to accumulator AL = (AC & 15) + (s & 15) + C; ! Calculate the lower nybble. if (AL > 9) ! BCD fixup AL += 6; ! for lower nybble AH = (A >> 4) + (s >> 4) + (AL > 15); ! Calculate the upper nybble. Z = (AC + s + C != 0); ! Zero flag is set just ! like in Binary mode. ! Negative and Overflow flags are set with the same logic than in ! Binary mode, but after fixing the lower nybble. N = (AH & 8 != 0); V = ((AH & 8) ^ (A >> 4)) && (!(A ^ s) & 128); if (AH > 9) ! BCD fixup AH += 6; ! for upper nybble ! Carry is the only flag set after fixing the result. C = (AH > 15); AC = ((AH << 4) | (AL & 15)) & 255; The C flag is set as the quiche eaters expect, but the N and V flags are set after fixing the lower nybble but before fixing the upper one. They use the same logic than binary mode ADC. The Z flag is set before any BCD fixup, so the D flag does not have any influence on it. Proof: The following test program tests all 131072 ADC combinations in Decimal mode, and aborts with BRK if anything breaks this theory. If everything goes well, it ends in RTS. All programs in this chapter have been successfully tested on a Vic20 and a Commodore 64. They should run on C16, +4 and on the PET series as well. If not, please report the problem to Marko M"akel"a. Each test in this chapter should run in less than a minute at 1 MHz. SBC is much easier. Just like CMP, its flags are not affected by the D flag. Proof: The only difference in SBC's operation in decimal mode from binary mode is the result-fixup: AC accumulator AL low nybble of accumulator AH high nybble of accumulator C Carry flag Z Zero flag V oVerflow flag N Negative flag s value to be added to accumulator AL = (AC & 15) - (s & 15) - !C; ! Calculate the lower nybble. if (AL & 16) ! BCD fixup AL -= 6; ! for lower nybble AH = (AC >> 4) - (s >> 4) - (AL > 15); ! Calculate the upper nybble. if (AH & 16) ! BCD fixup AH -= 6; ! for upper nybble ! Flags are set just like in Binary mode. C = (AC - s - !C > 255); Z = (AC - s - !C != 0); V = ((AC - s - !C) ^ s) && ((AC ^ s) & 128); N = ((AC - s - !C) & 128); AC = ((AH << 4) | (AL & 15)) & 255; Again Z flag is set before any BCD fixup. The N and V flags are set at any time before fixing the high nybble. The C flag may be set in any phase. Decimal subtraction is easier than decimal addition, as you have to make the BCD fixup only when a nybble flows over. In decimal addition, you had to verify if the nybble was greater than 9. The processor has an internal "half carry" flag for the lower nybble, and it uses it to trigger the BCD fixup. When calculating with legal BCD values, the lower nybble cannot flow over again when fixing it. So the processor does not handle overflows while performing the fixup. Similarly, the BCD fixup occurs in the high nybble only if the value flows over, i.e. when the C flag will be cleared. Because SBC's flags are not affected by the Decimal mode flag, you could guess that CMP uses the SBC logic, only setting the C flag first. But the SBX instruction shows that CMP also temporarily clears the D flag, although it is totally unnecessary. The following program, which tests SBC's result and flags, contains the 6502 version of the pseudo code example above. Obviously the undocumented instructions RRA (ROR+ADC) and ISB (INC+SBC) have inherited also the decimal operation from the official instructions ADC and SBC. The program droradc shows this statement for ROR, and the dincsbc test shows this for ISB. Finally, dincsbc-deccmp shows that ISB's and DCP's (DEC+CMP) flags are not affected by the D flag. 6510 features o PHP always pushes the Break (B) flag as a `1' to the stack. Jukka Tapanim"aki claimed in C=lehti issue 3/89, on page 27 that the processor makes a logical OR between the status register's bit 4 and the bit 8 of the stack register (which is always 1). o Indirect addressing modes do not handle page boundary crossing at all. When the parameter's low byte is $FF, the effective address wraps around and the CPU fetches high byte from $xx00 instead of $xx00+$0100. E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100, and LDA ($FF),Y fetches the base address from $FF and $00. o Indexed zero page addressing modes never fix the page address on crossing the zero page boundary. E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01. o The processor always fetches the byte following a relative branch instruction. If the branch is taken, the processor reads then the opcode from the destination address. If page boundary is crossed, it first reads a byte from the old page from a location that is bigger or smaller than the correct address by one page. o If you cross a page boundary in any other indexed mode, the processor reads an incorrect location first, a location that is smaller by one page. o Read-Modify-Write instructions write unmodified data, then modified (so INC effectively does LDX loc;STX loc;INX;STX loc) o -RDY is ignored during writes (This is why you must wait 3 cycles before doing any DMA - the maximum number of consecutive writes is 3, which occurs during interrupts except -RESET.) o Some undefined opcodes may give really unpredictable results. o All registers except the Program Counter remain the same after -RESET. (This is why you must preset D and I flags in the RESET handler.) Different CPU types The Rockwell data booklet 29651N52 (technical information about R65C00 microprocessors, dated October 1984), lists the following differences between NMOS R6502 microprocessor and CMOS R65C00 family: 1. Indexed addressing across page boundary. NMOS: Extra read of invalid address. CMOS: Extra read of last instruction byte. 2. Execution of invalid op codes. NMOS: Some terminate only by reset. Results are undefined. CMOS: All are NOPs (reserved for future use). 3. Jump indirect, operand = XXFF. NMOS: Page address does not increment. CMOS: Page address increments and adds one additional cycle. 4. Read/modify/write instructions at effective address. NMOS: One read and two write cycles. CMOS: Two read and one write cycle. 5. Decimal flag. NMOS: Indeterminate after reset. CMOS: Initialized to binary mode (D=0) after reset and interrupts. 6. Flags after decimal operation. NMOS: Invalid N, V and Z flags. CMOS: Valid flag adds one additional cycle. 7. Interrupt after fetch of BRK instruction. NMOS: Interrupt vector is loaded, BRK vector is ignored. CMOS: BRK is executed, then interrupt is executed. 6510 Instruction Timing The NMOS 6500 series uses a sort of pipelining. It always reads two bytes for each instruction. If the instruction was only two cycles long, the opcode for the next instruction can be fetched during the third cycle. As most instructions are two or three bytes long, this is quite efficient. But one-byte instructions take two cycles, even though they could be performed in one. The following tables show what happens on the bus while executing different kinds of instructions. The tables having "???" marks at any cycle may be totally wrong, but the rest should be absolutely accurate. Interrupts NMI and IRQ both take 7 cycles. Their timing diagram is much like BRK's. IRQ will be executed only when the I flag is clear. The processor will usually wait for the current instruction to complete before executing the interrupt sequence. There is one exception to this rule: If a NMI occurs while the processor is executing a BRK, the two interrupts may take 7 to 14 cycles to execute, and the processor may totally lose the BRK instruction. Probably the results are similar also with IRQ. Marko M"akel"a experimented with BRK/NMI, but he still hasn't analyzed the results. RESET does not push program counter on stack, and we don't know how long it lasts. But we know that RESET preserves all registers (except PC). Accumulator or implied addressing BRK # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away), increment PCR 3 $0100,S W push PCH on stack (with B flag set), decrement S 4 $0100,S W push PCL on stack, decrement S 5 $0100,S W push P on stack, decrement S 6 $FFFE R fetch PCL 7 $FFFF R fetch PCH RTI # address R/W description --- ------- --- ----------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R read next instruction byte (and throw it away), increment PCR 3 $0100,S R increment S 4 $0100,S R pull P from stack, increment S 5 $0100,S R pull PCL from stack, increment S 6 $0100,S R pull PCH from stack RTS # address R/W description --- ------- --- ----------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R read next instruction byte (and throw it away), increment PCR 3 $0100,S R increment S 4 $0100,S R pull PCL from stack, increment S 5 $0100,S R pull PCH from stack 6 PCR R increment PCR PHA, PHP # address R/W description --- ------- --- ----------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R read next instruction byte (and throw it away), increment PCR 3 $0100,S W push register on stack, decrement S PLA, PLP # address R/W description --- ------- --- ----------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R read next instruction byte (and throw it away), increment PCR 3 $0100,S R increment S 4 $0100,S R pull register from stack Note: The 3rd cycle does NOT read from PCR. Maybe it reads from $0100,S. Other instructions # address R/W description --- ------- --- ----------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R read next instruction byte (and throw it away), increment PCR Immediate addressing # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch value, increment PCR Absolute addressing JMP # address R/W description --- ------- --- ------------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address's low byte to latch, increment PCR 3 PCR R copy latch to PCL, fetch address's high byte to latch, increment PCR, copy latch to PCH JSR # address R/W description --- ------- --- ------------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address's low byte to latch, increment PCR 3 $0100,S R store latch 4 $0100,S W push PCH on stack, decrement S 5 $0100,S W push PCL on stack, decrement S 6 PCR R copy latch to PCL, fetch address's high byte to latch, increment PCR, copy latch to PCH Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, increment PCR 4 address R read from effective address Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, increment PCR 4 address R read from effective address 5 address W write the value back to effective address, and do the operation on it 6 address W write the new value to effective address Write instructions (STA, STX, STY, SAX) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, increment PCR 4 address W write register to effective address Zero page addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address R read from effective address Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address R read from effective address 4 address W write the value back to effective address, and do the operation on it 5 address W write the new value to effective address Write instructions (STA, STX, STY, SAX) # address R/W description --- ------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address W write register to effective address Zero page indexed addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- --------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address R read from address, add index register to it 4 address+I* R read from effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- --------- --- --------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address R read from address, add index register X to it 4 address+X* R read from effective address 5 address+X* W write the value back to effective address, and do the operation on it 6 address+X* W write the new value to effective address Note: * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Write instructions (STA, STX, STY, SAX) # address R/W description --- --------- --- ------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R fetch address, increment PCR 3 address R read from address, add index register to it 4 address+I* W write to effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Absolute indexed addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, LAE, SHS, NOP) # address R/W description --- --------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, add index register to low address byte, increment PCR 4 address+I* R read from effective address, fix the high byte of effective address 4+ address+I R re-read from effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. + This cycle will be executed only if the effective address was invalid during cycle #4, i.e. page boundary was crossed. Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- --------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, add index register X to low address byte, increment PCR 4 address+X* R read from effective address, fix the high byte of effective address 5 address+X R re-read from effective address 6 address+X W write the value back to effective address, and do the operation on it 7 address+X W write the new value to effective address Notes: * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Write instructions (STA, STX, STY, SHA, SHX, SHY) # address R/W description --- --------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch low byte of address, increment PCR 3 PCR R fetch high byte of address, add index register to low address byte, increment PCR 4 address+I* R read from effective address, fix the high byte of effective address 5 address+I W write to effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Because the processor cannot undo a write to an invalid address, it always reads from the address first. Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS) # address R/W description --- --------- --- --------------------------------------------- 1 PCR R fetch opcode, increment PCR 2 PCR R fetch operand, increment PCR 3 PCR R Fetch opcode of next instruction, If branch is taken, add operand to PCL. Otherwise increment PCR. 3+ PCR* R Fetch opcode of next instruction. Fix PCH. If it did not change, increment PCR. 3! PCR R Fetch opcode of next instruction, increment PCR. Notes: * The high byte of Program Counter (PCH) may be invalid at this time, i.e. it may be smaller or bigger by $100. + If branch is taken, this cycle will be executed. ! If branch occurs to different page, this cycle will be executed. Indexed indirect addressing Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, add X to it, increment PCR 3 ??? R internal operation 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address R read from effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, add X to it, increment PCR 3 ??? R internal operation 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address R read from effective address 7 address W write the value back to effective address, and do the operation on it 8 address W write the new value to effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Write instructions (STA, SAX) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, add X to it, increment PCR 3 ??? R internal operation 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address W write to effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Indirect indexed addressing Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, increment PCR 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 5+ address+Y R read from effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. + This cycle will be executed only if the effective address was invalid during cycle #5, i.e. page boundary was crossed. Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, increment PCR 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 6 address+Y W write to effective address 7 address+Y W write the value back to effective address, and do the operation on it 8 address+Y W write the new value to effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Write instructions (STA, SHA) # address R/W description --- ----------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address, increment PCR 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 6 address+Y W write to effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Absolute indirect addressing (JMP) # address R/W description --- --------- --- ------------------------------------------ 1 PCR R fetch opcode, increment PCR 2 PCR R fetch pointer address low, increment PCR 3 PCR R fetch pointer address high, increment PCR 4 pointer R fetch low address to latch 5 pointer+1* R fetch PCH, copy latch to PCL Note: * The PCH will always be fetched from the same page than PCL, i.e. page boundary crossing is not handled. MEMORY MANAGEMENT normal ultimax 1111 101x 011x 001x 1110 0100 1100 xx01 1000 00x0 10000 ------------------------------------------------------------------------ F000 Kernal RAM Kernal RAM Kernal Kernal Kernal module E000 ------------------------------------------------------------------------ D000 I/O I/O** I/O RAM I/O I/O I/O I/O ------------------------------------------------------------------------ C000 RAM RAM RAM RAM RAM RAM RAM - ------------------------------------------------------------------------ B000 BASIC RAM RAM RAM BASIC module module - A000 ------------------------------------------------------------------------ 9000 RAM RAM RAM RAM module RAM module module 8000 ------------------------------------------------------------------------ 7000 6000 RAM RAM RAM RAM RAM RAM RAM - 5000 4000 ------------------------------------------------------------------------ 3000 2000 RAM RAM RAM RAM RAM RAM RAM RAM 1000 0000 ------------------------------------------------------------------------ **) Chargen not accessible by the CPU AUTOSTART CODE If memory places $8004 to $8008 contain 'CBM80' (C3 C2 CD 38 30), the RESET routine jumps to ($8000) and the default NMI handler jumps to ($8002). HOW REAL PROGRAMMERS ACKNOWLEDGE INTERRUPTS With RMW instructions: ; beginning of combined raster/timer interrupt routine LSR $D019 ; clear VIC interrupts, read raster interrupt flag to C BCS raster ; jump if VIC caused an interrupt ... ; timer interrupt routine Operational diagram of LSR $D019: # data address R/W --- ---- ------- --- --------------------------------- 1 4E PCR R fetch opcode 2 19 PCR+1 R fetch address low 3 D0 PCR+2 R fetch address high 4 xx $D019 R read memory 5 xx $D019 W write the value back, rotate right 6 xx/2 $D019 W write the new value back The 5th cycle acknowledges the interrupt by writing the same value back. If only raster interrupts are used, the 6th cycle has no effect on the VIC. With indexed addressing: ; acknowledge interrupts to both CIAs LDX #$10 LDA $DCFD,X Operational diagram of LDA $DCFD,X: # data address R/W description --- ---- ------- --- --------------------------------- 1 BD PCR R fetch opcode 2 FD PCR+1 R fetch address low 3 DC PCR+2 R fetch address high, add X to address low 4 xx $DC0D R read from address, fix high byte of address 5 yy $DD0D R read from right address ; acknowledge interrupts to CIA 2 LDX #$10 STA $DDFD,X Operational diagram of STA $DDFD,X: # data address R/W description --- ---- ------- --- --------------------------------- 1 9D PCR R fetch opcode 2 FD PCR+1 R fetch address low 3 DC PCR+2 R fetch address high, add X to address low 4 xx $DD0D R read from address, fix high byte of address 5 ac $DE0D W write to right address With branch instructions: ; acknowledge interrupts to CIA 2 LDA #$00 ; clear N flag JMP $DD0A DD0A BPL $DC9D ; branch DC9D BRK ; return You need the following preparations to initialize the CIA registers: LDA #$91 ; argument of BPL STA $DD0B LDA #$10 ; BPL STA $DD0A STA $DD08 ; load the ToD values from the latches LDA $DD0B ; jam the ToD display LDA #$7F STA $DC0D ; assure that $DC0D is $00 Operational diagram of BPL $DC9D: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DD0A R fetch opcode 2 91 $DD0B R fetch argument 3 xx $DD0C R fetch opcode, add argument to PCL 4 yy $DD9D R fetch opcode, fix PCH ( 5 00 $DC9D R fetch opcode ) ; acknowledge interrupts to CIA 1 LDA #$00 ; clear N flag JMP $DCFA DCFA BPL $DD0D DD0D BRK ; Again you need to set the ToD registers of CIA 1 and the ; Interrupt Control Register of CIA 2 first. Operational diagram of BPL $DD0D: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DCFA R fetch opcode 2 11 $DCFB R fetch argument 3 xx $DCFC R fetch opcode, add argument to PCL 4 yy $DC0D R fetch opcode, fix PCH ( 5 00 $DD0D R fetch opcode ) ; acknowledge interrupts to CIA 2 automagically ; preparations LDA #$7F STA $DD0D ; disable CIA 2's all interrupt sources LDA $DD0E AND #$BE ; ensure that $DD0C remains constant STA $DD0E ; and stop the timer LDA #$FD STA $DD0C ; parameter of BPL LDA #$10 STA $DD0B ; BPL LDA #$40 STA $DD0A ; RTI/parameter of LSR LDA #$46 STA $DD09 ; LSR STA $DD08 ; load the ToD values from the latches LDA $DD0B ; jam the ToD display LDA #$09 STA $0318 LDA #$DD STA $0319 ; change NMI vector to $DD09 LDA #$FF ; Try changing this instruction's operand STA $DD05 ; (see comment below). LDA #$FF STA $DD04 ; set interrupt frequency to 1/65536 cycles LDA $DD0E AND #$80 ORA #$11 LDX #$81 STX $DD0D ; enable timer interrupt STA $DD0E ; start timer LDA #$00 ; To see that the interrupts really occur, STA $D011 ; use something like this and see how LOOP DEC $D020 ; changing the byte loaded to $DD05 from BNE LOOP ; #$FF to #$0F changes the image. When an NMI occurs, the processor jumps to Kernal code, which jumps to ($0318), which points to the following routine: DD09 LSR $40 ; clear N flag BPL $DD0A ; Note: $DD0A contains RTI. Operational diagram of BPL $DD0A: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DD0B R fetch opcode 2 11 $DD0C R fetch argument 3 xx $DD0D R fetch opcode, add argument to PCL 4 40 $DD0A R fetch opcode, (fix PCH) With RTI: ; the fastest possible interrupt handler in the 6500 family ; preparations SEI LDA $01 ; disable ROM and enable I/O AND #$FD ORA #$05 STA $01 LDA #$7F STA $DD0D ; disable CIA 2's all interrupt sources LDA $DD0E AND #$BE ; ensure that $DD0C remains constant STA $DD0E ; and stop the timer LDA #$40 STA $DD0C ; store RTI to $DD0C LDA #$0C STA $FFFA LDA #$DD STA $FFFB ; change NMI vector to $DD0C LDA #$FF ; Try changing this instruction's operand STA $DD05 ; (see comment below). LDA #$FF STA $DD04 ; set interrupt frequency to 1/65536 cycles LDA $DD0E AND #$80 ORA #$11 LDX #$81 STX $DD0D ; enable timer interrupt STA $DD0E ; start timer LDA #$00 ; To see that the interrupts really occur, STA $D011 ; use something like this and see how LOOP DEC $D020 ; changing the byte loaded to $DD05 from BNE LOOP ; #$FF to #$0F changes the image. When an NMI occurs, the processor jumps to Kernal code, which jumps to ($0318), which points to the following routine: DD0C RTI How on earth can this clear the interrupts? Remember, the processor always fetches two successive bytes for each instruction. A little more practical version of this is redirecting the NMI (or IRQ) to your own routine, whose last instruction is JMP $DD0C or JMP $DC0C. If you want to confuse more, change the 0 in the address to a hexadecimal digit different from the one you used when writing the RTI. Or you can combine the latter two methods: DD09 LSR $xx ; xx is any appropriate BCD value 00-59. BPL $DCFC DCFC RTI This example acknowledges interrupts to both CIAs. If you want to confuse the examiners of your code, you can use any of these techniques. Although these examples use no undefined opcodes, they do not run correctly on CMOS processors. However, the RTI example should run on 65C02 and 65C816, and the latter branch instruction example might work as well. The RMW instruction method has been used in some demos, others were developed by Marko M"akel"a. His favourite is the automagical RTI method, although it does not have any practical applications, except for some time dependent data decryption routines for very complicated copy protections. MAKING USE OF THE I/O REGISTERS If you are making a resident program and want to make as invisible to the system as possible, probably the best method is keeping most of your code under the I/O area (in the RAM at $D000-$DFFF). You need only a short routine in the normally visible RAM that pushes the current value of the processor's I/O register $01 on stack, switches I/O and ROMs out and jumps to this area. Returning from the $D000-$DFFF area is easy even without any routine in the normally visible RAM area. Just write a RTS to an I/O register and return through it. But what if your program needs to use I/O? And how can you write the RTS to an I/O register while the I/O area is switched off? You need a swap area for your program in normally visible memory. The first thing your routine at $D000-$DFFF does is copying the I/O routines (or the whole program) to normally visible memory, swapping the bytes. For instance, if your I/O routines are initially at $D200-$D3FF, exchange the bytes at $D200-$D3FF with the contents of $C000-$C1FF. Now you can call the I/O routines from your routine at $D000-$DFFF, and the I/O routines can switch the I/O area temporarily on to access the I/O circuitry. And right before exiting your program at $D000-$DFFF swaps the old contents of that I/O routine area in, e.g. exchanges the memory areas $D200-$D3FF and $C000-$C1FF again. What I/O registers can you use for the RTS? There are two alternatives: 8-bit VIC sprite registers or CIA serial port register. The CIA register is usually better, as changing the VIC registers might change the screen layout. However, also the SP register has some drawbacks: If the machine's CNT1 and CNT2 lines are connected to a frequency source, you must stop either CIA's Timer A to use the SP register method. Normally the 1st CIA's Timer A is the main hardware interrupt source. And if you use the Kernal's RS232, you cannot stop the 2nd CIA's Timer A either. Also, if you don't want to lose any CIA interrupts, remember that the RTS at SP register causes also the Interrupt Control Register to be read. Also keep in mind that the user could press RESTORE while the Kernal ROM and I/O areas are disabled. You could write your own NMI handler (using the NMI vector at $FFFA), but a fast loader that uses very tight timing would still stop working if the user pressed RESTORE in wrong time. So, to make a robust program, you have to disable NMI interrupts. But how is this possible? They are Non-Maskable after all. The NMI interrupt is edge-sensitive, the processor jumps to NMI handler only when the -NMI line drops from +5V to ground. Just cause a NMI with CIA2's timer, but don't read the Interrupt Control register. If you need to read $DD0D in your program, you must add a NMI handler just in case the user presses RESTORE. And don't forget to raise the -NMI line upon exiting the program. This can be done automatically by the latter two of the three following examples. ; Returning via VIC sprite 7 X coordinate register Initialization: ; This is executed when I/O is switched on LDA #$60 STA $D015 ; Write RTS to VIC register $15. Exiting: ; NOTE: This procedure must start at VIC register ; $12. You have multiple alternatives, as the VIC ; appears in memory at $D000+$40*n, where $0<=n<=$F. PLA ; Pull the saved 6510 I/O register state from stack STA $01 ; Restore original memory bank configuration ; Now the processor fetches the RTS command from the ; VIC register $15. ; Returning via CIA 2's SP register (assuming that CNT2 is stable) Initialization: ; This is executed when I/O is switched on LDA $DD0E ; CIA 2's Control Register A AND #$BF ; Set Serial Port to input STA $DD0E ; (make the SP register to act as a memory place) LDA #$60 STA $DD0C ; Write RTS to CIA 2 register $C. Exiting: ; NOTE: This procedure must start at CIA 2 register ; $9. As the CIA 2 appears in memory at $DD00+$10*n, ; where 0<=n<=$F, you have sixteen alternatives. PLA STA $01 ; Restore original memory bank configuration ; Now the processor fetches the RTS command from ; the CIA 2 register $C. ; Returning via CIA 2's SP register, stopping the Timer A ; and forcing SP2 and CNT2 to output Initialization: ; This is executed when I/O is switched on LDA $DD0E ; CIA 2's Control Register A AND #$FE ; Stop Timer A ORA #$40 ; Set Serial Port to output STA $DD0E ; (make the SP register to act as a memory place) LDA #$60 STA $DD0C ; Write RTS to CIA register $C. Exiting: ; NOTE: This procedure must start at CIA 2 register ; $9. As the CIA 2 appears in memory at $DD00+$10*n, ; where, 0<=n<=$F, you have sixteen alternatives. PLA STA $01 ; Restore original memory bank configuration ; Now the processor fetches the RTS command from ; the CIA 2 register $C. For instance, if you want to make a highly compatible fast loader, make the ILOAD vector ($0330) point to the beginning of the stack area. Remember that the BASIC interpreter uses the first bytes of stack while converting numbers to text. A good address is $0120. Robust programs practically never use so much stack that it could corrupt this routine. Usually only crunched programs (demos and alike) use all stack in the decompression phase. They also make use of the $D000-$DFFF area. This stack routine will jump to your routine at $D000-$DFFF, as described above. For performance's sake, copy the whole byte transfer loop to the swap area, e.g. $C000-$C1FF, and call that subroutine after doing the preliminary work. But what about files that load over $C000-$C1FF? Wouldn't that destroy the transfer loop and jam the machine? Not necessarily. If you copy those bytes to your swap area at $D000-$DFFF, they will be loaded properly, as your program restores the original $C000-$C1FF area. If you want to make your program user-friendly, put a vector initialization routine to the stack area as well, so that the user can restore the fast loader by issuing a SYS command, rather than loading it each time he has pressed STOP & RESTORE or RESET. NOTES See MCS 6500 Microcomputer Family Programming Manual for more information. There is also a table showing functional description and timing for complete 6510 instruction set on C=Hacking magazine issue 1/92 (available via FTP at ccosun.caltech.edu:/pub/rknop/hacking.mag/ and nic.funet.fi:/pub/cbm/c=hacking/). References: C64 Memory Maps C64 Programmer's Reference Guide pp. 262-267 6510 Block Diagram C64 Programmer's Reference Guide p. 404 Instruction Set C64 Programmer's Reference Guide pp. 416-417 C=Hacking Volume 1, issue #1, 1/92 C=Lehti magazine 4/87 =============================================================================