==============================================================================
64 Documentation
by Jarkko Sonninen, Jouko Valta, John West, and Marko M"akel"a
   (sonninen@lut.fi, jopi@stekt.oulu.fi, john@ucc.gu.uwa.edu.au,
    msmakela@hylk.helsinki.fi)

[Ed's Note: I'm leaving this file as is because of its intention to
serve as a reference guide, and not necessarily to be presented in
article format. The detail and clarity with which the authors have
presented the material is wonderful!!]

#
# $Id: 64doc,v 1.3 93/06/21 13:37:18 jopi Exp $
#
# This file is part of Commodore 64 emulator
#      and Program Development System.
#
# See README for copyright notice
#
# This file contains documentation for 6502/6510/8502 instruction set.
#
# Written by 
#   Jarkko Sonninen (sonninen@lut.fi)
#   Jouko Valta     (jopi@stekt.oulu.fi)
#   John West       (john@ucc.gu.uwa.edu.au)
#   Marko M"akel"a  (msmakela@hylk.helsinki.fi)
#
# $Log: 64doc,v $
# Revision 1.3  93/06/21  13:37:18  jopi
#  X64 version 0.2 PL 0
# 
# Revision 1.2  93/06/21  13:07:15  jopi
# *** empty log message ***
# 
#
#

                6510 Instructions by Addressing Modes

        ++++++++ Positive ++++++++++    -------- Negative ----------
        00      20      40      60      80      a0      c0      e0      mode

+00     BRK     JSR     RTI     RTS     NOP*    LDY     CPY     CPX  Impl/immed
+01     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC  (indir,x)
+02      t       t       t       t      NOP*t   LDX     NOP*t   NOP*t  ? /immed
+03     SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB* (indir,x)
+04     NOP*    BIT     NOP*    NOP*    STY     LDY     CPY     CPX  Zeropage
+05     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     -"-
+06     ASL     ROL     LSR     ROR     STX     LDX     DEC     INC     -"-
+07     SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    -"-

+08     PHP     PLP     PHA     PLA     DEY     TAY     INY     INX  Implied
+09     ORA     AND     EOR     ADC     NOP*    LDA     CMP     SBC  Immediate
+0a     ASL     ROL     LSR     ROR     TXA     TAX     DEX     NOP  Accu/impl
+0b     ANC**   ANC**   ASR**   AR
7[7m Cancel [m8

/duck/mailserv/hacking> 
7[7m Interrupt [m8

/duck/mailserv/hacking> 

                6510 Instructions by Addressing Modes

        ++++++++ Positive ++++++++++    -------- Negative ----------
        00      20      40      60      80      a0      c0      e0      mode

+00     BRK     JSR     RTI     RTS     NOP*    LDY     CPY     CPX  Impl/immed
+01     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC  (indir,x)
+02      t       t       t       t      NOP*t   LDX     NOP*t   NOP*t  ? /immed
+03     SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB* (indir,x)
+04     NOP*    BIT     NOP*    NOP*    STY     LDY     CPY     CPX  Zeropage
+05     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     -"-
+06     ASL     ROL     LSR     ROR     STX     LDX     DEC     INC     -"-
+07     SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    -"-

+08     PHP     PLP     PHA     PLA     DEY     TAY     INY     INX  Implied
+09     ORA     AND     EOR     ADC     NOP*    LDA     CMP     SBC  Immediate
+0a     ASL     ROL     LSR     ROR     TXA     TAX     DEX     NOP  Accu/impl
+0b     ANC**   ANC**   ASR**   ARR**   ANE**   LXA**   SBX**   SBC* Immediate
+0c     NOP*    BIT     JMP     JMP     STY     LDY     CPY     CPX  Absolute
+0d     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     -"-
+0e     ASL     ROL     LSR     ROR     STX     LDX     DEC     INC     -"-
+0f     SLO*    RLA*    SRE*    RRA*    SAX*    LAX*    DCP*    ISB*    -"-

+10     BPL     BMI     BVC     BVS     BCC     BCS     BNE     BEQ  Relative
+11     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC  (indir),y
+12      t       t       t       t       t       t       t       t      ?
+13     SLO*    RLA*    SRE*    RRA*    SHA**   LAX*    DCP*    ISB* (indir),y
+14     NOP*    NOP*    NOP*    NOP*    STY     LDY     NOP*    NOP* Zeropage,x
+15     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     -"-
+16     ASL     ROL     LSR     ROR     STX y)  LDX y)  DEC     INC     -"-
+17     SLO*    RLA*    SRE*    RRA*    SAX* y) LAX* y) DCP     ISB     -"-

+18     CLC     SEC     CLI     SEI     TYA     CLV     CLD     SED  Implied
+19     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC  Absolute,y
+1a     NOP*    NOP*    NOP*    NOP*    TXS     TSX     NOP*    NOP* Implied
+1b     SLO*    RLA*    SRE*    RRA*    SHS**   LAS**   DCP*    ISB* Absolute,y
+1c     NOP*    NOP*    NOP*    NOP*    SHY**   LDY     NOP*    NOP* Absolute,x
+1d     ORA     AND     EOR     ADC     STA     LDA     CMP     SBC     -"-
+1e     ASL     ROL     LSR     ROR     SHX**y) LDX y)  DEC     INC     -"-
+1f     SLO*    RLA*    SRE*    RRA*    SHA**y) LAX* y) DCP     ISB     -"-

        Legend:

        t       Jams the machine
        *t      Jams very rarely
        *       Undocumented command
        **      Unusual operation
        y)      indexed using IY instead of IX



                6510/8502 Undocumented Commands

         -- A brief explanation about what may happen while
                using don't care states.


        ANE $8B         AC = (AC | #$EE) & IX & #byte
                        same as
                        AC = ((AC & #$11 & IX) | ( #$EE & IX)) & #byte

                        In real 6510/8502 the internal parameter #$11 may
                        occasionally be #$10, #$01 or even #$00. This occurs
                        probably when the VIC halts the processor right between
                        the two clock cycles of this instruction.

        LXA $AB         C=Lehti:   AC = IX = ANE
                        Alternate: AC = IX = (AC & #byte)

                        TXA and TAX have to be responsible for these.


        SHA $93,$9F     Store (AC & IX & (ADDR_HI + 1))
        SHX $9E         Store (IX & (ADDR_HI + 1))
        SHY $9C         Store (IY & (ADDR_HI + 1))
        SHS $9B         SHA and TXS, where X is replaced by (AC & IX).

                        Note: The value to be stored is copied also
                        to ADDR_HI if page boundary is crossed.


        SBX $CB         Carry and Decimal flags are ignored but set in 
                        substraction. This is due to the CMP command,
                        which is executed instead of the real SBC.


 Many undocumented commands do not use AND between registers, the CPU just
 throws the bytes to a bus simultaneously and lets the open-collector drivers
 perform the AND. I.e. the command called 'SAX', which is in the STORE section
 (opcodes $A0...$BF), stores the result of (AC & IX) by this way.

 More fortunate is its opposite, 'LAX' which just loads a byte simultaeously
 into both AC and IX.


        $CB  SBX   IX <- (AC & IX) - Immediate

 The 'SBX' ($CB) may seem to be very complex operation, even though it is
 combination of subtraction of accumulator and parameter, as in the 'CMP'
 instruction, and the command 'DEX'. As a result, both AC and IX are connected
 to ALU but only the subtraction takes place. Since the comparison logic was
 used, the result of subtraction should be normally ignored, but the 'DEX' now
 happily stores to IX the value of (AC & IX) - Immediate.
 That is why this instruction does not have any decimal mode, and it does not
 affect the V flag. Also Carry flag is ignored in the subtraction but set
 according to the result.

 Proof:

 These test programs show if your machine is compatible with ours
 regarding the opcode $CB. The first test, vsbx, shows that SBX does
 not affect the V flag. The latter one, sbx, shows the rest of our
 theory. The vsbx test tests 33554432 SBX combinations (16777216
 different AC, IX and Immediate combinations, and two different V flag
 states), and the sbx test doubles that amount (16777216*4 D and C flag
 combinations). Both tests have run successfully on a C64 and a Vic20.
 They ought to run on C16, +4 and the PET series as well. The tests
 stop with BRK, if the opcode $CB does not work expectedly. Successful
 operation ends in RTS. As the tests are very slow, they print dots on
 the screen while running so that you know that the machine has
 not jammed. On computers running at 1 MHz, the first test prints
 approximately one dot every four seconds and a total of 2048 dots,
 whereas the second one prints half that amount, one dot every seven seconds.

 If the tests fail on your machine, please let us know your processor's part
 number and revision. If possible, save the executable (after it has stopped
 with BRK) under another name and send it to us so that we know at which stage
 the program stopped.

 The following program is a Commodore 64 executable that Marko M"akela
 developed when trying to find out how the V flag is affected by SBX.
 (It was believed that the SBX affects the flag in a weird way, and this
 program shows how SBX sets the flag differently from SBC.)
 You may find the subroutine at $C150 useful when researching other 
 undocumented instructions' flags. Run the program in a machine language
 monitor, as it makes use of the BRK instruction. The result tables will be
 written on pages $C2 and $C3.

 Other undocumented instructions usually cause two preceding opcodes being
 executed. However 'NOP' seems to completely disappear from 'SBC' code $EB.

 The most difficult to comprehend are the rest of the instructions located on
 the '$0B' line.

 All the instructions located at the positive (left) side of this line should
 rotate either memory or the accumulator, but the addressing mode turns out
 to be immediate!
 No problem. Just read the operand, let it be ANDed with the accumulator
 and finally use accumulator addressing mode for the instructions above them.

 The rest two instructions on the same line, called 'ANE' and 'LXA' ($8B and
 $AB respectively) often give quite unpredictable results.
 However, the most usual operation is to store ((A | #$ee) & X & #$nn) to
 accumulator. Note that this does not work reliably in a real 64!
 On 8502 opcode $8B uses values 8C,CC, EE, and occasionally 0C and 8E for the
 OR instead of EE,EF,FE and FF used by 6510. With 8502 running at 2 MHz #$EE is
 always used.
 Opcode $AB does not cause this OR taking place on 8502 while 6510 always 
 performs it.  Note that this behaviour depends on chip revision.

 Let's take a closer look at $8B (6510).

        AC <- IX & D & (AC | VAL)

        where VAL comes from this table:

       IX high  D high  D low   VAL
        even     even    ---    $EE (1)
        even     odd     ---    $EE
        odd      even    ---    $EE
        odd      odd      0     $EE
        odd      odd     not 0  $FE (2)

 (1) If the bottom 2 bits of AC are both 1, then the LSB of the result may
    be 0. The values of IX and D are different every time I run the test.
    This appears to be very rare.
 (2) VAL is $FE most of the time. Sometimes it is $EE - it seems to be random,
    not related to any of the data. This is much more common than (1).

  In decimal mode, VAL is usually $FE.


 Two different functions has been discovered for LAX, opcode $AB. One is
 AC = IX = ANE (see above) and the other, encountered with 6510 and 8502,
 is less complicated AC = IX = (AC & #byte). However, according to what is
 reported, the version altering only the lowest bits of each nybble seems to
 be more common.

 What happens, is that $AB loads a value into both AC and IX, ANDing
 the low bit of each nybble with the corresponding bit of the old AC. However,
 there are exceptions. Sometimes the low bit is cleared even when AC contains
 a '1', and sometimes other bits are cleared. The exceptions seem random (they
 change every time I run the test). Oops - that was in decimal mode. Much
 the same with D=0.

 What causes the randomness?  Probably it is that it is marginal logic levels -
 when too much wired-anding goes on, some of the signals get very close to
 the threshold. Perhaps we're seeing some of them step over it. The low bit
 of each nybble is special, since it has to cope with carry differently
 (remember decimal mode). We never see a '0' turn into a '1'.

 Since these instructions are unpredictable, they should not be used.


 There is still very strange instruction left, the one named SHA/X/Y, which is
 the only one with only indexed addressing modes. Actually, the commands 'SHA',
 'SHX' and 'SHY' are generated by the indexing algorithm.

 While using indexed addressing, effective address for page boundary crossing
 is calculated as soon as possible so it does not slow down operation.
 As a result, in the case of SHA/X/Y, the address and data are prosessed at the
 same time making AND between the to take place. Thus, the value to be stored
 by SAX, for example, is in fact (AC & IX & (ADDR_HI + 1)).
 On page boundary crossing the same value is copied also to high byte of the
 effective address.



  Register selection for load and store

   bit1 bit0    AC IX IY
    0   0             x
    0   1          x
    1   0       x
    1   1       x  x

 So, AC and IX are selected by bits 1 and 0 respectively, while ~(bit1 | bit0)
 enables IY.

 Indexing is determined by bit4, even in relative addressing mode, which
 is one kind of indexing.

 Lines containing opcodes xxx000x1 (01 and 03) are treated as absolute after
 the effective address has been loaded into CPU.

 Zeropage,y and Absolute,y (codes 10x1 x11x) are distinquished by bit5.


                 Decimal mode in NMOS 6500 series

   Most sources claim that the NMOS 6500 series sets the N, V and Z flags
 unpredictably. Of course, this is not true. While testing how the flags are
 set, I also wanted to see what happens if you use illegal BCD values.

   ADC works in Decimal mode in a quite complicated way. It is amazing how it
 can do that all in a single cycle. Here's a pseudo code version of the
 instruction:

	AC	accumulator
	AL	low nybble of accumulator
	AH	high nybble of accumulator

	C	Carry flag
	Z	Zero flag
	V	oVerflow flag
	N	Negative flag

	s	value to be added to accumulator

	AL = (AC & 15) + (s & 15) + C;		! Calculate the lower nybble.

	if (AL > 9)				! BCD fixup
	  AL += 6;				! for lower nybble

	AH = (A >> 4) + (s >> 4) + (AL > 15);	! Calculate the upper nybble.

	Z = (AC + s + C != 0);			! Zero flag is set just
						! like in Binary mode.

	! Negative and Overflow flags are set with the same logic than in
	! Binary mode, but after fixing the lower nybble.

	N = (AH & 8 != 0);
	V = ((AH & 8) ^ (A >> 4)) && (!(A ^ s) & 128);

	if (AH > 9)				! BCD fixup
	  AH += 6;				! for upper nybble

	

	! Carry is the only flag set after fixing the result.

	C = (AH > 15);
	AC = ((AH << 4) | (AL & 15)) & 255;


   The C flag is set as the quiche eaters expect, but the N and V flags 
 are set after fixing the lower nybble but before fixing the upper one.
 They use the same logic than binary mode ADC. The Z flag is set before
 any BCD fixup, so the D flag does not have any influence on it.

 Proof: The following test program tests all 131072 ADC combinations in
        Decimal mode, and aborts with BRK if anything breaks this theory.
        If everything goes well, it ends in RTS.

   All programs in this chapter have been successfully tested on a Vic20
 and a Commodore 64. They should run on C16, +4 and on the PET series as
 well. If not, please report the problem to Marko M"akel"a. Each test in
 this chapter should run in less than a minute at 1 MHz.

   SBC is much easier. Just like CMP, its flags are not affected by
 the D flag.

 Proof:

   The only difference in SBC's operation in decimal mode from binary mode
 is the result-fixup:

	AC	accumulator
	AL	low nybble of accumulator
	AH	high nybble of accumulator

	C	Carry flag
	Z	Zero flag
	V	oVerflow flag
	N	Negative flag

	s	value to be added to accumulator

	AL = (AC & 15) - (s & 15) - !C;		! Calculate the lower nybble.

	if (AL & 16)				! BCD fixup
	  AL -= 6;				! for lower nybble

	AH = (AC >> 4) - (s >> 4) - (AL > 15);	! Calculate the upper nybble.

	if (AH & 16)				! BCD fixup
	  AH -= 6;				! for upper nybble

	! Flags are set just like in Binary mode.

	C = (AC - s - !C > 255);
	Z = (AC - s - !C != 0);
	V = ((AC - s - !C) ^ s) && ((AC ^ s) & 128);
	N = ((AC - s - !C) & 128);

	AC = ((AH << 4) | (AL & 15)) & 255;


   Again Z flag is set before any BCD fixup. The N and V flags are set
 at any time before fixing the high nybble. The C flag may be set in any
 phase.

   Decimal subtraction is easier than decimal addition, as you have to
 make the BCD fixup only when a nybble flows over. In decimal addition,
 you had to verify if the nybble was greater than 9. The processor has
 an internal "half carry" flag for the lower nybble, and it uses it to
 trigger the BCD fixup. When calculating with legal BCD values, the
 lower nybble cannot flow over again when fixing it. So the processor
 does not handle overflows while performing the fixup. Similarly, the
 BCD fixup occurs in the high nybble only if the value flows over,
 i.e. when the C flag will be cleared.

   Because SBC's flags are not affected by the Decimal mode flag, you
 could guess that CMP uses the SBC logic, only setting the C flag
 first. But the SBX instruction shows that CMP also temporarily clears
 the D flag, although it is totally unnecessary.

   The following program, which tests SBC's result and flags,
 contains the 6502 version of the pseudo code example above.


   Obviously the undocumented instructions RRA (ROR+ADC) and ISB
 (INC+SBC) have inherited also the decimal operation from the official
 instructions ADC and SBC. The program droradc shows this statement
 for ROR, and the dincsbc test shows this for ISB. Finally,
 dincsbc-deccmp shows that ISB's and DCP's (DEC+CMP) flags are not
 affected by the D flag.


                                 6510 features
   o  PHP always pushes the Break (B) flag as a `1' to the stack.
      Jukka Tapanim"aki claimed in C=lehti issue 3/89, on page 27 that the
      processor makes a logical OR between the status register's bit 4 
      and the bit 8 of the stack register (which is always 1).

   o  Indirect addressing modes do not handle page boundary crossing at all.
      When the parameter's low byte is $FF, the effective address wraps 
      around and the CPU fetches high byte from $xx00 instead of $xx00+$0100.
      E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100,
      and LDA ($FF),Y fetches the base address from $FF and $00.

   o  Indexed zero page addressing modes never fix the page address on
      crossing the zero page boundary.
      E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01.

   o  The processor always fetches the byte following a relative branch
      instruction. If the branch is taken, the processor reads then the
      opcode from the destination address. If page boundary is crossed, it
      first reads a byte from the old page from a location that is bigger
      or smaller than the correct address by one page.

   o  If you cross a page boundary in any other indexed mode,
      the processor reads an incorrect location first, a location that is
      smaller by one page.

   o  Read-Modify-Write instructions write unmodified data, then modified
      (so INC effectively does LDX loc;STX loc;INX;STX loc)

   o  -RDY is ignored during writes
      (This is why you must wait 3 cycles before doing any DMA -
      the maximum number of consecutive writes is 3, which occurs
      during interrupts except -RESET.)

   o  Some undefined opcodes may give really unpredictable results.

   o  All registers except the Program Counter remain the same after -RESET.
      (This is why you must preset D and I flags in the RESET handler.)


                Different CPU types

 The Rockwell data booklet 29651N52 (technical information about R65C00 
 microprocessors, dated October 1984), lists the following differences between
 NMOS R6502 microprocessor and CMOS R65C00 family:

 1. Indexed addressing across page boundary.
        NMOS: Extra read of invalid address.
        CMOS: Extra read of last instruction byte.

 2. Execution of invalid op codes.
        NMOS: Some terminate only by reset. Results are undefined.
        CMOS: All are NOPs (reserved for future use).

 3. Jump indirect, operand = XXFF.
        NMOS: Page address does not increment.
        CMOS: Page address increments and adds one additional cycle.

 4. Read/modify/write instructions at effective address.
        NMOS: One read and two write cycles.
        CMOS: Two read and one write cycle.

 5. Decimal flag.
        NMOS: Indeterminate after reset.
        CMOS: Initialized to binary mode (D=0) after reset and interrupts.

 6. Flags after decimal operation.
        NMOS: Invalid N, V and Z flags.
        CMOS: Valid flag adds one additional cycle.

 7. Interrupt after fetch of BRK instruction.
        NMOS: Interrupt vector is loaded, BRK vector is ignored.
        CMOS: BRK is executed, then interrupt is executed.



                6510 Instruction Timing

   The NMOS 6500 series uses a sort of pipelining. It always reads two
 bytes for each instruction. If the instruction was only two cycles long,
 the opcode for the next instruction can be fetched during the third cycle.
 As most instructions are two or three bytes long, this is quite efficient.
 But one-byte instructions take two cycles, even though they could be
 performed in one.

   The following tables show what happens on the bus while executing different
 kinds of instructions. The tables having "???" marks at any cycle may be
 totally wrong, but the rest should be absolutely accurate.


  Interrupts

     NMI and IRQ both take 7 cycles. Their timing diagram is much like
     BRK's. IRQ will be executed only when the I flag is clear.
     The processor will usually wait for the current instruction to
     complete before executing the interrupt sequence.

     There is one exception to this rule: If a NMI occurs while the
     processor is executing a BRK, the two interrupts may take 7 to 14
     cycles to execute, and the processor may totally lose the BRK
     instruction. Probably the results are similar also with IRQ.
     Marko M"akel"a experimented with BRK/NMI, but he still hasn't
     analyzed the results.

     RESET does not push program counter on stack, and we don't know how
     long it lasts. But we know that RESET preserves all registers
     (except PC).


  Accumulator or implied addressing

     BRK

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PC     R  fetch opcode, increment PC
        2    PC     R  read next instruction byte (and throw it away),
		       increment PCR
        3  $0100,S  W  push PCH on stack (with B flag set), decrement S
        4  $0100,S  W  push PCL on stack, decrement S
        5  $0100,S  W  push P on stack, decrement S
        6   $FFFE   R  fetch PCL
        7   $FFFF   R  fetch PCH


     RTI

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  read next instruction byte (and throw it away),
	      	       increment PCR
        3  $0100,S  R  increment S
        4  $0100,S  R  pull P from stack, increment S
        5  $0100,S  R  pull PCL from stack, increment S
        6  $0100,S  R  pull PCH from stack


     RTS

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  read next instruction byte (and throw it away),
		       increment PCR
        3  $0100,S  R  increment S
        4  $0100,S  R  pull PCL from stack, increment S
        5  $0100,S  R  pull PCH from stack
        6    PCR    R  increment PCR


     PHA, PHP

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  read next instruction byte (and throw it away),
		       increment PCR
        3  $0100,S  W  push register on stack, decrement S


     PLA, PLP

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  read next instruction byte (and throw it away),
		       increment PCR
        3  $0100,S  R  increment S
        4  $0100,S  R  pull register from stack

        Note: The 3rd cycle does NOT read from PCR.
              Maybe it reads from $0100,S.


     Other instructions

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  read next instruction byte (and throw it away),
		       increment PCR


  Immediate addressing

        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch value, increment PCR


  Absolute addressing

     JMP

        #  address R/W description
       --- ------- --- -------------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch address's low byte to latch, increment PCR
        3    PCR    R  copy latch to PCL, fetch address's high byte to
		       latch, increment PCR, copy latch to PCH


     JSR

        #  address R/W description
       --- ------- --- -------------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch address's low byte to latch, increment PCR
        3  $0100,S  R  store latch
        4  $0100,S  W  push PCH on stack, decrement S
        5  $0100,S  W  push PCL on stack, decrement S
        6    PCR    R  copy latch to PCL, fetch address's high byte to
		       latch, increment PCR, copy latch to PCH


     Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
                        LAX, NOP)

        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch low byte of address, increment PCR
        3    PCR    R  fetch high byte of address, increment PCR
        4  address  R  read from effective address


     Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
                                     SLO, SRE, RLA, RRA, ISB, DCP)

        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch low byte of address, increment PCR
        3    PCR    R  fetch high byte of address, increment PCR
        4  address  R  read from effective address
        5  address  W  write the value back to effective address,
                       and do the operation on it
        6  address  W  write the new value to effective address


     Write instructions (STA, STX, STY, SAX)
    
        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch low byte of address, increment PCR
        3    PCR    R  fetch high byte of address, increment PCR
        4  address  W  write register to effective address


  Zero page addressing

     Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
                        LAX, NOP)

        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch address, increment PCR
        3  address  R  read from effective address


     Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
                                     SLO, SRE, RLA, RRA, ISB, DCP)

        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch address, increment PCR
        3  address  R  read from effective address
        4  address  W  write the value back to effective address,
                       and do the operation on it
        5  address  W  write the new value to effective address


     Write instructions (STA, STX, STY, SAX)
    
        #  address R/W description
       --- ------- --- ------------------------------------------
        1    PCR    R  fetch opcode, increment PCR
        2    PCR    R  fetch address, increment PCR
        3  address  W  write register to effective address

  Zero page indexed addressing

     Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
                        LAX, NOP)

        #   address  R/W description
       --- --------- --- ------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch address, increment PCR
	3   address   R  read from address, add index register to it
        4  address+I* R  read from effective address

       Notes: I denotes either index register (X or Y).

              * The high byte of the effective address is always zero,
                i.e. page boundary crossings are not handled.


     Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
                                     SLO, SRE, RLA, RRA, ISB, DCP)

        #   address  R/W description
       --- --------- --- ---------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch address, increment PCR
	3   address   R  read from address, add index register X to it
        4  address+X* R  read from effective address
        5  address+X* W  write the value back to effective address,
                         and do the operation on it
        6  address+X* W  write the new value to effective address

       Note: * The high byte of the effective address is always zero,
               i.e. page boundary crossings are not handled.


     Write instructions (STA, STX, STY, SAX)

        #   address  R/W description
       --- --------- --- -------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch address, increment PCR
	3   address   R  read from address, add index register to it
        4  address+I* W  write to effective address

       Notes: I denotes either index register (X or Y).

              * The high byte of the effective address is always zero,
                i.e. page boundary crossings are not handled.


  Absolute indexed addressing

     Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
                        LAX, LAE, SHS, NOP)

        #   address  R/W description
       --- --------- --- ------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch low byte of address, increment PCR
        3     PCR     R  fetch high byte of address,
                         add index register to low address byte,
			 increment PCR
        4  address+I* R  read from effective address,
                         fix the high byte of effective address
        4+ address+I  R  re-read from effective address

       Notes: I denotes either index register (X or Y).

              * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100.

              + This cycle will be executed only if the effective address
                was invalid during cycle #4, i.e. page boundary was crossed.


     Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
                                     SLO, SRE, RLA, RRA, ISB, DCP)

        #   address  R/W description
       --- --------- --- ------------------------------------------
        1    PCR      R  fetch opcode, increment PCR
        2    PCR      R  fetch low byte of address, increment PCR
        3    PCR      R  fetch high byte of address,
                         add index register X to low address byte,
		         increment PCR
        4  address+X* R  read from effective address,
                         fix the high byte of effective address
        5  address+X  R  re-read from effective address
        6  address+X  W  write the value back to effective address,
                         and do the operation on it
        7  address+X  W  write the new value to effective address

       Notes: * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100.


     Write instructions (STA, STX, STY, SHA, SHX, SHY)

        #   address  R/W description
       --- --------- --- ------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch low byte of address, increment PCR
        3     PCR     R  fetch high byte of address,
                         add index register to low address byte,
			 increment PCR
        4  address+I* R  read from effective address,
                         fix the high byte of effective address
        5  address+I  W  write to effective address

       Notes: I denotes either index register (X or Y).

              * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100. Because
		the processor cannot undo a write to an invalid address,
                it always reads from the address first.


  Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS)

        #   address  R/W description
       --- --------- --- ---------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch operand, increment PCR
        3     PCR     R  Fetch opcode of next instruction,
                         If branch is taken, add operand to PCL.
			 Otherwise increment PCR.
        3+    PCR*    R  Fetch opcode of next instruction.
                         Fix PCH. If it did not change, increment PCR.
        3!    PCR     R  Fetch opcode of next instruction,
			 increment PCR.

       Notes: * The high byte of Program Counter (PCH) may be invalid
                at this time, i.e. it may be smaller or bigger by $100.

              + If branch is taken, this cycle will be executed.

              ! If branch occurs to different page, this cycle will be
                executed.


  Indexed indirect addressing

     Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, add X to it,
			   increment PCR
        3      ???      R  internal operation
        4   pointer+X   R  fetch effective address low
        5  pointer+X+1  R  fetch effective address high
        6    address    R  read from effective address

       Note: The effective address is always fetched from zero page,
             i.e. the zero page boundary crossing is not handled.

     Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, add X to it,
			   increment PCR
        3      ???      R  internal operation
        4   pointer+X   R  fetch effective address low
        5  pointer+X+1  R  fetch effective address high
        6    address    R  read from effective address
        7    address    W  write the value back to effective address,
                           and do the operation on it
        8    address    W  write the new value to effective address

       Note: The effective address is always fetched from zero page,
             i.e. the zero page boundary crossing is not handled.

     Write instructions (STA, SAX)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, add X to it,
			   increment PCR
        3      ???      R  internal operation
        4   pointer+X   R  fetch effective address low
        5  pointer+X+1  R  fetch effective address high
        6    address    W  write to effective address

       Note: The effective address is always fetched from zero page,
             i.e. the zero page boundary crossing is not handled.

  Indirect indexed addressing

     Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, increment PCR
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        5+  address+Y   R  read from effective address

       Notes: The effective address is always fetched from zero page,
              i.e. the zero page boundary crossing is not handled.

              * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100.

              + This cycle will be executed only if the effective address
                was invalid during cycle #5, i.e. page boundary was crossed.


     Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, increment PCR
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address
        7   address+Y   W  write the value back to effective address,
                           and do the operation on it
        8   address+Y   W  write the new value to effective address

       Notes: The effective address is always fetched from zero page,
              i.e. the zero page boundary crossing is not handled.

              * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100.


     Write instructions (STA, SHA)

        #    address   R/W description
       --- ----------- --- ------------------------------------------
        1      PCR      R  fetch opcode, increment PCR
        2      PCR      R  fetch pointer address, increment PCR
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address

       Notes: The effective address is always fetched from zero page,
              i.e. the zero page boundary crossing is not handled.

              * The high byte of the effective address may be invalid
                at this time, i.e. it may be smaller by $100.


  Absolute indirect addressing (JMP)

        #   address  R/W description
       --- --------- --- ------------------------------------------
        1     PCR     R  fetch opcode, increment PCR
        2     PCR     R  fetch pointer address low, increment PCR
        3     PCR     R  fetch pointer address high, increment PCR
        4   pointer   R  fetch low address to latch
        5  pointer+1* R  fetch PCH, copy latch to PCL

       Note: * The PCH will always be fetched from the same page
               than PCL, i.e. page boundary crossing is not handled.



                MEMORY MANAGEMENT

       normal                                                   ultimax
        1111    101x    011x    001x    1110    0100    1100    xx01
                1000            00x0
10000
------------------------------------------------------------------------
 F000
        Kernal  RAM     Kernal  RAM     Kernal  Kernal  Kernal  module
 E000
------------------------------------------------------------------------
 D000   I/O     I/O**   I/O     RAM     I/O     I/O     I/O     I/O
------------------------------------------------------------------------
 C000   RAM     RAM     RAM     RAM     RAM     RAM     RAM      -
------------------------------------------------------------------------
 B000
        BASIC   RAM     RAM     RAM     BASIC   module  module   -
 A000
------------------------------------------------------------------------
 9000
        RAM     RAM     RAM     RAM     module  RAM     module  module
 8000
------------------------------------------------------------------------
 7000

 6000
        RAM     RAM     RAM     RAM     RAM     RAM     RAM      -
 5000

 4000
------------------------------------------------------------------------
 3000    

 2000
        RAM     RAM     RAM     RAM     RAM     RAM     RAM      RAM    
 1000
    
 0000
------------------------------------------------------------------------

                **) Chargen not accessible by the CPU


                AUTOSTART CODE

  If memory places $8004 to $8008 contain 'CBM80' (C3 C2 CD 38 30),
  the RESET routine jumps to ($8000) and the default NMI handler jumps to
  ($8002).


                HOW REAL PROGRAMMERS ACKNOWLEDGE INTERRUPTS

  With RMW instructions:

        ; beginning of combined raster/timer interrupt routine
        LSR $D019       ; clear VIC interrupts, read raster interrupt flag to C
        BCS raster      ; jump if VIC caused an interrupt
        ...             ; timer interrupt routine

        Operational diagram of LSR $D019:

          #  data  address  R/W  
         --- ----  -------  ---  ---------------------------------
          1   4E     PCR     R   fetch opcode
          2   19    PCR+1    R   fetch address low
          3   D0    PCR+2    R   fetch address high
          4   xx    $D019    R   read memory
          5   xx    $D019    W   write the value back, rotate right
          6  xx/2   $D019    W   write the new value back

	The 5th cycle acknowledges the interrupt by writing the same
	value back. If only raster interrupts are used, the 6th cycle
	has no effect on the VIC.


  With indexed addressing:

        ; acknowledge interrupts to both CIAs
        LDX #$10
        LDA $DCFD,X

        Operational diagram of LDA $DCFD,X:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   BD     PCR     R   fetch opcode
          2   FD    PCR+1    R   fetch address low
          3   DC    PCR+2    R   fetch address high, add X to address low
          4   xx    $DC0D    R   read from address, fix high byte of address
          5   yy    $DD0D    R   read from right address


        ; acknowledge interrupts to CIA 2
        LDX #$10
        STA $DDFD,X

        Operational diagram of STA $DDFD,X:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   9D     PCR     R   fetch opcode
          2   FD    PCR+1    R   fetch address low
          3   DC    PCR+2    R   fetch address high, add X to address low
          4   xx    $DD0D    R   read from address, fix high byte of address
          5   ac    $DE0D    W   write to right address


  With branch instructions:

        ; acknowledge interrupts to CIA 2
                LDA #$00  ; clear N flag
                JMP $DD0A
        DD0A    BPL $DC9D ; branch
        DC9D    BRK       ; return

        You need the following preparations to initialize the CIA registers:

                LDA #$91  ; argument of BPL
                STA $DD0B
                LDA #$10  ; BPL
                STA $DD0A
                STA $DD08 ; load the ToD values from the latches
                LDA $DD0B ; jam the ToD display
                LDA #$7F
                STA $DC0D ; assure that $DC0D is $00

        Operational diagram of BPL $DC9D:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   10    $DD0A    R   fetch opcode
          2   91    $DD0B    R   fetch argument
          3   xx    $DD0C    R   fetch opcode, add argument to PCL
          4   yy    $DD9D    R   fetch opcode, fix PCH
        ( 5   00    $DC9D    R   fetch opcode )


        ; acknowledge interrupts to CIA 1
                LDA #$00  ; clear N flag
                JMP $DCFA
        DCFA    BPL $DD0D
        DD0D    BRK

        ; Again you need to set the ToD registers of CIA 1 and the
        ; Interrupt Control Register of CIA 2 first.

        Operational diagram of BPL $DD0D:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   10    $DCFA    R   fetch opcode
          2   11    $DCFB    R   fetch argument
          3   xx    $DCFC    R   fetch opcode, add argument to PCL
          4   yy    $DC0D    R   fetch opcode, fix PCH
        ( 5   00    $DD0D    R   fetch opcode )


        ; acknowledge interrupts to CIA 2 automagically
                ; preparations
                LDA #$7F
                STA $DD0D       ; disable CIA 2's all interrupt sources
                LDA $DD0E
                AND #$BE        ; ensure that $DD0C remains constant
                STA $DD0E       ; and stop the timer
                LDA #$FD
                STA $DD0C       ; parameter of BPL
                LDA #$10
                STA $DD0B       ; BPL
                LDA #$40
                STA $DD0A       ; RTI/parameter of LSR
                LDA #$46
                STA $DD09       ; LSR
                STA $DD08       ; load the ToD values from the latches
                LDA $DD0B       ; jam the ToD display
                LDA #$09
                STA $0318
                LDA #$DD
                STA $0319       ; change NMI vector to $DD09
                LDA #$FF        ; Try changing this instruction's operand
                STA $DD05       ; (see comment below).
                LDA #$FF
                STA $DD04       ; set interrupt frequency to 1/65536 cycles
                LDA $DD0E
                AND #$80
                ORA #$11
                LDX #$81
                STX $DD0D       ; enable timer interrupt
                STA $DD0E       ; start timer

                LDA #$00        ; To see that the interrupts really occur,
                STA $D011       ; use something like this and see how
        LOOP    DEC $D020       ; changing the byte loaded to $DD05 from
                BNE LOOP        ; #$FF to #$0F changes the image.

        When an NMI occurs, the processor jumps to Kernal code, which jumps to
        ($0318), which points to the following routine:

        DD09    LSR $40         ; clear N flag
                BPL $DD0A       ; Note: $DD0A contains RTI.

        Operational diagram of BPL $DD0A:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   10    $DD0B    R   fetch opcode
          2   11    $DD0C    R   fetch argument
          3   xx    $DD0D    R   fetch opcode, add argument to PCL
          4   40    $DD0A    R   fetch opcode, (fix PCH)


  With RTI:

        ; the fastest possible interrupt handler in the 6500 family
                ; preparations
		SEI
		LDA $01		; disable ROM and enable I/O
		AND #$FD
		ORA #$05
		STA $01
                LDA #$7F
                STA $DD0D       ; disable CIA 2's all interrupt sources
                LDA $DD0E
                AND #$BE        ; ensure that $DD0C remains constant
                STA $DD0E       ; and stop the timer
                LDA #$40
                STA $DD0C       ; store RTI to $DD0C
                LDA #$0C
                STA $FFFA
                LDA #$DD
                STA $FFFB       ; change NMI vector to $DD0C
                LDA #$FF        ; Try changing this instruction's operand
                STA $DD05       ; (see comment below).
                LDA #$FF
                STA $DD04       ; set interrupt frequency to 1/65536 cycles
                LDA $DD0E
                AND #$80
                ORA #$11
                LDX #$81
                STX $DD0D       ; enable timer interrupt
                STA $DD0E       ; start timer

                LDA #$00        ; To see that the interrupts really occur,
                STA $D011       ; use something like this and see how
        LOOP    DEC $D020       ; changing the byte loaded to $DD05 from
                BNE LOOP        ; #$FF to #$0F changes the image.

        When an NMI occurs, the processor jumps to Kernal code, which jumps to
        ($0318), which points to the following routine:

        DD0C    RTI

        How on earth can this clear the interrupts? Remember, the processor
        always fetches two successive bytes for each instruction.

        A little more practical version of this is redirecting the NMI (or IRQ)
        to your own routine, whose last instruction is JMP $DD0C or JMP $DC0C.
        If you want to confuse more, change the 0 in the address to a
        hexadecimal digit different from the one you used when writing the RTI.

        Or you can combine the latter two methods:

        DD09    LSR $xx         ; xx is any appropriate BCD value 00-59.
                BPL $DCFC
        DCFC    RTI

        This example acknowledges interrupts to both CIAs.


   If you want to confuse the examiners of your code, you can use any of
 these techniques. Although these examples use no undefined opcodes, they do
 not run correctly on CMOS processors. However, the RTI example should run on
 65C02 and 65C816, and the latter branch instruction example might work as
 well.

   The RMW instruction method has been used in some demos, others were
 developed by Marko M"akel"a. His favourite is the automagical RTI method,
 although it does not have any practical applications, except for some
 time dependent data decryption routines for very complicated copy protections.



                MAKING USE OF THE I/O REGISTERS

   If you are making a resident program and want to make as invisible to the
 system as possible, probably the best method is keeping most of your code
 under the I/O area (in the RAM at $D000-$DFFF). You need only a short routine
 in the normally visible RAM that pushes the current value of the processor's
 I/O register $01 on stack, switches I/O and ROMs out and jumps to this area.
 Returning from the $D000-$DFFF area is easy even without any routine in the
 normally visible RAM area. Just write a RTS to an I/O register and return
 through it.

   But what if your program needs to use I/O? And how can you write the RTS
 to an I/O register while the I/O area is switched off? You need a swap area
 for your program in normally visible memory. The first thing your routine at
 $D000-$DFFF does is copying the I/O routines (or the whole program) to
 normally visible memory, swapping the bytes. For instance, if your I/O
 routines are initially at $D200-$D3FF, exchange the bytes at $D200-$D3FF
 with the contents of $C000-$C1FF. Now you can call the I/O routines from
 your routine at $D000-$DFFF, and the I/O routines can switch the I/O area
 temporarily on to access the I/O circuitry. And right before exiting your
 program at $D000-$DFFF swaps the old contents of that I/O routine area in,
 e.g. exchanges the memory areas $D200-$D3FF and $C000-$C1FF again.

   What I/O registers can you use for the RTS? There are two alternatives:
 8-bit VIC sprite registers or CIA serial port register. The CIA register is
 usually better, as changing the VIC registers might change the screen layout.
 However, also the SP register has some drawbacks: If the machine's CNT1 and
 CNT2 lines are connected to a frequency source, you must stop either CIA's
 Timer A to use the SP register method. Normally the 1st CIA's Timer A is the
 main hardware interrupt source. And if you use the Kernal's RS232, you cannot
 stop the 2nd CIA's Timer A either. Also, if you don't want to lose any CIA
 interrupts, remember that the RTS at SP register causes also the Interrupt
 Control Register to be read.

   Also keep in mind that the user could press RESTORE while the Kernal ROM
 and I/O areas are disabled. You could write your own NMI handler (using
 the NMI vector at $FFFA), but a fast loader that uses very tight timing
 would still stop working if the user pressed RESTORE in wrong time. So, to
 make a robust program, you have to disable NMI interrupts. But how is this
 possible? They are Non-Maskable after all. The NMI interrupt is
 edge-sensitive, the processor jumps to NMI handler only when the -NMI line
 drops from +5V to ground. Just cause a NMI with CIA2's timer, but don't
 read the Interrupt Control register. If you need to read $DD0D in your
 program, you must add a NMI handler just in case the user presses RESTORE.
 And don't forget to raise the -NMI line upon exiting the program. This can
 be done automatically by the latter two of the three following examples.
 

        ; Returning via VIC sprite 7 X coordinate register

        Initialization:   ; This is executed when I/O is switched on
                LDA #$60
                STA $D015 ; Write RTS to VIC register $15.

        Exiting:          ; NOTE: This procedure must start at VIC register
                          ; $12. You have multiple alternatives, as the VIC
                          ; appears in memory at $D000+$40*n, where $0<=n<=$F.

                PLA       ; Pull the saved 6510 I/O register state from stack
                STA $01   ; Restore original memory bank configuration
                          ; Now the processor fetches the RTS command from the
                          ; VIC register $15.


        ; Returning via CIA 2's SP register (assuming that CNT2 is stable)

        Initialization:   ; This is executed when I/O is switched on
                LDA $DD0E ; CIA 2's Control Register A
                AND #$BF  ; Set Serial Port to input
                STA $DD0E ; (make the SP register to act as a memory place)
                LDA #$60
                STA $DD0C ; Write RTS to CIA 2 register $C.

        Exiting:          ; NOTE: This procedure must start at CIA 2 register
                          ; $9. As the CIA 2 appears in memory at $DD00+$10*n,
                          ; where 0<=n<=$F, you have sixteen alternatives.
                PLA
                STA $01   ; Restore original memory bank configuration
                          ; Now the processor fetches the RTS command from
                          ; the CIA 2 register $C.


        ; Returning via CIA 2's SP register, stopping the Timer A
        ; and forcing SP2 and CNT2 to output

        Initialization:   ; This is executed when I/O is switched on
                LDA $DD0E ; CIA 2's Control Register A
                AND #$FE  ; Stop Timer A
                ORA #$40  ; Set Serial Port to output
                STA $DD0E ; (make the SP register to act as a memory place)
                LDA #$60
                STA $DD0C ; Write RTS to CIA register $C.

        Exiting:          ; NOTE: This procedure must start at CIA 2 register
                          ; $9. As the CIA 2 appears in memory at $DD00+$10*n,
                          ; where, 0<=n<=$F, you have sixteen alternatives.
                PLA
                STA $01   ; Restore original memory bank configuration
                          ; Now the processor fetches the RTS command from
                          ; the CIA 2 register $C.

        
   For instance, if you want to make a highly compatible fast loader, make
 the ILOAD vector ($0330) point to the beginning of the stack area. Remember
 that the BASIC interpreter uses the first bytes of stack while converting
 numbers to text. A good address is $0120. Robust programs practically never
 use so much stack that it could corrupt this routine. Usually only crunched
 programs (demos and alike) use all stack in the decompression phase. They
 also make use of the $D000-$DFFF area.

   This stack routine will jump to your routine at $D000-$DFFF, as described
 above. For performance's sake, copy the whole byte transfer loop to the swap
 area, e.g. $C000-$C1FF, and call that subroutine after doing the preliminary
 work. But what about files that load over $C000-$C1FF? Wouldn't that destroy
 the transfer loop and jam the machine? Not necessarily. If you copy those
 bytes to your swap area at $D000-$DFFF, they will be loaded properly, as
 your program restores the original $C000-$C1FF area.

   If you want to make your program user-friendly, put a vector initialization
 routine to the stack area as well, so that the user can restore the fast
 loader by issuing a SYS command, rather than loading it each time he has
 pressed STOP & RESTORE or RESET.



                NOTES

 See MCS 6500 Microcomputer Family Programming Manual for more information.
 There is also a table showing functional description and timing for complete
 6510 instruction set on C=Hacking magazine issue 1/92 (available via FTP at
 ccosun.caltech.edu:/pub/rknop/hacking.mag/ and
 nic.funet.fi:/pub/cbm/c=hacking/).


 References:
  C64 Memory Maps       C64 Programmer's Reference Guide pp. 262-267
  6510 Block Diagram    C64 Programmer's Reference Guide  p. 404
  Instruction Set       C64 Programmer's Reference Guide pp. 416-417
                        C=Hacking Volume 1, issue #1, 1/92
                        C=Lehti magazine 4/87
=============================================================================