Acorn User

ARM CODE TUTORIAL

by Martin Penny

Part 2 - The ARM Instruction Set I

After the description of BASIC's inline assembler in the first part of this introduction, we can now move onto a potentially more interesting subject - the ARM instruction set. I shall be going over most of the ARM instructions, including some not directly supported by the BASIC assembler; however, I feel that the floating-point maths instructions are perhaps a bit beyond the scope of an "introduction" to ARM assembly language. If you feel otherwise, let me know!

To start off with, we first need to know a little more about the ARM processors themselves - in particular, and overview of the internal design, and how they see the "outside world".

The ARM processors are 32-bit CPUs - all internal and external operations are based on 32-bit data. Transferring data to or from memory is done on a four byte, 32-bit word basis, with each and every transfer being done on a word-aligned basis - the address of the first byte is divisible by four. However, there are two ways of storing the bytes in memory - little-endian and big-endian. The ARM processors are little-endian, storing the least-significant byte in the first (lowest) address, the most-significant in the last (highest) address; big-endian processors store the bytes in the reverse order.

There are sixteen main 32-bit integer registers, labelled "R0" to "R15". The registers "R0" to "R12" have no preset uses, though many are used to pass parameters to or from BASIC or RISC OS routines - for example, BASIC copies the integer variables "A%" to "H%" into "R0" to "R7" on use of either "CALL" or "USR", and "USR" returns the contents of "R0" to the BASIC program on return from the machine code routine. Of the remaining three registers, "R13" is the stack pointer, "R14" the link register and "R15" the program counter. The stack is used to hold data temporarily, the link register holds return address from subroutines, and the program counter is used to mark the current position in the program. Strictly speaking, "R15" doesn't hold the address of the current instruction, but the address of the instruction two places ahead of the current instruction - in other words, instead of holding "address", it holds "address+8"; this is down to the ARM's pipelining - the overlapping of the decoding and execution of instructions. Further private, "hidden" registers are used by the ARM processor during privileged operating modes - for example, handling interrupts - but, as they are not normally accessible to the programmer, they will not be covered in great detail here. The exact uses of the three dedicated registers will be covered in more detail during the instruction set description.

There is also a processor status register. which holds a number of flags, which are used to indicate the result of the most recent compare or test instruction, or to pass results back from RISC OS system calls. The main four accessible to the user are the negative ("N"), zero ("Z"), overflow ("V") and carry ("C") flags. The "N" flag is effectively a mirror of the most-significant bit of the result of the last test, and is used primarily in signed-value tests; the "Z" flag has a similarly obvious use - if the result from the last test is zero, the "Z" flag is set, otherwise it is cleared. The "V" flag is related to the "N" flag, in that if there is a carry into, or borrow from, the most-significant bit of the result, the "V" flag is set, and cleared otherwise; it is most often used in signed-value tests, to indicate an incorrectly signed result, though is also used by RISC OS system calls to indicate an error of some kind. Lastly, the "C" flag is used either as a carry out of, or borrow into, the most-significant bit of an addition or subtraction. The ARM processors have a number of other flags, some used to indicate which of the processor's interrupt modes are enabled, and others to indicate the current privilege mode; these are usually of relevance only when writing modules - for example, device drivers, or adding extra system calls - so shall be mentioned only in passing as appropriate.

A further note that is of importance is that the earlier ARM processors - the ARM2, ARM250 and ARM3 - have a 26-bit address bus, giving a 64Mbyte address range. As all memory operations are done on a 32-bit word basis, that means that only 24 of these 26 address address lines are significant. This allowed the designers to combine the program counter and processor status register into one; the two aspects can still be accessed separately, nor do these ARM processors get "mixed up" about the situation. This particular operating mode is usually referred to as "26-bit" mode, from the address bus size.

On later designs - from the ARM6-based CPUs onwards - the processors have two operating modes, a "26-bit" mode, and a "32-bit" mode. In the "26-bit" mode, the program counter and processor status register are combined, as far as the program is concerned, in the same way as on earlier processors; they also have a "32-bit" mode that has a full 32-bit program counter (and hence 4Gbyte address range), with the flags accessible through a separate register.

This can, and does, lead to some awkward programming problems, especially if one wants the code being written to work equally well on all RISC OS computers, but most of the time, as long as one is careful, the differences between the two modes can be largely avoided. Alternatively, the program code can be written just for specific computers, usually the "32-bit"-capable-based systems. The differences between the "26-bit" and "32-bit" modes will be mentioned as and when appropriate.

The easiest way for me to describe the ARM instruction set is to start from the response to typing "HELP [" at the BASIC prompt, as given in figure 1.

-- Figure 1 --

I'll begin by covering the directives, except for "ADR", which I'll include with the data processing instructions, as directives are used as control statements rather than as "proper" instructions. The first directive is "OPT", found just after each "[", and taking a numerical expression as an argument; it was covered more fully in part one of this series, and figure 2 is a refresher of its uses.

-- Figure 2 --

The remaining directives, listed in figure 3, involve placing data into memory; "=" and "EQUS" store the characters in the order they are given, while "EQUW", "DCW", "EQUD" and "DCD" are all little-endian. (If you think that these last four directives are mixed up, they are a hang-over from the 8-bit Acorn era, when a word was a two byte, 16-bit value.) Finally, "ALIGN" "rounds up" the address to the next four-byte boundary.

-- Figure 3 --

Now for the "program" instructions. Just about all of them have a number of options, some of which are common to most, if not all, instructions, so I will go into them before going into the instructions in detail. The first option to take notice of is the "[<cond>]" alongside all such instruction mnemonics; this means that each such instruction can be made to execute conditionally - not just branches, but all bar the directives. In use, the condition codes are two-letter suffixes added to the basic instruction mnemonic; if none is explicitly stated, the assembler will use "AL" ("always") by default. There are a further sixteen main condition codes (two being synonyms of two others); figure 4 lists all the legitimate condition codes, along with meanings and notes.

-- Figure 4 --

When the condition code is "true", the associated instruction is executed, with the corollary being that if the condition code is "false", the associated instruction is not executed - it is effectively a "NOP". The condition codes marked "signed" are used after tests of signed data, while those marked "unsigned" are user (not surprisingly) after tests of unsigned data.

One condition code not yet mentioned is no longer used. This last condition code, "NV" ("never"), was introduced in the original ARM processor design, but as it prevents the execution of the instructions it is used with, it is not of much use. Because of this, it has been removed from general use; this allows for redevelopment of these instructions in future designs.

One major advantage of the ARM's implementation of condition codes is that explicit branches can often be eliminated; as branches have relatively large performance penalties, due to pipelining and cacheing, this is particularly useful. Keep in mind, though, that it extremely easy to "overdose" on conditional execution; not all combinations are sensible, or are of any practical use, and overly long sequences should be avoided in order to simplify error handling and debugging - a point that will be covered in more detail in a later part of this series.

The next point to be covered is the "[S]" option. Something that is essential to keep in mind is that no instructions, by default, update the ARM's status register; adding "S" to the end of an instruction (that supports it) then forces that individual instruction to write updated values into to the user flags. This may seem a rather curious point to anyone familiar with other processor designs - for example, with the 6502, or with the 68000 series - but it dovetails neatly with the ARM's conditional-execution, allowing some "quite nice" sequences of code; this shall become clearer in both this, and later, parts of this introduction. It is extremely easy to miss out the "S" accidentally - I've done it myself plenty of times - but this will be covered along with more general debugging in a future article.

The third common option of ARM instructions is the barrel shifter, a circuit within the ARM that has the job of shifting a 32-bit value left or right a given number of bit places; the part of this value shifted off the "top" or "bottom" may be either re-inserted at the other end of the value, or just simply masked out.

In figure 1, many instructions' operands are given as "<reg>", in which case, the operand can only be a "simple" register - "R0" to "R15" - or a constant or variable that evaluate to a register number. However, some operands are given as "<shift>"; in these cases, the operand - also referred to as "Op2" in "official" literature - has a range of formats, as listed in figure 5. In each of the formats, "<reg>" again means, as before, a "simple" register; the register used is not fixed, and need not be the same as other operands; indeed, in expansion #3, the two registers need not be identical.

-- Figure 5 --

The "<s-op>" given is the shift type, again as listed in figure 5. Each shift has a different effect, as follows; in each case, "C in" is the existing state of the carry flag, while "C out" would be the new state of the carry flag, if the "S" option were specified.

The shifts are split neatly into two groups, depending on which direction they move the contents of the operand, with the one I will cover first being the left-to-right group. The simplest shift - expansion #2 in figure 5 - is "RRX", which shifts the contents of the operand right by one bit-place, the value of bit 31 being set to "C in", and the old value of bit 0 becoming "C out". Shift type "a", "ROR", is the next step up - indeed, "RRX" uses the encoding for "ROR #0", which cannot be used - and shifts the operand round by the given number of places, each bit "dropping off" through bit 0 being written to both bit 31 and "C out" at the same time; a side-effect is that "ROR #32" leaves the operand effectively unaltered, with "C out" set to the same as bit 31. Shift type "b", "LSR" shifts the operand right by the specified number of places, the last value "dropping out" of bit 0 being written to "C out"; however, this value is not what is written to bit 31, as is the case with "ROR" - the value "0" is instead, with the side-effect of an "LSR" shift of 32 or more places being that both the resulting operand and "C out" end up containing the value "0". The last right-shift, "ASR" (shift type "c") works in a similar fashion to "LSR", the only difference being that bit 31 has its former value, rather than a "0", written to it; the upshot of an "ASR" shift of 32 or more places is that all bits of both the resulting operand and "C out" end up the same, either "0" or "1".

The second group of shifts all work on a right-to-left basis, and contains equivalents to most, but not all, left-to-right shifts. For instance, there is no equivalent to "RRX", nor to "ROR"; attempting to use "ROL" in BASIC's assembler generates a "Syntax error" message. Because of this, instructions have to be written to use "ROR" rather than "ROL"; for example, "ROL #8" becomes "ROR #24". That leaves "LSL" and "ASL", shifts "d" and "e" in figure 5; unlike "LSR" and "ASR", both "LSL" and "ASL" do the self-same thing. They work by shifting the operand left by the specified number of bit-places, with the contents of bit 31 being written to "C out" and "0" to bit 0 each bit-shift; a shift of "LSL #32" zeroes the operand, putting the original value of bit 0 in "C out", with any greater a shift zeroing that as well.

Figure 5 lists a couple of expansions where the operand "Rm" is, in fact, a numerical expression, "<expr>". In the case of expansion #4, the constant is reduced to the range 0 to 31, regardless of the situation, while expansion #5 has a much greater flexibility. For the "LDR" and "STR" instructions - covered more fully in a later part - the constant lies between -4095 and +4095 inclusive, held as a 12-bit magnitude and separate 1-bit sign, while the data processing instructions' constant is calculated from the rule given in figure 6; "<x>" is an 8-bit value in the range 0 to 255, while "<y>" is a 4-bit value in the range 0 to 15.

-- Figure 6 --

Any attempt to use a constant that cannot be expressed this way is faulted by BASIC with a "Bad immediate constant" error. The barrel shifter is used to generate a full 32-bit result from these two values, and, as with "ROR", any part of "x" shifted off the right-hand end of the 32-bit result are wrapped round to the left-hand end. This method may seem limiting, but does allow a large range of constants to be generated by individual instructions. And, on top of this, "proper" 32-bit constants may be loaded into registers, but this will be covered along with the data processing instructions.

That wraps it up for now, but I shall continue next time with a look at the data processing instructions.

Return to ARM Code Tutorial index

Return to Tutorials index

Return to Main index

This CD and its design is Copyright © 2000 Tau Press Limited. It may not be copied or distributed without the prior consent of Tau Press. Failure to abide by this may result in prosecution. (That doesn't mean the contents are our copyright, just the linking pages that we created and the CD itself.)