by Martin Penney
Now that the barrel shifter has been covered, we can now get on with the data processing instructions, the biggest single group of ARM instructions. Figure 1 lists these instructions, but perhaps in an overly complex fashion; figure 2 lists the same information more clearly. Each instruction is listed along with how it uses the operands, along with notes. Similar instructions are listed together as a group.
Three of the four "logical" instructions all work in the same way as the equivalent BASIC functions, doing bitwise (bit-by-bit) operations. The fourth instruction, "BIC", does a bitwise "AND", using the ones-complement of "Op2". One major use for "BIC" is as the complement of "ORR" - whereas "ORR" can be used to set individual flags, "BIC" can be used to clear them. Also, one use of "EOR" is to swap the contents of two registers - figure 3 is an annotated fragment of code that does just that. Lastly, if the "S" option is used with these instructions, the flags are set according to the contents of "Rd"; "N" mirrors bit 31, "Z" is set if "Rd" is zero, cleared otherwise, while "C" is set to the "C out" output from the barrel shifter (it is cleared if the barrel shifter is not used). None of these four instructions alter the "V" flag in any way.
Next, onto the arithmetic instructions, the second group in figure 2. "ADD" is the simplest, adding the two operands together, while "ADC" adds the two operands together, along with the current value of the "C" flag. If the "S" option is used, the flags are subsequently updated from the result, "Rd". The "N" and "Z" flags are updated in the same way as with the logical instructions, while "C" is updated to show whether or not there was a carry from bit 31 - "C out" from the barrel shifter is ignored. Lastly, "V" shows whether or not there was a carry into bit 31 of "Rd". Figures 4 and 5 are code fragments showing the difference the "S" option makes - the code in figure 4 does not propagate the results of one addition to the next, while the code in figure 5 does.
The remaining four arithmetic instructions are the opposites of "ADD" and "ADC"; "SUB" will (not surprisingly), subtract the second operand from the first, while "SBC" takes into account the current state of the carry flag; the "S" option, if specified, updates the flags from the result. Note that, for these two instructions, the "C" flag is a "not-borrow", not a "carry" - it set if no borrow has occurred, cleared on a borrow. Figure 6 gives an idea of how these two instructions can be used.
That leaves "RSB" and "RSC", but why have four instructions for subtraction, when there are only two for addition? The answer lies in the format of the instructions, as given in figure 2; the two operands, "Rn" and "Op2", are asymmetric, and this asymmetry allows for two types of subtractions; "SUB" and "SBC" do "Rn-Op2", while "RSB" and "RSC" do "Op2-Rn". This is useful, as it keeps the ARM instruction set flexible, and reduces the amount of "fiddling" that would be required to get the right information into the right registers to allow the equivalents of "RSB" and "RSC". Figure 7 is a code fragment that shows one use of "RSB" and "RSC" - changing the sign of a signed number by subtracting it from zero. Lastly, if the "S" option is used with "RSB" and "RSC", the flags are updated in the same way as with "SUB" and "SBC".
As the multiplication instructions - covered later - tend to be rather time-consuming on most ARM processors, combinations of arithmetic instructions tend to be used to produce the desired result more quickly, especially if one of the operands is a known, fixed number; this is extremely useful for - for example - screen-update code. This can be done, because of the way the second operand - "Op2" - can be used. A common use is for converting a coordinate to a pixel number, ready for direct access to the screen memory. Mode 13 is a 320-by-256 pixel screen, with a one-byte-per-pixel linear layout; figure 8 is a code snippet that converts a coordinate to an offset into the screen memory. Note, in particular, the first "ADD", which relies on five equalling one plus four, and uses the barrel shifter to that effect.
From addition and subtraction, we move neatly on to the pseudo-instruction, "ADR"; as an instruction, it loads the address of a label into the nominated destination register. To demonstrate "ADR", figure 9 is a listing of the first example program given back in the first part of the series, with figure 10 being its output. If the output corresponding to the "ADR" is disassembled manually - using a data-sheet or similar - the instruction that the assembler actually generated is revealed to be an "ADD", with "R15" and the constant "4" as operands. This is what the assembler does for each "ADR" it comes across - it calculates a program counter-relative offset, and generates the appropriate "ADD" or "SUB"; the assembler "knows" the program counter points to the address two words ahead of the "current" one, and takes this into account when calculating the offset. The only downside is that if the offset falls outside the allowable range for immediate constants, the assembler will generate an error, as with other instructions. The "S" option cannot be used with "ADR", as it makes no sense to do so.
The next instructions to consider are "MOV" and "MVN", probably two of the most commonly used ARM instructions of all. Figure 2 lists the two assignment instructions, and what they do; if the "S" option is used with either instruction, the flags are updated in a similar way to the "logical" group of instructions - "N" and "Z" according to the value assigned to "Rd", "C" is set to "C out" from the barrel shifter, if used, and "V" is unchanged. There are a couple of points to note about these two instructions, that need to be kept in mind. Firstly, "MVN" is "move-not"; compare that with "CMN". Secondly, neither instruction makes use of the "Rn" operand, and never will, Thirdly, the ARM processors don't have a "NOP" instruction as such, but "MOV R0,R0" is used as a general-purpose "do-nothing" instruction instead, as it doesn't alter any registers or flags.
Fourthly is the effect on "R15", the program counter - "MOV R15,Op2" acts as an analogue to "GOTO" in BASIC. Indeed, "MOV R15,R14" is used as the ARM's replacement for "RTS"-style instructions on other processors; see also the branch instructions in this context. Don't worry about the effects of the pipeline on these instructions - the address given is the literal, absolute new execution address; it is only when reading from "R15" that pipelining has to taken into account. There is a side-effect from trying to use the "S" option with "MOV R15,Op2"; in "26-bit" mode, "R15" is - as already stated - used as a combined program counter and processor status register. "MOVS R15,Op2" is then used to update both parts of "R15" at the same time, so a subroutine can be "invisible" to the calling code, returning with the flags apparently unaltered. As the flags are held separately in "32-bit" mode and not as part of "R15", "MOVS R15,Op2" does not have the same result under these conditions. Because of this asymmetry, this situation should be avoided if at all possible, so that the program code will work equally well in both "26-bit" and "32-bit" modes.
Fifthly, and lastly, is a technique for getting round the "limitation" on immediate constants; normally, one is restricted to assigning shifted 8-bit constants to registers, but this can be overcome by using a combination of instructions - figure 11 is the listing of a program that uses a function to allow full 32-bit constants to be assigned to registers. Type the program in, and run it; it will keep asking for test hexadecimal values - try "11223344" for starters. In normal use, though, you might want to change line 500 to read "[OPT (OPT% AND NOT 1)", to hide the output from the function.
The fourth set of instructions from figure 2 is the test group, with two points standing out concerning all four of the instructions. The first is that none of them make use of "Rd" - all the instructions do is the test, update the flags, and discard the result. Because of this, the second point is that the "S" option is always implied, and the assembler will act as if it were specified, whether or not it is; for this reason, I personally have got into the habit of explicitly adding the "S" to these instructions, mainly to remind me that they alter the flags. When it comes to updating the flags, "TEQ" and "TST" update the flags in the same manner as the "logical" group - "N" and "Z" from the discarded result, "C" from "C out", if the barrel shifter is used, with "V" unchanged - while "CMP" and "CMN" update the flags the same way as the "arithmetic" group - "N", "Z", "V" and "C" are all updated from the discarded result. Remember, though, that "CMN" is "compare negative"; compare this with "MVN".
Back in figure 1, the output from "HELP [" in BASIC, the "P" option is mentioned; in use, either the "S" option or the "P" option is used, but not both together - they are mutually exclusive. The "P" option is used on the "26-bit" ARM processors, and in the "26-bit" mode on the later processors, to easily alter the "N", "Z", "V" and "C" flags individually; however, the flags can be - and are - accessed through a separate register in the "32-bit" mode of the later processors, and the "P" option does not work as expected under these conditions. Hence, the "P" option is best avoided in new programs, for the same reasons as "MOVS R15,Op2" - ensuring that the software will work properly on a wide range of RISC OS-based computers.
The last two data processing instructions are used for multiplication - "MUL" and "MLA". Both of these instructions have their general formats listed in figure 1, with more details listed in figures 2 and 12, but a more specific description is as follows.
"MUL" is the basic multiplication instruction; "MLA" does the multiplication, followed by an addition. "MLA" has its uses over and above "MUL" in producing running totals, as in allowing separate "ADD" instructions to be eliminated. Some later processors - most notably the StrongARM - have additional multiply instructions that simplify some calculations, but they are not supported by BASIC's assembler, nor by the processor's "26-bit" mode.
Not completely surprisingly, there are a couple of things to keep in mind. Firstly, the multiplication makes no assumption about signs; both "Rm" and "Rs" are taken to be numbers in the range "0" to "2^32-1", so you need to be careful with interpreting the sign of "Rd". If you want to use either instruction for "clean" signed (32-bit times 32-bit giving 64-bit) multiplication, then ideally both "Rm" and "Rs" have to have their signs stripped from them before the multiplication takes place. Figure 13 is a short program which does just this; type in decimal numbers, and the program will multiply them.
Reading through - and using - this example program, the eagle-eyed among you will have noticed some oddities. The most obvious oddity is that the assembly language never uses the word at location "Result%+4". This stems from the multiplication taking a pair of 32-bit operands, multiplying them and returning, not the expected 64-bit result, but only the 32-bit least-significant half of it. This can be overcome by splitting the operands into halves, multiplying the individual halves together, and recombining them, as in figure 14.
The other oddity is in the order of the operands; due to the way the multiplication is done on older processors, "Rd" and "Rm" must be different, otherwise you will end up with an "undefined" answer - an answer that is meaningless. Additionally, "R15" should not be used for any of the operands; as it is the program counter, any use of it would produce similarly meaningless answers, not to mention possible crashes if specified as "Rd".
The last points to note about the "MUL" and "MLA" instructions are to do with the flags and speed. Due to the algorithm used - a shift-and-sum procedure - if the "S" option is specified, "N" and "Z" are taken from "Rd", while "V" and "C" end up with meaningless values. Speed-wise, both instructions are extremely time-consuming on the original "26-bit" processors, using up many clock cycles' worth of processing power. The upshot from this was that explicit multiply instructions were rarely used in the most speed-critical code, especially if one operand were fixed; there is more on this back with the arithmetic instructions.
That's all for the data processing instructions, and this part of the introduction; in the fourth part, I'll go over the remaining ARM instructions.
Return to ARM Code Tutorial index
This CD and its design is Copyright © 2000 Tau Press Limited. It may not be copied or distributed without the prior consent of Tau Press. Failure to abide by this may result in prosecution. (That doesn't mean the contents are our copyright, just the linking pages that we created and the CD itself.)