Acorn User

ARM CODE TUTORIAL

by Martin Penny

Part 4 - The ARM Instruction Set III

In the last two parts of this series, I covered most of the ARM instructions, as given in figure 1, the output from "HELP [" in BASIC. That leaves only a few more to do; some others I'll mention in passing are not properly supported by BASIC's assembler, making them difficult to use. To overcome this problem, functions can be added to BASIC, or alternate assemblers could be used.

-- Figure 1 --

The first of these instructions I'll cover are "LDR" and "STR". Both have been used in the example programs in previous parts without a great deal of an explanation, but here goes. These two instructions are the most commonly used for moving data between registers and memory, and options can be used with either of them to specify how this is done.

The first of these options is the "B" suffix, which controls whether a word or a byte is transferred. The default is for a 32-bit word to be transferred, but if "B" is used, just a single 8-bit byte is read from or written to memory. There are certain points concerning addresses that need to be remembered. The first is that address for word access must be word-aligned, while those for byte accesses are byte-aligned - they do not have to be word-aligned. Using an "unaligned" address for a word transfer will either be trapped, or will not give the "expected" result, depending on which type of ARM processor the code is used on.

A second point is that an "LDRB" instruction will load the specified byte into the least-significant byte of the destination register, clearing the remaining three bytes of that register, while the third point is that, as stated in a previous part of this series, RISC OS computers are set up for little-endian operation, so the least-significant byte of a word ends up in the first of the four byte addresses.

The StrongARM processors used in the RiscPCs also support an "H" for 16-bit half-word transfers. However, it is not supported by either the BASIC assembler or the RiscPC design; this note is of more relevance for users of other assemblers.

When it comes to specifying the address, a number of different modes may be used, as listed in figure 2; this is a slightly expanded version of the descriptions in figure 1. All of the modes may be used with either "LDR" or "STR".

-- Figure 2 --

The first of the modes is the simplest, with the contents of "Rn" being the address used by the "LDR" or "STR"; "Rn" is not altered in any way.

With both the pre-indexed and post-indexed modes, an offset is added to, or subtracted from, "Rn"; also, writeback may occur, with the contents of "Rn" being updated by the value of the offset. The sequence of events - the order in which the memory access, the factoring-in of the offset, and the writeback, depends on whether pre- or post-indexed addressing is being used. The offset itself is much the same as the "Op2" operand with the data-processing instructions; however, the shift option cannot take a register. Also, the expression option has a different range - in this case, from -4095 to +4095.

The three pre-indexed modes add the offset to - or subtract it from - "Rn" before the transfer takes place; writeback is optional, and indicated by "!" after the closing square bracket. This makes these modes most useful for accessing tables or lists of data. With the three post-indexed modes, the offset is still added to - or subtracted from - "Rn", but only after the memory read or write has taken place, On top of this, writeback is "always on" - "Rn" is always updated.

As a side-note, this is where the "T" option comes into play. Due to the way addresses are translated from "logical" to "physical" memory locations, an application does not necessarily have a true view of the layout of memory - only what RISC OS allows it to see. To make the situation more complicated, not all the possible levels of translation are in effect when the ARM is in one of its privileged operating modes. The "T" option forces the ARM to convert the address in the same way as for an application. The "T" option is only valid when used with post-indexed addressing, and then only in these privileged modes; most people will come across the "T" option is if they are writing - for example - modules or filing systems.

To round things off for "LDR" and "STR", here are a couple of example programs to demonstrate them in use. Figure 3 copies 256Kbytes of memory byte-by-byte, while figure 4 copies word-by-word instead. On the machine I'm using at the moment - based on an 8MHz ARM2 - figure 3 takes 64 centiseconds to finish; figure 4 takes 16. Figure 5 does the same type of copy as figure 4, but as post-indexed addressing instead; it too takes 4 centiseconds to finish. Note that examples 3 and 4 leave the pointers in "R8" and "R9" unchanged, while example 5 leaves the registers pointing to just beyond the ends of the data blocks. If you are going to try these examples out on a faster computer, use larger amounts of memory in order to get accurate timings - for example, on a Kinetic RiscPC, try changing "S%" to claim either 16Mbytes or 32Mbytes of RAM per block, depending on the total amount of RAM fitted.

-- Figure 3 --

-- Figure 4 --

-- Figure 5 --

There are some occasions when combinations of "LDR" and "STR" do not always do what you expect them to do. Take, for example, a two-processor system. There will be times when only one of the processors should be allowed to access particular areas of memory or sections of code. To overcome this problem, "semaphore" flags can be used to control such accesses; figure 6 gives a section of pseudo-code that could be used for such a task.

-- Figure 6 --

There is a slight snag with the code in figure 6, though. The "LDR" and "STR" are separate instructions - as indicated by the "marker" comment - and they are executed with no reference to each other. That means that there is nothing stopping both processors effectively interleaving the memory reads and writes - the first processor reads the flag, and, before it can store an updated version, the second processor also reads it. This has the effect of both processors having a copy of the old value of the flag, and therefore both running the "one processor at a time" code simultaneously.

One way to get get round this is to have an instruction that does a read-write combination without releasing access to memory between the read and the write phases. On ARM processors, that's the "SWP" instruction, and figure 7 is a version of figure 6, updated to make use of it.

-- Figure 7 --

With the "SWP" instruction, the default is a word swap; using the "B" option - as per the "LDR" and "STR" instructions - allows a byte swap instead. Also, the address can only ever be given as the form "[Rn]" - no offset addressing is possible. The two "data" operands can be the same, or may be different, as in figure 7; however, "R15" cannot be used in any operand position. Finally, "SWP" is not available on either the ARM2 or ARM250 processors, nor is it supported by the BASIC assembler.

Okay - now for "LDM" and "STM". These two instructions are most commonly used for stacks; before I get to that, though, an explanation of how they work is in order. "LDM" and "STM" are used to transfer data between one or more registers and memory, with figure 1 giving the basic description of these instructions' format. All transfers are based on a whole number of words, and all addresses involved are word-aligned.

"{reg_list}" is the list of registers to be read from or written to memory. Individual registers are specified as usual, with several registers being separated by commas; for example, both "{R14}" and "{R0,R1,R2,R3}" are valid. If a series of registers is being transferred, just the first and last need be explicitly listed, separated by a hyphen; this means that "{R0,R1,R2,R3}" and "{R0-R3}" are equivalent, and specify the same registers. The two formats may be mixed, so "{R0-R12,R14}" covers all registers bar "R13" and "R15"; if the registers are quoted "out-of-order", the assembler will internally rearrange the list as required.

The optional "^" has a number of meanings. If used with an "LDM" where "R15" is in the register list, the flags are updated, the "how" depending on the type of processor in use. On "26-bit" processors, they are restored from the non-program counter section of the value written to "R15", but on "26/32-bit" processors, the flags are restored from the separate "stored" processor status register, mentioned more fully later. In other cases where "^" is used - an "STM" with "R15" listed, or and "LDM" or "STM" with no "R15" listed - the instruction is almost always being used from an ARM privileged mode, and it is the "user mode" registers that are transferred to or from memory. These forms of the transfer are used by RISC OS to save or restore the user registers between task switches.

The "<reg>" refers to the "base register", the register containing the initial address used for the transfer; only "R15" cannot be used in this position. For stack operations, the base register is usually "R13"; other registers may be used instead of "R13", though this is not recommended. For non-stack operations, other registers are, by convention, used. If the optional "!" is used, the modified address is written back into the base register after the transfer. There are some quirks involved with using the same register as both the base register and also including it in the register list. The expected value does not always end up being written to memory - if an "STM" is used - or to the base register, so it is better to avoid this case.

That leaves just the last two options to be explained; in essence, the first controls whether the base address is incremented ("I") or decremented ("D"), and the second whether this occurs before ("B") or after ("A") each individual word is transferred. Due to the way the ARM processors order the registers during a transfer, the lowest-numbered register always matches up with the lowest address involved, and the highest-numbered register register is similarly matched up to the highest address; the registers are always transferred in low-to-high numerical order. Using these options directly is most useful for performing block data transfers; base register writeback need not be used, depending on circumstances, and (almost) any register is used as the base register.

However, "LDM" and "STM" are, as already mentioned, used to replace the "PUSH" and "PULL" stack instructions found on other processors. When used for this purpose, base register writeback is, by necessity, always used. Four types of stack may be created, as follows. In a "full" ("F") stack, the stack pointer holds the address of the last used entry on the stack, while an "empty" ("E") stack, it holds the address of the first free memory slot. A "descending" ("D") stack grows from high to low addresses, and an "ascending" ("A") stack grows from low to high addresses. Stacks under RISC OS are almost always of the "full, descending" ("FD") type, with "R13" being used as the stack pointer.

The "stack" options are, in use, translated by the assembler into the "block-transfer" options, as remembering the "LDM" and "STM" "matching pairs" can be difficult; for example, "LDMFD" is translated into "LDMIA", while "STMFD" corresponds to "STMDB". The potential problems arising from matching the wrong pairs of options is why there are separate "stack" and "block-transfer" options. Figure 8 lists these "matching pairs".

-- Figure 8 --

Now for an example program, as given in figure 9. As BASIC places values into "R8" to "R12" - values that ought to be preserved - these registers are pushed onto the stack at the beginning of the code, and restored from it at the end. For safety's sake, "R14" is also stacked; at the other end of the program, the original value of "R14" is unstacked directly into "R15". This may seem a little strange, but it eliminates the need for a separate "MOV R15,R14" instruction, and produces shorter, faster code.

-- Figure 9 --

The central loop of this program in figure 9 used eight of the ARM's registers for holding the data during the copy; hence, thirty-two bytes of data are copied per pass. On my machine, the program copied the block of data in 4 centiseconds, by far the best of the block-copy routines. Users of later processors may well notice less of a performance hike through the example programs, due to changes in memory handling, but the same principles apply.

Now, onto the two branch instructions, "B" and "BL"; as they work in much the same way, they can be grouped together. "B" is the equivalent of "GOTO", while "BL" is the analogue of "GOSUB". The new execution address is calculated from the branch's 24-bit operand; this is not an absolute address, but a signed offset that has to be added to "R15". As it stands, this would suggest a range of 8Mbytes forwards or backwards from the branch. However, as all ARM instructions are word-aligned, the offset counts words, not bytes; this gives an increased range of 32Mbytes forwards or backwards. Remember, though, that "R15" doesn't point at the current instruction, but two words ahead of it; the assembler takes this into account when calculating the offset.

These two instructions differ in the way they handle "R14". "B" leaves "R14" alone, while "BL" copies the address of the instruction following the "BL" into "R14"; this can then be used as a return address - just use "MOV R15,R14" to do the equivalent of an "RTS". You must preserve the contents of "R14" on the stack if you wish to nest subroutines, otherwise return addresses will be lost. And, although both "B" and "BL" may be conditional, the differences between "26-bit" and "32-bit" modes makes preserving flags across a "generic" subroutine difficult.

Finally, the last "regular" ARM instruction is "SWI" - "software interrupt". It is used to allow easy access to operating system routines through a standard, consistent interface. I won't go into too much detail about individual routines here and now, especially as they are well-documented elsewhere. All I shall say is that the instruction has a 24-bit operand that the ARM processors themselves ignore, but which is used to indicate which routine to call; figure 10 gives a breakdown of how it is divided up. RISC OS uses the registers to pass and return parameters, and the flags to return status information; this last point means using conditional "SWI" instructions can prove problematic. Some useful aspects of using software interrupts will be covered in a later part.

-- Figure 10 --

The only other instructions left to mention relate to coprocessors, "extensions" to the basic ARM architecture. The floating-point maths hardware appears as a coprocessor, as do the later processors' cache controller and MMU. As I've already said, I'm not going to cover the floating-point instructions here, but I will cover aspects of the "system" coprocessor.

On the "32-bit" ARM processors - ARM6-based and later CPUs - the status register is accessed through this "system" coprocessor via the "MRS" and "MSR" instructions. There are two types of processor status register, for "current" and "stored" copies of the flags, and are the "CPSR" and "SPSR" respectively. There is a copy of the "CPSR" for each of the ARM's privilege modes, including the "user" mode; similarly, there is a separate "SPSR" for each mode, bar "user" mode.

Figure 11 covers the "MRS" and "MSR" instruction formats. In each case, "[<cond>]" is the usual optional ARM condition code and "Rd" a regular integer register. As the processor status registers contain not just the arithmetic flags, but also the interrupt mask flags and privilege mode bits, there are two ways of accessing the status registers. "psr" means "CPSR", "CPSR_all", "SPSR" or "SPSR_all", and will access the entire registers. "psrf" means either "CPSR_flg" or "SPSR_flg", and will access just the arithmetic flags. Finally, the "SPSR" registers may only be accessed in non-"user" modes.

-- Figure 11 --

That - finally - wraps up the description of the instruction set itself. In the next part of this series, we get down to a bit of programming - more specifically, covering the handling of BASIC "CALL" and "USR" parameters, and how to convert BASIC control statements into assembly language.

Return to ARM Code Tutorial index

Return to Tutorials index

Return to Main index

This CD and its design is Copyright © 2000 Tau Press Limited. It may not be copied or distributed without the prior consent of Tau Press. Failure to abide by this may result in prosecution. (That doesn't mean the contents are our copyright, just the linking pages that we created and the CD itself.)