Acorn User

ARM CODE TUTORIAL

by Martin Penny

Part 1 - The BASICs

When I began to write this series of articles, I started by asking myself who would read them - or, better yet, who would want to read them. I decided that the readers would probably have owned a RISC OS computer for a while, and that they would also have used BBC BASIC to some extent. So, with that in mind, I'm going to assume a moderate knowledge of programming - in principle, if not in practise. As I go along, I'll highlight various points with figures containing descriptions, pseudo-code or example programs

As this is intended as just an introduction to assembly language, you will probably want to do some additional background reading. The BBC BASIC Guide I have (for my A420/1) doesn't contain much information about assembly language itself, but does have information about passing variables between BASIC and "local" machine code routines; it is more useful as a BASIC guide (no surprises there!). The RISC OS Programming Reference Manuals ("PRMs") are essential reading when it comes to the ins-and-outs of RISC OS, and also have appendices with formal descriptions of ARM assembly language and pitfalls you may encounter when writing it.

Other books perhaps worth a look include The ARM RISC Chip - A Programmer's Guide (Alex van Someren and Carol Atack, pub. by Addison-Wesley, ISBN 0-201-62410-9), Archimedes Assembly Language (Mike Ginns, pub. by Dabs Press, ISBN 1-870336-20-8), and Archimedes Operating System (Alex and Nic van Someren, pub. by Dabs Press, ISBN 1-870336-48-8); the former is a rather detailed look at the ARM6-based CPUs, while both the latter two books are more general guides, though quite old now; their operating system-related sections are badly out-of-date, but they may be of use.

So, why bother to use assembly language and machine code? The main reason is speed. BASIC is an interpreted language, and this makes execution fairly slow. At the other end of the scale, C and C++ are compiled languages, so gain most of the speed benefits of machine code. In both cases, however, assembly language can provide fine control for speed-critical routines - for example, screen update routines in games. Be careful over this though, as you will need to start off with "good" code - translating a slow algorithm into assembly language won't achieve a great deal.

As assembly language is not directly understood by the computer, it needs to be translated into machine code - and that's where an assembler comes in. Fortunately for RISC OS users, the built-in BASIC interpreter has an inline assembler - in other words, assembly language can be typed in as part of a BASIC program, and translated to machine code during the execution of the program. The easiest way to demonstrate both ARM assembly language and BASIC's inline assembler is with an example - figure 1.

-- Figure 1 --

A lot of the program will be familiar - it is fairly plain BASIC - the rest being assembly language. If the program is typed in - with lines 70, 90 and 100 being changed for computers that can't easily support mode 12 - and then run, you will end up with something very similar to figure 2 on the screen.

-- Figure 2 --

At this point, if you type "CALL Code%" at the prompt, the machine code created from the assembly language will be run. I'll leave it to you to find out what happens! This brings me onto the next point, that there are two ways of accessing machine code programs from BASIC - "CALL" and "USR". In both cases, some BASIC variables are copied into the ARM's registers by the BASIC interpreter, so details can easily be passed to a machine code program. "CALL" also allows a parameter list to be passed to the program; however, as it is a command, it does not return anything directly to the BASIC program that called it. On the other hand, "USR", as a function, returns a value to the calling program, but cannot take any parameters. More details will be included in the following parts of this series, and both keywords are covered in more detail in the BBC BASIC Guide.

But how does the program work? How does BASIC know which part of the program is assembly language, and how is it translated into machine code? Typing "HELP [" at the prompt is a good start; it will come up with a response similar to figure 3.

-- Figure 3 --

Reading through this, the first thing that can be picked up is that the assembly language in the program is from lines 170 to 330, between the square brackets. The "OPT" statement on line 170, and the associated line 140, control the output of the assembler; the numerical expression that "OPT" takes as an argument is best viewed as a binary number, and figure 4 lists the relevant details. Lines 180, 250, 300 and 320 are spacers, in the program just to make it a little more readable, while lines 190, 230, 260 and 310 each create a label. A label is analogous to a procedure or function name - it is set by the assembler to the address of the corresponding machine code. BASIC allows either integer or floating-point variables to be used as labels; but as fractional addresses don't exist, I prefer using integer variables for this purpose.

-- Figure 4 --

The assembly language itself is straightforward; line 200 sets a pointer to the following text, line 210 calls the RISC OS routine to print it, while line 220 calls the RISC OS "newline" routine. Line 240 returns control back to BASIC. The text to be printed is on line 270, is followed by a zero byte end-of-text marker on line 280; line 290 rounds the address up to the next multiple of four.

The assembler needs some reserved memory where it can put the machine code it generates from this assembly language; this can be done through a form of the "DIM" keyword. "DIM variable size" reserves size+1 bytes of memory. The start address of the memory is assigned to the variable used in the statement, as with "D%" in line 110. To prevent the assembler from using more memory than has been reserved - range checking - the variable "L%" can be set to the first byte of memory beyond that reserved, as in line 120; to enable range checking, set bit three of the "OPT" argument.

A technique called two-pass assembly is required by all but the simplest program. It allows labels, and references to them, to be created correctly. No labels are defined until the first pass through the assembly language, so on a forward reference - use of label not yet encountered - the assembler would normally generate an "Unknown or missing variable" error. However, if error reporting is disabled during the first pass, the assembler does not generate errors, and the second pass produces the correct machine code. A consequence of this is that you would not normally want the assembler to display anything during the first pass, but only (optionally) during the second pass. Control of both error reporting and listing is via the "OPT" argument - bits one and zero, respectively. Lines 130 to 350 are the loop used for two-pass assembly.

Line 160 sets up the variable "P%", used by the assembler as a pointer to the start of the machine code in memory - the origin, as it is otherwise called. "P%" has to be defined within the two-pass assembly loop to allow the machine code to end up in the right place place in memory.

So, what do the various parts of the output listing (figure 2) mean? The first column is the hexadecimal address of each machine code instruction in memory - in this example, from "00009100" onwards - with the second being the data being placed at that location. The third and fourth columns are the assembly language generating the data. The main use of the output listing is to ensure that the assembly language and machine code are as expected, but - as already mentioned - can turned off completely if not required.

A last feature - for now - of BASIC's assembler is offset assembly, turned on by bit two of the "OPT" argument. This generates machine code in one place in memory, only for it to be run from somewhere else. Offset assembly may not immediately appear to be useful, but it can be used to create stand-alone machine code programs; figure 5 lists an updated program to illustrate the point. If typed in and run, it produces a listing like that is figure 6, and saves a small program to the RAM drive. The first difference is the routine to return control to you, as it is not called from BASIC, but from the command line or the desktop - double-click on it to try it! The second difference is in the use of the variables "O%" and "P%"; "O%" is equivalent to "P%" from the first example program - it points to where to where the machine code is to be stored in memory - while "P%" now becomes a pointer to where the machine code is to be run - here, "00008000", the start of application space.

-- Figure 5 --

-- Figure 6 --

With regards the BASIC assembler, one final topic that I haven't yet covered is the use of comments. On a line of BASIC code, comments follow the keyword "REM", with the BASIC interpreter ignoring everything from the "REM" to the end of that line, even if an end-of-statement colon (":") is used on that line. Comments can also be added to lines of assembly language, with the backslash ("\") used to indicate the start of the comment. Beware, though, as the BASIC assembler will take notice of colons within comments, and will try to interpret anything after the backslash as a new statement.

That effectively wraps up this part of the series; in the next part, I will pick up from here, and start covering the ARM instruction set.

Return to ARM Code Tutorial index

Return to Tutorials index

Return to Main index

This CD and its design is Copyright © 2000 Tau Press Limited. It may not be copied or distributed without the prior consent of Tau Press. Failure to abide by this may result in prosecution. (That doesn't mean the contents are our copyright, just the linking pages that we created and the CD itself.)