OS_ReadArgs
A very useful OS call explained by Hariet Bazley Reading options from the command lineA process rarely attempted by beginnersIt is perhaps fair to say that in the RISC OS world, novice programmers rarely use the command line to pass data and/or options to their programs. Indeed, generalising from my own somewhat limited observations, I should say that the learning curve begins with single-tasking programs that request keyboard input:
and then proceeds, as the programmer begins to write 'useful' utility programs, to the use of global variables set up at the start of the code in order to supply data and control how it works:
then followed by the first tentative steps into multitasking WIMP applications, with option icons and data files dragged to the window. Only at the stage when the programmer is defining his own filetypes and double-clicking on them to load them into his program does he need to start worrying about reading data passed on the command line; and even then it is more often than not a case of simply adding the magical characters %*0 into the !Run file of his application after the Potential offered by the command lineIn the GNU/Linux world, however, where just about everything can be done from the command line, with a graphical desktop very much an optional extra, this is not the case, and programs such as unzip or diff are designed to be invoked on the command line by name, followed by a long string of options:
For many of us, this sort of thing may well have been our first encounter with the idea that data could be passed to a program other than simply by manipulating its icons! However, while this sort of thing makes life difficult for the user (with the result that RISC OS versions of such programs rapidly acquire WIMP front ends) from a programmer's point of view it offers the considerable advantage that such programs can be run without any further input. In other words, the program doesn't need to wait while the user sets up its options correctly and drags a file to its icon. It can be run from within another program, or as a command in an Obey or Desktop file, or as a task alarm. RISC OS-specific facilitiesIn view of all this, it is surprising that the sophisticated SWI interface offered by RISC OS for parsing and interpreting command line options remains relatively little-known. Many people are aware of the use of OS_GetEnv (or the argv[] array in C) to read the contents of the command line used to invoke a program, but rather fewer have even heard of the SWI OS_ReadArgs, which will do almost all the hard work of interpreting the command line for you. OS_ReadArgs will identify any number of keywords (options preceded by a dash) in a string passed to it, in any order, whether they are abbreviated to initial letters or supplied in full (-d, -debug), and copy any parameters following a given keyword (e.g. -infile "RAM:test") into a buffer supplied by the program. Pointers to the parameters will always be in a specified order within the buffer, no matter in which order they appear in the input string. Optionally, OS_ReadArgs will also evaluate parameters (e.g. "-bytes 12*1024" will supply the option bytes in the form of the numeric value 12288) or GSTrans them (e.g. "-init <Sys$Time>" would be translated to supply the current time as a parameter to the option init). How it works — basic theoryOS_ReadArgs requires four values to be passed in, using registers R0-R4: the syntax string, the input string (normally the command line used to run the program in question), the address of an output buffer, and the size of this buffer. On exit, R4 is updated to return the number of bytes now remaining in the output buffer. If the output buffer is too small, or the contents of the input string don't seem to match up with the syntax string specified, an error will be returned instead. Syntax stringThe most important (and difficult) part of using OS_ReadArgs lies in creating a suitable syntax string. This is where you specify what is and is not acceptable input; it also controls the order in which values will appear in the output buffer. A 'syntax string' consists of an arbitrary number of elements, each of which will correspond to one word in the output buffer, and any or all of which may be found in the expected input string. Elements are separated by commas, and normally consist of a keyword followed by one or more qualifiers, each preceded by a slash. Qualifiers are single-letter codes which define the way in which this particular element is expected to work.
The case of the qualifiers is not important (capitals and lowercase letters may be mingled with impunity) but I have chosen to standardise on the use of capitals for the purposes of this article. Input stringThe input string will normally be obtained by calling OS_GetEnv, which returns the value of the program's command line, but may be any valid string. A sample command line used to invoke a BASIC program might read:
Output bufferWhen the SWI returns, the output buffer will contain a series of 4-byte values, one for every element specified in the syntax string. They can be found in the order in which they appear in the syntax string irrespective of how the input string may have been arranged. Having the options in a predictable order obviously makes it much easier to write code to interpret them — one of the major advantages of using OS_ReadArgs! In the case of a switch, any value other than zero will indicate that the keyword was present in the input string. In the case of all other options, a non-zero value in the relevant word of the output block can be interpreted as a pointer to the parameter which was associated with that keyword; a zero value indicates that the keyword was not used. Normally this will be a pointer to a zero-terminated string. In the case of an evaluated parameter this word is instead a pointer to a five-byte block, where the first byte indicates the type of data in the block (a zero byte for integer data) and the next four bytes form the integer value itself. Beware — this data is not word-aligned! BASIC will read a 32-bit value correctly from a non-word-aligned address, but for assembler (and by extension C) programs special coding will be required. Note that the actual data supplied for any string or evaluated parameters will also be written to the output buffer, after the pointers — so it has to be big enough to cope with the maximum length of input you are expecting. If the buffer is not long enough for the input string provided, you will get a 'buffer overflow' error. How it works — basic practiceThe easiest way to explain how something works is by example. All sample code will be given in BASIC, but the same principles apply to other languages. To illustrate the basics, I'll use a small program which doesn't actually read its own command line but simply passes a user-specified string to OS_ReadArgs — this will make it quicker and easier to test from the desktop! It will recognise four keywords, -debug, -width, -filename and -symbol. For the sake of argument I shall suppose that filename represents the name under which to save the output and must therefore always be supplied, that width is a integer value likewise required, that debug represents a switch which will activate debugging output if present, and that symbol may be one of either "square", "disc" or "triangle". I leave it to your imagination to come up with any useful function for this program...! The syntax string required will thus contain four elements. The order in which they are specified is not important, save to determine the order in which the output pointers will appear in the block. symbol will take a string parameter, if supplied this is the default element type, and therefore it requires no qualifier codes. Debug, as a switch, requires the /S code, and filename, which must always be supplied, requires an /A. This leaves width, which is both compulsory and numeric (evaluated) and therefore requires two qualifiers, /E/A. The beginning of the program will thus read as follows:
Acceptable inputRun this program, typing in a random test string when prompted. You will almost certainly get a 'Bad parameters' error. This is OS_ReadArgs warning you that the data it was given doesn't conform to the required syntax you specified — in other words, it's not in a form usable by your program! Try a selection of different input strings (it's probably easier to set input$ directly each time by editing the third line of the program than to keep on typing strings in at the prompt). You will find, for example, that "-symbol square -filename Test -width 19" and "-debug -filename Test -width 19" both work, but that "-symbol -filename Test -width 19" (because symbol requires a parameter) and "-symbol square -width 19" (because filename, defined as a non-optional keyword, has been omitted) will both result in an error. Note also that while "-s square -f Test -d -w 19" is perfectly valid, "-sym square -file Test -de -widt 19" is not; keywords must be supplied either in full or as initial letters. For this reason, when defining names for your keywords it is a good idea to try to ensure that they all start with a different initial. If a single-letter option is supplied which will match more than one keyword, OS_ReadArgs will interpret it as matching the first keyword to appear in the syntax string — which may not be what was meant.... A slightly unexpected result can be obtained by supplying a string parameter to width, e.g. "-width two" — instead of 'Bad parameters', OS_ReadArgs will report 'Unknown operand'. This makes more sense when you remember that the /E qualifier, although almost invariably used to indicate numeric input, actually implies a Decoding the inputOS_ReadArgs would be of little use if all it did was check the validity of the input string; every time the SWI is successfully called, the output buffer is filled as described above. Provided you are certain which type of parameter is represented by each word in the buffer (beware subsequent changes to the syntax string that change the keyword order...) it becomes relatively simple to interpret the values from within your program. The other thing to be careful about is that you have checked for a zero value in any given word in the buffer (indicating that this keyword was not used in the input string) before attempting to use it as a pointer. Given that
If you test this you should find that the values supplied in input$ for these three keywords are reported as expected, and always in the same order no matter how the input is arranged. Note how there is no need to check for a zero pointer in the case of filename, since the latter has already been defined using the /A qualifier — this keyword can be guaranteed always to be present, since if it is missing OS_ReadArgs will already have reported it as an error. Handling width is slightly more complicated. Again in this example I'm not checking for a zero pointer because this parameter has been defined as always present.
Note the complicated byte-by-byte memory access and shifting used in order to load a 32-bit value from an address that doesn't align with a 4-byte offset! Strictly speaking, when accessing memory from BASIC this isn't actually necessary: Specifying valid string inputYou may remember that the initial specification for the test program stated that symbol could take one of only three parameters — "square", "disc" or "triangle". This was included as an example of a restriction that OS_ReadArgs can't handle for you! Anything on this level is up to the program itself to check, there is no way of specifying a permitted range of inputs.
In this example, I've simply used a Example programHere's the final version of the demonstration program:
How it works — advanced theoryAlternative or missing keywordsIn reality, the syntax string supplied to OS_ReadArgs can be somewhat more complex than my original description of it as a series of elements separated by commas, each consisting of a keyword followed by one or more qualifiers, where each element corresponds to a possible parameter in the command line. One important feature that I have so far ignored is the fact that an element can have more than one alternative keyword associated with it — or even none at all! The idea of using OS_ReadArgs to identify parameters not preceded by keywords might seem strange; but in fact, any such 'unlabelled' parameters are simply assigned to the currently unused elements in order of occurrence, irrespective of whether they have keywords or not. For example, in the case of our familiar syntax string It can happen that a keyword which is 'unused' at the time unlabelled input is encountered will subsequently occur towards the end of the input string — if, for instance, the example above had been written as "Test -w 9 -d -symbol triangle", the initial "Test" would have been assigned to the first vacant slot, under symbol, before the actual -symbol keyword was reached, and the confusing result would be an 'Argument repeated' error. If your input is at all likely to mix labelled and unlabelled elements, it is a good idea to ensure that even the labelled elements are arranged in the syntax string in the expected order. It is possible, on the other hand, to provide more than one keyword for a given element. This has a purely cosmetic effect, and is done by separating the alternative keywords using the equals sign '='. The syntax element "debug=errordump/S" allows the user to specify any of -d, -e, -debug or -errordump in order to activate this switch, according to preference. You can have more than two alternatives — even three or four — but you may find that you run out of spare initial letters for your other options! Additional qualifiersThere are two further qualifiers which have not so far been mentioned, the most important of which is the /K option.
How it works - advanced practiceReading the command lineIn order to read the actual command line used to invoke your program from BASIC, the easiest way is to substitute a line:
In place of the " How to supply test dataThe easiest way to test command line input is to create an Obey (or TaskObey) file in the same directory as your test program, like this:
where 'TestProg' is the name of your BASIC file, and the options follow. If you then run this file, you will see that the command line received by your program not only includes the options you supplied, but also the filename of the program, together with the BASIC commands used to run your program! Given that the syntax string is suddenly being asked to cope with input like
none of which it was expecting, and all of which occurs at the beginning of the input string, meaning that it gets assigned to the first available keywords, it is perhaps not surprising that the result is a plethora of 'Bad operand' (as a filename gets evaluated under width) or 'Bad parameters' errors. Using 'dummy' elements to skip over unwanted parametersIn order to get OS_ReadArgs even to look at the options we are interested in, it is necessary to create 'dummy' elements that can be used by the SWI to parse the new data at the start of the line. The start of the program can be changed to look like this:
As you can see, I've added three new elements to the syntax string. The first one is a 'nameless' element with no qualifier codes either; this is designed to pick up the "BASIC" which will always appear at the start of any BASIC program's command line. This isn't preceded by anything that OS_ReadArgs can interpret as a keyword, so it will be put in the first available slot, which I haven't bothered to name. The second element is designed to catch the "-quit" option which follows the "BASIC" command; this is actually part of the definition for the filetype BASIC, meaning that when you double-click on a BASIC file you will be returned to the desktop once the program has finished, instead of being left at the BASIC ">" prompt in a command window. Technically it's a switch, since BASIC may or may not have been started with this option, so I've defined it as such in the syntax string. In practice, it is almost always certain to be present. The third new element is designed to intercept the filename of the BASIC program. Here I've assigned it a named keyword in order to remind myself why this element is present, although strictly speaking, since a keyword will never be supplied for this element either, it has no more need of a name than the first element. If you wanted to be sneaky and save a few bytes, you could even treat the "-quit" parameter immediately preceding this element as its keyword, and supply a syntax string along the lines of Whichever you decide to do, it's a good idea to make a firm choice for one or the other, in order to minimise the infuriating next stage of altering your syntax string — which is... Changing your block offsetsThe single most annoying thing about using OS_ReadArgs from BASIC is that every time you change the order of the elements in your syntax string, the order of the pointers in the output block will change. In the example program I gave, I was assuming that the data for the first keyword, symbol, was to be found at the start of the block, at offset block%!0. Adding three new elements to the start of the syntax string means that the data for this keyword has now been pushed back to block%!12 — and all the references to it need to be changed. So do all the references to subsequent keywords! Reading GSTrans'd dataJust as a test of the new qualifier, I added a The format is that of an unterminated string preceded by a two-byte length value; the pointer for this element points to the start of the length bytes, so these have to be read first.
The element VDU will now be GSTrans'd — you can test this by editing your command line to pass in system variables:
or, if feeling really venturous, VDU codes —
will clear the (command) window and change the text colour, though you'll probably have to use an Obey file rather than a TaskObey file to get the full effect! Evaluated data — further dark cornersCoping with non-numeric valuesYou may remember that I stated that the data block for an element defined with the /E qualifier consists of five bytes; one byte to hold the value type and four (non-word-aligned!) bytes to point to the value itself. You may also remember that my sample program checked to see if the value type was zero, representing an integer, and rejected anything else as being unsuitable data. As it happens, neither of these are necessarily true. It is in fact possible for OS_EvaluateExpression legitimately to return a string result, and I demonstrated this with the example of typing If the first byte of the data block is non-zero, this signifies that the element was a string value. In this case, the following bytes do not constitute a pointer but consist of the string itself in GSTrans-type format — two length bytes followed by the unterminated string. Given the existence of a
for
Non-integer valuesIt seems worth including a warning at this point that, unlike BASIC's VAL and EVAL commands, OS_EvaluateExpression (and by extension the /E qualifier to OS_ReadArgs) cannot handle non-integer values. Input containing a decimal point will simply lead to an 'unknown operand' error, expressions whose result would be fractional (22/7) are rounded down to the preceding integer. If you are likely to need to pass fractions as arguments to your program, the best way to do so, from BASIC at least, is simply to pass them as string values and evaluate them from within the program.
Example programHere is the updated version of the demonstration program, including new procedures and altered block offsets:
Error checkingPerhaps the most irritating thing about using OS_ReadArgs in a program is the infuriating error messages which appear if the input is not precisely as expected. For example, if the user innocently supplies "5.5" as a numerical (i.e. evaluated) argument, the SWI will instead report 'bad operand'! So far, I have simply been allowing the SWI to generate error messages to stop the program if it cannot interpret its input. However, this is not really desirable behaviour; if you examine how UNIX-type command-line programs handle bad input, you will see that the norm is to respond by giving a summary of the correct syntax.
X-SWIsIn order to be able to provide more helpful error messages, it is first necessary to suppress the automatic error generated by OS_ReadArgs itself. Suppressing errorsTo do this, simply prefix an X to the name of the SWI. Instead of causing an error which will halt the program, the SWI will signal that an error has occurred by setting the ARM processor's overflow flag. Methods of reading the processor flags will obviously differ according to how high-level the language is that you are using. From assembler you can check the overflow flag directly via the VS/VC (overflow set/overflow clear) condition codes; from C, where SWIs are called via library functions, it is conventional for such functions to return a pointer to a error block in place of the normal NULL value in order to signal that an error has occurred. BASIC uses a non-intuitive extension of the
Causing your own errorsYou can then generate your own error report, using whatever text you prefer, using BASIC's
Additional validity checksIf you want to be helpful to the user, you can also check for invalid data that can't be detected by the syntax string, such as numbers that are out of range, strings that don't match a restricted set of valid options (such as those for symbol above), unexpected string values from OS_EvaluateExpression, or filenames that don't exist. The more checks you do on your input, the less likely your program is to fall over due to invalid data later on! Harriet Bazley <harriet@bazley.freeuk.com> |